If you commute, you might find you’re spending more and more time in your car. Daniela Rus, a professor at MIT, has noticed this, and it prompted her to undertake a project involving data analytics to improve the operation of transportation networks.
Rus discussed her case study as part of the online course “Tackling the Challenges of Big Data,” which runs from February 3 to March 17.
Rus conducted her study in Singapore, a country of 5 million people measuring about 30 by 16 miles. Crucially for the study, Singapore has 26,000 taxis, each equipped with an easy-to-deploy device that transmits information such as GPS location. Rus had access to data from 16,000 taxis, including the vehicles ID, GPS location, speed, status, and time stamp, logged about once every half minute. Overall she collected a lot of data—about 33 GB.
In addition, Singapore has embedded in the pavement of its roads about 10,000 inductive loop sensors that count cars passing particular locations—providing “the ground truth for what traffic looks like for a particular location,” Rus said.
That leads to the question, are taxis a good, unbiased sample of overall traffic, and if not unbiased can we correct for the bias and use taxis as a probe for general traffic? It turns out that the taxi sample is biased—yet consistently biased and therefore correctable, although not with one linear regression correction coefficient.
For example, taxis and general traffic correlate well at rush hour but not at other times. Challenges include finding the appropriate time-interval granularity. Over-fitting the data does not work well for generalizations, whereas coarser granularity (workday vs. weekend with a one hour time interval for each learned coefficient) does work well.
It turns out you don’t need 16,000 taxis; 30 taxis per road per time chunk studied is statistically significant, Rus said. And taxis behave like a rapidly mixing Markov chain, independent of their initial location, or state, so they make good surrogates for general traffic.
The data can be used to develop fundamental diagrams for individual road segments to predict congestion, and to avoid it, possibly through variable pricing schemes or by providing information that lets drivers better schedule their trips and plan their routes.
The Singapore study, Rus said, has demonstrated the capabilities at city scale over long period of time, and she concluded, “We are very excited to bring these ideas to other urban areas.”
Registration for “Tackling the Challenges of Big Data” has been extended until February 10.—Rick Nelson
About the Author

Rick Nelson
Contributing Editor
Rick is currently Contributing Technical Editor. He was Executive Editor for EE in 2011-2018. Previously he served on several publications, including EDN and Vision Systems Design, and has received awards for signed editorials from the American Society of Business Publication Editors. He began as a design engineer at General Electric and Litton Industries and earned a BSEE degree from Penn State.