Our team work on traffic big data - Resuly's Blog

Introduction

Every year, we can get significant data from traffic. Traffic flow, average speed, average delay and all information are important for traffic forecast and management. In the past, it’s almost impossible to analysis data on this scale. Fortunately, the development of cloud computing technique saves us from the heavy computational works. In this post, I will introduce the traffic big data works conducted by our team.

Our cluster servers in SEU

Objectives

With the traffic data, we can do many things to make the traffic a better place. In our team, the main objective of data analysis is making a prediction, which is helpful for all traffic participants. The main concerns are listed as below:

Demanding forecast of taxi
Forecast of bus passenger volume
Forecast of traffic volume on highways
Forecast of travel time on highways
Commuters’ travel characteristics
Forecast of selecting bus lines
Analysis of Cause of traffic accident
Forecast of urban road travel time

Technological supports

There are many ways to handle magnanimous data. The related techniques and software are listed as follows:

Machine learning
Hadoop/Spark
New technology exploring and pre-researching
Abundant knowledge of traffic

Predict traffic volume on highways

Data

Average travel time, speed and traffic volume from 15000 detectors placed on California highway in 2013.

Method

Results

Comparison between different prediction model: When the traffic volume on the highway is more than 450veh/15min:

The accuracy of 15-minute prediction is over 90% for 86% of the roads.
The accuracy of 30-minute prediction is over 90% for 88% of the roads.
The accuracy of 45-minute and 60-minute prediction show is over 90% for 90% of the roads.

Multivariate data fusion analysis

Data Source

Modules

traffic volume analysis
an operational state analysis of road segment
an operational state analysis of the intersection
trip distribution character analysis

Technologies

Track recognition
Data mining
Multivariate big data fusion

Procedures

Road traffic volume analysis
Operational state analysis of road segment
Operational state analysis of intersection
Trip distribution character analysis

Result

5 kilometers around Nanjing South Railway Station—road speed:

Public transport demand prediction

Data

Bus card data of Guangzhou Bus and Guangfo Bus from August 1 to December 31 in 2014. From the behavior model of fixed crowd in public transport, we can build a Passenger Flow Forecast Model for Bus Lines and predict the number of trips hourly for each line in the coming week.

Result

The Neural network is good in this case, but it’s not unstable. The main reason about that is over-fitting.
GBDT and RF both got a high performance on final online evaluation.

The vehicle routing problem of public transport

Data

The data comes from taxi, social media and mobile in Shanghai. We build a model to reveal the relation among residents commuting behavior and identifying urban residence and employment spaces.

Method

Traffic grid OD clustering

Result

We can find the hot areas of origins and destinations from the result. As a result, the travel time and the law of commuting can be judged approximately.

Introduction

Objectives

Technological supports

Predict traffic volume on highways

Data

Method

Results

Multivariate data fusion analysis

Data Source

Modules

Technologies

Procedures

Result

Public transport demand prediction

Data

Result

The vehicle routing problem of public transport

Data

Method

Result

CATALOG