All Projects → JingqingZ → Baidutraffic

JingqingZ / Baidutraffic

This repo includes introduction, code and dataset of our paper Deep Sequence Learning with Auxiliary Information for Traffic Prediction (KDD 2018).

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Baidutraffic

Transportationnetworks
Transportation Networks for Research
Stars: ✭ 312 (+118.18%)
Mutual labels:  dataset, traffic
Coffee Quality Database
Building the Coffee Quality Institute Database
Stars: ✭ 141 (-1.4%)
Mutual labels:  dataset
Awesome Italian Public Datasets
A selection of interesting Open dataset from the Italian Public Administration and Civic Data use cases
Stars: ✭ 132 (-7.69%)
Mutual labels:  dataset
Deep Qlearning Agent For Traffic Signal Control
A framework where a deep Q-Learning Reinforcement Learning agent tries to choose the correct traffic light phase at an intersection to maximize traffic efficiency.
Stars: ✭ 136 (-4.9%)
Mutual labels:  traffic
Textrecognitiondatagenerator
A synthetic data generator for text recognition
Stars: ✭ 2,075 (+1351.05%)
Mutual labels:  dataset
Dataspice
🌶 Create lightweight schema.org descriptions of your datasets
Stars: ✭ 137 (-4.2%)
Mutual labels:  dataset
Tvqa
[EMNLP 2018] PyTorch code for TVQA: Localized, Compositional Video Question Answering
Stars: ✭ 130 (-9.09%)
Mutual labels:  dataset
Lacmus
Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.
Stars: ✭ 142 (-0.7%)
Mutual labels:  dataset
Ml Datasets
Machine Learning datasets for Nepal
Stars: ✭ 139 (-2.8%)
Mutual labels:  dataset
Coronawatchnl
Numbers concerning COVID-19 disease cases in The Netherlands by RIVM, LCPS, NICE, ECML, and Rijksoverheid.
Stars: ✭ 135 (-5.59%)
Mutual labels:  dataset
Continuum
A clean and simple data loading library for Continual Learning
Stars: ✭ 136 (-4.9%)
Mutual labels:  dataset
Mams For Absa
A Multi-Aspect Multi-Sentiment Dataset for aspect-based sentiment analysis.
Stars: ✭ 135 (-5.59%)
Mutual labels:  dataset
Gossiping Chinese Corpus
PTT 八卦版問答中文語料
Stars: ✭ 137 (-4.2%)
Mutual labels:  dataset
Vfx Datasets
Stars: ✭ 134 (-6.29%)
Mutual labels:  dataset
Netcdf Fortran
Official GitHub repository for netCDF-Fortran libraries, which depend on the netCDF C library. Install the netCDF C library first.
Stars: ✭ 141 (-1.4%)
Mutual labels:  dataset
Hake
HAKE: Human Activity Knowledge Engine (CVPR'18/19/20, NeurIPS'20)
Stars: ✭ 132 (-7.69%)
Mutual labels:  dataset
Sensaturban
🔥Urban-scale point cloud dataset (CVPR 2021)
Stars: ✭ 135 (-5.59%)
Mutual labels:  dataset
Datasets
🎁 3,000,000+ Unsplash images made available for research and machine learning
Stars: ✭ 1,805 (+1162.24%)
Mutual labels:  dataset
Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+1595.8%)
Mutual labels:  dataset
Triggerner
TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)
Stars: ✭ 141 (-1.4%)
Mutual labels:  dataset

Deep Sequence Learning with Auxiliary Information for Traffic Prediction. KDD 2018. (Accepted)

Binbing Liao, Jingqing Zhang, Chao Wu, Douglas McIlwraith, Tong Chen, Shengwen Yang, Yike Guo, Fei Wu

Binbing Liao and Jingqing Zhang contributed equally to this article.

Paper Link: arXiv or KDD18

Contents

  1. Abstract
  2. Q-Traffic Dataset
  3. Code
  4. Citation
  5. Poster and Video

Abstract

Predicting traffic conditions from online route queries is a challenging task as there are many complicated interactions over the roads and crowds involved. In this paper, we intend to improve traffic prediction by appropriate integration of three kinds of implicit but essential factors encoded in auxiliary information. We do this within an encoder-decoder sequence learning framework that integrates the following data: 1) offline geographical and social attributes. For example, the geographical structure of roads or public social events such as national celebrations; 2) road intersection information, i.e. in general, traffic congestion occurs at major junctions; 3) online crowd queries. For example, when many online queries issued for the same destination due to a public performance, the traffic around the destination will potentially become heavier at this location after a while. Qualitative and quantitative experiments on a real-world dataset from Baidu have demonstrated the effectiveness of our framework.

Q-Traffic Dataset

We collected a large-scale traffic prediction dataset - Q-Traffic dataset, which consists of three sub-datasets: query sub-dataset, traffic speed sub-dataset and road network sub-dataset. We compare our released Q-Traffic dataset with different datasets used for traffic prediction.

Access to the Q-Traffic Dataset

This dataset is updated and now available at BaiduNetDisk Code:umqd. Backup link.

For those who have downloaded the old dataset, we strongly suggest you re-download the updated dataset. The old dataset at Baidu Research Open-Access Dataset (BROAD) exists some duplicated hashed_link_id due to the hash function. So the hashed_link_id is removed in the updated dataset, meaning that we just use the link_id which is consistent with the intermediate_files.

The intermediate data files (after pre-processing) are available at intermediate_files, so you can directly train the model now.

Please feel free to raise an issue if you have any question.

Query Sub-dataset

This sub-dataset was collected in Beijing, China between April 1, 2017 and May 31, 2017, from the Baidu Map. The detailed pre-processing of this sub-dataset is described in the paper. The query sub-dataset contains about 114 million user queries, each of which records the starting time-stamp, coordinates of the starting location, coordinates of the destination, estimated travel time (minutes). There are some query samples as follows:

2017-04-01 19:42:23, 116.88 37.88, 116.88 37.88, 33

2017-04-01 18:00:05, 116.88 37.88, 116.88 37.88, 33

2017-04-01 01:14:08, 116.88 37.88, 116.88 37.88, 33

..., ..., ..., ..., ...

Traffic Speed Sub-dataset

We also collected the traffic speed data for the same area and during the same time period as the query sub-dataset. This sub-dataset contains 15,073 road segments covering approximately 738.91 km. Figure 1 shows the spatial distribution of these road segments, respectively.


Figure 1. Spatial distribution of the road segments in Beijing

They are all in the 6th ring road (bounded by the lon/lat box of <116.10, 39.69, 116.71, 40.18>), which is the most crowded area of Beijing. The traffic speed of each road segment is recorded per minute. To make the traffic speed predictable, for each road segment, we use simple moving average with a 15-minute time window to smooth the traffic speed sub-dataset and sample the traffic speed per 15 minutes. Thus, there are totally 5856 ($61 \times 24 \times 4$) time steps, and each record is represented as road_segment_id, time_stamp ([0, 5856)) and traffic_speed (km/h).

There are some traffic speed samples as follows:

15257588940, 0, 42.1175  

..., ..., ...  
  
15257588940, 5855, 33.6599  

1525758913, 0, 41.2719  

..., ..., ...  

Road Network Sub-dataset

Due to the spatio-temporal dependencies of traffic data, the topology of the road network would help to predict traffic. Table 1 shows the fields of the road network sub-dataset.


Table 1. Examples of geographical attributes of each road segment.

For each road segment in the traffic speed sub-dataset, the road network sub-dataset provides the starting node (snode) and ending node (enode) of the road segment, based on which the topology of the road network can be built. In addition, the sub-dataset also provides various geographical attributes of each road segment, such as width, length, speed limit and the number of lanes. Furthermore, we also provide the social attributes such as weekdays, weekends, public holidays, peak hours and off-peak hours.

Comparison with Other Datasets

Table 2 shows the comparison of different datasets for traffic speed prediction. In the past few years, researchers have performed experiments with small or (and) private datasets. The release of Q-Traffic, a large-scale public available dataset with offline (geographical and social attributes, road network) and online (crowd map queries) information, should lead to an improvement of the research of traffic prediction.


Table 2. Comparison of different datasets for traffic speed prediction.

Code

The source code has been tested with:

  • Python 3.5
  • TensorFlow 1.3.0
  • TensorLayer 1.7.3
  • numpy 1.14.0
  • pandas 0.21.0
  • scikit-learn 0.19.1

The structure of code:

  • model.py: Implementation of deep learning models
  • train.py: Implementation of controllers for training and testing
  • baselines.py: Implementation of baseline models including RF and SVR
  • dataloader.py: Data processing and loading, subject to change due to data format if necessary
  • preprocessing: Data preprocessing and cleaning
  • others: utilities, playground, logging, data preprocessing

Citation

In case using our dataset, please cite the following publication:

@inproceedings{bbliaojqZhangKDD18deep,  
  title = {Deep Sequence Learning with Auxiliary Information for Traffic Prediction},  
  author = {Binbing Liao and Jingqing Zhang and Chao Wu and Douglas McIlwraith and Tong Chen and Shengwen Yang and Yike Guo and Fei Wu},  
  booktitle = {Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},  
  pages = {537--546},
  year = {2018},  
  organization = {ACM}  
}  

Poster and Video

  • You can find our KDD 2018 poster here.
  • You can find our KDD 2018 Video here. YouTube
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].