Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

A framework where a deep Q-Learning Reinforcement Learning agent tries to choose the correct traffic light phase at an intersection to maximize traffic efficiency.

Stars: ✭ 136 (-4.9%)

Mutual labels: traffic

Textrecognitiondatagenerator

A synthetic data generator for text recognition

Stars: ✭ 2,075 (+1351.05%)

Mutual labels: dataset

Dataspice

🌶 Create lightweight schema.org descriptions of your datasets

Stars: ✭ 137 (-4.2%)

Mutual labels: dataset

Tvqa

[EMNLP 2018] PyTorch code for TVQA: Localized, Compositional Video Question Answering

Stars: ✭ 130 (-9.09%)

Mutual labels: dataset

Lacmus

Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.

Stars: ✭ 142 (-0.7%)

Mutual labels: dataset

Ml Datasets

Machine Learning datasets for Nepal

Stars: ✭ 139 (-2.8%)

Mutual labels: dataset

Coronawatchnl

Numbers concerning COVID-19 disease cases in The Netherlands by RIVM, LCPS, NICE, ECML, and Rijksoverheid.

Stars: ✭ 135 (-5.59%)

Mutual labels: dataset

Continuum

A clean and simple data loading library for Continual Learning

Stars: ✭ 136 (-4.9%)

Mutual labels: dataset

Mams For Absa

A Multi-Aspect Multi-Sentiment Dataset for aspect-based sentiment analysis.

Stars: ✭ 135 (-5.59%)

Mutual labels: dataset

Gossiping Chinese Corpus

PTT 八卦版問答中文語料

Stars: ✭ 137 (-4.2%)

Mutual labels: dataset

Vfx Datasets

Stars: ✭ 134 (-6.29%)

Mutual labels: dataset

Netcdf Fortran

Official GitHub repository for netCDF-Fortran libraries, which depend on the netCDF C library. Install the netCDF C library first.

Stars: ✭ 141 (-1.4%)

Mutual labels: dataset

Hake

HAKE: Human Activity Knowledge Engine (CVPR'18/19/20, NeurIPS'20)

Stars: ✭ 132 (-7.69%)

Mutual labels: dataset

Sensaturban

🔥Urban-scale point cloud dataset (CVPR 2021)

Stars: ✭ 135 (-5.59%)

Mutual labels: dataset

Datasets

🎁 3,000,000+ Unsplash images made available for research and machine learning

Stars: ✭ 1,805 (+1162.24%)

Mutual labels: dataset

Clue

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Stars: ✭ 2,425 (+1595.8%)

Mutual labels: dataset

Triggerner

TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)

Stars: ✭ 141 (-1.4%)

Mutual labels: dataset

View All Similar Projects ➔

Deep Sequence Learning with Auxiliary Information for Traffic Prediction. KDD 2018. (Accepted)

Binbing Liao, Jingqing Zhang, Chao Wu, Douglas McIlwraith, Tong Chen, Shengwen Yang, Yike Guo, Fei Wu

Binbing Liao and Jingqing Zhang contributed equally to this article.

Paper Link: arXiv or KDD18

Abstract
Q-Traffic Dataset
Code
Citation
Poster and Video

Abstract

Predicting traffic conditions from online route queries is a challenging task as there are many complicated interactions over the roads and crowds involved. In this paper, we intend to improve traffic prediction by appropriate integration of three kinds of implicit but essential factors encoded in auxiliary information. We do this within an encoder-decoder sequence learning framework that integrates the following data: 1) offline geographical and social attributes. For example, the geographical structure of roads or public social events such as national celebrations; 2) road intersection information, i.e. in general, traffic congestion occurs at major junctions; 3) online crowd queries. For example, when many online queries issued for the same destination due to a public performance, the traffic around the destination will potentially become heavier at this location after a while. Qualitative and quantitative experiments on a real-world dataset from Baidu have demonstrated the effectiveness of our framework.

Q-Traffic Dataset

We collected a large-scale traffic prediction dataset - Q-Traffic dataset, which consists of three sub-datasets: query sub-dataset, traffic speed sub-dataset and road network sub-dataset. We compare our released Q-Traffic dataset with different datasets used for traffic prediction.

Access to the Q-Traffic Dataset

This dataset is updated and now available at BaiduNetDisk Code：umqd. Backup link.

For those who have downloaded the old dataset, we strongly suggest you re-download the updated dataset. The old dataset at Baidu Research Open-Access Dataset (BROAD) exists some duplicated hashed_link_id due to the hash function. So the hashed_link_id is removed in the updated dataset, meaning that we just use the link_id which is consistent with the intermediate_files.

The intermediate data files (after pre-processing) are available at intermediate_files, so you can directly train the model now.

Please feel free to raise an issue if you have any question.

Query Sub-dataset

This sub-dataset was collected in Beijing, China between April 1, 2017 and May 31, 2017, from the Baidu Map. The detailed pre-processing of this sub-dataset is described in the paper. The query sub-dataset contains about 114 million user queries, each of which records the starting time-stamp, coordinates of the starting location, coordinates of the destination, estimated travel time (minutes). There are some query samples as follows:

2017-04-01 19:42:23, 116.88 37.88, 116.88 37.88, 33

2017-04-01 18:00:05, 116.88 37.88, 116.88 37.88, 33

2017-04-01 01:14:08, 116.88 37.88, 116.88 37.88, 33

..., ..., ..., ..., ...

Traffic Speed Sub-dataset

We also collected the traffic speed data for the same area and during the same time period as the query sub-dataset. This sub-dataset contains 15,073 road segments covering approximately 738.91 km. Figure 1 shows the spatial distribution of these road segments, respectively.

Figure 1. Spatial distribution of the road segments in Beijing

They are all in the 6th ring road (bounded by the lon/lat box of <116.10, 39.69, 116.71, 40.18>), which is the most crowded area of Beijing. The traffic speed of each road segment is recorded per minute. To make the traffic speed predictable, for each road segment, we use simple moving average with a 15-minute time window to smooth the traffic speed sub-dataset and sample the traffic speed per 15 minutes. Thus, there are totally 5856 ($61 \times 24 \times 4$) time steps, and each record is represented as road_segment_id, time_stamp ([0, 5856)) and traffic_speed (km/h).

There are some traffic speed samples as follows:

15257588940, 0, 42.1175  

..., ..., ...  
  
15257588940, 5855, 33.6599  

1525758913, 0, 41.2719  

..., ..., ...

Road Network Sub-dataset

Due to the spatio-temporal dependencies of traffic data, the topology of the road network would help to predict traffic. Table 1 shows the fields of the road network sub-dataset.

Table 1. Examples of geographical attributes of each road segment.

For each road segment in the traffic speed sub-dataset, the road network sub-dataset provides the starting node (snode) and ending node (enode) of the road segment, based on which the topology of the road network can be built. In addition, the sub-dataset also provides various geographical attributes of each road segment, such as width, length, speed limit and the number of lanes. Furthermore, we also provide the social attributes such as weekdays, weekends, public holidays, peak hours and off-peak hours.

Comparison with Other Datasets

Table 2 shows the comparison of different datasets for traffic speed prediction. In the past few years, researchers have performed experiments with small or (and) private datasets. The release of Q-Traffic, a large-scale public available dataset with offline (geographical and social attributes, road network) and online (crowd map queries) information, should lead to an improvement of the research of traffic prediction.

Table 2. Comparison of different datasets for traffic speed prediction.

Code

The source code has been tested with:

Python 3.5
TensorFlow 1.3.0
TensorLayer 1.7.3
numpy 1.14.0
pandas 0.21.0
scikit-learn 0.19.1

The structure of code:

model.py: Implementation of deep learning models
train.py: Implementation of controllers for training and testing
baselines.py: Implementation of baseline models including RF and SVR
dataloader.py: Data processing and loading, subject to change due to data format if necessary
preprocessing: Data preprocessing and cleaning
others: utilities, playground, logging, data preprocessing

Citation

In case using our dataset, please cite the following publication:

@inproceedings{bbliaojqZhangKDD18deep,  
  title = {Deep Sequence Learning with Auxiliary Information for Traffic Prediction},  
  author = {Binbing Liao and Jingqing Zhang and Chao Wu and Douglas McIlwraith and Tong Chen and Shengwen Yang and Yike Guo and Fei Wu},  
  booktitle = {Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},  
  pages = {537--546},
  year = {2018},  
  organization = {ACM}  
}

Poster and Video

You can find our KDD 2018 poster here.
You can find our KDD 2018 Video here. YouTube

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 143

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

JingqingZ / Baidutraffic

Programming Languages

Labels

Projects that are alternatives of or similar to Baidutraffic

Deep Sequence Learning with Auxiliary Information for Traffic Prediction. KDD 2018. (Accepted)

Binbing Liao, Jingqing Zhang, Chao Wu, Douglas McIlwraith, Tong Chen, Shengwen Yang, Yike Guo, Fei Wu

Binbing Liao and Jingqing Zhang contributed equally to this article.

Contents

Abstract

Q-Traffic Dataset

Access to the Q-Traffic Dataset

Query Sub-dataset

Traffic Speed Sub-dataset

Road Network Sub-dataset

Comparison with Other Datasets

Code

Citation

Poster and Video