#275 A Scenario-based Database for Connected and Autonomous Driving in A Smart City

Principal Investigator
Ding Zhao
Start Date
Jan. 1, 2019
End Date
Dec. 31, 2019
Research Type
Grant Type
Grant Program
FAST Act - Mobility National (2016 - 2022)
Grant Cycle
2018 Traffic21


A high-quality driving dataset is a key ingredient to thrive the autonomous vehicle industry in Pittsburgh and build a smart city for the residence. In this project, we aim to build the world’s first scenario-based driving database that is dedicated to connected and autonomous vehicles. We plan to record and model the dynamic traffic information in Pittsburgh from heterogeneous driving data such as lidar point cloud, vision information, GPS, etc. Dynamic unsupervised learning then will be applied to automatically extract typical driving scenarios automatically. 

Our data collection platform is equipped with multiple advanced sensors including Lidar, high-resolution camera, radar, GPS, IMU units, and vehicle information such as steering wheels and braking pedals. The platform is able to capture the complex and informative real-world driving scenarios and categorize them as high-dimensional and heterogeneous time series data. After that, an unsupervised learning approach based on nonparametric Bayesian will be applied to learn and recognize driving scenarios by segmentation. A user-friendly web application will be developed to provide the dataset to public from a scenario perspective. 

Particularly, we plan to work closely with the department of mobility of the city and integrate the DSRC and smart cities information (e.g. traffic light, grid, event, weather, etc) into our analysis. The confidence of success stems from the lab’s accumulated efforts in developing the automated vehicle platforms and unsupervised machine learning theories supported by Toyota, Uber, Ford, Mcity, etc.     
Pittsburgh has become a city of smart mobility. Hundreds of autonomous vehicles (AV) made by Uber, Argo AI, and Aurora are driving through the city daily. AV industry is a great opportunity to thrive the city and provide jobs, but also poses concerns of public safety unveiled by the recent crashes involving Uber, Waymo, and Tesla. High-quality driving datasets can be the keystone for autonomous vehicle researchers. Most existing traffic dataset only contains processed raw data from the view of ego vehicle and provide discretized driving data such as image and data sequences. However, this may shadow the important interaction information in driving behavior and will cause defects in autonomous vehicle development.

To reduce the risks, a key ingredient is to understand the behavior of multiple agents in traffic scenarios with respect to their dynamic interactions. While the dynamic model for a single vehicle is well developed, the modeling for driving scenarios involving a large number of vehicles remains unsolved. The varying number of vehicles, cyclists, and pedestrians leads to a high heterogeneity in traffic data and obstacle the researchers. Moreover, apart from single or two-vehicle behaviors which can be described by empirical narratives such as ‘left turn’, ‘car following’ or ‘overtaking’. The definition of multi-agent traffic behavior is beyond human knowledge and thereby barrage human research from recognizing and labeling those traffic scenarios who have an extremely large number of involving agents. In order to identify the dynamic pattern of vehicle interaction behaviors from large traffic data, an unsupervised learning approach may help to recognize and classify the data without prior knowledge. 

In this project, we propose an automatic way to extract dynamic driving scenarios and build a driving library that contains interaction scenario information in the Pittsburgh. Our data collection platform vehicle, which is equipped with several advanced sensors including high-resolution cameras, Lidar, radar, and GPS unit, will be utilized to collect the data. The raw data will be processed into multi-dimensional time series. The encountered vehicles, cyclists and pedestrians will be perceived by perception approaches.

What’s more, in order to better understand the driving interaction behaviors, an unsupervised learning approach based on nonparametric Bayesian will be applied to learn and recognize driving scenarios by segmentation. Traffic primitives represent the fundamental driving scenarios akin to words in our previous researches. We applied Bayesian unsupervised learning based on hierarchical Dirichlet process and successfully extracted the traffic primitives from a massive dataset recorded in Ann Arbor. Besides, a web application will be built to provide the dataset to public from a scenario perspective, from which users can query the driving scenario data from a dynamic interaction perspective.

In summary, the scenario-based dataset established a link between an individual autonomous and the city. The output of the project will facilitate self-driving development and testing, as well as other aspects including policy-making, insurance, business model, security, and privacy.    
Tasks are described in the detailed plan session.
Task 1 (Data): 1/1/2019-8/31/2019
Task 2 (Theory): 1/1/2019-6/30/2019
Task 3 (Process): 3/1/2019-10/31/2019
Task 4 (Web): 7/1/2019-12/31/2019    
Deployment Plan
The project will be divided into the following tasks:
Task 1. Integrate a data collection platform by installing multiple advanced sensors such as camera, radar, Lidar, and IMU. Collect the data around the city 
Task 2. Develop dynamic unsupervised learning based on nonparametric Bayesian learning.
Task 3. Design the data processing method to extract traffic scenarios in the data based on unsupervised learning method.
Task 4. Develop and set up the web application for a public database.    
Expected Accomplishments and Metrics
Deliverables in this project include:
1. Novel theories to automatically and efficiently recognize the dynamic driving scenarios.
2. A scenario-based data library for smart mobility in Pittsburgh
3. A website (traffic-net.org) providing the world’s first scenario-based public AV driving database    

Individuals Involved

Email Name Affiliation Role Position
dingzhao@cmu.edu Zhao, Ding Carnegie Mellon School of Engineering PI Faculty - Untenured, Tenure Track


Amount of UTC Funds Awarded
Total Project Budget (from all funding sources)


Type Name Uploaded
Data Management Plan 7-dmp.docx Jan. 3, 2019, 11:18 a.m.
Progress Report 275_Progress_Report_2019-03-30 March 25, 2019, 3:08 p.m.
Presentation A self-organized Scenario-based Heterogeneous Traffic Database for Autonomous Vehicles Sept. 27, 2019, 10:44 a.m.
Progress Report 275_Progress_Report_2019-09-30 Sept. 27, 2019, 10:44 a.m.
Progress Report 275_Progress_Report_2020-03-30 Dec. 29, 2019, 12:51 p.m.
Final Report M21_Cover_Page__1.docx.pdf Jan. 2, 2020, 8:51 a.m.
Publication How to Evaluate Proving Grounds for Self-Driving? A Quantitative Approach March 21, 2021, 5:07 p.m.
Publication Advanced Driver Assistance Strategies for a Single-Vehicle Overtaking a Platoon on the Two-Lane Two-Way Road. March 21, 2021, 5:09 p.m.
Publication Learning to collide: An adaptive safety-critical scenarios generating method March 21, 2021, 5:11 p.m.
Publication Clustering of driving encounter scenarios using connected vehicle trajectories March 21, 2021, 5:12 p.m.
Publication CMTS: A Conditional Multiple Trajectory Synthesizer for Generating Safety-Critical Driving Scenarios March 21, 2021, 5:14 p.m.
Publication Evaluation Uncertainty in Data-Driven Self-Driving Testing March 21, 2021, 5:15 p.m.
Publication A general framework of learning multi-vehicle interaction patterns from video March 21, 2021, 5:16 p.m.
Publication Modeling Multi-Vehicle Interaction Scenarios Using Gaussian Random Field March 21, 2021, 5:18 p.m.
Publication Probabilistic trajectory prediction for autonomous vehicles with attentive recurrent neural process March 21, 2021, 5:26 p.m.
Publication Multi-vehicle interaction scenarios generation with interpretable traffic primitives and gaussian process regression March 21, 2021, 5:27 p.m.
Publication Density-Adaptive Sampling for Heterogeneous Point Cloud Object Segmentation in Autonomous Vehicle Applications March 21, 2021, 5:30 p.m.
Publication A multi-vehicle trajectories generator to simulate vehicle-to-vehicle encountering scenarios March 21, 2021, 5:31 p.m.
Publication Crash Avoidance Systems-Safety Evaluation of an Important Class of Electronic Control Systems March 21, 2021, 5:32 p.m.
Publication Combining Reachability Analysis and Importance Sampling for Accelerated Evaluation of Highway Automated Vehicles at Pedestrian Crossing March 21, 2021, 5:33 p.m.
Publication Designing importance samplers to simulate machine learning predictors via optimization March 21, 2021, 5:34 p.m.
Publication An accelerated approach to safely and efficiently test pre-production autonomous vehicles on public streets March 21, 2021, 5:39 p.m.
Publication Accelerated evaluation of autonomous vehicles in the lane change scenario based on subset simulation technique March 21, 2021, 5:40 p.m.
Publication Synthesis of different autonomous vehicles test approaches March 21, 2021, 5:41 p.m.
Publication Improving localization accuracy in connected vehicle networks using Rao–Blackwellized particle filters: Theory, simulations, and experiments March 21, 2021, 5:42 p.m.
Publication An" Xcity" Optimization Approach to Designing Proving Grounds for Connected and Autonomous Vehicles March 21, 2021, 5:43 p.m.
Publication A learning-based approach for lane departure warning systems with a personalized driver model March 21, 2021, 5:45 p.m.
Publication A Versatile Approach to Evaluating and Testing Automated Vehicles based on Kernel Methods March 21, 2021, 5:47 p.m.
Publication Evaluation of the energy efficiency in a mixed traffic with automated vehicles and human controlled vehicles March 21, 2021, 5:48 p.m.
Publication Clustering of naturalistic driving encounters using unsupervised learning March 21, 2021, 5:49 p.m.
Publication A tempt to unify heterogeneous driving databases using traffic primitives March 21, 2021, 5:50 p.m.
Publication Extracting traffic primitives directly from naturalistically logged data for self-driving applications March 21, 2021, 5:51 p.m.
Publication From the lab to the street: Solving the challenge of accelerating automated vehicle testing March 21, 2021, 5:51 p.m.
Publication Sequential experimentation to efficiently test automated vehicles March 21, 2021, 5:53 p.m.
Publication Accelerated evaluation of automated vehicles using piecewise mixture models March 21, 2021, 5:53 p.m.
Publication Towards affordable on-track testing for autonomous vehicle—A Kriging-based statistical approach March 21, 2021, 5:55 p.m.
Publication An accelerated testing approach for automated vehicles with background traffic described by joint distributions March 21, 2021, 5:55 p.m.
Publication Towards secure and safe appified automated vehicles March 21, 2021, 5:57 p.m.
Publication Evaluation of a semi-autonomous lane departure correction system using naturalistic driving data March 21, 2021, 5:57 p.m.
Publication Evaluation of automated vehicles in the frontal cut-in scenario—An enhanced approach using piecewise mixture models March 21, 2021, 5:59 p.m.
Publication Accelerated evaluation of automated vehicles in car-following maneuvers March 21, 2021, 6 p.m.
Publication Accelerated evaluation of automated vehicles safety in lane-change scenarios based on importance sampling techniques March 21, 2021, 7:09 p.m.
Publication Accelerated evaluation of automated vehicles using extracted naturalistic driving dataevaluation of automated vehicles using extracted naturalistic driving data March 21, 2021, 7:10 p.m.

Match Sources

No match sources!


No partners!