#291 Labeling Roads with Different Types of Automated Driving Functional Requirements using Machine Learning

Principal Investigator
Ding Zhao
Start Date
July 1, 2019
End Date
June 30, 2020
Research Type
Grant Type
Grant Program
FAST Act - Mobility National (2016 - 2022)
Grant Cycle
2019 Mobility21 UTC


The project aims to label roads with different types of automated driving functional requirements to safety deploy automated vehicles in variant communities regarding road types, geometries, lighting facilities, and human behaviors. Novel unsupervised learning approaches will be developed to synthesize the dynamic heterogeneous data which are being collected and analyzed by the vehicle platform and the infrastructure in PI’s projects supported by Toyota, Bosch, and Uber. Outputs are reports/suggestions to the city council and open datasets/tools for the industry partners.    
Motivation and Goal of the Project

Automated vehicles (AVs) should be deployed gradually and geometrically selectively to ensure safety. Self-driving technologies are currently under rapid development and level 3-4 AVs are expected on our roads this year, in large numbers, which can drive themselves in particular spatial areas or driving circumstances. However, frequent collisions of AVs in certain driving scenarios, such as in dark streets or crowded areas, have posed wide concerns of the AV safety. People want to know what kinds of driving circumstances or areas are easy for AVs and what are relatively hard and how to quantify the degree.

The question is not trivial as it involves both understanding of mobility technologies and the traffic facility and driving environment. In the new Federal guidance - Automated Vehicles 3.0: Preparing for the Future of Transportation 3.0, released last year, the Department of Transportation proposed the concept of Operational Design Domain (ODD) to describe the driving complexity considering roadway types, geographic area, and speed range. This concept sheds a light on the evaluation of the difficulty of driving, but it is still not clear how to apply it in practice as neither the automation level nor the ODD provide a numerical solution, hence likely resulting in subjective, incomplete, and inherently somewhat ambiguous analysis to fully describe the complex nature of real-world traffic, and thus causing biased confidence and mis-qualification of AVs for public deployment. Moreover, the modeling of AV driving behavior remains opaque due to its multidimensional and time-variant nature. In order to reduce the risks and lay the foundation of autonomous vehicles deployment, this project is to label the roads  of the city with different automated driving requirement by systematically analyze the streets with respect to real-world scenarios as well as automated functionality based on the analysis of large scale of data collected from onboard sensors and city infrastructure.

Our previous research of using unsupervised learning approaches to identify and classify driving environments has equipped us with state-of-the-art methods for AV deployment risk evaluation. In our ongoing project, funded by the Toyota, Uber, Bosch, and Traffic21, which provides cost-share in this project, we are building the world’s first scenario-based driving dataset “TrafficNet 2.0” to facilitate the analysis and develop evaluation tools for AV safety. Built on top of this effort, in this project we extend the effort into providing tools to evaluate driving difficulties on different roads of Pittsburgh and gaining insights how new mobility technologies shall be well integrated with the futuristic smart city. The output of the project will be tools to identify the types of functional automated driving requirement to deploy autonomous vehicles considering various road geometries, lighting facilities, road-user behaviors, and other related factors. These tools and analysis results provide a valuable resource for AV companies in Pittsburgh and will lay a foundation in the process of regulations.

Research Methodologies

The confidence to convincingly evaluate the driving difficulty is based on the foundation of our previous research on AV safety. We created the accelerated evaluation method [1] to predict the average crash rate of an AV in certain driving scenarios. We further used Nonparametric Bayesian learning (NPBayes) to identify typical driving scenarios [2]. In this project, we plan to extend the unsupervised learning methods to analyze multi-dimensional and multi-fidelity data that are collected from onboard sensors such as lidars, radars, and vehicle dynamic data together with the smart city infrastructure, such as digital maps, connected vehicle roadside equipment, and lighting facility comprising not only time series data but also cross-sectional data. Besides, kernel methods will be used as an additional layer of regularization in the unsupervised scheme to stream such cross-platform collected data into the current NPBayes setting. The types of kernels will be carefully selected so as to balance the complexity and tractability of the proposed architecture in extracting real-world traffic representations.

Stochastic models will be utilized to represent the uncertainties of the city zones dynamics based on the extracted traffic representations, which guides the formulation of driving difficulty and prescribes the required AV functionalities. Accounting for the widely acknowledged periodicity patterns of urban activities (e.g. peak traffic hours or weekly grocery trips, etc.), we consider adopting Non-Homogeneous Poisson Process (NHPP) with piecewise intensities model [3] from the stochastic modeling literature to better capture the spatiotemporal pattern of zone-based routine activities. The sensitivity of the resulting risk estimates will be thoroughly analyzed in regards to a variety of factors including AV automation levels, AV configurations, physical infrastructure conditions, and the derived driving difficulty measures. While all these factors are relevant for AV and city domains, the proper emphasis will be tailored depending on the inherent quality of the collected data or the derived information. In the cases where rare safety-critical cases are prevalent, we will use Accelerated Evaluation framework [4] to efficiently quantify the safety risk to a high order of precision. Combining these methods, we shall obtain an integrated framework to link the powerful data-driven traffic representation scheme with efficient stochastic modeling and safety evaluation framework that can address the highly dynamic urban activities, the widely diverse community characteristics, and the overarching needs of promoting social fairness and economic benefits by means of intelligent transportation deployment and smart city development.

[1] D. Zhao, “Accelerated Evaluation of Automated Vehicles,” Ph.D. dissertation, University of Michigan, Ann Arbor, 2016.
[2] W. Wang and D. Zhao, "Extracting Traffic Primitives Directly From Naturalistically Logged Data for Self-Driving Applications," in IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 1223-1229, April 2018.
[3] A. Mansur, P. Glynn, and D. Zhao. "An Accelerated Approach to Safely and Efficiently Test Pre-produced Autonomous Vehicles on Public Streets." IEEE Intelligent Transportation System Conference, 2018.
[4] D. Zhao, H. Lam, H. Peng, S. Bao, D. J. LeBlanc, K. Nobukawa, C. S. Pan, “ Accelerated evaluation of automated vehicles safety in lane-change scenarios based on importance sampling techniques,” IEEE Transactions on Intelligent Transportation Systems, 18(3), 595-607, 2017.    
In order to achieve the goal, the research can be divided into the following tasks: 
Task 1 (Functional requirement): 1/1/2019-8/31/2019
Define types of automated driving requirement and develop algorithms to link driving environments to the requirements

Task 2 (Evaluation): 2/1/2019-6/30/2019
Develop methodologies to evaluate the safety of AV based on human behavior data. 

Task 3 (Real-world Application): 7/1/2019-10/31/2019
Analyze the heterogeneous traffic and infrastructure data of Pittsburgh and recognize primitive traffic scenarios. 
Task 4 (Tools/Reports): 7/1/2019-12/31/2019
Develop the framework of building “Safe Zone” based on driving scenario Driving complexity by analyzing large heterogeneous data for smart cities.     
Deployment Plan
Task 1: Define the types of automated driving requirement and develop algorithms to link driving environments to the requirements

In this task, our team will develop an advanced robust unsupervised learning approach to learn the scenarios and features extracted from organized heterogeneous data. We will build a framework based on nonparametric Bayesian approach with the hierarchical Dirichlet process (HDP) and hidden Markov model (HMM). Specifically, this task will mainly deal with how to develop a dimensional-invariance nonparametric Bayesian learning approach to robustly extract scenarios and pattern from heterogeneous data. We can find unified traffic scenarios from our “TrafficNet” scenario-based traffic dataset. For benchmarking purpose, we evaluate the method using open driving data such as KITTI and Oxford driving datasets. Information entropy, accuracy, and precision can be used to quantify the scenarios complexity for vehicles. After that, the capacity of autonomous vehicles of certain automation level is defined with respect to these features. Basically, the information complexity for autonomous vehicles describes how an AV with certain sensor configuration would perform under real-world scenarios.

Task 2: Develop methodologies to evaluate the safety of AV based on human behavior data. 

The performance of AVs on public streets can be evaluated via the extracted primitive traffic scenarios and city transportation patterns. We have developed a methodology to test pre-produced AVs on public roads, where the ‘pre-produced’ term here implies that the capability of AVs in question is yet to be assessed. In addition, a robust stochastic model for traffic system and various recent advances in machine learning and adaptive design theories to construct a surrogate model for the capability are leveraged upon. As constructed, the accuracy of the surrogate model will gradually increase as more data are observed from real-world deployments. With this improved estimation accuracy, we could purposefully select a scenario to evaluate at each deployment iteration based on its estimated risk and potential learning gain. As a result, we could obtain the risk level for each extracted traffic scenario. 
Task 3: Analyze the heterogeneous traffic and infrastructure data of Pittsburgh and label the functional automated driving requirements

Kernel methods will be utilized with unsupervised learning to deal with the heterogeneity and learn traffic scenarios in terms of city transportation infrastructure and public traffic behaviors, akin traffic scenarios. Also, nonparametric Bayesian learning (NPBayes) [5] will be applied to extract vehicle behavior traffic primitives, which can be used to fundamentally model various types of real-world driving behaviors.

Statistical analysis will be performed to further evaluate the driving complexity based on the interaction of city transportation and AV behaviors. Several measures such as information entropy, accuracy and precision can be computed from traffic scenarios and primitives. Therefore, the local transportation of smart cities, which is characterized by traffic scenarios, and AV behaviors, which are represented using traffic primitives, are comparable under the same driving complexity measure. We will qualify the AV deployment in a certain zone only when the synthesis of equipped functionalities achieve the driving complexity (or AV functionality) required to safely navigate within the zone, or, in other words, are capable of handling the driving difficulty within the zone.

Task 4: Provide reports, tools, and datasets for the public and private partners 

We plan to visit the city hall regularly to better facilitate the need of the Department of Mobility of Pittsburgh to deploy automated vehicles. We will also talk to our partners including Uber, Bosch, and Toyota so that they can benefit from this project thus being motivated to invest in this project and engage with the Metra 21 program. The team plans to open the methods, tools, and datasets to build a good ecosystem and thus flourish the automated driving community. Tools and data will be shared in a designated website built by the PI.
Expected Accomplishments and Metrics
Deliverables in this project include 
1. A report to the city council regarding the safety benefits and risk levels of the mobility usage for different zones of Pittsburgh.
2. Data sets, tools, and an interactive website to label the functional driving requirements to deploy automated vehicles in typical areas of Pittsburgh
3. Novel methods of to synthesize heterogeneous datasets collected from sensors of autonomous vehicles and smart cities infrastructure.

The team will closely work with the Pittsburgh city and industrial partners to make sure the outputs of the project meet their needs. We will also communicate frequently with the Metro21 management team and actively participate related events to flourish the community and, of course, meet all requirements with high-quality research.    

Individuals Involved

Email Name Affiliation Role Position
dingzhao@cmu.edu Zhao, Ding CMU PI Faculty - Untenured, Tenure Track


Amount of UTC Funds Awarded
Total Project Budget (from all funding sources)


Type Name Uploaded
Data Management Plan Labeling_Road_for_FUnctional_Automated_Driving_Requirements.docx Feb. 19, 2019, 5:04 a.m.
Publication How to Evaluate Test Tracks for Self-Driving? A Quantitative Approach Sept. 24, 2019, 11:07 a.m.
Publication Active Learning for Risk-Sensitive Inverse Reinforcement Learning Sept. 24, 2019, 11:07 a.m.
Progress Report 291_Progress_Report_2019-09-30 Sept. 27, 2019, 8:35 a.m.
Progress Report 291_Progress_Report_2020-03-30 March 25, 2020, 8:51 p.m.
Progress Report 291_Progress_Report_2020-06-30 June 30, 2020, 6:55 p.m.
Final Report Final_Report_-_291.pdf July 7, 2020, 12:56 p.m.
Publication Functional Optimal Transport: Mapping Estimation and Domain Adaptation for Functional data March 21, 2021, 7:14 p.m.
Publication Highway exiting planner for automated vehicles using reinforcement learning March 21, 2021, 7:15 p.m.
Publication The Impact of Road Configuration in V2V-Based Cooperative Localization: Mathematical Analysis and Real-World Evaluation March 21, 2021, 7:16 p.m.

Match Sources

No match sources!


No partners!