#469 Generating Safety-Critical Driving Scenarios for the Design of the CAV Proving-Ground - using domain knowledge, causality, and large language models

Principal Investigator
Ding Zhao
Start Date
July 1, 2023
End Date
June 30, 2024
Project Type
Research Advanced
Grant Program
US DOT BIL, Safety21, 2023 - 2028 (4811)
Grant Cycle
Safety21 : 23-24


Connected Autonomous Vehicles (CAVs) have witnessed significant advancements in recent years, largely due to the progress in machine learning-enabled sensing and decision-making algorithms. A paramount challenge for their widespread deployment in the real world, however, is safety evaluation. While most existing driving systems are trained and evaluated using naturalistic scenarios from daily life or heuristically-generated adversarial ones, safety-critical  scenarios are extremely rare considering the sheer number of cars on the road. This leads to very imbalanced data and high cost for data collection. Consequently, methods that can generate realistic risky scenarios become essential for safety assessment and cost reduction.

This research aims to enhance the testing procedures for Connected and Autonomous Vehicles by generating critical driving scenarios. These scenarios play a pivotal role in ensuring the safety and reliability of CAVs before they are deployed on roads. We plan to incorporate domain-specific knowledge about driving and road conditions, draw on causality inferences to comprehend the sequences of events leading to critical situations, and utilize the reasoning abilities of large language models to produce realistic and diverse driving scenarios. By incorporating these components, we aim to develop a holistic testing framework that presents a more accurate depiction of real-world driving challenges for CAVs. This will not only boost the robustness of CAV testing but also guide the design of proving grounds.

Specifically, we will collaborate with PennSTART to implement our scenario generation approach in designing a proving ground for CAVs in Pennsylvania. Additionally, we'll harness augmented reality technologies to amplify the capabilities of the physical infrastructure by integrating virtual road users, including wheelchair users and the visually impaired. Our ultimate objective is to make a tangible real-world impact, potentially through technology transfer or the launch of a startup.    


Strategic Description / RD&T
"Transportation is not just about how people or things move from one place to another; it's also about how we connect, build, consume, work, and support communities." [RD&T-p2] Connected and Autonomous Vehicles play a critical role in ensuring efficiency, convenience, and the embodiment of equity. Autonomous vehicles, as one of the key trending technologies, have the potential to reshape transportation for a safer, greener, and more human-centric community. However, it is crucial to guarantee safety when deploying end-to-end highly intelligent autonomous vehicles.

Given the complex and diverse nature of autonomous vehicles and human-machine interaction tasks, data-driven approaches are utilized to address these intricate situations. As AVs will operate physically and interact with humans in a crowded environment, safety becomes one of the most important considerations during deployment. We will adhere to the guidance of RD&T's safety requirements, encompassing "safe design, safety data, and safe technology" strategy [RD&T-p18], to ensure operational safety. Additionally, we will "identify and implement strategies to enhance safety for vulnerable road users" [RD&T-p19], particularly for people with disabilities, aiming to achieve "safety equity" [RD&T-p23]. In summary, this project aims to directly address safety concerns associated with identifying key driving scenarios for evaluation and design of proving grounds, such as digital twins, in autonomous driving. The ultimate goal is to establish a trustworthy, convenient, and inclusive environment for all.
Deployment Plan
October - December 2023: Simulation Environment Setup and Theoretical Framework

October: Initiate the design and configuration of the simulation environment to closely emulate transportation, considering traffic flows and scenarios. Complete the setup of the simulation environment, ensuring an accurate representation of human behaviors and potential challenges. Concurrently, formulate the theoretical framework outlining safety constraints, user interaction protocols, and adaptive navigation strategies.
November: Begin the design of advanced algorithms for real-time decision-making, dynamic behavior, and navigation in dynamic environments.
December: Continue algorithm design, incorporating insights from safe reinforcement learning and dynamic adjustment methodologies.

January 2023 - February 2024: Algorithm Design and Integration
January: Conclude algorithm design and initiate their integration into the simulation environment for testing.
February: Commence rigorous testing of the integrated algorithms within the simulation environment, ensuring adaptability and adherence to safety constraints.
March: Conduct comprehensive simulation-based verification to evaluate vehicles’ behavior across diverse scenarios, refining algorithms based on simulation outcomes.

April - June 2024: Generating Safety-critical Scenarios in Simulations
April: Use data-driven generative models to create safety-criticals by reproducing and searching interested scenarios from the database and then modifying the scenarios to fit the user’s preference.
May: Use causality-based algorithms to help improve the generation capability of scenario generative models.
June: Use a large language model to incorporate more human knowledge and common senses to the generative model for more realistic scenario generation.

July - September 2024: Real-World Evaluation with Generated Safety-critical Scenarios
July: Use generated scenarios to train safe algorithms for connected autonomous vehicles.
August: Analyze the results of the simulation evaluation and improve the results of the generation algorithm.
September: Deploy the critical scenarios in the real world and analyze the feedback results of the evaluation of different connected autonomous vehicles. 


Artificial Intelligence (AI) has been widely used in software products such as facial recognition} and voice-print verification. But as AI continues to grow and research has expanded to physical products like autonomous vehicles (AVs), the question of safety is now at the forefront of this cutting-edge field. The reason why intelligent physical systems are much harder to be deployed is that our world is complicated and long-tailed, causing too much uncertainty to the intelligent agents. The driving skills would take several months to learn, even for us humans, due to the complex traffic scenarios. Therefore, the AVs should be trained and evaluated on lots of different scenarios to demonstrate their safety and capability of dealing with diverse situations. 

According to the 2020 disengagement report from the California Department of Motor Vehicle, there were at least five companies (Waymo, Cruise, AutoX, Pony.AI, Argo.AI) that made their AVs drive more than 10,000 miles without disengagement. These results are usually recorded in normal driving scenarios without risky situations.  It is a great achievement that current AVs are successful in normal cases trained by hundreds of millions of miles of training. However, we are still not sure whether AVs have enough safety and robustness in distinct scenarios. 
For example, when one AV is driving on the road, a kid suddenly runs into the drive lane chasing a ball. This emergency case leaves the AV a very short time to react, and even a subtle misbehave could cause vital damage. This kind of situation is named safety-critical scenarios and is usually extremely rare in the normal driving case.

To efficiently evaluate the safety of AVs, more and more people from the government, industry, and academia start to focus on the generation of safety-critical scenarios. National Highway Traffic Safety Administration (NHTSA), a government agency of the United States Department of Transportation, summarized driving system testable cases and pre-crash scenarios as well as published a review of simulation frameworks and standards related to driving scenarios. Waymo, one of the leading companies in autonomous driving, released a safety report to illustrate how they reconstruct fatal crashes in simulations from collected data. Meanwhile, academic research aims to generate safety-critical scenarios surges by developing methods in Deep Generative Models (DGMs) and adversarial attack techniques. With the upcoming massive deployment of self-driving cars, an overview that systematically summarizes existing works in safety-critical scenario generation is urgently demanded.

The specific research objectives encompass the following aspects:

1. Designing Scenario Generation Algorithms with Domain Knowledge: The research aims to conceptualize and construct a diverse array of scenario generation method with different representations of domain knowledge, including tree and graph structure, text description, numerical constraints. Incorporating domain knowledge dramatically improves the quality of generated scenarios for safety evaluation.

2. Designing Causality-based Scenario Generation: A foundational element of this research involves the establishment of a causality-base generation framework. Understanding the cause and effect relationships between objects and events is important for making safe decisions and generating scenarios to evaluate the safety and decision-making algorithm.

3. Designing large language model-based Scenario Generation: Recently, large language models (LLMs) show ground-breaking capability of text generation and dialog conversation. It has been shown that LLMs learn most common senses with text representation, which is useful for evaluating the safety of connected autonomous vehicles from a human-like perspective.

Methodologies and Intelligent Merits

In our pursuit of building CAV testing and proving-ground design, we draw inspiration and insights from the current advancements in the field of causal inference and large language models. These developments serve as valuable stepping stones toward our overarching goal of creating intelligent vehicles that seamlessly adapt to varying safety conditions while enhancing passenger experiences within connected autonomous vehicle environments.

The concept of Domain Knowledge with Tree Structure Representation, as evidenced by the works in scenario generation, aligns harmoniously with our mission. This framework contains two stages to separate the learning of data distribution of real-world driving scenes and the searching of adversarial scenes with knowledge as constraints. In the training stage, we train a tree-structured generative model that parametrizes nodes and edges of trees with a neural network to learn the representation of structured data. In the generation stage, explicit knowledge is applied to different levels of the learned tree model to achieve knowledge-guided generation for reducing the performance of victim algorithms.

Furthermore, Causal Autoregressive Flow for Scenario Generation presents an intriguing avenue for us to explore. We model the causality as a directed acyclic graph (DAG) named Causal Graph (CG). To facilitate CG in the traffic scenario, we propose another Behavioral Graph (BG) for representing the interaction between objects in scenarios. The graphical representation of both graphs makes it possible to use the BG to unearth the causality given by CG. Based on BG, we propose the first generative model that integrates causality into the graph generation task. Specifically, we propose two types of causal masks -- Causal Order Masks (COM) that modify the node order for node generation, and Causal Visibility masks (CVM) that remove irrelevant information for edge generation.

Additionally, Large Language Models for Scenario Generation resonates strongly with our commitment to safe and efficient evaluation. The text input and output of LLMs provide human understandable information for better analysis of the evaluation results and better controllable preference setting. The common sense of safety encoded in LLMs makes the generation process more efficient and adaptive. The text input of LLMs also makes it possible for users to directly communicate with the model to control the content and risk level of generated scenarios for more flexible evaluation.

In conclusion, our proposed work is intricately woven into the fabric of current advancements in domain knowledge and connected autonomous vehicle evaluation. By building upon these foundations, we aspire to develop a controllable and efficient evaluation platform that transcend the limitations of conventional automation. The generated scenarios will embody the core values of safety and human-centricity while redefining passenger experiences at autonomous driving, ushering in a new era of efficient, adaptable, and empathetic service delivery.

Detailed Deployment Plan 

Task 1: Simulation Environment Setup
Create a realistic simulation environment that accounts for passenger flows, road layouts, and various scenarios. Incorporate models for human behaviors, environmental factors, and potential challenges to serve as a testing ground for algorithm development and validation.

Task 2: Theoretical Framework Formulation and Algorithm Design
Develop a comprehensive theoretical framework defining safety constraints, traffic modeling, and human driver preference. Build advanced algorithms by integrating principles from safe reinforcement learning, deep generative models, causal inference, and large language models to facilitate controllable and efficient scenario generation methods.

Task 3: Algorithm Integration, Testing, and Verification
Integrate algorithms into the simulation environment and rigorously test their performance, adaptability, and safety compliance. Conduct thorough simulation-based verification to evaluate the behavior of several autonomous driving algorithms across diverse scenarios, iteratively refining algorithms to ensure the safety.

Task 4: Human-friend interface with Large Language Models
To make the critical scenario generation platform easy to operate for broad users, we will use LLMs as the language interface to control the preferences and content of generated scenarios. The algorithm will follow the instructions of human users and generate precise scenarios to satisfy the requirements.

Task 5: Documentation, Reporting, and Iterative Refinement
Compile a comprehensive report that details the entire deployment process, encompassing validation through simulation, real-world pilot deployment, performance evaluation, encountered challenges, and lessons learned. Present findings, successes, and recommendations to stakeholders, researchers, and industry professionals. Utilize insights gained from real-world testing to refine algorithms, enhance adaptability, and optimize user interactions.
Expected Outcomes/Impacts
The primary anticipated outcome of this research is to establish a proof-of-concept for deploying safety-critical scenario generation methods for CAV testing and proving-ground design. Through collaboration with the design of proving-ground, the project holds the potential to formulate new transportation policies, regulations, and practices, thereby contributing to a safer, more reliable, and efficient autonomous driving environment. We will work closely with PennStart to design the physical infrastructure for the CAV proving ground at Pennsylvania and also implement the scenarios with augmented reality.
Expected Outputs
A digital twin platform of connected autonomous vehicle testing ground based on an open source platform.
Algorithms for generating safety-critical scenarios in digital twins, including Domain Knowledge with Tree Structure Representation, Causal Autoregressive Flow for Scenario Generation, and Large Language Models for Scenario Generation.
Several designs of proving grounds for PennStart.
Patent filings and potential tech transfer.
As indicated in the TRIS database, this project would mark the first endeavor aimed at studying the design of the proving ground for CAVS with realistic critical scenario generation in Autonomous Vehicles. This project will directly complement ten existing safety-focused projects supported by Mobility 21 and Traffic 21 over the past years, including one led by the PI Zhao titled "Towards a Smart, Safe, and Sustainable Sidewalk: A Quantitative Analysis on How Sidewalk Infrastructure Affects Personal Delivery Devices." The project will introduce a virtual digital twin as a shared platform for the UTC community, facilitating a comprehensive understanding of the requirements and opportunities for evaluating the safety. 

Individuals Involved

Email Name Affiliation Role Position
dingzhao@cmu.edu Zhao, Ding CMU PI Faculty - Untenured, Tenure Track


Amount of UTC Funds Awarded
Total Project Budget (from all funding sources)


Type Name Uploaded
Data Management Plan Data_Management_Plan-2.pdf Oct. 10, 2023, 6:49 p.m.
Progress Report 469_Progress_Report_2024-03-31 March 18, 2024, 11:51 a.m.

Match Sources

No match sources!


Name Type
Google Deployment Partner Deployment Partner
PennSTART Deployment Partner Deployment Partner
City of Pittsburgh Equity Partner Equity Partner