Login

Project

#597 Deep Reinforcement Learning Based Driving Strategies Bootstrapped by Suboptimal Policies


Principal Investigator
Keith Redmill
Status
Active
Start Date
July 1, 2025
End Date
June 30, 2026
Project Type
Research Advanced
Grant Program
US DOT BIL, Safety21, 2023 - 2028 (4811)
Grant Cycle
Safety21 : 25-26
Visibility
Public

Abstract

Designing autonomous driving systems requires a modular approach. This includes vehicle sensing, perception, localization, scene representation, path planning, decision-making, and vehicle control [1]. Our study focuses on decision-making in complex driving scenarios. Traditional methods often use model-based, heuristic, or rule-based controllers [2]-[7].  Heuristic and rule-based controllers rely on human expertise to handle different driving situations. They include safety measures and rules to avoid deadlocks. However, these methods face limitations as designing rules to cover every possible driving scenario is highly challenging and resource intensive.

Reinforcement learning (RL), on the other hand, offers a self-supervised approach, learning control policies through interaction with the environment [8]-[12]. With advancements in neural networks, deep reinforcement learning (DRL) has shown the ability to efficiently map state-action transitions in high-dimensional environments. 

However, a persistent challenge in RL is achieving sufficient exploration of the environment. Research has shown that RL agents require extensive exploration to accurately map state-action values and discover high-reward states [13][14]. Even with mechanisms like stochastic DRL algorithms [15][16] that aim to embed exploration throughout the training process, agents often fail to discover optimal policies in complex scenarios. As a result, RL agents struggle with delayed rewards, where limited and inefficient exploration can result in sub-optimal performance.

Driving environments encompass a range of challenging scenarios, such as roundabouts, overtaking, merging, intersections, and highway exits. These scenarios involve delayed rewards, which discourage exploration and complicate the RL learning process. Examples of delayed rewards include the cumulative benefit of using a free lane to achieve optimal cruising speed, waiting for the right moment to enter a roundabout safely, or merging into traffic while avoiding collisions—even if it requires temporarily stopping in the lane. Failure to explore and identify these states during training can lead to sub-optimal policies or even unsafe behaviors, such as remaining stuck behind a slow vehicle, rushing into a roundabout prematurely, or misjudging merges in traffic. 

In this study, we propose to address these limitations by combining the stability of rule-based sub-optimal controllers with the adaptive learning capabilities of DRL [17]-[20].  We aim to utilize sub-optimal policies to bootstrap efficient training during the RL learning process and to overcome exploration barriers. Specifically, we leverage a rule-based controller to prevent deadlocks and guide RL agents toward promising states, improving their ability to learn optimal driving strategies. This approach not only improves collision avoidance but also enhances training efficiency by guiding agents through challenging scenarios where rewards are delayed. By integrating suboptimal strategies, our method strikes a balance between simplicity and adaptability, enabling agents to learn more effectively.

Our experiments are conducted in a complex driving scenario where the driving policy must optimize long-term rewards. The key contributions of this study are summarized as follows:
1 The sub-optimal policies: We design sub-optimal controllers capable of making driving decisions under reasonable motivations.
2 Guidance for RL agents: The sub-optimal controllers guide RL agent toward exploring states associated with delayed rewards, improving exploration and learning efficiency.
3 Validation in complex driving scenarios: We validate the effectiveness of our method in complex driving scenarios, demonstrating its ability to optimize long-term rewards and handle challenging conditions effectively.

[1]: Yurtsever, Ekim, et al. "A survey of autonomous driving: Common practices and emerging technologies." IEEE access 8 (2020): 58443-58469.
[2]: Bevly, David, et al. "Lane change and merge maneuvers for connected and automated vehicles: A survey." IEEE Transactions on Intelligent Vehicles 1.1 (2016): 105-120.
[3]: Hatipoglu, Cem, Umit Ozguner, and Keith A. Redmill. "Automated lane change controller design." IEEE transactions on intelligent transportation systems 4.1 (2003): 13-22.
[4]: Chandler, Robert E., Robert Herman, and Elliott W. Montroll. "Traffic dynamics: studies in car following." Operations research 6.2 (1958): 165-184.
[5]: Gipps, Peter G. "A behavioural car-following model for computer simulation." Transportation research part B: methodological 15.2 (1981): 105-111.
[6]: Treiber, Martin, Ansgar Hennecke, and Dirk Helbing. "Congested traffic states in empirical observations and microscopic simulations." Physical review E 62.2 (2000): 1805.
[7]: Kesting, Arne, Martin Treiber, and Dirk Helbing. "General lane-changing model MOBIL for car-following models." Transportation Research Record 1999.1 (2007): 86-94.
[8]: Ngai, Daniel Chi Kit, and Nelson Hon Ching Yung. "A multiple-goal reinforcement learning method for complex vehicle overtaking maneuvers." IEEE Transactions on Intelligent Transportation Systems 12.2 (2011): 509-522.
[9]: Xu, Xin, et al. "A reinforcement learning approach to autonomous decision making of intelligent vehicles on highways." IEEE Transactions on Systems, Man, and Cybernetics: Systems 50.10 (2018): 3884-3897.
[10]: Peng, Jiankun, et al. "An integrated model for autonomous speed and lane change decision-making based on deep reinforcement learning." IEEE Transactions on Intelligent Transportation Systems 23.11 (2022): 21848-21860.
[11]: Wang, Pin, Ching-Yao Chan, and Arnaud de La Fortelle. "A reinforcement learning based approach for automated lane change maneuvers." 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2018.
[12]: Shi, Tianyu, et al. "Driving decision and control for automated lane change behavior based on deep reinforcement learning." 2019 IEEE intelligent transportation systems conference (ITSC). IEEE, 2019.
[13]: Hu, Edward S., et al. "Planning goals for exploration." arXiv preprint arXiv:2303.13002 (2023).
[14]: Ladosz, Pawel, et al. "Exploration in deep reinforcement learning: A survey." Information Fusion 85 (2022): 1-22.
[15]: Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." International conference on machine learning. PMLR, 2018.
[16]: Fujimoto, Scott, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." International conference on machine learning. PMLR, 2018.
[17]: Huang, Zhiyu, Jingda Wu, and Chen Lv. "Efficient deep reinforcement learning with imitative expert priors for autonomous driving." IEEE Transactions on Neural Networks and Learning Systems 34.10 (2022): 7391-7403.
[18]: Wu, Jingda, et al. "Prioritized experience-based reinforcement learning with human guidance for autonomous driving." IEEE Transactions on Neural Networks and Learning Systems 35.1 (2022): 855-869.
[19]: Kendall, Alex, et al. "Learning to drive in a day." 2019 international conference on robotics and automation (ICRA). IEEE, 2019.
[20]: Yurtsever, Ekim, et al. "Integrating deep reinforcement learning with model-based path planners for automated driving." 2020 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2020.    
Description

    
Timeline

    
Strategic Description / RD&T
Section left blank until USDOT’s new priorities and RD&T strategic goals are available in Spring 2026.
Deployment Plan
The research imagined for the project will not immediately lead to deployment.  However, we have identified project advisors from government and industry. These groups will participate in review meetings with our team at least twice during the project and provide comments and advice covering both technical considerations and issues related to the relevance of the research and the potential for its use in future production vehicles and vehicle automation systems.

July – September 2025: Quarterly report, Participate in Safety21 meetings.

October – December 2025: Quarterly report, Inform deployment partners of research progress and solicit feedback, Participate in Safety21 meetings.

January – March 2026: Quarterly report, Submit conference paper, Participate in Safety21 meetings.

April – June 2026: Quarterly report, Submit journal paper, Prepare final report, Participate in Safety21 meetings, Inform deployment partners of research progress and solicit feedback,Present in conference.
Expected Outcomes/Impacts
This research aims to produce significant advancements in the field of autonomous driving by combining rule-based strategies with reinforcement learning to improve decision-making in driving scenarios. The key outcome is a hybrid approach that enhances exploration and long-term reward optimization, addressing challenges faced by conventional reinforcement learning methods. By incorporating sub-optimal rule-based controllers, the framework facilitates safer and more efficient driving strategies, particularly in tasks such as overtaking and collision avoidance.

The results of this study are expected to have practical applications in autonomous vehicle systems, providing a robust solution for navigation and decision-making. The proposed approach aims to improve vehicle safety, reliability, and efficiency, offering a scalable strategy that can adapt to various traffic scenarios. Simulations and case studies will demonstrate the effectiveness of this method in real-world-inspired environments, highlighting its ability to deliver practical improvements to autonomous driving performance.

Beyond its application in complex driving scenarios, this research contributes to the broader understanding of how reinforcement learning can be enhanced through the integration of human driving logic. By focusing on the intersection of rule-based strategies and machine learning, the work has the potential to set new standards in designing safer and more reliable transportation systems.
Expected Outputs
This research project will produce novel methods, open-source software, publications about the proposed hybrid framework that combines rule-based driving strategies with reinforcement learning for robust automated driving.

1 The findings, methodologies, and analysis of this research will be thoroughly documented in a research paper. The paper will describe the hybrid framework that combines rule-based strategies with reinforcement learning. It will also explain the methods used to address exploration challenges and provide experimental results. This will serve as a useful resource for studying autonomous driving systems and reinforcement learning.

2 To promote further research and enable replication, the project will include the release of source code for the developed models and methods. Comprehensive documentation and user guides will accompany the code, making it accessible to the research community and supporting future advancements in driving decision-making systems.

3 To demonstrate the practical relevance of this research, case study and customized simulation environment will be developed. These will illustrate the application of the hybrid framework in scenarios such as lane changing, efficient navigation, and collision avoidance, providing a foundation for testing and refining autonomous driving strategies in realistic scenarios.
TRID
This study focuses on improving the synergy between autonomous systems, robotics, and machine learning by introducing a novel integration of reinforcement learning and sub-optimal rule-based strategies. By embedding sub-optimal driving strategies into reinforcement learning models, we aim to enhance the ability to interpret complex traffic environments. This methodology is designed to optimize decision-making processes in scenarios requiring efficient navigation, smooth lane changes, and improved safety in driving conditions.

We selected four search terms that seemed to capture the relevant information in the TriD projects database: “vehicle control machine learning” (39 records), “reinforcement learning vehicle” (23 records) and “deep learning vehicle” (66 records).  The search results are shown in the attached document.  After discarding projects that were related to infrastructure, traffic control, maintenance, or strictly perception (marked with a dash in the search results) and after identifying duplicate project among the four searches, we found thirty five projects of potential relevance (marked A-Z and AA-II in the search results) and identified twelve as being the most relevant (marked D, G, J, K, L, N, R, S, T, AA, BB, CC, EE, FF).

D Automated Lane Change and Robust Safety. This project improves safety and efficiency in autonomous lane changing combining reinforcement learning with control barrier functions and robust control techniques to handle uncertainties and time delays.
G Lane Changing of Autonomous Vehicles in Mixed Traffic Environments: A Reinforcement Learning Approach. This project focus on trajectory tracking and robust control.
J Learning to Drive Autonomously.  This project develops adaptive learning algorithms for CAVs to improve CACC, energy efficiency, and combined control for lane changing and path following, considering uncertainties and human driver reaction times.
K Online Competitive Algorithms and Reinforcement Learning for Traffic Management.
This project uses reinforcement learning and batch scheduling to reduce traffic delays and inefficiencies at intersections.
L Development of a Data-Driven Optimal Controller Based on Adaptive Dynamic Programming.  This project develops an adaptive optimal control method for CAVs in mixed platoons using V2V communication and reinforcement learning.
N Vision-based Navigation of Autonomous Vehicle in Roadway Environments with Unexpected Hazards. This project improves autonomous vehicle navigation using DNNs with hazard detection and semantic segmentation.
R Chance-Constrained Collision Avoidance Based Motion Planning in a Cooperative Perception Framework. This project enhances motion planning for autonomous vehicles using cooperative perception and uncertainty modeling to address occlusions and data attacks.
S 5G-enabled Safe and Robust Deep Multi-agent Reinforcement Learning Framework for CAV Coordination. This project advances CAV safety and efficiency using 5G-enabled V2X communication and multi-agent reinforcement learning for robust decision-making and control in complex scenarios.
T Explaining Deep Learning Decisions to Improve Cognitive Trust of Autonomous Driving. This project builds public trust in autonomous vehicles by clarifying deep learning decisions using explainable AI.
AA Control of Connected and Autonomous Vehicle’s for Congestion Reduction in Mixed Traffic: A Learning-Based Approach. This project develops learning-based algorithm to reduce traffic congestion, optimize trajectories, and improve safety for autonomous vehicles.
BB Enhancing Traffic Safety and Connectivity: A Data Driven Multi-step-Ahead Vehicle Headway Prediction Leveraging High-Resolution Vehicle Trajectories. This project uses deep learning to predict vehicle headway, addressing driver and vehicle variability with high-resolution data. 
CC Hierarchical Decision Making and Control in RL-based Autonomous Driving for Improved Safety in Complex Traffic Scenarios.  This PI’s 2013-2014 project, addressing a similar problem in complex scenario behavior generation and vehicle control but using a different approach.
EE Physics-Informed Learning and Control of Connected and Autonomous Vehicles for Congestion Reduction.
FF Explaining Deep Learning Decisions to Improve Cognitive Trust of Autonomous Driving

Individuals Involved

Email Name Affiliation Role Position
redmill.1@osu.edu Redmill, Keith The Ohio State University PI Faculty - Research/Systems
yurtsever.2@osu.edu Yurtsever, Ekim The Ohio State University Co-PI Other

Budget

Amount of UTC Funds Awarded
$83927.00
Total Project Budget (from all funding sources)
$155789.00

Documents

Type Name Uploaded

Match Sources

No match sources!

Partners

Name Type
DriveOHIO Deployment Partner_ Deployment Partner_