Abstract
    This study addresses the challenge of effectively interpreting and navigating complex dynamic driving environments, using occupancy grids as a primary mode of spatial input representation. In this work, we present a novel approach that combines the strengths of reinforcement learning (RL) and transformer-based architectures [1], particularly focusing on leveraging the transformer encoder across spatial and temporal dimensions [2]-[5]. 
In the realm of autonomous systems and robotics, machine learning-based controller design can be broadly categorized into two types: supervised-learning methods and reinforcement learning methods. Supervised learning methods, such as imitation learning [6], require the collection of large amounts of data and corresponding expert behaviors. In contrast, reinforcement learning generates data and learns control strategies through repeated interaction with the environment. 
For processing image-based inputs, traditional RL approaches in robotics rely heavily on convolutional neural networks (CNNs) for spatial understanding [7]. The advent of transformer models in natural language processing (NLP) [8] and their subsequent adaptations to other domains suggest significant potential applications for these models in handling complex spatial data [2] in autonomous systems. From a temporal perspective, general reinforcement learning, particularly when based on Markov models, operates under the assumption that the future state is dependent solely on the current state and action. This approach, while considering sequences of states and actions, does not explicitly model the entire history or treat the control problem as an extensive time series [3] [9]. It focuses on the immediate transition without accounting for the full sequence of past behaviors.
To address the inherent limitations in traditional reinforcement learning approaches, our work will focus on the integration of advanced machine learning techniques in the field of autonomous systems. We will develop a transformer-based model that effectively utilizes occupancy grids [10]-[12], a standard tool in robotics for environment mapping and navigation, combined with a spatial attention mechanism. This combination provides a structured way to represent spatial information, enhancing the model's capabilities in understanding and navigating its environment. We will also explore temporal embedding techniques so that the model can understand and interpret dynamic scenarios and effectively track and respond to dynamic sequences.
Our research aims to make a significant advancement in autonomous navigation by innovatively applying both spatial and temporal transformers within a reinforcement learning framework. The first key innovation is the application of a spatial transformer in our model. This is a crucial development as it allows for more effective extraction of spatial information. This enhanced spatial understanding is vital for navigating complex and dynamic environments, allowing the model to make more accurate and reliable decisions based on a nuanced understanding of its surroundings. The second major innovation involves the incorporation of a temporal transformer. This element is vital for the model to make informed decisions based on a series of actions and states over time, predicting future scenarios and making strategic decisions that consider the trajectory of environmental changes.
We will conduct a comprehensive series of experiments to rigorously evaluate the performance of our transformer-based model in a variety of simulated environments. The purpose of these simulations will be to benchmark our model against traditional machine learning models commonly used in reinforcement learning. A primary objective of these experiments is to showcase the enhanced navigation efficiency and decision-making accuracy of our model, especially in comparison to conventional RL models. Our experiments are aimed at highlighting not just quantitative improvements in navigation and decision-making, but also qualitative advancements in traffic safety management. 
[1]: Vaswani, Ashish, et al. "Attention is all you need", Advances in Neural Information Processing Systems, 30 (2017).
[2]: Dosovitskiy, Alexey, et al. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale", arXiv preprint arXiv:2010.11929, (2020).
[3]: Arnab, Anurag, et al. "Vivit: A Video Vision Transformer”, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.	
[4]: Raghu, Maithra, et al. "Do Vision Transformers See Like Convolutional Neural Networks?", Advances in Neural Information Processing Systems, 34 (2021), pp. 12116-12128.
[5]: Han, Kai, et al. "A Survey on Vision Transformer", IEEE Transactions on Pattern Analysis and Machine Intelligence, 45:1 (2022), pp. 87-110.
[6]: Hussein, Ahmed, et al. "Imitation Learning: A Survey of Learning Methods", ACM Computing Surveys (CSUR), 50:2 (2017), pp. 1-35.
[7]: Yurtsever, Ekim, et al. "Integrating Deep Reinforcement Learning with Model-Based Path Planners for Automated Driving", 2020 IEEE Intelligent Vehicles Symposium (IV), 2020.
[8]: Devlin, Jacob, et al. "Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding", arXiv preprint arXiv:1810.04805 (2018).
[9]: Chen, Lili, et al. "Decision Transformer: Reinforcement Learning via Sequence Modeling", Advances in Neural Information Processing Systems, 34 (2021), pp. 15084-15097.
[10]: Mukadam, Mustafa, et al. "Tactical Decision Making for Lane Changing with Deep Reinforcement Learning", 31st Conference on Neural Information Processing Systems (NIPS), 2017.
[11]: Pfeiffer, Mark, et al. "A Data-driven Model for Interaction-aware Pedestrian Motion Prediction in Object Cluttered Environments", 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018.
[12]: Huegle, Maria, et al. "Dynamic Input for Deep Reinforcement Learning in Autonomous Driving", 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019.    
    Description
    
    Timeline
    
    Strategic Description / RD&T
    In advancing autonomous vehicle technology, our primary objective is to significantly enhance vehicle safety with a novel vehicle planning and control strategy. By integrating occupancy grids within a transformer-based reinforcement learning framework, our focus is on developing a model that excels in detection and response to environmental hazards in complex traffic scenarios. This aligns with the U.S. DOT focus on safe technology on page 19, transformative novel automation on page 50 and 60, and AI and machine learning on page 57 et al of the 2022-2026 Research, Development, and Technology Strategic Plan.
The model will be finely tuned to prioritize safety, adeptly handling scenarios that test its ability to prevent collisions and adapt to sudden environmental changes. This approach aims to establish an autonomous driving system that not only navigates efficiently but also sets a new standard in vehicular safety, contributing to the reduction of road accidents and ensuring safer roadways.
In conclusion, our research plan aligns with the research priorities of data-driven system safety, with a particular emphasis on safe driving technology development. We strive to achieve safer and more human-friendly autonomous driving, contributing to the advancement of autonomous vehicle technology and its successful integration into the transportation ecosystem.
    Deployment Plan
    The research imagined for the project will not immediately lead to deployment.  However, we have identified project advisors from government and industry. These groups will participate in review meetings with our team at least twice during the project and provide comments and advice covering both technical considerations and issues related to the relevance of the research and the potential for its use in future production vehicles and vehicle automation systems.
    Expected Outcomes/Impacts
    This research anticipates several significant outcomes and impacts in the field of autonomous systems, robotics, and machine learning, particularly through the integration of reinforcement learning with transformer-based architectures in understanding and navigating spatial-temporal environments.
A primary outcome of this research is the advancement in spatial-temporal understanding within autonomous driving systems. Leveraging transformers, renowned for their effectiveness in sequence-to-sequence tasks, in combination with occupancy grids, the model is expected to demonstrate superior performance in interpreting complex environmental data and navigating in complicated traffic scenarios. This enhanced understanding is vital for critical tasks such as autonomous navigation, environment mapping, and dynamic obstacle avoidance, with a significant emphasis on safety. To illustrate the practical applications of the research, we will present case studies and demonstrations of the model in various scenarios, including simulations of autonomous navigation, obstacle avoidance, and decision-making, with safety as a central evaluation parameter. 
Furthermore, beyond its immediate application, the research contributes to the broader field of machine learning. It offers insights into the capabilities and limitations of current transformer models when applied to novel domains. This contribution is especially valuable in the ongoing discourse in deep learning research, shedding light on the intersection of reinforcement learning, neural network architectures, and safety in autonomous systems. The emphasis on safety in this research is expected to set new benchmarks for future developments, driving the creation of more reliable and secure autonomous systems.
    Expected Outputs
    1 The findings, methodologies, and analyses will be comprehensively documented in a research paper. This paper will detail the architectural nuances of the transformer model, the implementation of reinforcement learning techniques, the use of occupancy grids, and the experimental setups and results. It will serve as a valuable resource for researchers and practitioners in related fields.
2 To facilitate further research and allow for replication and extension of the study, the source code of the model implementation, along with detailed guides and documentation, will be made available. This open-source contribution will support the academic and research community in exploring and advancing this domain.
3 To showcase the practical applications of our research and facilitate its testing and refinement, a series of case studies along with tailored simulation environments will be provided.
    TRID
    This research aims to revolutionize autonomous systems, robotics, and machine learning by integrating reinforcement learning with transformer architectures for enhanced spatial-temporal navigation and understanding. The primary innovation lies in using transformers, known for their sequence-to-sequence proficiency, combined with occupancy grids, to improve complex environmental data interpretation. This is crucial for autonomous navigation, environment mapping, and dynamic obstacle avoidance, with a strong focus on safety.
We selected four search terms that seemed to capture the relevant information in the TriD projects database: “vehicle control machine learning” (32 records), “reinforcement learning vehicle” (19 records) and “deep learning vehicle” (41 records).  The search results are shown in the attached document.  After discarding projects that were related to infrastructure, traffic control, maintenance, or strictly perception (marked with a dash in the search results) and after identifying duplicate project among the four searches, we found twenty six projects of potential relevance (marked A-Z in the search results) and identified twelve as being the most relevant (marked D, G, J, K, L, N, R, S, T, AA, BB, CC).
D Automated Lane Change and Robust Safety. Covers only the lane change maneuver plan, not low level control, in a CAV setting with safety and fault/failure issues.
G Lane Changing of Autonomous Vehicles in Mixed Traffic Environments: A Reinforcement Learning Approach.  Trajectory planning and tracking for lane change maneuver.
J Learning to Drive Autonomous.  RL based longitudinal and lateral control for cooperative adaptive cruise control in an exclusive transit bus lane.
K Development of machine-learning models for autonomous vehicle decisions on weaving sections of freeway ramps. Study driver decision making before changing lanes into/out of a weaving section and implement AV lane change decision and maneuver algorithms.
L Development of a Data-Driven Optimal Controller Based on Adaptive Dynamic Programming.  Controllers for platoon dynamics and car following.
N Vision-based Navigation of Autonomous Vehicle in Roadway Environments With Unexpected Hazards. End-to-end DNN driving augmented with object detection and semantic segmentation.
R Development of AI-based and control-based systems for safe and efficient operations of connected and autonomous vehicles.  Local and cooperative sensing with DRL for an end-to-end automation in variable traffic densities, including mitigating stop and go issues and collision avoidance issues.
S Accelerated Training for Connected and Automated Vehicles Based on Adaptive Evaluation Method.  A mixed naturalistic data and RL-based training mechanism to improving training efficiency.
T  Promoting CAV Deployment by Enhancing the Perception Phase of the Autonomous Driving Using Explainable AI. Developing an explainable end-to-end driving controller using deep learning.  Does not deal with planning for complicated scenarios.
AA Control of Connected and Autonomous Vehicle’s for Congestion Reduction in Mixed Traffic: A Learning-Based Approach. This work focuses and traffic congestion and throughput, and is primarily concerned with velocity and flow.
BB Enhancing Traffic Safety and Connectivity: A Data Driven Multi-step-Ahead Vehicle Headway Prediction Leveraging High-Resolution Vehicle Trajectories. Similar to AA, this work focuses on traffic flow based on headway prediction and control
CC Hierarchical Decision Making and Control in RL-based Autonomous Driving for Improved Safety in Complex Traffic Scenarios.  This PI’s 2013-2014 project, addressing a similar problem in complex scenario behavior generation and vehicle control but using a different approach.
    Individuals Involved
    
        
            
                | Email | Name | Affiliation | Role | Position | 
        
        
            
                
                    | ozguner.1@osu.edu | Ozguner, Umit | The Ohio State University | Other | Other | 
            
                
                    | redmill.1@osu.edu | Redmill, Keith | The Ohio State University | PI | Faculty - Research/Systems | 
            
                
                    | yurtsever.2@osu.edu | Yurtsever, Ekim | The Ohio State University | Co-PI | Other | 
            
    
    Budget
    
    Amount of UTC Funds Awarded
 $
    
Total Project Budget (from all funding sources)
 $158814.00
    
Documents
    
        
    
    Match Sources
    
        No match sources!
    
    Partners
    
        
            
                
                    | Name | Type | 
            
            
                
                    
                        | DriveOhio | Deployment Partner_ Deployment Partner_ |