Abstract
    Traffic congestion costs American cities tens of billions of dollars per year, not to mention its negative impact on the environment or people’s mental health. Novel Markov game models and advanced reinforcement learning algorithms hold the promise of drastically alleviating congestion through dynamic coordination of traffic signals and adaptive techniques to dynamically re-route traffic. This project involves a collaboration with Econolite, a leading provider of traffic management systems.    
    Description
    Traffic congestion in cities can in part be attributed to inefficient coordination across intersections. While solutions exist that attempt to locally optimize the operation of traffic signals, coordinating these decisions across large numbers of intersections could lead to significant reductions in congestion. Effectively doing so requires however moving beyond traditional techniques and developing adaptive models that reflect the complex nature of traffic, including the behavior of diverse users – not just car drivers but also pedestrians and bikes. We propose to develop Markov game models and multi-agent reinforcement learning (MARL) algorithms to enhance the multi-agent coordination in transportation systems. In contrast to prior work in this area, we will evaluate our techniques on large-scale, high-fidelity models to be developed in collaboration with Econolite. MARL has shown great promise in dealing with challenging sequential decision-making problems such as train scheduling and cyber defense. Many problems in transportation systems, including traffic light control and dynamic traffic re-routing, naturally involve making a series of decisions over time to adapt to traffic conditions, which makes MARL particularly suitable to the problems. We plan to model the problems as Markov games, where a traffic light controller at a single intersection is modeled as an agent. These agents need to coordinate with each other to achieve global efficiency. We seek to combine deep MARL approaches based on graph neural networks with distributed hierarchical training to enable large-scale learning that incorporates problem structures intrinsic to transportation (e.g., road networks). We will leverage our past experience in designing deep RL and MARL algorithms that lead to interpretable strategies (Topin et al. 2021) or strategies robust to uncertainties (Xu et al. 2021, Li et al. 2019), that can train a large number of agents (Long et al. 2020), and the applications of MARL for human patrol planning to combat poaching (Wang et al 2019), cooperative sensor communication (Wu et al. 2021). 
Our work will try to address three critical research questions that have thus far been insufficiently explored by the literature on deep learning for transportation: (1) How do we automatically learn optimal yet scalable strategies for large-scale multi-agent coordination? (2) How do we better incorporate intrinsic problem structure and environmental uncertainty into our algorithms? (3) How do we make the resulting systems more usable by accounting for human behavior and preferences? 
Much work in designing algorithms for transportation has relied on simulated testbeds, yet these models have often been very coarse. To pave the way to real-world deployment, we will be working with Econolite, a provider of smart transportation solutions whose signal control solutions control over two-thirds of intersections nationwide. Econolite has expressed interest in sharing historical data and providing us with access to high-fidelity models they have developed. In addition, as a leading provider of transportation infrastructure solutions including in particular traffic control solutions in the United States, Econolite would also be an ideal partner to actually pilot our solutions in the wild once they are ready.
    Timeline
    This project will be organized in four phases:
1. Data analysis and simulation environment development (July – December 2022): Research on building/adapting simulation models and functionality based on historical data provided by Econolite and possibly additional data from publicly available sources.
2. Solution development I (July – October 2022): Research on modeling the problems of interest as Markov games.
3. Solution development II (November 2022 – February 2023): Research on developing MARL algorithms for the Markov game models.
4.  Evaluation in the simulation environment (March 2023 – June 2023): Evaluation of solution through testing in the developed simulation environments. Continued improvement of the MARL algorithms based on the evaluation results.
In addition to the above four phases, we will also work with Econolite (and possibly also with the Traffic Management Center of Cranberry Township) to design a pilot study towards the end of the project (March 2023 – June 2023).
    Strategic Description / RD&T
    
    Deployment Plan
    The data analytics, modeling, and algorithm development will be led by Prof. Fei Fang and Prof. Norman Sadeh from CMU’s School of Computer Science. It will involve one Ph.D. student, Rex Chen (also from CMU’s School of Computer Science), who will be responsible for the data analytics, modeling, and algorithm development. This project will also leverage funding from the Tang Family Endowed Innovation fund for developing scalable MARL algorithms.
The analysis of data and the development of the simulation environment will be led by the CMU team but will involve substantial effort from Econolite to make sure the simulation environment is close to the real-world scenario.
The design of the pilot study will be led jointly by CMU and Econolite. It involves the selection of timing and intersections to be tested, as well as the evaluation metrics. If the proposed research is successful, we will continue working with Econolite to run a small-scale pilot study in the field, followed by larger deployment plans. The research team at CMU will provide technical support and strategic guidance as part of the deployment plan, and will be engaged in the data collection and analysis efforts for impact assessment.
    Expected Outcomes/Impacts
    The overall objective of the proposed research is to develop models and algorithms for problems in transportation systems that require adaptive coordination, including traffic light control and alternative route suggestion. The goal of this project is to provide a solid step towards the objective with principled research and demonstrate its potential impact with real-world data and the simulation environment built together with our industry partner. Specifically, the results will be evaluated through the following two metrics:
(1) Impact on traffic in the simulation environment. We will quantify the effect of the strategies learned through the proposed MARL algorithm on traffic in simulation, e.g., total queuing time at the intersections, average travel time, number of vehicles on the road, estimates of travel cost reductions.
(2) Potential impact on traffic in the real world. Since there is an inevitable simulation-to-real gap, we will try to understand the potential impact of the learned strategies in the real world by running sensitivity analysis: testing the strategies in simulation environments with perturbed parameters. 
In addition, we will also seek feedback from stakeholders, including our collaborators at Econolite and collaborators at the Traffic Management Center of Cranberry Township.
    Expected Outputs
    
    TRID
    
    Individuals Involved
    
        
            
                | Email | Name | Affiliation | Role | Position | 
        
        
            
                
                    | rexc@andrew.cmu.edu | Chen, Rex | Carnegie Mellon University | Other | Student - PhD | 
            
                
                    | monikade@andrew.cmu.edu | DeReno, Monika | CMU | Other | Staff - Business Manager | 
            
                
                    | feifang@cmu.edu | Fang, Fei | Carnegie Mellon University | PI | Faculty - Untenured, Tenure Track | 
            
                
                    | sadeh@cs.cmu.edu | Sadeh, Norman | Carnegie Mellon University | Co-PI | Faculty - Tenured | 
            
    
    Budget
    
    Amount of UTC Funds Awarded
 $100000.00
    
Total Project Budget (from all funding sources)
 $200000.00
    
Documents
    
        
            
                
                    | Type | Name | Uploaded | 
            
            
                
                    
                
                    
                    
                        | Data Management Plan | DataManagementPlan_v2.docx | Nov. 21, 2021, 6:59 p.m. | 
                    
                
                    
                
                    
                
                    
                
                    
                    
                        | Publication | 2022___ATT___The_Real_Deal__RL_for_TSC__0TeQ3pE.pdf | Oct. 2, 2022, 7:51 p.m. | 
                    
                
                    
                    
                        | Presentation | traffic_signal_control_slides_5U5aS8g.pptx | Oct. 2, 2022, 7:51 p.m. | 
                    
                
                    
                    
                        | Progress Report | 407_Progress_Report_2022-09-30 | Oct. 2, 2022, 7:51 p.m. | 
                    
                
                    
                    
                        | Progress Report | 407_Progress_Report_2023-03-30 | March 26, 2023, 4:48 p.m. | 
                    
                
                    
                    
                        | Publication | MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning | March 30, 2023, 6:16 a.m. | 
                    
                
                    
                    
                        | Publication | Curriculum Reinforcement Learning using Optimal Transport via Gradual Domain Adaptation | March 30, 2023, 6:17 a.m. | 
                    
                
                    
                    
                        | Publication | The Real Deal: A Review of Challenges and Opportunities in Moving Reinforcement Learning-Based Traffic Signal Control Systems Towards Reality | March 30, 2023, 6:17 a.m. | 
                    
                
                    
                    
                        | Publication | Robust reinforcement learning as a stackelberg game via adaptively-regularized adversarial training | March 30, 2023, 6:18 a.m. | 
                    
                
                    
                    
                        | Publication | A survey of explainable reinforcement learning | March 30, 2023, 6:19 a.m. | 
                    
                
                    
                    
                        | Publication | Signal instructed coordination in cooperative multi-agent reinforcement learning | March 30, 2023, 6:19 a.m. | 
                    
                
                    
                    
                        | Publication | Cooperative communication between two transiently powered sensor nodes by reinforcement learning | March 30, 2023, 6:20 a.m. | 
                    
                
                    
                    
                        | Publication | Monte Carlo Forest Search: UNSAT Solver Synthesis via Reinforcement learning | April 10, 2023, 9:33 p.m. | 
                    
                
                    
                    
                        | Final Report | Final_Report_-_407.pdf | July 7, 2023, 10:30 a.m. | 
                    
                
            
        
    
    Match Sources
    
        No match sources!
    
    Partners
    
        
            
                
                    | Name | Type | 
            
            
                
                    
                        | Econolite | Deployment Partner Deployment Partner |