Final Report - 497 (Final Report)
- Download File
- CMUSafety21_Project497_OccupancyGridsRLControl_FinalReport-20250518.pdf
- Description
- Autonomous driving in complex traffic environments demands decision-making systems that can interpret diverse sensory inputs, anticipate multi-agent interactions, and execute safe and efficient maneuvers. Roundabouts present a particularly challenging case due to continuous merging, yielding, and exiting maneuvers under heterogeneous and often unpredictable traffic flows. This study addresses these challenges by designing and evaluating a decision-making framework capable of handling high-complexity roundabout scenarios.
We analyze the intrinsic complexity of roundabout navigation by quantifying interaction density, conflict points, and decision latency under varying traffic densities, providing a structured benchmark for evaluating policy robustness. To model the environment, we adopt multi-layer occupancy grids as the primary spatial representation, providing a dense encoding of occupancy, velocities, and road geometry. The backbone combines a CNN-based spatial encoder to capture local spatial patterns with a transformer module for temporal abstraction, enabling the model to reason over both spatial and temporal dependencies in traffic flow.
Building on this foundation, we propose the Uncertainty Weighted Decision Transformer (UWDT) to improve decision making, safety, and efficiency in rare, high-risk situations. Unlike standard Decision Transformers, which optimize a uniform sequence prediction loss, UWDT incorporates an uncertainty-weighted objective that increases the learning signal for states with higher policy or value uncertainty. Experimental simulation results demonstrate that UWDT, combined with the spatial encoder, achieves lower collision rates and shorter traversal times compared to baseline Decision Transformers and Behavior Cloning (BC) Transformers and conventional deep reinforcement learning agents such as Soft Actor Critic (SAC) and Conservative Q-Learning (CQL). In particular, UWDT exhibits greater resilience in rare but safety-critical states, effectively balancing assertiveness with caution.
- Citation
- Publication Date
- None