Nearly a quarter of the 1.35 million traffic fatalities in the world involve pedestrians . In the US, the number of pedestrian deaths increased from 4,280 in 2010 to estimated 7508 in 2019 , a 75% increase. Roughly one-quarter of traffic fatalities and about one-half of all traffic injuries in the US are caused at intersections where complex interactions between vehicles, pedestrians, motorcycles, and cyclists occur . This proposal is aimed at increasing the safety of vulnerable road users (VRUs) such as pedestrians, cyclists, and scooter riders, specifically at intersections.
One approach to increase traffic safety is to bring more autonomy into vehicles with the goal of avoiding human-specific problems such as distractions and drunken driving. Towards this goal, computer vision algorithms are applied on data from vision sensors such as RGB cameras and LIDAR for detecting other objects in the scene. Most of these algorithms perform well on detecting larger objects such as vehicles but not as well on detecting pedestrians and other VRUs. Another challenge is that, unlike human drivers with whom pedestrians can communicate with a plethora of verbal and non-verbal cues, interacting with autonomous vehicles is a nascent concept, Further, the performance of deep learning-based vision techniques can degrade significantly in challenging conditions such as low-light environments, sun glare and severe occlusions, e.g., a pedestrian darting out from between two parked cars.
In line with US DOT’s vision for connected and autonomous vehicles, this proposal is aimed at developing and evaluating computer vision solutions that offer increased VRU safety by using C-V2X capabilities. Our proposed Connected Vision for Increased Pedestrian Safety (CVIPS) system is illustrated schematically in Fig. 1 (see supplement). Each agent extracts pedestrian location and trajectory information using a video vision transformer and communicates that information to other participating agents using a unified representation such as the Bird’s Eye View (BEV).
An example of the benefits of CVIPS is shown schematically in Fig. 2 (in the supplement) which shows a scenario with 4 vehicle cameras and 1 infrastructure (e.g., traffic light) camera. Here a pedestrian is not visible to two of the vehicle cameras but is visible to another vehicle camera and the infrastructure camera.
CVIPS relies on C-V2X connectivity and it is important to understand the impact of the C-V2X parameters (e.g., bandwidth, latency, etc.) and limitations on the data available to the multiple agents. For example, is there enough bandwidth to share the full video frames or should we share only the bounding boxes of detected VRUs? Also, achieving a unified BEV representation requires knowledge of the locations of participating cameras and the impact of location errors (e.g., caused by GNSS) needs to be studied. It is also important to consider challenging imaging conditions such as rain/snow, sun glare, night time, etc. Deep learning solutions can be highly demanding in storage and computational complexity and so one of our goals will be to develop light-weight implementations. Also, there are no known datasets that provide videos from multiple cameras covering the diverse range of pedestrian scenarios we propose to investigate. We propose to generate synthetic data using the high-fidelity CARLA simulator. Initial algorithm development and evaluation will be based on synthetic data, but best-performing algorithms will be tested on real data that will be collected after identifying relevant driving scenarios and obtaining the necessary IRB approvals.
CVIPS project consists of the following major research tasks: (1) Creating synthetic image sequences for the VRU scenario, (2) acquiring real data after IRB approval, (3) development and testing of baseline deep learning algorithms for pedestrian detection and pedestrian trajectory estimation, (4) investigation of the impact of C-V2X parameters on the accuracy of pedestrian detection and trajectory estimation, (5) evaluating the developed algorithms under challenging imaging conditions, and (6) quantifying the increased pedestrian safety through CVIPS.
Strategic Description / RD&T
The US DOT RD&T Plan states that the aim is to “contribute to a future transportation system where transportation-related serious injuries and fatalities are eliminated” (Page 14) and identifies “Vulnerable road user safety” as a critical research topic (Page 15). The main goal of the proposed research project CVIPS to achieve increased pedestrian is highly aligned with this USDOT aim. CVIPS aims to achieve increased pedestrian safety by combining deep learning algorithms for pedestrian detection and pedestrian trajectory estimation with the C-V2X capabilities available in a connected intersection. CVIPS will initially focus on pedestrian safety and will extend and adapt those algorithms to increase the safety of other VRUs (e.g., bicyclists, people in wheel chairs, scooter riders, motor cyclists, etc.). This is consistent with US DOT’s research priority of “Identify(ing) and support(ing) strategies to increase vulnerable road user safety (e.g., pedestrians, bicyclists, motorcyclists, and people with disabilities)” (Page 19).
As stated earlier, the number of pedestrian fatalities increased by 75% during the period 2010-2021, whereas the number of all other traffic fatalities increased by 25%. Also, according to https://smartgrowthamerica.org/dangerous-by-design/, “people of color, particularly Native and Black Americans are more likely to die while walking than any other race or ethnic group” and people walking in lower-income areas are killed at far higher rates. CVIPS’s objective to achieve increased pedestrian safety will address the US DOT’s grand challenge of “Equitable Mobility for All” (Page 11).
July – September 2023
1. Brief: Develop a presentation that summarizes the goals, methodology, and expected outcomes of the CVIPS research project.
2. CVIPS Project web page: Develop a dedicated CVIPS project web page that will provide stakeholders, collaborators, and the public with information about the CVIPS project. This web page will communicate project objectives, approach, results and findings as the project progresses, serving as a central hub for project-related information.
October – December 2023
1. CVIPS Synthetic dataset: Generate synthetic image sequences that simulate vulnerable road users (VRUs) under different CV2-X conditions and scenarios using the high-fidelity CARLA simulator that will be used for algorithm development and testing.
2. CVIPS project Poster Presentation: Present a poster at the UTC Deployment Partner’s conference to communicate the CVIPS project motivation and goals and to invite potential deployment partners to learn more about the CVIPS project.
January – March 2024
1. Publication with baseline results & simulated dataset: Publish the results achieved with the synthetic dataset and different collaborative scenarios. This publication will introduce the CVIPS dataset and summarize the baseline algorithm performance and serve as a benchmark for further advancements.
April – June 2024
1. Demos: Develop demonstrations of the CVIPS system on videos containing partially occluded pedestrians and other VRUs.
2. Final report: Summary of the CVIPS research project motivation, technical approach, methodology, results, and conclusions.
The following are the expected outcomes.
1. Creation of synthetic image and image sequence datasets (using CARLA) that includes vehicles, pedestrians and other VRUs at intersections equipped with connected cameras. These images will reflect normal operating conditions as well as challenging imaging conditions. After appropriate vetting, this dataset will be made publicly available spurring further research into the important topic of increasing pedestrian and VRU safety at urban intersections.
2. Deep learning based pedestrian detection and trajectory estimation algorithm that offer increased robustness to challenging conditions such as occlusions, rain/snow and sun glare. The developed machine learning algorithms will have the potential to increase pedestrian and VRU safety at urban intersections.
3. Quantification of the tradeoffs between multi-agent communication parameters (e.g., communication delays, data dropouts, etc.) and pedestrian and VRU detection and trajectory estimation accuracy. This quantification will help in the understanding how the parameters of CV2-X systems can impact the VRU safety improvements achievable through collaborative vision approaches.
These outcomes will bring closer the vision of connected vehicles leading to increased safety, particularly for vulnerable road users.
The anticipated outputs from this project are as follows.
• Synthetic image and image sequence datasets that includes vehicles, pedestrians and other VRUs at intersections equipped with connected cameras..
• Publications describing the deep learning algorithms for pedestrian detection and trajectory estimation in images and image sequences collected by multiple cameras at intersections, subject to C-V2X and V2X limitations.
• Software implementations of the deep learning algorithms.
• Technical report quantifying the tradeoffs between multi-agent communication parameters and pedestrian and VRU detection and trajectory estimation accuracy.
Two TRID searches were conducted. A search using the keywords "pedestrian, safety, vulnerable road users, computer vision" identified 6 documents from the last 5 years. Similarly, a search using the keywords "vulnerable road users, cooperative perception" identified 8 documents from the last 5 years.
These identified works in the area of VRU/pedestrian detection for safety focus on using a single perception unit, such as a roadside camera or on-vehicle camera whereas we propose to use data from multiple perception units to detect and predict the tracks of VRUs. The closest work we could find to what we propose is [Shan, et al, https://trid.trb.org/view/1896511, October 2021], which proposes a general framework for using cooperative perception final results to detect vehicles and pedestrians. Our methods will focus on VRUs and thus are expected to outperform the more general-purpose object detection approaches. Also, most works use off-the-shelf object detection methods, for instance, YOLOv4 whereas we propose to develop a Vision Transformer approach for VRU detection and trajectory estimation.
Our proposed work differs from the state of the art in the following ways:
a. We will consider a variety of VRUs’ including pedestrians, scooter riders, cyclists, motorcyclists people in wheel chairs, etc.
b. We will develop new method(s) specifically designed for VRU detection & trajectory prediction using collaborative vision.
c. We will investigate the impact of C-V2X parameters (e.g., bandwidth and delay) on VRU detection and tracking performance.
|Carnegie Mellon University
|Faculty - Tenured
|Carnegie Mellon University
|Faculty - Tenured
|Carnegie Mellon University
|Student - PhD
Amount of UTC Funds Awarded
Total Project Budget (from all funding sources)
|Data Management Plan
|Oct. 16, 2023, 7:06 a.m.
No match sources!
|Deployment Partner Deployment Partner
|Deployment Partner Deployment Partner
|City of Pittsburgh Dept of Mobility and Infrastructure
|Equity Partner Equity Partner