#403 Robust Automatic Detection of Traffic Activity from Vehicle Perspectives

Principal Investigator: Alex Hauptmann
Status: Completed
Start Date: July 1, 2022
End Date: June 30, 2023

Project Type: Research Advanced
Grant Program: FAST Act - Mobility National (2016 - 2022)
Grant Cycle: 2022 Mobility21 UTC
Visibility: Public

Abstract

The accurate detection and prediction of actions by multiple traffic participants such as pedestrians, vehicles, cyclists and others is a critical prerequisite for enabling self driving vehicles to make autonomous decisions. Current approaches  to teach an autonomous vehicle how to drive use reinforcement learning which is essentially relies on already collected situations as examples relying purely on visual similarity without any understanding of the semantics of the situation and therefore no ability to reason about other similar situations that may have different appearance. This can be overcome by methods that provide situation awareness to the vehicle. The idea is to enable semantically meaningful representations of road scenarios which include the physical layout of the scene, the various participants prior and current activities. The ability to abstract this semantic representation and apply it to multiple scenes that are conceptually similar allows much more robust decision-making strategies by autonomous vehicles. Essentially this allows endowing autonomous vehicles with a reasoning process.

Description

We proposed to build an efficient robust spatio-temporal activity detection system for extended and road activity detection. The proposed system is a composition of a four-stage framework: Proposal Generation, Proposal Filtering, Activity Recognition and Activity De-duplication. The major difference to former works, is the concept of cube proposals. Rather than simply adapting tube proposals,  cropped trajectories of detected and tracked objects, we propose to merge and crop the area of detected objects across the frames.
The proposed system will provide a real-time activity detection for unconstrained video streams of road scenes, and be robust across different road scenarios.
We will implement overlapping spatio-temporal cubes as the  core concept of road activity proposals to ensure coverage and completeness of activity detection through oversampling.
\
An early version of this system tailored to human activity detection only has achieved outstanding performance in a large series of activity detection benchmarks such as the TRECVid 2021 challenge on activity analysis in extended surveillance video.

Timeline

Months 1-3: Gather and preprocess data sets, start pretraining on annotated data
Months 4-6: Experiments with implementation of different algorithms for road activity detection
Months 7-9: Improve system, participate in public challenges for automatic road activity analysis
Months 10-12: Improve system for final deliverable and report

Strategic Description / RD&T

Deployment Plan

GM will use the algorithms we develop. They will validate our results on our test data and also apply the algorithms to their own proprietary data. 

The research will adapt based on GM  feedback in their evaluation on proprietary data.

Expected Outcomes/Impacts

Our primary metrics will be detection accuracy in established data sets such as the NVIDIA AI City Challenge data and the ICCV Road Challenge data, based on the metrics already in use for these datasets.

Expected Outputs

TRID

Individuals Involved

Email	Name	Affiliation	Role	Position
ablair2@andrew.cmu.edu	Blair, Allison	CMU	Other	Staff - Business Manager
alex@cs.cmu.edu	Hauptmann, Alex	CMU/LTI	PI	Faculty - Research/Systems

Budget

Amount of UTC Funds Awarded

$100000.00

Total Project Budget (from all funding sources)

$146000.00

Documents

Type	Name	Uploaded
Data Management Plan	DMP_-_Robust_Automatic_Detection_of_Traffic_Activity_from_Vehicle_Perspectives.pdf	Dec. 7, 2022, 12:24 p.m.
Publication	Msnet: A multilevel instance segmentation network for natural disaster damage assessment in aerial videos	March 30, 2023, 5:31 a.m.
Publication	Scene graphs: A survey of generations and applications	March 30, 2023, 5:31 a.m.
Publication	Subspace Representation Learning for Few-shot Image Classification	March 30, 2023, 5:32 a.m.
Publication	Argus++: Robust real-time activity detection for unconstrained video streams with overlapping cube proposals	March 30, 2023, 5:33 a.m.
Publication	Trm: Temporal relocation module for video recognition	March 30, 2023, 5:33 a.m.
Publication	Rethinking spatial invariance of convolutional networks for object counting	March 30, 2023, 5:34 a.m.
Publication	Vehicle and Pedestrian Trajectory and Gap Estimation for Traffic Conflict Prediction	March 30, 2023, 5:35 a.m.
Publication	Deep discrete cross-modal hashing with multiple supervision	March 30, 2023, 5:35 a.m.
Publication	Video pivoting unsupervised multi-modal machine translation	March 30, 2023, 5:36 a.m.
Publication	MAGVIT: Masked Generative Video Transformer	March 30, 2023, 5:36 a.m.
Final Report	403_-_Final_Report.pdf	July 5, 2023, 12:31 p.m.

Match Sources

No match sources!

Partners

Name	Type
General Motors	Deployment Partner Deployment Partner