The accurate detection and prediction of actions by multiple traffic participants such as pedestrians, vehicles, cyclists and others is a critical prerequisite for enabling self driving vehicles to make autonomous decisions. Current approaches to teach an autonomous vehicle how to drive use reinforcement learning which is essentially relies on already collected situations as examples relying purely on visual similarity without any understanding of the semantics of the situation and therefore no ability to reason about other similar situations that may have different appearance. This can be overcome by methods that provide situation awareness to the vehicle. The idea is to enable semantically meaningful representations of road scenarios which include the physical layout of the scene, the various participants prior and current activities. The ability to abstract this semantic representation and apply it to multiple scenes that are conceptually similar allows much more robust decision-making strategies by autonomous vehicles. Essentially this allows endowing autonomous vehicles with a reasoning process.
We proposed to build an efficient robust spatio-temporal activity detection system for extended and road activity detection. The proposed system is a composition of a four-stage framework: Proposal Generation, Proposal Filtering, Activity Recognition and Activity De-duplication. The major difference to former works, is the concept of cube proposals. Rather than simply adapting tube proposals, cropped trajectories of detected and tracked objects, we propose to merge and crop the area of detected objects across the frames.
The proposed system will provide a real-time activity detection for unconstrained video streams of road scenes, and be robust across different road scenarios.
We will implement overlapping spatio-temporal cubes as the core concept of road activity proposals to ensure coverage and completeness of activity detection through oversampling.
An early version of this system tailored to human activity detection only has achieved outstanding performance in a large series of activity detection benchmarks such as the TRECVid 2021 challenge on activity analysis in extended surveillance video.
Months 1-3: Gather and preprocess data sets, start pretraining on annotated data
Months 4-6: Experiments with implementation of different algorithms for road activity detection
Months 7-9: Improve system, participate in public challenges for automatic road activity analysis
Months 10-12: Improve system for final deliverable and report
GM will use the algorithms we develop. They will validate our results on our test data and also apply the algorithms to their own proprietary data.
The research will adapt based on GM feedback in their evaluation on proprietary data.
Expected Accomplishments and Metrics
Our primary metrics will be detection accuracy in established data sets such as the NVIDIA AI City Challenge data and the ICCV Road Challenge data, based on the metrics already in use for these datasets.
||Staff - Business Manager
||Faculty - Researcher/Post-Doc
Amount of UTC Funds Awarded
Total Project Budget (from all funding sources)
No match sources!
||Deployment Partner Deployment Partner