#161 Analyzing Social Media for Improving Transportation Safety

Principal Investigator
Feng (T.) Chen
Start Date
Jan. 1, 2013
End Date
Dec. 31, 2013
Research Type
Grant Type
Grant Program
MAP-21 TSET - Tier 1 (2012 - 2016)
Grant Cycle


The goal of this project is to develop an online intelligent system that automatically monitors and collects timely and comprehensive information from social media (e.g., blogs, online forums, and twitter) about the current status of the transportation network and traffic flow to support advanced safety enhancement. 

Our proposed approach is composed of five major components: 1) Public Safety Data Extraction. We plan to build a classifier (e.g., SVM) to automatically identify transportation-safety related posts on local social media platforms covering the area of interest. However, it is computationally expensive to train a classifier for social media, because of the short length and large volume of the messages, as well as the non-standard abbreviations. It is much cheaper to collect labels for news articles (e.g., national transportation safety board), so transfer learning techniques can be applied to build the classifier without the direct labeling of social media. 2) Heterogeneous Safety Data Modeling. Social media is heterogeneous by nature and has a variety of both entity types (e.g. user, post, hashtag, term, link, mention, location, and time) and relationships (e.g. originator, reply, friendship, and followership). To model this very complex data structure, we plan to build a heterogeneous network model for the safety data. 3) Transportation Safety Topics: Discovery. Transportation safety could include many different topics, such as road blockage or damage due to heavy snows or floods, missing people swept away by a flood, the malfunctioning of traffic lights, traffic incidents, and drunk driving to name but a few. In addition, topics may relate to different geographic locations and time periods. We propose to design a customized spatiotemporal topic model specifically for transportation safety applications. 4) Bias Estimation Using Traditional Traffic Sensor Data. Social media could potentially be a biased sample, and it is important to estimate this bias by cross-validation using traditional transportation census data, such as loop detector and camera data, incident reports, and transportation surveys. 5) User Interface and High Level Applications. These will include a regional sentiment index, safety alarms, and safety recommendations.    
Deployment Plan
Expected Accomplishments and Metrics

Individuals Involved

Email Name Affiliation Role Position
noemail3@noemail.com Chen, Feng (T.) Carnegie Mellon Heinz College PI Faculty - Researcher/Post-Doc
rk2x@cmu.edu Krishnan, Ramayya Carnegie Mellon Heinz College Co-PI Faculty - Tenured


Amount of UTC Funds Awarded
Total Project Budget (from all funding sources)


Type Name Uploaded
Final Report 161_-Final_Report_Chen.pdf June 21, 2018, 8:41 a.m.

Match Sources

No match sources!


No partners!