#334 Taxi-for-all: Incentivized Taxi Actuation System for Balanced Area-wide Service

Principal Investigator
Carlee Joe-Wong
Start Date
July 1, 2020
End Date
June 30, 2021
Research Type
Grant Type
Grant Program
FAST Act - Mobility National (2016 - 2022)
Grant Cycle
2020 Mobility21 UTC


This project develops a method to incentivize taxis to travel to different parts of a city so that they sense data or pick up passengers in under-served areas. Taxis' natural movement generally leads to them concentrating in busy locations with many potential passengers. By incentivizing taxis to move towards less busy locations, we can even out their distribution across a city. This work extends our prior UTC-funded research on incentivizing taxi movement for crowdsensing.    
In modern smart cities, understanding of the city-wide area is paramount to improve transportation throughout the city. City-wide sensing (such as air-pollution monitoring, traffic monitoring, road-surface sensing, etc.) and city-side transportation (taxis, buses, etc.) all require wide coverage over the entire city. However, vehicles (such as taxis) are not evenly spread and tend to be centered around the central business district. This often becomes miss-matched for the transportation or sensing needs during times where they are especially needed. 

We aim to derive a method to incentivize taxis to travel to different parts of a city, with the goal of ensuring that they sense data evenly throughout the city. We assume that taxis continuously sense data at a fixed frequency regardless of their location. Thus, controlling the distribution of sensed data is equivalent to controlling the movement of the taxis. In general, taxis' natural movement will lead to them concentrating in busy locations without many potential passengers, which is undesirable for sensing or transportation. By incentivizing taxis to move towards locations with more passengers and higher needs, we can also make the distribution of sensed data more uniform and thus useful for city operators.

In our previous work funded by Mobility21-UTC, we explicitly offered incentives for vehicles to follow a specific path through the city. These incentives were calculated so as to compensate vehicles for deviating from their ``natural'' paths, with the aim of ensuring that vehicles received the same utility (defined as the potential profit from picking up a rider less the cost of driving) with the incentivized path as with the path they otherwise would have followed. Thus, it relied on predicting where the vehicle would go without incentives; assuming that prediction was accurate, we could assume the vehicle would accept our incentive. The core challenge of the problem was then to decide which vehicles we should offer incentives, subject to budget constraints on the total amount of incentives we could offer.

A drawback of our prior incentive scheme is that it is complex, and it may appear unfair to taxi drivers who wonder why some vehicles are offered incentives and some are not. A simpler solution would be to offer fixed prices for collecting data at certain locations and times. (This is also simpler than many existing crowdsensing solutions, which use auction-based frameworks.) Any taxi driver could then observe the offered prices and decide where to sense data. These prices, however, should be chosen carefully: we need to predict how drivers will respond to them and then optimize the prices offered so as to induce an even sensing distribution. For instance, it makes sense to offer lower prices in crowded areas where drivers will go anyway.

In order to compute the optimal prices, we will need (at least) the following:

Predictions of where each taxi will go. This allows us to estimate the sensing coverage without incentives.

Predictions of how taxis will respond to incentives. This can most likely be modeled as taxis examining each location of the city and then choosing to go to the location that gives them the highest utility/benefit. Note that this is complicated by the travel time between different locations, which may be affected by traffic in the city.

Predictions of traffic in different locations of the city at different times. This will help us to predict the cost of traveling to different locations in a city.

Predictions of the ride requests at different locations in a city. This will help us estimate the potential benefits of taxi drivers in driving to different locations.

These variables are difficult to predict in advance, since they may change over time (traffic patterns in a city, for instance, may be different in different months) and heavily depend on individual taxi driver preferences. Thus, we plan to use reinforcement learning to simultaneously predict these variables and solve for the optimal prices offered to taxis. A learning-based approach can automatically adapt as traffic patterns in a city change, and moreover can solve complex optimization problems such as our price optimization problem, which is likely non-convex and thus difficult to solve with traditional optimization methods.

To cast our problem into a reinforcement learning framework, we must specify the actions, environment states, and reward function. Our actions will consist of the prices offered at each time and location. We will take these to be discretized variables in a given range; the output of the learning algorithm is then the prices that should be offered. We choose these prices so as to optimize a reward function that measures the discrepancy between the desired taxi distribution and the achieved taxi distribution, given the prices offered and our predictions of how taxis move around the city. We take the desired taxi distribution as given. For example, if the taxis are being used to pick up passengers, then the desired distribution should ensure that there are taxis present at each location where we predict passengers will request rides. We can specify the distribution to ensure that taxis go to traditionally under-served areas, helping to equalize accessibility to transportation across the city.

The actions that we choose depend on the states of the environment, which we use to determine the four variables that we need to predict above: predictions of where taxis will go, how they will respond to incentives, traffic at each time and location, and the ride requests at each time and location. As in our prior work on using reinforcement learning to optimize taxi movements [Oda 2018], we will formalize the state variables as (1) the current distribution of taxis at each time and location, (2) the predicted number of passengers at each time and location, and (3) current travel times within each location, which is a proxy measure of the amount of traffic. Our four predicted variables are functions of these three types of state variables. We will include the previously offered prices in the state variables, as these may influence how taxi drivers respond to incentives.

After completing this formulation, we can use standard reinforcement learning algorithms to solve for the optimal prices. We have built an initial simulator using actor-critic algorithms to predict the future state variables and optimize the price variables. In our evaluation, we will compare the optimized prices to those calculated from heuristics and use simulated driver reactions to prices in order to evaluate which set of prices is more effective. Our deployment plan has more details.    
We expect the project to last two years, with the timeline for each stage in the project description given below. All three PIs will contribute to each stage of the project. PI Joe-Wong will lead the formulation and theoretical framework of the work, PI Noh will lead the algorithms development stage, and PI Zhang will lead the deployment and system side of the work.

Stage 1 (Formulation): We will finish the formulation within the first year. We plan to spend months 1-3 of the project developing the formulation with vehicle actuation, building on our prior work [Xu 2019; see the project description for references], and the next 3 months (months 4-6 of the project) extending the formulation to incorporate passenger and traffic prediction.

Stage 2 (Algorithms): In months 7-9 of the project, we will develop and implement reinforcement learning algorithms to solve for the optimal prices. These will be tested in a small-scale deployment in months 10-12 as described in the evaluation stage below. In the second year of the project, we will spend 6 months (months 13-18 of the project) extending our algorithms to an online solution algorithm.

Stage 3 (Deployment): In months 10-12 of the project, we will deploy our algorithm in a small scale trial with 6 passenger taxis. In parallel, we will conduct extensive simulations based on NYC and Shenzhen taxi data to evaluate our formulation and algorithms. The initial deployment will use the taxi dispatch system to actuate a limited number of taxis. As we develop online versions of the algorithms in months 13-18 of the project, we will continue to evaluate them in simulation. We will spend the last 6 months of the project on a large-scale deployment with up to 50 vehicles. Our deployment plans are described below.    
Deployment Plan
We plan to explore deployment with Roadbotics in Pittsburgh, PA to simulate sensing using their fixed-route vehicles and a few non-fixed route vehicles. The scale and state of the deployment will be based on the result of our initial evaluation in the Shenzhen deployment. Our initial deployment will be with Tsinghua Berkeley Shenzhen Institute and Shenzhen Taxi (see collaboration letter and supplemental project description for a picture of the taxi system). During the first year we plan to deploy our algorithm on 6 passenger taxis for actuation. The routes will be communicated through the taxi dispatch system. The evaluations are planned to be short-term (1 day per week for a total of 10 weeks). In the second year we will plan for two deployments with 50 vehicles over multiple continuous weeks. For each deployment, half of the vehicles will be randomly chosen to run the developed algorithm and the other half will be using the baseline method. The incentive request will initially be done through the dispatch system.    
Expected Accomplishments and Metrics
This project will produce three major results: (1) prediction models of user demand and vehicle mobility, (2) algorithms to optimally incentivize vehicle mobility, and (3) a system for taxi actuation. We will validate our prediction models and algorithms on data from New York, Shenzhen, and limited deployment with Roadbotics. We expect the results of these evaluations to be published in 1-2 conference or journal papers. By incentivizing drivers to move towards more remote locations with demand, we will enable transportation services to become more efficient. In addition city-wide sensing tasks can also be accomplished on the platform. In evaluating our work, we will consider the metrics of reduction in overall cost, the resulting user wait times, and travel time of the taxis. In addition, we will also evaluate the effectiveness of shaping the desired distribution of taxi service across the city with and without incentives.    

Individuals Involved

Email Name Affiliation Role Position
cjoewong@andrew.cmu.edu Joe-Wong, Carlee Carnegie Mellon School of Engineering PI Faculty - Untenured, Tenure Track
noh@cmu.edu Noh, Hae Young Carnegie Mellon School of Engineering Co-PI Faculty - Adjunct
peizhang@andrew.cmu.edu Zhang, Pei Carnegie Mellon School of Engineering Co-PI Other


Amount of UTC Funds Awarded
Total Project Budget (from all funding sources)


Type Name Uploaded
Data Management Plan DataManagementPlan.pdf March 17, 2020, 9:08 a.m.
Publication A Generative Simulation Platform for Multi-agent Systems with Incentives Sept. 27, 2020, 1:03 p.m.
Presentation A Generative Simulation Platform for Multi-agent Systems with Incentives Sept. 27, 2020, 1:03 p.m.
Progress Report 334_Progress_Report_2020-09-30 Sept. 27, 2020, 1:04 p.m.
Publication An incentive mechanism for crowd sensing with colluding agents. Dec. 8, 2020, 9:51 a.m.
Publication Incentivizing vehicle mobility to optimize sensing distribution in crowd sensing. Dec. 8, 2020, 9:52 a.m.
Publication Vehicle dispatching for sensing coverage optimization in mobile crowdsensing systems. Dec. 8, 2020, 9:53 a.m.
Publication ASC: Actuation system for city-wide crowdsensing with ride-sharing vehicular platform. Dec. 8, 2020, 9:54 a.m.
Publication PAS: Prediction-Based Actuation System for City-Scale Ridesharing Vehicular Mobile Crowdsensing. Dec. 8, 2020, 9:54 a.m.
Publication On the Real-time Vehicle Placement Problem March 21, 2021, 5:02 p.m.
Progress Report 334_Progress_Report_2021-03-31 March 30, 2021, 5:04 p.m.
Final Report Final_Report_-_334.pdf July 23, 2021, 4:26 a.m.

Match Sources

No match sources!


Name Type
Tsinghua Berkeley Shenzhen Institute Deployment Partner Deployment Partner
RoadBotics Deployment Partner Deployment Partner