The Personalized Trip Planner for Seniors (PTPS) will create an interactive voice agent for this portion of the population that has limited possession of smartphones. The agent will give directions to any given destination with options of the means of transportation that can be used. It is intended to first become a deployed tool for Pittsburgh, but created in a flexible manner that allows it to be adapted for any other city.
We will use reinforcement learning and neural network technology to model what seniors really say. We will also address three fundamental issues for people as they age:
- we process information more slowly as we age, so PTPS will find ways to slow down the agent’s speech, while keeping it intelligible
- we cannot attend to several things at the same time, so PTPS will get the user’s attention before giving out any important information
- we do not retain much information at a given time, so PTPS will only give one piece of information at a time and will repeat the essential words at the end of the message.
The project will take two years to reach deployment stage. In Y1 the agent will be built with the help of our partners, AARP and Age Friendly Greater Pittsburgh. In Y2, the agent will be refined and endowed with personalization, tested and deployed.
Results of PTPS will be measured in several ways, the most important being if we can get real people to use it, give us positive feedback on it and, especially, to become return users.
Personalized Trip Planner for Seniors (PTPS)
Maxine Eskenazi, Alan W Black
Active seniors need information about how to get around their city. Most current navigation systems are designed to run on smart phones. They require an Android or iOS device. In addition to the hardware, they also require good eyesight, and the ability to navigate the user interface. While 77% of the total population owns a smartphone, only 42% of those over 65 do (Pew 2017). PTPS will solve this problem by replacing the standard graphical interface (GUI) with a speech-based interface, turning this service into a conversational phone agent that is better suited to helping seniors than the GUI in finding the best way to get to where they want to go. In the form of an agent that can be called on the phone, it will adapt to the habits and preferences of the individual traveler and will speak in a way that is easier for seniors to understand.
This voice-based agent will build on our prior work on the Let’s Go system which gave bus information to everyone who called the Port Authority of Allegheny County in the evening and on weekends. The target geographic area is Pittsburgh.
From our past work, we have learned that older citizens can only benefit from a phone-based agent if it adapts its speech to their changing listening and cognitive abilities. As we age, we are:
- able to process less information in a given period of time,
- less able to focus on more than one thing at a time and
- less able to remember large amounts of information.
To address these three important issues, we will modify the way that the agent speaks by:
- Slowing the speaking rate (not making it shout!) – as we get older we need more time to process information, we do not necessarily need the information to be louder.
- Getting the caller’s attention before giving important information – so that they are giving our information their full attention
- Uttering short phrases, each with one piece of information and repeating the important information at the end of each of the agent’s dialog turns.
In the first year we will build the basic spoken dialog system using machine learning represenations for the core modules: natural language understanding, dialog management and natural language generation. In the second year, we will iteratively improve the dialog system and what the system says while we also look at what the users actually say. This will take into account having the agent adapt to the terms that the users employ and especially by incorporating the use of references to monuments to tell the agent the start and endpoints of a journey. Monuments can be any physical item (a building – Newell Simon Hall, a thing – Kaufmann’s clock, a place of business – Craig Street Starbucks). And that thing does not necessarily presently exist (Kaufmann’s clock, the Isaly's on the Boulevard of the Allies). We will use information about the user to personalize the system’s output, (do they have a car, are they mobile enough to walk short distances, what landmarks are they familiar with). An example of the dialog that the system could have after Y2:
SYS: Where do you want to go today.
MrsH: I need to meet my daughter at the JCC at 11 tomorrow.
SYS: Is that in Squirrel Hill?
MrsH: Yes, just next to my granddaughter's school.
SYS: As the weather will be nice tomorrow, you could leave at 10:20 and walk down to Forbes and Craig, and take either the 61A or 61B that is due at 10:42 from the stop outside the Art Museum and get off at Forbes and Murray just next to the JCC. You should arrive at about 10:50.
The speech-based agent will be available by calling a free number. It will use speech alone for communication. It will know about: public transportation, driving directions, bicycling, Uber and other ride services and of course walking. It will access real-time information from the web but also importantly take into account what the user knows and has preferences for (do they have a car, are they mobile enough to walk short distances, what landmarks are they familiar with).
The agent will:
- Give facts about how to get between any two points, asking for appropriate clarification when some place cannot be resolved.
- Give the facts in a way that can be remembered (and requestioned later, e.g. during the trip itself). The information will delivered in appropriate sized memorable chunks, not stuffing too much information into the one sentence.
- Give directions with place names and terms the user is familiar with (learned from previous interactions).
- Remember locations the user has been to before for repeat trips and for referring to in future trip (e.g. "go past the restaurant you went to last week with your daughter").
- Offer alternatives when appropriate if the user likes alternatives. (e.g biking, walking etc).
For the agent, we will first use handwritten rules for the Natural Language Understanding, Natural Language Generation and Dialog Manager modules We will use Chrome for speech recognition (ASR) and Festival (Black et al) for speech synthesis (since, for the latter, we can have control over the output, such as speaking rate and pause insertion). In Y2, we will train the agent on the data we gathered in Y1 so that the NLU, DM and NLG modules can function using reinforcement learning and neural nets (NNs). The system will remain hybrid, combining machine learning techniques with handwritten rules so that we can take model infrequent events and easily incorporate new functions as needed. In Y2, the agent will also deal with meta-personalization options (for example, weather - using a bus if it’s raining rather than walking, etc).
The novel contributions of this proposal are:
- using the latest findings on language generation, choosing the appropriate words, and phrasing those words to ensure understandability, including adapting to the individual user, familiarity of the information being presented,
- using the latest findings on language adaptation during a dialog. Lexical entrainment techniques will adapt to the users' choice of phrases and the agent will "ground" (i.e. confirm the reference in the real world).
- using the latest findings on dialog and user modeling to use the known information about the user (or predictably known) in the language generation process.
- using knowledge about the effects of aging to enable the agent to address the user in a much more appropriate manner.
Assessment is an important part of the project. Since we hope to deploy the agent to the public in Y2, we need real potential users to test the agent as it is developed. We will:
- Create advertising that targets this population in Pittsburgh, with the help of AARP and Age Friendly (see below)
- Run regular system tests
- In between each test, analyze the data collected and making modifications to the system as needed to better respond to this population’s needs.
The main areas of work in Y1 are the digital phone connection (finding a provider, testing reliability, etc, which will be a major focus), creating the dialog system and linking it to the Google backend, creating prompts that give short answers with small amounts of information that are repeated at the end of the message, slowing the synthesis down so that it is more understandable, testing the first version of the system. During this time, we will also be working with our partners to get feedback on each iteration of the system, to record three of them in Wizard of Oz (WoZ) dialogs. This is an activity where one person plays the role of the system they would like to see us create and the other plays the role of the user. We record those sessions, annotate them, and use them in the structure of the agent.
In Y2, the main focus will be personalizing the agent, training NNs for three of the system modules and system modification from user feedback, testing and deployment. We will also deal with how users refer to monuments, and ensuring system performance 24/7.
We have interest at both the local and PA state levels of AARP (see letter of support). They have also recommended that we work with Age Friendly Greater Pittsburgh, a local group who are concerned about transportation options. We are pursuing the possibility of working with them for: meetings about what the system should be like and what we should include; testing the system when it is ready and at every iteration; being recorded in WoZ mockups of the system.
http://www.pewinternet.org/fact-sheet/mobile/ (accessed on 12-12-2017)
note on our present estimate of in-kind contributions – still needs to be confirmed with partners- per year:
- Cost of 1hr of agent testing per person: $25 x 50 people = $1250 x 4 times a year = $5000
- Cost of advising – meetings and follow ups with AARP and Age Friendly leaders - 1hr = $150 per person x about 30 hours/yr = $4500 for three persons = $13500
- WoZ (Wizard of Oz) work on system development per hour: $100 (at least 100 hours) = $10000
- Publicity (to attract the 50 testers each time on buses, etc) = $1800
total = $31300.
Y1 will see preliminary meetings about the working of the system as the system is set up. By month 5 we should have the WoZ recordings. By month 7 we should have gleaned what we need to put in the system from the WoZ experience. Months 9 and 12 will see testing by the users and iteration to improve the Y1 system.
Y2 will see continued meetings and reviews of the system, the addition of personalization etc, continued testing every three months and a final version of the system that can be deployed widely in Y3.
We plan to develop the system in Y1 all the while (see in AARP below) having it tested by a group of potential users from our target population. After Y1. the first version of the system should be up and running on our servers and, with an associated phone number, ready to call. Thus in Y2, as we add personalization and other features, we plan to advertise the service. We can advertise on campus to the Osher Lifetime Learning population and at various senior centers. But we largely will rely on help from AARP and Age Friendly in this.
After Y2, we would expect that the dialog system could start to spin off into a commercial venture with support at that time from AARP, perhaps Uber and others who could also advertise on our publicity and website.
Expected Accomplishments and Metrics
At the end of Y1, we will produce a working dialog system that our testers have used successfully to get real information about how to get to some places that they really intend to visit. Metrics of success include number of completed dialogs at each 3-month test run, length of dialog (in turns and in time), success rate (did the user get what they wanted?), user feedback via questionnaire.
At the end of Y2, we expect to have a complete working system that is ready to be deployed to a large number of users. Metrics include ability to deal with many incoming calls at the same time, success rate, percent of return users, length of dialog (in turns and in time).
||Faculty - Tenured
||Faculty - Researcher/Post-Doc
||Student - Masters
Amount of UTC Funds Awarded
Total Project Budget (from all funding sources)
|Data Management Plan
||Jan. 12, 2018, 6:55 a.m.
||Sept. 27, 2018, 9:19 a.m.
||GetGoing for senior transportation
||March 28, 2019, 5:08 a.m.
||March 28, 2019, 5:08 a.m.
||July 23, 2019, 4:58 a.m.
||July 23, 2019, 5:09 a.m.
No match sources!
|AARP and Age Friendly