Chapter 4. Simple Voice Interactions
Wouldn't it be great if you could just speak to your mobile device to ask it for information or to get it to do something? This chapter looks at simple voice interactions that allow you to do just this. Two tutorial examples will show you how to implement a query to search for information as well as a request to launch one of the apps on your device.
Speech recognition is not perfect, thus it is interesting to implement some mechanisms to choose only the best recognition results. In previous chapters, we studied how to obtain confidence measures. In this chapter, we will cover two new mechanisms: similarity measures to compare the recognized input with what the user said, and confirmations to directly ask the user if the system understood correctly.
By the end of this chapter, you should be able to develop simple voice interactions to request information and carry out commands on your device. You should also be aware of how to use similarity measures and...
As discussed in Chapter 1, Speech on Android Devices, Google Voice Actions are simple interactions in which the user speaks a question or a command and the app responds with an action or a verbal response (or a combination of both).
Note
The following are examples of similar interactions with a simple structure and involving a small number of turns:
Example 1
User: BBC News
App: (launches BBC News)
Example 2
App: What is your query?
User: What is the capital of France?
App: (returns web pages about Paris and France)
The interactions are simple in the following ways:
Limited dialog management: The interactions consist of at most two or three turns.
Limited spoken language understanding: The user is restricted to inputs consisting of single words or phrases, such as the name of a website or of an app, or a stretch of text that can be handled by the Google search engine.
This app illustrates the following:
When clicking on the Press the button to speak option, the user is prompted to say some words.
The user speaks some words.
VoiceSearch
initiates a search query based on the words spoken by the user.
The opening screen has a button asking the user to press and speak. On pressing the button, the next screen displays the Google speech prompt What is your query? The results are displayed in a browser window.
In this case, the app uses the two libraries developed previously: TTSLib
(see Chapter 2, Text-to-Speech Synthesis) and ASRLib
(see Chapter 3, Speech Recognition). Their jar
files are included in the libs
folder of the VoiceSearch
project. The ASR methods are used to recognize the user input and use it as the search criterion. The TTS is employed to provide spoken feedback to the user about the status of the app.
This app combines the code that was already presented for the TTSWithLib
(Chapter 2, Text-to-Speech Synthesis) and the ASRWithLib
...
The functionality of this app is as follows:
When clicking on the Press to speak button, the user is prompted for the name of an app.
The user says the name of an app.
VoiceLaunch
compares the recognized input against the names of all the apps installed in the device, and launches the one whose name is the most similar.
An application like VoiceLaunch
does not require any interface, as the user could just speak the name of the app they want to be launched. However, for illustration purposes, we have created a simple interface in which the user can choose the values of two parameters: a similarity threshold and a similarity criterion, as shown in the following screenshot. The screenshot shows the scenario in which the user has asked to launch Email. VoiceLaunch
shows the screen in the figure and launches the Email application (the one with the highest similarity, in this case 1.00):
We have introduced the technique of similarity criteria to show how to improve on the results from...
VoiceSearchConfirmation app
Confirmations are a very important aspect of a transactional dialog and are also used extensively by humans in service transactions to ensure that everything has been understood correctly. Since the current speech recognition technology cannot guarantee that the app heard exactly what the user said, the app should confirm what the user wants, especially if the next action could result in unrecoverable consequences. However, confirmations should be used judiciously as they prolong the interaction and can be annoying for the user if they are overused.
The VoiceSearchConfirmation
app has the same functionality as VoiceSearch
, but it confirms the search criteria before performing the search. Two sample interactions with this app are as follows:
In this chapter, we have shown how to develop simple voice interactions using the Google speech recognition and TTS APIs. The first example showed how to take an input of some words from the user and initiate a search query. The second example involved using speech to launch apps on the device. Here we introduced the technique of using similarity measures to compare the recognition of the user's input with what might have been said. Two different measures were illustrated: orthographic similarity and phonetic similarity. The final example showed how to use confirmations in order to check with the user that the system had recognized the input correctly. These techniques, along with the use of confidence scores introduced in the previous chapter, are useful tools for the development of speech-enabled apps.
However, these interactions are limited in two ways. Firstly, they do not involve the use of dialog state information to control the interaction and to determine what the app should...