Packt+ | Advance your knowledge in tech

You're reading from Voice Application Development for Android

Product type Book

Published in Nov 2013

Publisher Packt

ISBN-13 9781783285297

Pages 134 pages

Edition 1st Edition

Languages

Java

Concepts

Android Development

Table of Contents (19) Chapters

Voice Application Development for Android

Credits

Foreword

About the Authors

Acknowledgement

About the Reviewers

www.PacktPub.com

Preface

1. Speech on Android Devices

2. Text-to-Speech Synthesis

3. Speech Recognition

4. Simple Voice Interactions

5. Form-filling Dialogs

6. Grammars for Dialog

7. Multilingual and Multimodal Dialogs

8. Dialogs with Virtual Personal Assistants

9. Taking it Further

Afterword

Index

Chapter 4. Simple Voice Interactions

Wouldn't it be great if you could just speak to your mobile device to ask it for information or to get it to do something? This chapter looks at simple voice interactions that allow you to do just this. Two tutorial examples will show you how to implement a query to search for information as well as a request to launch one of the apps on your device.

Speech recognition is not perfect, thus it is interesting to implement some mechanisms to choose only the best recognition results. In previous chapters, we studied how to obtain confidence measures. In this chapter, we will cover two new mechanisms: similarity measures to compare the recognized input with what the user said, and confirmations to directly ask the user if the system understood correctly.

By the end of this chapter, you should be able to develop simple voice interactions to request information and carry out commands on your device. You should also be aware of how to use similarity measures and...

Voice interactions

As discussed in Chapter 1, Speech on Android Devices, Google Voice Actions are simple interactions in which the user speaks a question or a command and the app responds with an action or a verbal response (or a combination of both).

Note

The following are examples of similar interactions with a simple structure and involving a small number of turns:

Example 1

User: BBC News

App: (launches BBC News)

Example 2

App: What is your query?

User: What is the capital of France?

App: (returns web pages about Paris and France)

The interactions are simple in the following ways:

Limited dialog management: The interactions consist of at most two or three turns.
Limited spoken language understanding: The user is restricted to inputs consisting of single words or phrases, such as the name of a website or of an app, or a stretch of text that can be handled by the Google search engine.

VoiceSearch app

This app illustrates the following:

When clicking on the Press the button to speak option, the user is prompted to say some words.
The user speaks some words.
VoiceSearch initiates a search query based on the words spoken by the user.

The opening screen has a button asking the user to press and speak. On pressing the button, the next screen displays the Google speech prompt What is your query? The results are displayed in a browser window.

In this case, the app uses the two libraries developed previously: TTSLib (see Chapter 2, Text-to-Speech Synthesis) and ASRLib (see Chapter 3, Speech Recognition). Their jar files are included in the libs folder of the VoiceSearch project. The ASR methods are used to recognize the user input and use it as the search criterion. The TTS is employed to provide spoken feedback to the user about the status of the app.

This app combines the code that was already presented for the TTSWithLib (Chapter 2, Text-to-Speech Synthesis) and the ASRWithLib ...

VoiceLaunch app

The functionality of this app is as follows:

When clicking on the Press to speak button, the user is prompted for the name of an app.
The user says the name of an app.
VoiceLaunch compares the recognized input against the names of all the apps installed in the device, and launches the one whose name is the most similar.

An application like VoiceLaunch does not require any interface, as the user could just speak the name of the app they want to be launched. However, for illustration purposes, we have created a simple interface in which the user can choose the values of two parameters: a similarity threshold and a similarity criterion, as shown in the following screenshot. The screenshot shows the scenario in which the user has asked to launch Email. VoiceLaunch shows the screen in the figure and launches the Email application (the one with the highest similarity, in this case 1.00):

We have introduced the technique of similarity criteria to show how to improve on the results from...

VoiceSearchConfirmation app

Confirmations are a very important aspect of a transactional dialog and are also used extensively by humans in service transactions to ensure that everything has been understood correctly. Since the current speech recognition technology cannot guarantee that the app heard exactly what the user said, the app should confirm what the user wants, especially if the next action could result in unrecoverable consequences. However, confirmations should be used judiciously as they prolong the interaction and can be annoying for the user if they are overused.

The VoiceSearchConfirmation app has the same functionality as VoiceSearch, but it confirms the search criteria before performing the search. Two sample interactions with this app are as follows:

Confirmation scenario: This scenario is characterized by the following steps:
1. The user pushes the button to talk and says Weather in Belfast.
2. The system understands Weather in Belfast and asks Did you say weather in Belfast?
3. The...

Summary

In this chapter, we have shown how to develop simple voice interactions using the Google speech recognition and TTS APIs. The first example showed how to take an input of some words from the user and initiate a search query. The second example involved using speech to launch apps on the device. Here we introduced the technique of using similarity measures to compare the recognition of the user's input with what might have been said. Two different measures were illustrated: orthographic similarity and phonetic similarity. The final example showed how to use confirmations in order to check with the user that the system had recognized the input correctly. These techniques, along with the use of confidence scores introduced in the previous chapter, are useful tools for the development of speech-enabled apps.

However, these interactions are limited in two ways. Firstly, they do not involve the use of dialog state information to control the interaction and to determine what the app should...

The rest of the chapter is locked