Search icon CANCEL
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Learning Hub
Free Learning
Arrow right icon
Voice Application Development for Android
Voice Application Development for Android

Voice Application Development for Android: A practical guide to develop advanced and exciting voice applications for Android using open source software

By Zoraida Callejas , Michael F McTear
€24.99 €16.99
Book Nov 2013 134 pages 1st Edition
€24.99 €16.99
Free Trial
Renews at €14.99p/m
€24.99 €16.99
Free Trial
Renews at €14.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Table of content icon View table of contents Preview book icon Preview Book

Voice Application Development for Android

Chapter 1. Speech on Android Devices

Have you ever wanted to create voice-based apps that you could run on your own Android device; apps that you could talk to and that could talk back to you? This chapter provides an introduction to the use of speech on Android devices, using open-source APIs from Google for text-to-speech synthesis and speech recognition. Following a brief overview of the world of Voice User Interfaces (VUIs), the chapter outlines the components of an interactive voice application (or virtual personal assistant).

By the end of this chapter you should have a good understanding of what is required to create a voice-based app using freely available resources from Google.

Using speech on an Android device

Android devices provide built-in speech-to-text and text-to-speech capabilities. The following are some examples of speech-based apps on Android:


With speech-to-text users of Android devices can dictate into any text box on the device where textual input is required, for example, e-mail, text messaging, and search. The keyboard control contains a button with a microphone symbol and two letters indicating the language input settings, which can be changed by the user. On pressing the microphone button a window pops up asking the user to Speak Now. The spoken input is automatically transcribed into written text. The user can then decide what to do with the transcribed text.

Accuracy rates have improved considerably for dictation on small devices, on one hand due to the use of large-scale cloud-based resources for speech recognition, and on the other, to the fact that the device is usually held close to the user's mouth so that a more reliable acoustic signal can be obtained. One of the main challenges for voice dictation is that the input is unpredictable—users can say literally anything—and so a large general vocabulary is required to cover all possible inputs. Other challenges include dealing with background noise, sloppy speech, and unfamiliar accents.


Text-to-speech (TTS) is used to convert text to speech. Various applications can take advantage of TTS. For example, TalkBack, which is available through the Accessibility option, uses TTS to help blind and visually impaired users by describing what items are touched, selected and activated. TalkBack can also be used to read a book in the Google Play Books app. The TTS function is also available on Android Kindle as well as on Google Maps for giving step-by-step driving instructions. There is a wide range of third-party apps that make use of TTS, and alternative TTS engines are also available.

Voice Search

Voice Search provides the same functionality on Android devices as the traditional Google Search except that instead of typing a query the user speaks it. Voice Search is available using the microphone in the Google Search widget. In Voice Search the recognized text is passed to the search engine and executed in the same way that a typed query is executed.

A new feature of Voice Search is that, in addition to returning a list of links, a spoken response to the query is returned. For example, in response to the question "How tall is the Eiffel tower?", the app replies, "The Eiffel tower is 324 meters tall." It is also possible to ask follow-up questions using pronouns, for example, "When was it built?". This additional functionality is made possible by combining Google's Knowledge Graph—a knowledge base used by Google—with its conversational search technology to provide a more conversational style of interaction.

Android Voice Actions

Android Voice Actions can also be accessed using the microphone in the Google Search widget. Voice Actions allow the user to control their device using voice commands. Voice Actions require input that matches a particular structure, as shown in the following list from Google's webpage: Note: items with * are optional. Italicized items are the words to be spoken.

Voice Action



Send text messages

send text to [recipient] [message]*

send text to Allison Miller Running late. I will be home around 9

Call businesses

call [business name] [location]*

call Soho Pizzeria London

View a map

map of [address/city]

map of London

Search Google

[your query]

pictures of Stonehenge at sunset

Get directions

navigate to [address/city/business name]

navigate to British Museum London


navigate to 24 Mill Street

Call contacts

call [contact name] [phone type]*

call Allison Miller home

Go to websites

go to [website]

go to Wikipedia

The structures in Voice Actions allow them to be mapped on to actions that are available on the device. For example, the keyword call indicates a phone call while the key phrase go to indicates a website to be launched. Additional processing is required to extract the parameters of the actions, such as contact name and website.

Virtual Personal Assistants

One of the most exciting speech-based apps is the Virtual Personal Assistant (VPA), which acts like a personal assistant, performing a range of tasks such as finding information about local restaurants; carrying out commands involving apps on the device, for example, using speech to set the alarm or update the calendar; and engaging in general conversation. There are at least 20 VPAs available for Android devices (see the web page for this book) although the best-known VPA is Siri, which has been available on the iPhone iOS since 2011. You can find examples of interactions with Siri that are similar to those performed by Android VPAs on Apple's website Many VPAs, including Siri, have been created with a personality and an ability to respond in a humorous way to trick questions and dubious input, thus adding to their entertainment value. See examples at as well as numerous video clips on YouTube.

It is worth mentioning that a number of technologies share some of the characteristics of VPAs as explained in the following:

Dialog systems, which have a long tradition in academic research, are based on the vision of developing systems that can communicate with humans in natural language (initially written text but more recently speech). The first systems were concerned with obtaining information, for example, flight times or stock quotes. The next generation enabled users to engage in some form of transaction, in banking or making a travel reservation, while more recent systems are being developed to assist in troubleshooting, for example, guiding a user who is having difficulty setting up some item of equipment. A wide range of techniques have been used to implement dialog systems, including rule-based and statistically-based dialog processing.

Voice User Interfaces (VUIs), which are similar to dialog systems but with the emphasis on commercial deployment. Here the focus has tended to be on systems for specific purposes, such as call routing, directory assistance, and transactional dialogs for example, travel, hotel, flight, car rental, or bank balance. Many current VUIs have been designed using VoiceXML, a markup language based on XML. The VoiceXML scripts are then interpreted on a voice browser that also provides the required speech and telephony functions.

Chatbots, which have been used traditionally to simulate human conversation. The earliest chatbots go back to the 1960s with the famous ELIZA program written by Joseph Weizenbaum that simulated a Rogerian psychotherapist—often in a convincing way. More recently chatbots have been used in education, information retrieval, business, e-commerce, and in automated help desks. Chatbots use a sophisticated pattern-matching algorithm to match the user's input and to retrieve appropriate responses. Most chatbots have been text-based although increasingly speech-based chatbots are beginning to emerge (see further in Chapter 8, Dialogs with Virtual Personal Assistants).

Embodied conversational agents (ECAs), are computer-generated animated characters that combine facial expression, body stance, hand gestures, and speech to provide an enriched channel of communication. By enhancing the visual dimensions of face-to-face interaction embodied conversational agents can appear more trustworthy and believable, and also more interesting and entertaining. Embodied conversational agents have been used in applications such as interactive language learning, virtual training environments, virtual reality game shows, and interactive fiction and storytelling systems. Increasingly they are being used in e-commerce and e-banking to provide friendly and helpful automated help. See, for example, the agent Anna at the IKEA website

Virtual Personal Assistants differ from these technologies in that they allow users to use speech to perform many of the functions that are available on mobile devices, such as sending a text message, consulting and updating the calendar, or setting an alarm. They also provide access to web services, such as finding a restaurant, tracking a delivery, booking a flight, or using information services such as Knowledge Graph, Wolfram Alpha, or Wikipedia. Because they have access to contextual information on the device such as the user's location, time and date, contacts, and calendar, the VPA can provide information such as restaurant recommendations relevant to the user's location and preferences.

Designing and developing a speech app

Speech app design shares many of the characteristics of software design in general, but there are also some aspects unique to voice interfaces—for example, dealing with the issue that speech recognition is always going to be less than 100 percent accurate, and so is less reliable compared with input when using a GUI. Another issue is that, since speech is transient, especially on devices with no visual display, greater demands are put on the user's memory compared with a GUI app.

There are many factors that contribute to the usability of a speech-based app. It is important to perform extensive use case analysis in order to determine the requirements of the system, looking at issues such as whether the app is to replace or complement an existing app; whether speech is appropriate as a medium for input/output; the type of service to be provided by the app; the types of user who will make use of the app; and the general deployment environment for the app.

Why Google speech?

The following are our reasons for using Google speech:

  • The proliferation of Android devices: Recent information on Android states that "Android had a worldwide smartphone market share of 75% during the third quarter of 2012,with 750 million devices activated in total and 1.5 million activations per day." (From Retrieved 09/07/2013).

  • The Android SDK is open source: The fact that the Android SDK is open source makes it more easily available for developers and enthusiasts to create apps, compared with some other operating systems. Anyone can develop their own apps using a free development environment such as Eclipse and then upload it to their Android device for their own personal use and enjoyment.

  • The Google Speech APIs: The Google Speech APIs are available for free for use on Android devices. This means that the Speech APIs are useful for developers wishing to try out speech without investing in expensive commercially available alternatives. As Google employs many of the top speech scientists, their speech APIs are comparable in performance to those on offer commercially.


You may also try…

Nuance NDEV Mobile, which supports a number of languages for text-to-speech synthesis and speech recognition as well as providing a PhoneGap plug-in to enable developers to implement their apps on different platforms (

The AT&T Speech Mashup (, which supports the development of speech-based apps and the use of W3C standard speech recognition grammars.

What is needed to create a Virtual Personal Assistant?

The following figure shows the various components required to build a speech-enabled VPA.

A basic requirement for a VPA is that it should be able to speak and to understand speech. Text to speech synthesis, which provides the ability to speak, is discussed in Chapter 2, Text To Speech Synthesis, while speech recognition is covered in Chapter 3, Speech Recognition. However, while these capabilities are fundamental for a voice-enabled assistant, they are not sufficient. The ability to engage in dialog and connect to web services and device functions is also required as the basis of personal assistance. To do these things a VPA requires the following:

  • A method for controlling the dialog, determining who should take the dialog initiative and what topics they should cover. In practice this can be simplified by having one-shot interactions in which the user simply speaks their query and the app responds. One-shot interactions are covered in Chapter 4, Simple Voice Interactions. System-directed dialogs, in which the app asks a series of questions—as in web-based form-filling (for example, to book a hotel or rent a car), are covered in Chapter 5, Form-filling Dialogs.

  • A method for interpreting the user's input once it has been recognized. This is the task of the Spoken Language Understanding component which, among other things, provides a semantic interpretation representing the meaning of what the user said. Since in many commercial systems input is restricted to single words or phrases, the interpretation is relatively straightforward. Two different approaches will be illustrated in Chapter 6, Grammars for Dialog: how to create a hand-crafted grammar that covers the words and phrases that the user might say; and how to use statistical grammars to cover a wider range of inputs and to provide a more robust interpretation. It also provides different modalities if speech input and output is not possible or performance is poor. A VPA should also have the ability to use different languages, if required. These topics are covered in Chapter 7, Multilingual and Multimodal Dialogs.

  • Determining relevant actions and generating appropriate responses. These aspects of dialog management and response generation are described in Chapter 7, Multilingual and Multimodal Dialogs, and in Chapter 8, Dialogs with Personal Virtual Assistants.

Building on the basic technologies of text-to-speech synthesis and speech recognition, as presented in Chapter 2 and Chapter 3, Chapters 4-8 cover a range of techniques that will enable developers to take the basic technologies further and create speech-based apps using the Google speech APIs.


This chapter has provided an introduction to speech technology on Android devices. We examined various types of speech app that are currently available on Android devices. We also looked at why we decided to focus on Google Speech APIs as tools for the developer. Finally we introduced the main technologies required to create a Virtual Personal Assistant. These technologies will be covered in the remaining chapters of this book.

We will introduce you to text-to-speech synthesis (TTS) and show how to use the Google TTS API to develop applications that speak in the next chapter.

Left arrow icon Right arrow icon

Key benefits

  • A comprehensive guide containing all the best practices for voice application development for Android
  • Progress quickly from basic apps to more advanced topics
  • Written in an easy-to-follow style with detailed descriptions of the included code examples to help you learn quickly and efficiently
  • You can download the updated code here


Speech technology has been around for some time now. However, it has only more recently captured the imagination of the general public with the advent of personal assistants on mobile devices that you can talk to in your own language. The potential of voice apps is huge as a novel and natural way to use mobile devices. Voice Application Development for Android is a practical, hands-on guide that provides you with a series of clear, step-by-step examples which will help you to build on the basic technologies and create more advanced and more engaging applications. With this book, you will learn how to create useful voice apps that you can deploy on your own Android device in no time at all. This book introduces you to the technologies behind voice application development in a clear and intuitive way. You will learn how to use open source software to develop apps that talk and that recognize your speech. Building on this, you will progress to developing more complex apps that can perform useful tasks, and you will learn how to develop a simple voice-based personal assistant that you can customize to suit your own needs. For more interesting information about the book, visit

What you will learn

Use text-to-speech synthesis so that your device can talk to you Enable your device to recognize your speech Create simple voice interactions to get information and carry out commands Develop a voice app that engages in a dialogue with you to collect the information required to perform a transaction Use grammars to enable your app to understand the meaning behind your words Make use of different languages in your apps Add multimodal interaction to your apps as an alternative to speech Build a voice-based personal assistant using an open source development platform for chatbots

Product Details

Country selected

Publication date : Nov 25, 2013
Length 134 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781783285297
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want

Product Details

Publication date : Nov 25, 2013
Length 134 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781783285297
Category :
Languages :

Table of Contents

19 Chapters
Voice Application Development for Android Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
Foreword Chevron down icon Chevron up icon
About the Authors Chevron down icon Chevron up icon
Acknowledgement Chevron down icon Chevron up icon
About the Reviewers Chevron down icon Chevron up icon Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
1. Speech on Android Devices Chevron down icon Chevron up icon
2. Text-to-Speech Synthesis Chevron down icon Chevron up icon
3. Speech Recognition Chevron down icon Chevron up icon
4. Simple Voice Interactions Chevron down icon Chevron up icon
5. Form-filling Dialogs Chevron down icon Chevron up icon
6. Grammars for Dialog Chevron down icon Chevron up icon
7. Multilingual and Multimodal Dialogs Chevron down icon Chevron up icon
8. Dialogs with Virtual Personal Assistants Chevron down icon Chevron up icon
9. Taking it Further Chevron down icon Chevron up icon
Afterword Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Top Reviews
No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial


How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to
  • To contact us directly if a problem is not resolved, use
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.