Developing apps with the Google Speech APIs

Exclusive offer: get 50% off this eBook here
Voice Application Development for Android

Voice Application Development for Android — Save 50%

A practical guide to develop advanced and exciting voice applications for Android using open source software with this book and ebook

£13.99    £7.00
by Michael F. McTear Zoraida Callejas | November 2013 | Open Source

This article by Micheal F. McTear and Zoraida Callejas, the authors of the book Voice Application Development for Android, covers the main aspects of developing apps with the Google Search APIs.

(For more resources related to this topic, see here.)

Speech technology has come of age. If you own a smartphone or tablet you can perform many tasks on your device using voice. For example, you can send a text message, update your calendar, set an alarm, and do many other things with a single spoken command that would take multiple steps to complete using traditional methods such as tapping and selecting. You can also ask the sorts of queries that you would previously have typed into your Google search box and get a spoken response.

For those who wish to develop their own speech-based apps, Google provides APIs for the basic technologies of text-to-speech synthesis (TTS) and automated speech recognition (ASR). Using these APIs developers can create useful interactive voice applications. This article provides a brief overview of the Google APIs and then goes on to show some examples of voice-based apps built around the APIs.

Using the Google text-to-speech synthesis API

TTS has been available on Android devices since Android 1.6 (API Level 4). The components of the Google TTS API (package android.speech.tts) are documented at http://developer.android.com/reference/android/speech/tts/package-summary.html. Interfaces and classes are listed here and further details can be obtained by clicking on these.

Starting the TTS engine involves creating an instance of the TextToSpeech class along with the method that will be executed when the TTS engine is initialized. Checking that TTS has been initialized is done through an interface called OnInitListener. If TTS initialization is complete, the method onInit is invoked.

If TTS has been initialized correctly, the speak method is invoked to speak out some words:

TextToSpeech tts = new TextToSpeech(this, new OnInitListener(){ public void onInit(int status) { if (status == TextToSpeech.SUCCESS) speak(“Hello world”, TextToSpeech.QUEUE_ADD, null); } }

Due to limited storage on some devices, not all languages that are supported may actually be installed on a particular device so it is important to check if a particular language is available before creating the TextToSpeech object. This way, it is possible to download and install the required language-specific resource files if necessary. This is done by sending an Intent with the action ACTION_CHECK_TTS_DATA, which is part of the TextToSpeech.Engine class:

Intent intent = new In-tent(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA); startActivityForResult(intent,TTS_DATA_CHECK);

If the language data has been correctly installed, the onActivityResult handler will receive a CHECK_VOICE_DATA_PASS. If the data is not available, the action ACTION_INSTALL_TTS_DATA will be executed:

Intent installData = new Intent (Engine. ACTION_INSTALL_TTS_DATA); startActivity(installData);

The next figure shows an example of an app using the TTS API.

A potential use-case for this type of app is when the user accesses some text on the Web - for example, a news item, email, or a sports report. This is useful if the user’s eyes and hands are busy, or if there are problems reading the text on the screen. In this example, the app retrieves some text and the user presses the Speak button to hear it. A Stop button is provided in case the user does not wish to hear all of the text.

Using the Google speech recognition API

The components of the Google Speech API (package android.speech) are documented at http://developer.android.com/reference/android/speech/package-summary.html. Interfaces and classes are listed and further details can be obtained by clicking on these.

There are two ways in which speech recognition can be carried out on an Android Device: based solely on a RecognizerIntent, or by creating an instance of SpeechRecognizer.

The following code shows how to start an activity to recognize speech using the first approach:

Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); // Specify language model intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, languageModel); // Specify how many results to receive. Results listed in order of confidence intent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, numberRecoResults); // Start listening startActivityForResult(intent, ASR_CODE);

The app shown below illustrates the following:

  1. The user selects the parameters for speech recognition.
  2. The user presses a button and says some words.
  3. The words recognized are displayed in a list along with their confidence scores.

Multilingual apps

It is important to be able to develop apps in languages other than English. The TTS and ASR engines can be configured to a wide range of languages. However, we cannot expect that all languages will be available or that they are supported on a particular device. Thus, before selecting a language it is necessary to check whether it is one of the supported languages, and if not, to set the currently preferred language.

In order to do this, a RecognizerIntent.ACTION_GET_LANGUAGE_DETAILS ordered broadcast is sent that returns a Bundle from which the information about the preferred language (RecognizerIntent.EXTRA_LANGUAGE_PREFERENCE) and the list of supported languages (RecognizerIntent.EXTRA_SUPPORTED_LANGUAGES) can be extracted. For speech recognition this introduces a new parameter for the intent in which the language is specified that will be used for recognition, as shown in the following code line:

intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, language);

As shown in the next figure, the user is asked for a language and then the app recognizes what the user says and plays back the best recognition result in the language selected.

Creating a Virtual Personal Assistant (VPA)

Many voice-based apps need to do more than simply speak and understand speech. For example, a VPA also needs the ability to engage in dialog with the user and to perform operations such as connecting to web services and activating device functions. One way to enable these additional capabilities is to make use of chatbot technology (see, for example, the Pandorabots web service: http://www.pandorabots.com/).

The following figure shows two VPAs, Jack and Derek, that have been developed in this way.

Jack is a general-purpose VPA, while Derek is a specialized VPA that can answer questions about Type 2 diabetes, such as symptoms, causes, treatment, risks to children, and complications.

Summary

The Google Speech APIs can be used in countless ways to develop interesting and useful voice-based apps. This article has shown some examples. By building on these you will be able to bring the power of voice to your Android apps, making them smarter and more intuitive, and boosting your users' mobile experience.

Resources for Article:


Further resources on this subject:


Voice Application Development for Android A practical guide to develop advanced and exciting voice applications for Android using open source software with this book and ebook
Published: November 2013
eBook Price: £13.99
Book Price: £22.99
See more
Select your format and quantity:

About the Author :


Michael F. McTear

Michael McTear is Emeritus Professor of Knowledge Engineering at the University of Ulster with a special research interest in spoken language technologies. He graduated in German Language and Literature from Queens University Belfast in 1965, was awarded MA in Linguistics at University of Essex in 1975, and a PhD at the University of Ulster in 1981. He has been Visiting Professor at the University of Hawaii (1986-87), the University of Koblenz, Germany (1994-95), and University of Granada, Spain (2006- 2010). He has been researching in the field of spoken dialogue systems for more than 15 years and is the author of the widely used text book Spoken Dialogue Technology: Toward the Conversational User Interface (Springer Verlag, 2004). He also is a co-author of the book Spoken Dialogue Systems (Morgan and Claypool, 2010).

Michael has delivered keynote addresses at many conferences and workshops, including the EU funded DUMAS Workshop, Geneva, 2004, the SIGDial workshop, Lisbon, 2005, the Spanish Conference on Natural Language Processing (SEPLN), Granada, 2005, and has delivered invited tutorials at IEEE/ACL Conference on Spoken Language Technologies, Aruba, 2006, and ACL 2007, Prague. He has presented on several occasions at SpeechTEK, a conference for speech technology professionals, in New York and London. He is a certified VoiceXML developer and has taught VoiceXML at training courses to professionals from companies including Genesys, Oracle, Orange, 3, Fujitsu, and Santander. He was the main developer of the VoiceXML-based home monitoring system for patients with type-2 diabetes, currently in use at the Ulster Hospital, Northern Ireland.

Zoraida Callejas

Zoraida Callejas is Assistant Professor at the University of Granada, Spain, where she has been teaching several subjects related to Oral and Multimodal Interfaces, Object Oriented Programming, and Software Engineering for the last eight years. She graduated in Computer Science in 2005, and was awarded a PhD in 2008 from the University of Granada. She has been Visiting Professor in Technical University of Liberec, Czech Republic (2007-13), University of Trento, Italy (2008), University of Ulster, Northern Ireland (2009), Technical University of Berlin, Germany (2010), University of Ulm, Germany (2012), and Telecom ParisTech, France (2013).

Zoraida focuses her research on speech technology and in particular, on spoken and multimodal dialogue systems. Zoraida has made presentations at the main conferences in the area of dialogue systems, and has published her research in several international journals and books. She has also coordinated training courses in the development of interactive speech processing systems, and has regularly taught object-oriented software development in Java in different graduate courses for nine years. Currently, she leads a local project for the development of Android speech applications for intellectually disabled users.

Books From Packt


Android Application Security Essentials
Android Application Security Essentials

Android Development Tools for Eclipse
Android Development Tools for Eclipse

 Android Security Cookbook
Android Security Cookbook

Android 4: New Features for Application Development
Android 4: New Features for Application Development

Instant Android Fragmentation Management How-to [Instant]
Instant Android Fragmentation Management How-to [Instant]

 Android Native Development Kit Cookbook
Android Native Development Kit Cookbook

 Android Database Programming
Android Database Programming

 Instant Android Systems Development How-to [Instant]
Instant Android Systems Development How-to [Instant]


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software