Packt+ | Advance your knowledge in tech

You're reading from Voice Application Development for Android

Product type Book

Published in Nov 2013

Publisher Packt

ISBN-13 9781783285297

Pages 134 pages

Edition 1st Edition

Languages

Java

Concepts

Android Development

Table of Contents (19) Chapters

Voice Application Development for Android

Credits

Foreword

About the Authors

Acknowledgement

About the Reviewers

www.PacktPub.com

Preface

1. Speech on Android Devices

2. Text-to-Speech Synthesis

3. Speech Recognition

4. Simple Voice Interactions

5. Form-filling Dialogs

6. Grammars for Dialog

7. Multilingual and Multimodal Dialogs

8. Dialogs with Virtual Personal Assistants

9. Taking it Further

Afterword

Index

Chapter 6. Grammars for Dialog

You will have noticed that the inputs in the form-filling dialogs studied in the previous chapter were restricted to single words and phrases. This chapter introduces the use of grammars to interpret more complex inputs and also to extract their meaning. Two types of grammars in common use for commercial applications are hand-crafted grammars for input that is predictable and well-defined, and statistical grammars for more robust performance with the less well-formed input typical of conversational speech.

By the end of this chapter, you should be able to develop apps that support more extended user input, making use of hand-crafted as well as statistical grammars.

Grammars for speech recognition and natural language understanding

Grammars can be used for two different purposes in speech-based apps that are as follows:

Speech recognition: In this case, grammars (also known as language models) specify the words and phrases that the recognizer can expect. For example, if the system is dealing with cities, it should not try to recognize numbers. Speech recognition grammars, as defined by W3C available at http://www.w3.org/TR/speech-grammar/, can either be specified explicitly by the developer (hand-crafted grammars) or can be computed from language data (statistical grammars). Speech recognition grammars help to make speech recognition more accurate.
Natural language understanding: The idea is to take the output of the recognizer and assign a semantic interpretation (or meaning) to the words. This can be done in several ways. One method involves determining the structure of the sentence (syntactic analysis) and then assigning a semantic interpretation ...

NLU with hand-crafted grammars

Designing a grammar involves predicting the different things the user might say and creating rules to cover them. Grammar design is an iterative process of creating an initial grammar, collecting data to test the grammar against actual user input, adding some phrases and removing others, and so on until the coverage of the grammar is as complete as possible. There are various tools to help with the design of grammars. For example, Nuance provides the Nuance Grammar Builder which can be used to test the coverage of a grammar, to check that the test phrases receive the correct semantic interpretation, and to test for over-generation that is, detect any unnecessary or unexpected phrases in the input (http://evolution.voxeo.com/library/grammar/grammar-gsl.pdf).

There are different languages for specifying speech grammars, the most popular are XML and Augmented BNF (ABNF), defined by W3C available at http://www.w3.org/TR/speech-grammar/, Java Script Grammar Format...

Statistical NLU

Hand-crafted grammars are time consuming to develop and prone to errors. Considerable linguistic and engineering expertise is required to develop a grammar with good coverage and optimized performance. Moreover, the rules of a hand-crafted grammar cannot easily cope with the irregular input that is characteristic of spontaneous spoken language. For example, given the recognized words I would like a um flight from Paris to New York on Monday no Tuesday afternoon, our grammar would fail since um and no are not specified in the rules.

A statistical grammar is an alternative to a hand-crafted grammar. Statistical grammars are learned from data and involve collecting and annotating large amounts of relevant language data. Statistical grammars can cope with irregular input as they do not have to match the input exactly but rather assign probabilities indicating the extent to which a structure or a semantic interpretation matches the input. There are different types of statistical...

The GrammarTest app

The GrammarTest app (sandra.examples.nlu.grammartest) illustrates how to use NLULib. It has a simple GUI in which the user selects the type of grammar to be used (hand-crafted or statistical), and can also select the Check text or Check ASR button to obtain a semantic representation of the input.

In the case of Check text, the input is typed into a TextView box using the keyboard. In the case of Check ASR, the app recognizes an oral input and produces the result for a 10-best list.

In the case of handcrafted grammar, an XML grammar is read from the specified location. The default grammar used is the one presented previously. If the input (either the input text or each of the N-best results) is in the grammar, it shows a valid message and the semantic representation, if not, it shows an invalid message (these messages are not hard-coded, but retrieved from the Strings file).

In the case of statistical grammar, the Maluuba service is used. In this case, we do not pose any...

Summary

This chapter has shown how to create and use grammars to check whether the user's input conforms to the words and phrases required by the app. Grammars are also used to extract a semantic representation from the user's input in terms of concepts relevant for the app. Two types of grammar were presented: a hand-crafted grammar designed by the developer to match the requirements of the app, and a statistical grammar learned from a large corpus of relevant data. Hand-crafted grammars are useful for input that is predictable and well-defined, whereas statistical grammars provide more robust performance and can handle a wider range of input that may be less well-formed.

In the chapters so far, the examples have assumed that the language used is English and that the interface is speech-only. Chapter 7, Multilingual and Multimodal Dialogs, will look at how to build apps that make use of languages other than English and other modalities in addition to speech.

The rest of the chapter is locked