Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Learning Microsoft Cognitive Services

You're reading from  Learning Microsoft Cognitive Services

Product type Book
Published in Mar 2017
Publisher Packt
ISBN-13 9781786467843
Pages 372 pages
Edition 1st Edition
Languages
Author (1):
Leif Larsen Leif Larsen
Profile icon Leif Larsen

Table of Contents (20) Chapters

Learning Microsoft Cognitive Services
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface
Getting Started with Microsoft Cognitive Services Analyzing Images to Recognize a Face Analyzing Videos Letting Applications Understand Commands Speak with Your Application Understanding Text Extending Knowledge Based on Context Querying Structured Data in a Natural Way Adding Specialized Searches Connecting the Pieces LUIS Entities and Intents Additional Information on Linguistic Analysis License Information

Chapter 5. Speak with Your Application

In the previous chapter, we learned to discover and understand the intent of a user, based on utterances. In this chapter, we will learn how to add audio capabilities to our applications. We will learn to convert text to speech and speech to text. We will learn how to identify the person speaking. Throughout this chapter, we will learn how you can utilize spoken audio to verify a person. Finally, we will touch briefly on how to customize speech recognition, to make it unique for your application's usage.

By the end of this chapter, we will have covered the following topics:

  • Converting spoken audio to text and text to spoken audio

  • Recognizing intent from spoken audio, utilizing LUIS

  • Verifying that the speaker is who they claim to be

  • Identifying the speaker

  • Tailoring the recognition API to recognize custom speaking styles and environments

Converting text to audio and vice versa


In Chapter 1, Getting Started with Microsoft Cognitive Services, we utilized a part of the Bing Speech API. We gave the example application the ability to speak sentences to us. We will use the code created in that example now, but we will dive a bit deeper into the details.

We will also go through the other feature of Bing Speech API, converting spoken audio to text. The idea is that we can speak to the Smart-House application, which will recognize what we are saying. Using the textual output, it will use LUIS to get the intent of our sentence. In case LUIS needs more information, the application will politely ask us with audio.

To get started, we want to modify the build definition of the Smart-House application. We need to specify whether we are running it on a 32-bit or a 64-bit OS. To utilize speech-to-text conversion, we want to install the Bing Speech NuGet client package. Search for Microsoft.ProjectOxford.SpeechRecognition, and install either...

Knowing who is speaking


Using the Speaker Recognition API we can identify who is speaking. By defining one or more speaker profiles, with corresponding samples, we can identify if any of these is speaking at any time.

To be able to utilize this feature, we need to go through a few steps:

  1. We add one or more speaker profile to the service.

  2. Each speaker profile enrolls several spoken samples.

  3. We call the service to identify a speaker based on audio input.

    Note

    If you have not already done so, sign up for an API key for the Speaker Recognition API at https://www.microsoft.com/cognitive-services/en-us/speaker-recognition-api .

Start by adding a new NuGet package to your Smart-House application. Search for and add Microsoft.ProjectOxford.SpeakerRecognition.

Add a new class called SpeakerIdentification to the Model folder of your project. This class will hold all the functionality related to speaker identification.

Beneath the class, we add another class, containing EventArgs for status updates:

    public...

Verifying a person through speech


The process of verifying if a person is who they claim to be is quite similar to the identification process. To show how it is done, we will create a new example project, as we do not need this functionality in our Smart-House Application.

Add the Microsoft.ProjectOxford.SpeakerRecognition and NAudio NuGet packages to the project. We will need the Recording class, which we used earlier, so copy this from the Smart-House application's Model folder.

Open the MainView.xaml file. We need a few elements in the UI for the example to work. Add a Button element to add speaker profiles. Add two Listbox elements. One will hold available verification phrases, while the other will list our speaker profiles.

Add Button elements for deleting a profile, starting and stopping enrollment recording, resetting enrollment, and starting/stopping verification recording.

In the ViewModel, you will need to add two ObservableCollection properties: one of type string, the other of type...

Customizing speech recognition


At the time of writing, the Custom Recognition Intelligent Service (CRIS) is still at the private beta stage. As such, we will not spend a lot of time on this, other than going through some key concepts.

When using speech-recognition systems, there are several components working together. Two of the more important components are acoustic and language models. The first one labels short fragments of audio into sound units. The second helps the system decide words, based on the likelihood of a given word appearing in certain sequences.

Although Microsoft have done a great job creating comprehensive acoustic and language models, there may still be times when you need to customize these models.

Imagine you have an application that is supposed to be used in a factory environment. Using speech recognition will require acoustic training of that environment, so that the recognition can separate usual factory noises.

Another example is if your application is used by a specific...

Summary


Throughout this chapter, we have focused on speech. We started by looking at how we can convert spoken audio to text and text to spoken audio. Using this, we modified our LUIS implementation so that we could speak commands and have conversations with the Smart-House Application. From there, we moved on to see how we can identify a person speaking using the Speaker Recognition API. Using the same API, we also learned how to verify that a person is who they claim to be. Finally, we looked briefly at the core functionality of the Custom Recognition Intelligent Service.

In the following chapter, we will move back to textual APIs, where we will learn how to explore and analyze text in different ways.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Learning Microsoft Cognitive Services
Published in: Mar 2017 Publisher: Packt ISBN-13: 9781786467843
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}