Packt+ | Advance your knowledge in tech

You're reading from Learning Microsoft Cognitive Services, - Third Edition

Product typeBook

Published inSep 2018

Reading LevelBeginner

PublisherPackt

ISBN-139781789800616

Edition3rd Edition

Languages

Tools

Microsoft Cognitive Services

Concepts

Artificial Intelligence

Author (1)

Leif Larsen

Chapter 5. Speaking with Your Application

In the previous chapter, we learned how to discover and understand the intent of a user, based on utterances. In this chapter, we will learn how to add audio capabilities to our applications, convert text to speech and speech to text, and learn how to identify the person speaking. Throughout this chapter, we will learn how you can utilize spoken audio to verify a person. Finally, we will briefly touch on how to customize speech recognition to make it unique for your application's usage.

By the end of this chapter, we will have covered the following topics:

Converting spoken audio to text and text to spoken audio
Recognizing intent from spoken audio by utilizing LUIS
Verifying that the speaker is who they claim to be
Identifying the speaker
Tailoring the Speaker Recognition API to recognize custom speaking styles and environments

Converting text to audio and vice versa

In Chapter 1, Getting Started with Microsoft Cognitive Services, we utilized a part of the Bing Speech API. We gave the example application the ability to say sentences to us. We will use the code that we created in that example now, but we will dive a bit deeper into the details.

We will also go through the other feature of Bing Speech API, that is, converting spoken audio to text. The idea is that we can speak to the smart-house application, which will recognize what we are saying. Using the textual output, the application will use LUIS to gather the intent of our sentence. If LUIS needs more information, the application will politely ask us for more via audio.

To get started, we want to modify the build definition of the smart-house application. We need to specify whether we are running it on a 32-bit or 64-bit OS. To utilize speech-to-text conversion, we want to install the Bing Speech NuGet client package. Search for Microsoft.ProjectOxford.SpeechRecognition...

Knowing who is speaking

Using the Speaker Recognition API, we can identify who is speaking. By defining one or more speaker profiles with corresponding samples, we can identify whether any of them are speaking at any time.

To be able to utilize this feature, we need to go through a few steps:

We need to add one or more speaker profiles to the service.
Each speaker profile enrolls several spoken samples.
We call the service to identify a speaker based on audio input.

If you have not already done so, sign up for an API key for the Speaker Recognition API at https://portal.azure.com.

Start by adding a new NuGet package to your smart-house application. Search for and add Microsoft.ProjectOxford.SpeakerRecognition.

Add a new class called SpeakerIdentification to the Model folder of your project. This class will hold all of the functionality related to speaker identification.

Beneath the class, we will add another class, containing EventArgs for status updates:

    public class SpeakerIdentificationStatusUpdateEventArgs...

Verifying a person through speech

The process of verifying if a person is who they claim to be is quite similar to the identification process. To show how it is done, we will create a new example project, as we do not need this functionality in our smart-house application.

Add the Microsoft.ProjectOxford.SpeakerRecognition and NAudio NuGet packages to the project. We will need the Recording class that we used earlier, so copy this from the smart-house application's Model folder.

Open the MainView.xaml file. We need a few elements in the UI for the example to work. Add a Button element to add speaker profiles. Add two Listbox elements. One will hold available verification phrases while the other will list our speaker profiles.

Add Button elements for deleting a profile, starting and stopping enrollment recording, resetting enrollment, and starting/stopping verification recording.

In the ViewModel, you will need to add two ObservableCollection properties: one of type string, the other of type...

Customizing speech recognition

When we use speech recognition systems, there are several components that are working together. Two of the more important components are acoustic and language models. The first one labels short fragments of audio into sound units. The second helps the system decide the words, based on the likelihood of a given word appearing in certain sequences.

Although Microsoft has done a great job of creating comprehensive acoustic and language models, there may still be times when you need to customize these models.

Imagine that you have an application that is supposed to be used in a factory environment. Using speech recognition will require acoustic training of that environment so that the recognition can separate it from usual factory noises.

Another example is if your application is used by a specific group of people, say, an application for search, where programming is the main topic. You would typically use words such as object-oriented, dot net, or debugging. This...

Translating speech on the fly

Using the Translator Speech API, you can add automatic end-to-end translation for speech. Utilizing this API, one can submit an audio stream of speech and retrieve a textual and audio version of translated text. It uses silent detection to detect when speech has ended. Results will be streamed back once the pause is detected.

For a comprehensive list of supported languages, please visit the following site: https://www.microsoft.com/en-us/translator/business/languages/.

The result recieved from the API, will contain a stream of audio- and text-based results. The results contain the source text in its original language and the translation in the target language.

For a thorough example on how to use the Translator Speech API, please visit the following sample at GitHub: https://github.com/MicrosoftTranslator/SpeechTranslator.

Summary

Throughout this chapter, we have focused on speech. We started by looking at how we can convert spoken audio to text and text to spoken audio. Using this, we modified our LUIS implementation so that we can say commands and have conversations with the smart-house application. From there, we moved on to see how we can identify a person speaking using the Speaker Recognition API. Using the same API, we also learned how to verify that a person is who they claim to be. We briefly looked at the core functionality of the Custom Speech Service. Finally, we briefly covered an introduction to the Translator Speech API.

In the following chapter, we will move back to textual APIs, where we will learn how to explore and analyze text in different ways.

The rest of the chapter is locked

You have been reading a chapter from

Learning Microsoft Cognitive Services, - Third Edition

Published in: Sep 2018Publisher: PacktISBN-13: 9781789800616

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Leif Larsen

Leif Larsen is a software engineer based in Norway. After earning a degree in computer engineering, he went on to work with the design and configuration of industrial control systems, for the most part, in the oil and gas industry. Over the last few years, he has worked as a developer, developing and maintaining geographical information systems, working with .NET technology. Today, he is working with a start-up, developing a brand new SaaS product. In his spare time, he develops mobile apps and explores new technologies to keep up with the high-paced tech world. You can find out more about him by checking out his blog, "Leif Larsen", and following him on Twitter (@leif_larsen) and LinkedIn (lhlarsen).
Read more about Leif Larsen

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages