You're reading from Practical Guide to Azure Cognitive Services

Product type Book

Published in May 2023

Publisher Packt

ISBN-13 9781801812917

Pages 454 pages

Edition 1st Edition

Languages

Concepts

Neural Networks

Authors (3):

Chris Seferlis

Christopher Nellis

Andy Roberts

View More author details

Table of Contents (22) Chapters

Preface

Part 1: Ocean Smart – an AI Success Story

Chapter 1: How Azure AI Changed Ocean Smart

Chapter 2: Why Azure Cognitive Services?

Chapter 3: Architectural and Cost Optimization Considerations

Part 2: Deploying Next-Generation Knowledge Mining Solutions with Azure Cognitive Search

Chapter 4: Deriving Value from Knowledge Mining Solutions in Azure

Chapter 5: Azure Cognitive Search Overview and Implementation

Chapter 6: Exploring Further Azure Cognitive Services for Successful KM Solutions

Chapter 7: Pulling It All Together for a Complete KM Solution

Part 3: Other Cognitive Services That Will Help Your Company Optimize Operations

Chapter 8: Decluttering Paperwork with Form Recognizer

Chapter 9: Identifying Problems with Anomaly Detector

Chapter 10: Streamlining the Quality Control Process with Custom Vision

Chapter 11: Deploying a Content Moderator

Chapter 12: Using Personalizer to Cater to Your Audience

Chapter 13: Improving Customer Experience with Speech to Text

Chapter 14: Using Language Services in Chat Bots and Beyond

Chapter 15: Surveying Our Progress

Chapter 16: Appendix – Azure OpenAI Overview

Index

Why subscribe?

Other Books You May Enjoy

Improving Customer Experience with Speech to Text

When a customer calls the Ocean Smart customer service line, they are anticipating a friendly and helpful associate who will solve their current challenge. With years of exceptional service and quality being provided to customers, this is the expectation that has been built. However, with massive growth and global expansion, monitoring this quality is not as easy as it once was. Ocean Smart hoped that AI could help provide a solution and a way to track the quality of the customer service that was being provided.

A great customer experience is becoming more and more critical for successful businesses in this climate of on-demand everything. If a person has a not-so-great experience with a company, they’re sure to let the world know as quickly as possible using as many social media outlets as possible. Because of this, Ocean Smart wanted a better system for improving how customer calls were handled and wanted to set a precedent...

Technical requirements

As with previous chapters and deployments, there are some requirements you’ll need in order to build any example for your use. First is an Azure account with a minimum of contributor rights if you’re not an owner of the subscription to be able to deploy resources. To help reduce costs for development, oftentimes, developers can use Visual Studio Code. You can download it for Windows, Linux, or macOS platforms here: https://code.visualstudio.com/download. For the examples we will display later in the chapter, you will need several extensions in Visual Studio Code, which can be downloaded directly within the tool. These extensions are as follows:

Python as an extension:
- You’ll also need to install the Python tools on your workstation, found here: https://www.python.org/downloads/
- Speech SDK for Python (https://pypi.org/project/azure-cognitiveservices-speech/)
Azure account
Azure Functions
Azure CLI Tools (not required, but...

Overview of Azure Speech services

The Azure Speech services are a collection of APIs surrounding various ways to convert speech to text, convert text to speech, translate speech, and other related services. When we consider the importance of speech in any business, and the ability to improve communications for accessibility and cultural reasons, it is easy to position these capabilities as transformative. As organizations become globalized and it is an everyday occurrence that language translation services can be used to improve internal and external communications, Microsoft has made significant effort and investment to support the most popular languages worldwide. In this chapter, you will learn how those investments have evolved to offer many solutions where the communication gaps have been closed significantly, which has led to an enhanced customer service experience for Ocean Smart customers.

In the Ocean Smart example, we are taking audio recordings from voice messages and...

Working with real-time audio and batch speech-to-text data

Now that we have provided an overview of the various services you can leverage and use cases you can expect to deploy using the Speech services, we will start to explore deeper to better position our example of building a customer service feedback system. Due to the nature of our example, we will focus on how to use batch audio transcription services; however, with so many applications for real-time transcription, we will explore both options in this section, as the approaches are vastly different.

In the case of a call center, and improving the customer service process, there could be applications for real-time feedback to be provided to the customer service agent. This could provide a sentiment score as the conversation is happening based on the words being used within the conversation; however, with the nature of any conversation, the tone could change very quickly from positive to negative and could cause a distraction...

Improving speech-to-text accuracy with Custom Speech

Even though the Microsoft research and development teams have received tremendous acclaim for all their work in developing groundbreaking machine learning technology for transcribing speech-to-text, they are aware that not all business domain-specific details can be captured. For this reason, they have provided the ability for customers to augment the base machine learning model with domain-specific terms directly related to the customer business. This portion of the chapter will focus on how to work with and deploy these custom models for use in your organization.

To build your augmented model, you will use the Speech Studio, which can be found at https://speech.microsoft.com/portal. After you have logged in with your Azure account, you will be presented with several options for working with various speech operations, including the following:

Speech-to-text
Text-to-speech
Voice assistant
Additional resources...

Working with different languages

As we have previously discussed in this book, the globalization of the planet Earth over the past 30-50 years has created an evolution in technology unlike ever imagined. Moore’s law observes that the number of transistors in a dense integrated circuit doubles about every two years, and this has roughly held true since his initial prediction back in 1975 until when, very recently, it was declared no longer considered possible (Wikipedia, https://en.wikipedia.org/wiki/Moore%27s_law). As one result, we have seen a massive proliferation of technology to help humans adapt to the challenges of globalization, and we cannot look past the language barriers faced by international travelers and companies. What’s more compelling when we can do more than simply text translation using a search engine is the ability to be able to translate the spoken word on the fly using that technology. For this reason and many more, Microsoft has made significant...

Building a complete batch solution

In this chapter, we use the Speech service to translate audio files that are sent into Azure Blob Storage. When the file is created, we can then choose to perform other downstream activities – for example, extracting a sentiment from the document and tracking the results. The following diagram shows the process that we follow to monitor the storage account, begin the async request to start the transcription, and once the transcription is complete, write the results file to Azure Blob Storage:

Figure 13.8 – Process outline for creating a transcript from an audio file

To support this process, we create the following:

Azure Cognitive Speech service to perform the transcription activity.
Azure Storage account:
- Blob container to store audio files.
- Blob container to store transcription results.
- Storage table to track transcription job progress.
Azure Functions App:
- One function to monitor the audio...

Summary

With that, you now have the ability to use the Speech service for creating a transcript from an audio conversation or capturing a live audio transcript and displaying it in real time for captioning and other uses. From there, you have the opportunity to track the quality of calls using the sentiment skill available with Language services and provide the ability for your organization to greatly enhance the customer service experience, as well as training tools. These capabilities are some of the more prevalent examples where the Cognitive Services tools are applied to real-world scenarios, but just a small portion of the overall capabilities from both the Speech and Language services. Be sure to use examples such as the one laid out in this chapter and apply critical thinking around what other skills are offered within the services, as well as enhancements applied over time, for what might be beneficial to your organization. Be mindful of the limitations of the service we discussed...