Reader small image

You're reading from  Hands-On Markov Models with Python

Product typeBook
Published inSep 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788625449
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Ankur Ankan
Ankur Ankan
author image
Ankur Ankan

Ankur Ankan is a BTech graduate from IIT (BHU), Varanasi. He is currently working in the field of data science. He is an open source enthusiast and his major work includes starting pgmpy with four other members. In his free time, he likes to participate in Kaggle competitions.
Read more about Ankur Ankan

Abinash Panda
Abinash Panda
author image
Abinash Panda

Abinash Panda has been a data scientist for more than 4 years. He has worked at multiple early-stage start-ups and helped them build their data analytics pipelines. He loves to munge, plot, and analyze data. He has been a speaker at Python conferences. These days, he is busy co-founding a start-up. He has contributed to books on probabilistic graphical models by Packt Publishing.
Read more about Abinash Panda

View More author details
Right arrow

Natural Language Processing

Automatic speech recognition has a lot of potential applications, such as audio transcription, dictation, audio search, and virtual assistants. I am sure that everyone has interacted with at least one of the virtual assistants by now, be it Apple's Siri, Amazon's Alexa, or Google's Assistant. At the core of all these speech recognition systems are a set of statistical models over the different words or sounds in a language. And since speech has a temporal structure, HMMs are the most natural framework to model it.

HMMs are virtually at the core of all speech recognition systems and the core concepts in modeling haven't changed much in a long time. But over time, a lot of sophisticated techniques have been developed to build better systems. In the following sections, we will try to cover the main concepts leading to the development...

Part-of-speech tagging

The first problem that we will look into is known as part-of-speech tagging (POS tagging). According to Wikipedia, POS tagging, also known as grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a particular part of speech based on both its definition and its context, that is, its relationship with adjacent and related words in a phrase, sentence, or paragraph. A simpler version of this, which is usually taught in schools, is classifying words as noun, verbs, adjectives, and so on.

POS tagging is not as easy as it sounds because the same word can take different parts of speech in different contexts. A simple example of this is the word dogs. The word dogs is usually considered a noun, but in the following sentence, it acts like a verb:

The sailor dogs the hatch.

Correct grammatical tagging...

Speech recognition

In the 1950s, Bell Labs was the pioneer in speech recognition. The early designed systems were limited to a single speaker and had a very limited vocabulary. After around 70 years of work, the current speech-recognition systems are able to work with speech from multiple speakers and can recognize thousands of words in multiple languages. A detailed discussion of all the techniques used is beyond the scope of this book as enough work has been done on each technique to have a book on itself.

But the general workflow for a speech-recognition system is to first capture the audio by converting the physical sound into an electrical signal using a microphone. The electrical signal generated by the microphone is analog and needs to be converted to a digital form for storage and processing, for which analog-to-digital converters are used. Once we have the speech in digital...

Summary

In this chapter, we looked into two of the major applications of HMMs: POS tagging and speech recognition. We coded the POS tagger using a most-frequent tag algorithm and used the pomegranate package to build one based on HMM. We compared the performance using both these methods and saw that an HMM-based approach outperforms the most-frequent tag method. Then, we used the SpeechRecognition package to transcribe audio to text using Google's Web Speech API. We looked into using the package with both audio files and live audio from a microphone.

In the next chapter, we will explore more applications of HMMs, specifically in the field of image recognition.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Markov Models with Python
Published in: Sep 2018Publisher: PacktISBN-13: 9781788625449
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Ankur Ankan

Ankur Ankan is a BTech graduate from IIT (BHU), Varanasi. He is currently working in the field of data science. He is an open source enthusiast and his major work includes starting pgmpy with four other members. In his free time, he likes to participate in Kaggle competitions.
Read more about Ankur Ankan

author image
Abinash Panda

Abinash Panda has been a data scientist for more than 4 years. He has worked at multiple early-stage start-ups and helped them build their data analytics pipelines. He loves to munge, plot, and analyze data. He has been a speaker at Python conferences. These days, he is busy co-founding a start-up. He has contributed to books on probabilistic graphical models by Packt Publishing.
Read more about Abinash Panda