Reader small image

You're reading from  Mobile Artificial Intelligence Projects

Product typeBook
Published inMar 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789344073
Edition1st Edition
Languages
Right arrow
Authors (3):
Karthikeyan NG
Karthikeyan NG
author image
Karthikeyan NG

Karthikeyan NG is the Head of Engineering and Technology at the Indian lifestyle and fashion retail brand. He served as a software engineer at Symantec Corporation and has worked with 2 US-based startups as an early employee and has built various products. He has 9+ years of experience in various scalable products using Web, Mobile, ML, AR, and VR technologies. He is an aspiring entrepreneur and technology evangelist. His interests lie in exploring new technologies and innovative ideas to resolve a problem. He has also bagged prizes from more than 15 hackathons, is a TEDx speaker and a speaker at technology conferences and meetups as well as guest lecturer at a Bengaluru University. When not at work, he is found trekking.
Read more about Karthikeyan NG

Arun Padmanabhan
Arun Padmanabhan
author image
Arun Padmanabhan

Arun Padmanabhan is a Machine Learning consultant with over 8 years of experience building end-to-end machine learning solutions and applications. Currently working with a couple of start-ups in the Financial and Insurance industries, he specializes in automating manual workflows using AI and creating Machine Vision and NLP applications. In past, he has led the data science team of a Singapore based product startup in the restaurant domain. He also has built stand-alone and integrated Machine Learning solutions in the Manufacturing, Shipping and e-commerce domains over the years. His interests are in research, development and applications of Artificial Intelligence and Deep Architectures.
Read more about Arun Padmanabhan

Matt Cole
Matt Cole
author image
Matt Cole

Matt R. Cole is a developer and author with 30 years' experience. Matt is the owner of Evolved AI Solutions, a provider of advanced Machine Learning/Bio-AI, Microservice and Swarm technologies. Matt is recognized as a leader in Microservice and Artificial Intelligence development and design. As an early pioneer of VOIP, Matt developed the VOIP system for NASA for the International Space Station and Space Shuttle. Matt also developed the first Bio Artificial Intelligence framework which completely integrates mirror and canonical neurons. In his spare time Matt authors books, and continues his education taking every available course in advanced mathematics, AI/ML/DL, Quantum Mechanics/Physics, String Theory and Computational Neuroscience.
Read more about Matt Cole

View More author details
Right arrow

TensorFlow on Mobile with Speech-to-Text with the WaveNet Model

In this chapter, we are going to learn how to convert audio to text using the WaveNet model. We will then build a model that will take audio and convert it into text using an Android application.

This chapter is based on the WaveNet: A Generative Model for Raw Audio paper, by Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. You can find this paper at https://arxiv.org/abs/1609.03499.

In this chapter, we will cover the following topics:

  • WaveNet and how it works
  • The WaveNet architecture
  • Building a model using WaveNet
  • Preprocessing datasets
  • Training the WaveNet network
  • Transforming a speech WAV file into English text
  • Building an Android application

Let's dig deeper into what Wavenet actually is.

...

WaveNet

WaveNet is a deep generative network that is used to generate raw audio waveforms. Sounds waves are generated by WaveNet to mimic the human voice. This generated sound is more natural than any of the currently existing text-to-speech systems, reducing the gap between system and human performance by 50%.

With a single WaveNet, we can differentiate between multiple speakers with equal fidelity. We can also switch between individual speakers based on their identity. This model is autoregressive and probabilistic, and it can be trained efficiently on thousands of audio samples per second. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning the speaker identity.

As shown in the movie Her, the long-standing dream of human-computer interaction is to allow people to talk to machines. The...

Summary

In this chapter, you learned how to build a complete speech detector on your own. We discussed how the WaveNet model works in detail. With this application, we can make a simple speech-to-text converter work; however, a lot of improvements and updates need to be done to get perfect results. You can build the same application on the iOS platform as well by converting the model into CoreML.

In the next chapter, we will move on and build a handwritten digit classifier using the Modified National Institute of Standards and Technology (MNIST) model.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mobile Artificial Intelligence Projects
Published in: Mar 2019Publisher: PacktISBN-13: 9781789344073
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Karthikeyan NG

Karthikeyan NG is the Head of Engineering and Technology at the Indian lifestyle and fashion retail brand. He served as a software engineer at Symantec Corporation and has worked with 2 US-based startups as an early employee and has built various products. He has 9+ years of experience in various scalable products using Web, Mobile, ML, AR, and VR technologies. He is an aspiring entrepreneur and technology evangelist. His interests lie in exploring new technologies and innovative ideas to resolve a problem. He has also bagged prizes from more than 15 hackathons, is a TEDx speaker and a speaker at technology conferences and meetups as well as guest lecturer at a Bengaluru University. When not at work, he is found trekking.
Read more about Karthikeyan NG

author image
Arun Padmanabhan

Arun Padmanabhan is a Machine Learning consultant with over 8 years of experience building end-to-end machine learning solutions and applications. Currently working with a couple of start-ups in the Financial and Insurance industries, he specializes in automating manual workflows using AI and creating Machine Vision and NLP applications. In past, he has led the data science team of a Singapore based product startup in the restaurant domain. He also has built stand-alone and integrated Machine Learning solutions in the Manufacturing, Shipping and e-commerce domains over the years. His interests are in research, development and applications of Artificial Intelligence and Deep Architectures.
Read more about Arun Padmanabhan

author image
Matt Cole

Matt R. Cole is a developer and author with 30 years' experience. Matt is the owner of Evolved AI Solutions, a provider of advanced Machine Learning/Bio-AI, Microservice and Swarm technologies. Matt is recognized as a leader in Microservice and Artificial Intelligence development and design. As an early pioneer of VOIP, Matt developed the VOIP system for NASA for the International Space Station and Space Shuttle. Matt also developed the first Bio Artificial Intelligence framework which completely integrates mirror and canonical neurons. In his spare time Matt authors books, and continues his education taking every available course in advanced mathematics, AI/ML/DL, Quantum Mechanics/Physics, String Theory and Computational Neuroscience.
Read more about Matt Cole