Reader small image

You're reading from  Java for Data Science

Product typeBook
Published inJan 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781785280115
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Richard M. Reese
Richard M. Reese
author image
Richard M. Reese

Richard Reese has worked in the industry and academics for the past 29 years. For 10 years he provided software development support at Lockheed and at one point developed a C based network application. He was a contract instructor providing software training to industry for 5 years. Richard is currently an Associate Professor at Tarleton State University in Stephenville Texas. Richard is the author of various books and video courses some of which are as follows: Natural Language Processing with Java. Java for Data Science Getting Started with Natural Language Processing in Java
Read more about Richard M. Reese

Jennifer L. Reese
Jennifer L. Reese
author image
Jennifer L. Reese

Jennifer L. Reese studied computer science at Tarleton State University. She also earned her M.Ed. from Tarleton in December 2016. She currently teaches computer science to high-school students. Her interests include the integration of computer science concepts with other academic disciplines, increasing diversity in computer science courses, and the application of data science to the field of education. She has co-authored two books: Java for Data Science and Java 7 New Features Cookbook. She previously worked as a software engineer. In her free time she enjoys reading, cooking, and traveling—especially to any destination with a beach. She is a musician and appreciates a variety of musical genres.
Read more about Jennifer L. Reese

View More author details
Right arrow

Chapter 10. Visual and Audio Analysis

The use of sound, images, and videos is becoming a more important aspect of our day-to-day lives. Phone conversations and devices reliant on voice commands are increasingly common. People regularly conduct video chats with other people around the world. There has been a rapid proliferation of photo and video sharing sites. Applications that utilize images, video, and sound from a variety of sources are becoming more common.

In this chapter, we will demonstrate several techniques available to Java to process sounds and images. The first part of the chapter addresses sound processing. Both speech recognition and Text-To-Speech (TTS) APIs will be demonstrated. Specifically, we will use the FreeTTS (http://freetts.sourceforge.net/docs/index.php) API to convert text to speech, followed with a demonstration of the CMU Sphinx toolkit for speech recognition.

The Java Speech API (JSAPI) (http://www.oracle.com/technetwork/java/index-140170.html) provides access...

Text-to-speech


Speech synthesis generates human speech. TTS converts text to speech and is useful for a number of different applications. It is used in many places, including phone help desk systems and ordering systems. The TTS process typically consists of two parts. The first part tokenizes and otherwise processes the text into speech units. The second part converts these units into speech.

The two primary approaches for TTS uses concatenation synthesis and formant synthesis. Concatenation synthesis frequently combines prerecorded human speech to create the desired output. Formant synthesis does not use human speech but generates speech by creating electronic waveforms.

We will be using FreeTTS (http://freetts.sourceforge.net/docs/index.php) to demonstrate TTS. The latest version can be downloaded from https://sourceforge.net/projects/freetts/files/. This approach uses concatenation to generate speech.

There are several important terms used in TTS/FreeTTS:

  • Utterance - This concept corresponds...

Understanding speech recognition


Converting speech to text is an important application feature. This ability is increasingly being used in a wide variety of contexts. Voice input is used to control smart phones, automatically handle input as part of help desk applications, and to assist people with disabilities, to mention a few examples.

Speech consists of an audio stream that is complex. Sounds can be split into phones, which are sound sequences that are similar. Pairs of these phones are called diphones. Utterances consist of words and various types of pauses between them.

The essence of the conversion process involves splitting sounds by silences between utterances. These utterances are then matched to the words that most closely sound like the utterance. However, this can be difficult due to many factors. For example, these differences may be in the form of variances in how words are pronounced due to the context of the word, regional dialects, the quality of the sound, and other factors...

Extracting text from an image


The process of extracting text from an image is called O ptical Character Recognition (OCR). This can be very useful when the text data that needs to be processed is embedded in an image. For example, the information contained in license plates, road signs, and directions can be very useful at times.

We can perform OCR using Tess4j (http://tess4j.sourceforge.net/), a Java JNA wrapper for Tesseract OCR API. We will demonstrate how to use the API using an image captured from the Wikipedia article on OCR (https://en.wikipedia.org/wiki/Optical_character_recognition#Applications). The Javadoc for the API is found at http://tess4j.sourceforge.net/docs/docs-3.0/. The image we use is shown here:

Using Tess4j to extract text

The ITesseract interface contains numerous OCR methods. The doOCR method takes a file and returns a string containing the words found in the file, as shown here:

ITesseract instance = new Tesseract();  
try { 
    String result = instance.doOCR...

Identifying faces


Identifying faces within an image is useful in many situations. It can potentially classify an image as one containing people or find people in an image for further processing. We will use OpenCV 3.1 (http://opencv.org/opencv-3-1.html) for our examples.

OpenCV (http://opencv.org/) is an open source computer vision library that supports several programming languages, including Java. It supports a number of techniques, including machine learning algorithms, to perform computer vision tasks. The library supports such operations as face detection, tracking camera movements, extracting 3D models, and removing red eye from images. In this section, we will demonstrate face detection.

Using OpenCV to detect faces

The example that follows was adapted from http://docs.opencv.org/trunk/d9/d52/tutorial_java_dev_intro.html. Start by loading the native libraries added to your system when OpenCV was installed. On Windows, this requires that appropriate DLL files are available:

System.loadLibrary...

Classifying visual data


In this section, we will demonstrate one technique for classifying visual data. We will use Neuroph to accomplish this. Neuroph is a Java-based neural network framework that supports a variety of neural network architectures. Its open source library provides support and plugins for other applications. In this example, we will use its neural network editor, Neuroph Studio, to create a network. This network can be saved and used in other applications. Neuroph Studio is available for download here: http://neuroph.sourceforge.net/download.html. We are building upon the process shown here: http://neuroph.sourceforge.net/image_recognition.htm.

For our example, we will create a Multi Layer Perceptron (MLP) network. We will then train our network to recognize images. We can both train and test our network using Neuroph Studio. It is important to understand how MLP networks recognize and interpret image data. Every image is basically represented by three two-dimensional arrays...

Summary


In this chapter, we demonstrated many techniques for processing speech and images. This capability is becoming important, as electronic devices are increasingly embracing these communication mediums.

TTS was demonstrated using FreeTSS. This technique allows a computer to present results as speech as opposed to text. We learned how we can control the attributes of the voice used, such as its gender and age.

Recognizing speech is useful and helps bridge the human-computer interface gap. We demonstrated how CMUSphinx is used to recognize human speech. As there is often more than one way speech can be interpreted, we learned how the API can return various options. We also demonstrated how individual words are extracted, along with the relative confidence that the right word was identified.

Image processing is a critical aspect of many applications. We started our discussion of image processing by use Tess4J to extract text from an image. This process is sometimes referred to as OCR. We...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Java for Data Science
Published in: Jan 2017Publisher: PacktISBN-13: 9781785280115
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Richard M. Reese

Richard Reese has worked in the industry and academics for the past 29 years. For 10 years he provided software development support at Lockheed and at one point developed a C based network application. He was a contract instructor providing software training to industry for 5 years. Richard is currently an Associate Professor at Tarleton State University in Stephenville Texas. Richard is the author of various books and video courses some of which are as follows: Natural Language Processing with Java. Java for Data Science Getting Started with Natural Language Processing in Java
Read more about Richard M. Reese

author image
Jennifer L. Reese

Jennifer L. Reese studied computer science at Tarleton State University. She also earned her M.Ed. from Tarleton in December 2016. She currently teaches computer science to high-school students. Her interests include the integration of computer science concepts with other academic disciplines, increasing diversity in computer science courses, and the application of data science to the field of education. She has co-authored two books: Java for Data Science and Java 7 New Features Cookbook. She previously worked as a software engineer. In her free time she enjoys reading, cooking, and traveling—especially to any destination with a beach. She is a musician and appreciates a variety of musical genres.
Read more about Jennifer L. Reese