Reader small image

You're reading from  OpenCV By Example

Product typeBook
Published inJan 2016
Reading LevelIntermediate
PublisherPackt
ISBN-139781785280948
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Prateek Joshi
Prateek Joshi
author image
Prateek Joshi

Prateek Joshi is the founder of Plutoshift and a published author of 9 books on Artificial Intelligence. He has been featured on Forbes 30 Under 30, NBC, Bloomberg, CNBC, TechCrunch, and The Business Journals. He has been an invited speaker at conferences such as TEDx, Global Big Data Conference, Machine Learning Developers Conference, and Silicon Valley Deep Learning. Apart from Artificial Intelligence, some of the topics that excite him are number theory, cryptography, and quantum computing. His greater goal is to make Artificial Intelligence accessible to everyone so that it can impact billions of people around the world.
Read more about Prateek Joshi

David Millán Escrivá
David Millán Escrivá
author image
David Millán Escrivá

David Millán Escrivá was 8 years old when he wrote his first program on an 8086 PC in Basic, which enabled the 2D plotting of basic equations. In 2005, he finished his studies in IT with honors, through the Universitat Politécnica de Valencia, in human-computer interaction supported by computer vision with OpenCV (v0.96). He has worked with Blender, an open source, 3D software project, and on its first commercial movie, Plumiferos, as a computer graphics software developer. David has more than 10 years' experience in IT, with experience in computer vision, computer graphics, pattern recognition, and machine learning, working on different projects, and at different start-ups, and companies. He currently works as a researcher in computer vision.
Read more about David Millán Escrivá

Vinícius G. Mendonça
Vinícius G. Mendonça
author image
Vinícius G. Mendonça

Vinícius G. Mendonça is a professor at PUCPR and a mentor at Apple Developer Academy. He has a master's degree in Computer Vision and Image Processing (PUCPR) and a specialization degree in Game Development (Universidade Positivo). He is also one of the authors of the book Learn OpenCV 4 by Building Projects, also by Packt Publishing. He has been in this field since 1996. His former experience includes designing and programming a multithreaded framework for PBX tests at Siemens, coordination of Aurélio Dictionary software (including its apps for Android, IOS, and Windows phones), and coordination of an augmented reality educational activity for Positivo's Mesa Alfabeto, presented at CEBIT. Currently, he works with server-side Node.js at a company called Tenet Tech.
Read more about Vinícius G. Mendonça

View More author details
Right arrow

Chapter 11. Text Recognition with Tesseract

In the previous chapter, we covered the very basic OCR processing functions. Although they are quite useful for scanned or photographed documents, they are almost useless when dealing with text that casually appears in a picture.

In this chapter, we'll explore the OpenCV 3.0 text module, which deals specifically with scene text detection. Using this API, it is possible to detect text that appears in a webcam video, or to analyze photographed images (like the ones in Street View or taken by a surveillance camera) to extract text information in real time. This allows a wide range of applications to be created, from accessibility to marketing and even robotics fields.

By the end of this chapter, you will be able to:

  • Understand what is scene text recognition

  • Understand how the text API works

  • Use the OpenCV 3.0 text API to detect text

  • Extract the detected text to an image

  • Use the text API and Tesseract integration to identify letters

How the text API works


The text API implements the algorithm proposed by Lukás Neumann and Jiri Matas in the article called Real-Time Scene Text Localization and Recognition during the CVPR (Computer Vision and Pattern Recognition) Conference in 2012. This algorithm represented a significant increase in scene text detection, performing the state-of-the art detection both in the CVPR database as well as in the Google Street View database.

Before we use the API, let's take a look at how this algorithm works under the hood, and how it addresses the scene text detection problem.

Note

Remember that the OpenCV 3.0 text API does not come with the standard OpenCV modules. It's an additional module present in the OpenCV contribute package. If you need to install OpenCV using the Windows Installer, refer to Chapter 1, Getting Started with OpenCV, which will help you install these modules.

The scene detection problem

Detecting text that randomly appears in a scene is a problem harder than it looks. There...

Using the text API


Enough of theory. It's time to see how the text module works in practice. Let's study how to use it to perform text detection, extraction, and identification.

Text detection

Let's start with creating a simple program to perform text segmentation using ERFilters. In this program, we will use the trained classifiers from text API samples. You can download them from the OpenCV repository, but they are also available in the book's companion code.

First, we start with including all the necessary libs and using:

#include  "opencv2/highgui.hpp"
#include  "opencv2/imgproc.hpp"
#include  "opencv2/text.hpp"

#include  <vector>
#include  <iostream>

using namespace std;
using namespace cv;
using namespace cv::text;

Recall from our previous section that the ERFilter works separately in each image channel. So, we must provide a way to separate each desired channel in a different single cv::Mat channel. This is done by the separateChannels function:

vector<Mat> separateChannels...

Summary


In this chapter, we saw that scene text recognition is a far more difficult OCR situation than working with scanned texts. We studied how the text module addresses this problem with extremal region identification using the Newmann and Matas algorithm. We also saw how to use this API with the floodfill function to extract the text to an image and submit it to Tesseract OCR. Finally, we studied how the OpenCV text module integrates with Tesseract and other OCR engines, and how we can use its classes to identify what's written in the image.

This ends our journey with OpenCV. From the beginning to the end of this book, we expected you to have a glance about the Computer Vision area and have a better understanding of how several applications work. We also sought to show you that, although OpenCV is quite an impressive library, the field is already full of opportunities for improvement and research.

Thank you for reading! No matter whether you use OpenCV for creating impressive commercial...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
OpenCV By Example
Published in: Jan 2016Publisher: PacktISBN-13: 9781785280948
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Prateek Joshi

Prateek Joshi is the founder of Plutoshift and a published author of 9 books on Artificial Intelligence. He has been featured on Forbes 30 Under 30, NBC, Bloomberg, CNBC, TechCrunch, and The Business Journals. He has been an invited speaker at conferences such as TEDx, Global Big Data Conference, Machine Learning Developers Conference, and Silicon Valley Deep Learning. Apart from Artificial Intelligence, some of the topics that excite him are number theory, cryptography, and quantum computing. His greater goal is to make Artificial Intelligence accessible to everyone so that it can impact billions of people around the world.
Read more about Prateek Joshi

author image
David Millán Escrivá

David Millán Escrivá was 8 years old when he wrote his first program on an 8086 PC in Basic, which enabled the 2D plotting of basic equations. In 2005, he finished his studies in IT with honors, through the Universitat Politécnica de Valencia, in human-computer interaction supported by computer vision with OpenCV (v0.96). He has worked with Blender, an open source, 3D software project, and on its first commercial movie, Plumiferos, as a computer graphics software developer. David has more than 10 years' experience in IT, with experience in computer vision, computer graphics, pattern recognition, and machine learning, working on different projects, and at different start-ups, and companies. He currently works as a researcher in computer vision.
Read more about David Millán Escrivá

author image
Vinícius G. Mendonça

Vinícius G. Mendonça is a professor at PUCPR and a mentor at Apple Developer Academy. He has a master's degree in Computer Vision and Image Processing (PUCPR) and a specialization degree in Game Development (Universidade Positivo). He is also one of the authors of the book Learn OpenCV 4 by Building Projects, also by Packt Publishing. He has been in this field since 1996. His former experience includes designing and programming a multithreaded framework for PBX tests at Siemens, coordination of Aurélio Dictionary software (including its apps for Android, IOS, and Windows phones), and coordination of an augmented reality educational activity for Positivo's Mesa Alfabeto, presented at CEBIT. Currently, he works with server-side Node.js at a company called Tenet Tech.
Read more about Vinícius G. Mendonça