Reader small image

You're reading from  OpenCV By Example

Product typeBook
Published inJan 2016
Reading LevelIntermediate
PublisherPackt
ISBN-139781785280948
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Prateek Joshi
Prateek Joshi
author image
Prateek Joshi

Prateek Joshi is the founder of Plutoshift and a published author of 9 books on Artificial Intelligence. He has been featured on Forbes 30 Under 30, NBC, Bloomberg, CNBC, TechCrunch, and The Business Journals. He has been an invited speaker at conferences such as TEDx, Global Big Data Conference, Machine Learning Developers Conference, and Silicon Valley Deep Learning. Apart from Artificial Intelligence, some of the topics that excite him are number theory, cryptography, and quantum computing. His greater goal is to make Artificial Intelligence accessible to everyone so that it can impact billions of people around the world.
Read more about Prateek Joshi

David Millán Escrivá
David Millán Escrivá
author image
David Millán Escrivá

David Millán Escrivá was 8 years old when he wrote his first program on an 8086 PC in Basic, which enabled the 2D plotting of basic equations. In 2005, he finished his studies in IT with honors, through the Universitat Politécnica de Valencia, in human-computer interaction supported by computer vision with OpenCV (v0.96). He has worked with Blender, an open source, 3D software project, and on its first commercial movie, Plumiferos, as a computer graphics software developer. David has more than 10 years' experience in IT, with experience in computer vision, computer graphics, pattern recognition, and machine learning, working on different projects, and at different start-ups, and companies. He currently works as a researcher in computer vision.
Read more about David Millán Escrivá

Vinícius G. Mendonça
Vinícius G. Mendonça
author image
Vinícius G. Mendonça

Vinícius G. Mendonça is a professor at PUCPR and a mentor at Apple Developer Academy. He has a master's degree in Computer Vision and Image Processing (PUCPR) and a specialization degree in Game Development (Universidade Positivo). He is also one of the authors of the book Learn OpenCV 4 by Building Projects, also by Packt Publishing. He has been in this field since 1996. His former experience includes designing and programming a multithreaded framework for PBX tests at Siemens, coordination of Aurélio Dictionary software (including its apps for Android, IOS, and Windows phones), and coordination of an augmented reality educational activity for Positivo's Mesa Alfabeto, presented at CEBIT. Currently, he works with server-side Node.js at a company called Tenet Tech.
Read more about Vinícius G. Mendonça

View More author details
Right arrow

Chapter 10. Developing Segmentation Algorithms for Text Recognition

In the previous chapters, we learned about a wide range of image processing techniques, such as thresholding, contour descriptors, and mathematical morphology. In this chapter, we will discuss the common problems with dealing with scanned documents, such as identifying where the text is or adjusting its rotation. We will also learn how to combine techniques presented in the previous chapters to solve these problems. Finally, we'll have segmented regions of text that can be sent to an OCR (optical character recognition) library.

By the end of this chapter, you should be able to answer the following questions:

  • What kind of OCR applications exist?

  • What are the common problems while writing an OCR application?

  • How do we identify regions of documents?

  • How do we deal with problems such as skewing and other elements in the middle of the text?

  • How do we use Tesseract OCR to identify the text?

Introducing optical character recognition


Identifying text in an image is a very popular application for Computer Vision. This process is commonly called OCR and divided into the following steps:

  • Text preprocessing and segmentation: During this step, the computer must learn to deal with the image noise and rotation (skewing) and identify what areas are candidate text areas.

  • Text identification: This is a process used to identify each letter in a text. Although this is also a Computer Vision topic, we will not show you how to do this in this book using OpenCV. Instead, we will show you how to use the Tesseract library to do this step, since it was integrated with OpenCV 3.0. If you are interested in learning how to do what Tesseract does all by yourself, take a look at Mastering OpenCV, Packt Publishing, which presents a chapter about car license plate recognition.

The preprocessing and segmentation phase can vary greatly depending on the source of the text. Let's take a look at the common...

The preprocessing step


Software that identifies letters do so by comparing text with a previously recorded data. Classification results can be improved greatly if the input text is clear, if the letters are in a vertical position, and if there are no other elements, such as images that are sent to the classification software. In this section, we'll learn how to adjust text. This stage is called preprocessing.

Thresholding the image

We usually start the preprocessing stage by thresholding the image. This eliminates all the color information. Most OpenCV functions require information to be the written in white and the background to be black. So, let's start with creating a threshold function to match this criterion:

#include <opencv2/opencv.hpp>
#include <vector>

using namespace std;
using namespace cv;

Mat binarize(Mat input)
{
  //Uses otsu to threshold the input image
  Mat binaryImage;
  cvtColor(input, input, CV_BGR2GRAY);
  threshold(input, binaryImage, 0, 255, THRESH_OTSU...

Installing Tesseract OCR on your operating system


Tesseract is an open source OCR engine originally developed by Hewlett-Packard Laboratories, Bristol and Hewlett-Packard Co. It has all the code licenses under the Apache License and is hosted on GitHub at https://github.com/tesseract-ocr.

It is considered one of the most accurate OCR engines that is available. It can read a wide variety of image formats and can convert text written in more than 60 languages.

In this session, we will teach you how to install Tesseract on Windows or Mac. Since there are lots of Linux distributions, we will not teach you how to install on this operating system.

Normally, Tesseract offers installation packages in your package repository, so before you compile Tesseract, just search there.

Installing Tesseract on Windows

Although Tesseract is hosted on GitHub, its latest Windows installer is still available in the old repository on Google Code. The latest installer version is 3.02.02, and it's recommended that you...

Using Tesseract OCR library


As Tesseract OCR is already integrated with OpenCV 3.0, it still worth studying its API since it allows a finer-grained control over Tesseract parameters. The integration will be studied in the next chapter.

Creating a OCR function

We'll change the previous example to work with Tesseract. We will start with adding baseapi and fstream tesseracts to the list:

#include <opencv2/opencv.hpp>
#include <tesseract/baseapi.h>

#include <vector>
#include <fstream>

Then, we'll create a global TessBaseAPI object that represents our Tesseract OCR engine:

tesseract::TessBaseAPI ocr;

Tip

The ocr engine is completely self-contained. If you want to create multithreaded OCR software, just add a different TessBaseAPI object to each thread, and the execution will be fairly thread-safe. You just need to guarantee that file writing is not done over the same file; otherwise, you'll need to guarantee safety for this operation.

Next, we will create a function called identify...

Summary


In this chapter, we presented a brief introduction to OCR applications. We saw that the preprocessing phase of such systems must be adjusted according to the type of documents that we are planning to identify. We learned the common operations while preprocessing text files, such as thresholding, cropping, skewing, and text region segmentation. Finally, we learned how to install and use Tesseract OCR to convert our image to text.

In the next chapter, we'll use a more sophisticated OCR technique to identify text in a casually taken picture or video—a situation known as scene text recognition. This is a much more complex scenario, since the text can be anywhere, in any font, and with different illuminations and orientations. There can be no text at all! We'll also learn how to use the OpenCV 3.0 text contribution module, which is fully integrated with Tesseract.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
OpenCV By Example
Published in: Jan 2016Publisher: PacktISBN-13: 9781785280948
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Prateek Joshi

Prateek Joshi is the founder of Plutoshift and a published author of 9 books on Artificial Intelligence. He has been featured on Forbes 30 Under 30, NBC, Bloomberg, CNBC, TechCrunch, and The Business Journals. He has been an invited speaker at conferences such as TEDx, Global Big Data Conference, Machine Learning Developers Conference, and Silicon Valley Deep Learning. Apart from Artificial Intelligence, some of the topics that excite him are number theory, cryptography, and quantum computing. His greater goal is to make Artificial Intelligence accessible to everyone so that it can impact billions of people around the world.
Read more about Prateek Joshi

author image
David Millán Escrivá

David Millán Escrivá was 8 years old when he wrote his first program on an 8086 PC in Basic, which enabled the 2D plotting of basic equations. In 2005, he finished his studies in IT with honors, through the Universitat Politécnica de Valencia, in human-computer interaction supported by computer vision with OpenCV (v0.96). He has worked with Blender, an open source, 3D software project, and on its first commercial movie, Plumiferos, as a computer graphics software developer. David has more than 10 years' experience in IT, with experience in computer vision, computer graphics, pattern recognition, and machine learning, working on different projects, and at different start-ups, and companies. He currently works as a researcher in computer vision.
Read more about David Millán Escrivá

author image
Vinícius G. Mendonça

Vinícius G. Mendonça is a professor at PUCPR and a mentor at Apple Developer Academy. He has a master's degree in Computer Vision and Image Processing (PUCPR) and a specialization degree in Game Development (Universidade Positivo). He is also one of the authors of the book Learn OpenCV 4 by Building Projects, also by Packt Publishing. He has been in this field since 1996. His former experience includes designing and programming a multithreaded framework for PBX tests at Siemens, coordination of Aurélio Dictionary software (including its apps for Android, IOS, and Windows phones), and coordination of an augmented reality educational activity for Positivo's Mesa Alfabeto, presented at CEBIT. Currently, he works with server-side Node.js at a company called Tenet Tech.
Read more about Vinícius G. Mendonça