Reader small image

You're reading from  Python Web Scraping Cookbook

Product typeBook
Published inFeb 2018
Reading LevelBeginner
PublisherPackt
ISBN-139781787285217
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Michael Heydt
Michael Heydt
author image
Michael Heydt

Michael Heydt is an independent consultant, programmer, educator, and trainer. He has a passion for learning and sharing his knowledge of new technologies. Michael has worked in multiple industry verticals, including media, finance, energy, and healthcare. Over the last decade, he worked extensively with web, cloud, and mobile technologies and managed user experiences, interface design, and data visualization for major consulting firms and their clients. Michael's current company, Seamless Thingies , focuses on IoT development and connecting everything with everything. Michael is the author of numerous articles, papers, and books, such as D3.js By Example, Instant Lucene. NET, Learning Pandas, and Mastering Pandas for Finance, all by Packt. Michael is also a frequent speaker at .NET user groups and various mobile, cloud, and IoT conferences and delivers webinars on advanced technologies.
Read more about Michael Heydt

Right arrow

Performing OCR on an image with pytesseract

It is possible to extract text from within images using the pytesseract library. In this recipe, we will use pytesseract to extract text from an image. Tesseract is an open source OCR library sponsored by Google. The source is available here: https://github.com/tesseract-ocr/tesseract, and you can also find more information on the library there. 0;pytesseract is a thin python wrapper that provides a pythonic API to the executable.

Getting ready

Make sure you have pytesseract installed:

pip install pytesseract

You will also need to install tesseract-ocr. On Windows, there is an executable installer, which you can get here: https://github.com/tesseract-ocr/tesseract/wiki/4.0-with...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Python Web Scraping Cookbook
Published in: Feb 2018Publisher: PacktISBN-13: 9781787285217

Author (1)

author image
Michael Heydt

Michael Heydt is an independent consultant, programmer, educator, and trainer. He has a passion for learning and sharing his knowledge of new technologies. Michael has worked in multiple industry verticals, including media, finance, energy, and healthcare. Over the last decade, he worked extensively with web, cloud, and mobile technologies and managed user experiences, interface design, and data visualization for major consulting firms and their clients. Michael's current company, Seamless Thingies , focuses on IoT development and connecting everything with everything. Michael is the author of numerous articles, papers, and books, such as D3.js By Example, Instant Lucene. NET, Learning Pandas, and Mastering Pandas for Finance, all by Packt. Michael is also a frequent speaker at .NET user groups and various mobile, cloud, and IoT conferences and delivers webinars on advanced technologies.
Read more about Michael Heydt