Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Python for Secret Agents - Volume II - Second Edition

You're reading from  Python for Secret Agents - Volume II - Second Edition

Product type Book
Published in Dec 2015
Publisher
ISBN-13 9781785283406
Pages 180 pages
Edition 2nd Edition
Languages
Authors (2):
Steven F. Lott Steven F. Lott
Profile icon Steven F. Lott
Steven F. Lott Steven F. Lott
Profile icon Steven F. Lott
View More author details

Getting text data from a document


We'll need to add some more features to our class definition so that we can extract meaningful, aggregated blocks of text. We'll need to add some layout rules and a text aggregator that uses the rules and the raw page to create aggregated blocks of text.

We'll override the init_device() method to create a more sophisticated device. Here's the next subclass, built on the foundation of the Miner_Page and Miner classes:

from pdfminer.converter import PDFPageAggregator
from pdfminer.layout import LAParams
class Miner_Layout(Miner_Page):
    def __init__(self, *args, **kw):
        super().__init__(*args, **kw)
    def init_device(self, resource_manager, **params):
        """Return an PDFPageAggregator as a device."""
        self.layout_params = LAParams(**params)
        return PDFPageAggregator(resource_manager, laparams=self.layout_params)
    def page_iter(self):
        """Yields a LTPage object for each page in the document."""
        for page in super...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}