Reader small image

You're reading from  Web Scraping with Python

Product typeBook
Published inOct 2015
Reading LevelIntermediate
PublisherPackt
ISBN-139781782164364
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Richard Penman
Richard Penman
author image
Richard Penman

Richard Lawson is from Australia and studied Computer Science at the University of Melbourne. Since graduating, he built a business specializing in web scraping while travelling the world, working remotely from over 50 countries. He is a fluent Esperanto speaker, conversational in Mandarin and Korean, and active in contributing to and translating open source software. He is currently undertaking postgraduate studies at Oxford University and in his spare time enjoys developing autonomous drones.
Read more about Richard Penman

Right arrow

Summary


In this chapter, we walked through a variety of ways to scrape data from a web page. Regular expressions can be useful for a one-off scrape or to avoid the overhead of parsing the entire web page, and BeautifulSoup provides a high-level interface while avoiding any difficult dependencies. However, in general, lxml will be the best choice because of its speed and extensive functionality, so we will use it in future examples.

In the next chapter we will introduce caching, which allows us to save web pages so that they only need be downloaded the first time a crawler is run.

lock icon
The rest of the page is locked
Previous PageNext Chapter
You have been reading a chapter from
Web Scraping with Python
Published in: Oct 2015Publisher: PacktISBN-13: 9781782164364

Author (1)

author image
Richard Penman

Richard Lawson is from Australia and studied Computer Science at the University of Melbourne. Since graduating, he built a business specializing in web scraping while travelling the world, working remotely from over 50 countries. He is a fluent Esperanto speaker, conversational in Mandarin and Korean, and active in contributing to and translating open source software. He is currently undertaking postgraduate studies at Oxford University and in his spare time enjoys developing autonomous drones.
Read more about Richard Penman