Reader small image

You're reading from  Applied Data Science with Python and Jupyter

Product typeBook
Published inOct 2018
Reading LevelBeginner
Publisher
ISBN-139781789958171
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Alex Galea
Alex Galea
author image
Alex Galea

Alex Galea has been professionally practicing data analytics since graduating with a masters degree in physics from the University of Guelph, Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. Alex is currently doing web data analytics, where Python continues to play a key role in his work. He is a frequent blogger about data-centric projects that involve Python and Jupyter Notebooks.
Read more about Alex Galea

Right arrow

Scraping Web Page Data


In the spirit of leveraging the internet as a database, we can think about acquiring data from web pages either by scraping content or by interfacing with web APIs. Generally, scraping content means getting the computer to read data that was intended to be displayed in a human-readable format. This is in contradistinction to web APIs, where data is delivered in machine-readable formats—the most common being JSON.

In this topic, we will focus on web scraping. The exact process for doing this will depend on the page and desired content. However, as we will see, it's quite easy to scrape anything we need from an HTML page so long as we have an understanding of the underlying concepts and tools. In this topic, we'll use Wikipedia as an example and scrape tabular content from an article. Then, we'll apply the same techniques to scrape

data from a page on an entirely separate domain. But first, we'll take some time to introduce HTTP requests.

Introduction to HTTP Requests

The...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Applied Data Science with Python and Jupyter
Published in: Oct 2018Publisher: ISBN-13: 9781789958171

Author (1)

author image
Alex Galea

Alex Galea has been professionally practicing data analytics since graduating with a masters degree in physics from the University of Guelph, Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. Alex is currently doing web data analytics, where Python continues to play a key role in his work. He is a frequent blogger about data-centric projects that involve Python and Jupyter Notebooks.
Read more about Alex Galea