Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Getting Started with Python and Raspberry Pi (Redirected from Learning Python By Developing Raspberry Pi Applications)

You're reading from  Getting Started with Python and Raspberry Pi (Redirected from Learning Python By Developing Raspberry Pi Applications)

Product type Book
Published in Sep 2015
Publisher
ISBN-13 9781783551590
Pages 200 pages
Edition 1st Edition
Languages
Author (1):
Dan Nixon Dan Nixon
Profile icon Dan Nixon

Chapter 8. Extracting Data from the Internet

In this chapter, we will look at ways we can extract data and files from the Internet using a range of data formats and services, namely web services (or Application Protocol Interfaces (APIs)) using the Extensible Markup Language (XML) and JavaScript Object Notation (JSON) data formats.

We will also look at how we can use Python to download files and extract information from web pages for when a website does not offer an API to access their data.

Using urllib2 to download data


Before we get on to processing the data we extract from the online sources, we will first demonstrate use of the in-built urllib2 Python module for downloading data from the internet.

This will be used in all the examples later on in the chapter for parsing information downloaded from the various online sources.

In the following example, we will write a simple script that will download the text contents of a web page and print them to the terminal. This is not a practical use for this module, however it does demonstrate the use of the module for retrieving data from web resources.

We will start by importing the Python modules required for this script. We will save this script file as urllib_example.py:

import urllib2
import sys

In this line, we are taking the first argument on the command line as a URL to open and return the HTML contents of:

url = sys.argv[1]

Now, we will create a request object that represents a request to be sent to the web server. This is not...

Parsing JSON APIs


In this section, we will be creating a simple currency converter application that will be run from the command line using the free to use Fixer.io API (http://fixer.io) to provide the exchange rates, which are updated daily (which is less frequent than some other paid for APIs, but will be good enough for our use).

This is a JSON API; an example URL is: http://api.fixer.io/latest?base=GBP&symbols=JPY,EUR

This is making a request for the exchange rates to convert British pounds to Euro and Yen and returns data in the format:

{
    "base": "GBP",
    "date": "2015-07-08",
    "rates": {
        "JPY": 186.64,
        "EUR": 1.3941
    }
}

As we will see in the next code, this data can be parsed using the json Python module, which will return the structure of the JSON tree as a nested tree of Python dictionaries.

We will start by importing the required Python modules for this script, which we will save as currency_converter.py:

import urllib2
import json
from string import...

Parsing XML APIs


In this section, we will look at creating a simple weather forecast application using the OpenWeatherMap 5 day forecast API (http://openweathermap.org/forecast#5days), which can return an XML document containing the forecast data.

This API is accessed through a URL in the following format; in this case, we are searching for the weather in Harwell, UK:

http://api.openweathermap.org/data/2.5/forecast?q=Harwell,GB&mode=xml

This gives an output in the following format, where the time element is repeated for the number of forecasts that are available in the 5 day time range:

<?xml version="1.0" encoding="UTF-8"?>
<weatherdata>
   <location>
      <name>Harwell</name>
      <type />
      <country>GB</country>
      <timezone />
      <location altitude="0" latitude="51.599468" longitude="-1.29175" geobase="geonames" geobaseid="0" />
   </location>
   <credit />
   <meta>
      <lastupdate />
...

Parsing a web page using BeautifulSoup


In this section, we will use the BeautifulSoup library to parse an HTML web page to extract information from it. This is particularly useful for when you wish to interact with a web page that does not provide an API to access their data, with the drawback being that it is more likely that an application using this method will be broken by a change in the web page structure (rather than an API, which is rarely changed, and when they are, developers are typically given warning of such a change).

In this next example, we will write a simple script to download low resolution previews of images from Pixiv (www.pixiv.net). This script will start in a similar way to the others we have written so far. Note that the UTF-8 character encoding is required here as the contents of the web pages are likely to contain Japanese characters.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import urllib2
import os
import sys
from string import Template

This string template...

Summary


In this chapter, we looked at the urllib2 Python module and how this can be used to download data from the internet, as well as a series of modules and libraries for parsing the data in a variety of formats once it has been downloaded.

In the next chapter, we will start looking at building complete applications as we start designing and implementing command line interfaces.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Getting Started with Python and Raspberry Pi (Redirected from Learning Python By Developing Raspberry Pi Applications)
Published in: Sep 2015 Publisher: ISBN-13: 9781783551590
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}