Packt+ | Advance your knowledge in tech

You're reading from Getting Started with Python and Raspberry Pi (Redirected from Learning Python By Developing Raspberry Pi Applications)

Product type Book

Published in Sep 2015

Publisher

ISBN-13 9781783551590

Pages 200 pages

Edition 1st Edition

Languages

Python

Concepts

Single Board Computers

Author (1):

Dan Nixon

Table of Contents (18) Chapters

Getting Started with Python and Raspberry Pi

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Your First Steps with Python on the Pi

Understanding Control Flow and Data Types

Working with Data Structures and I/O

Understanding Object-oriented Programming and Threading

Packaging Code with setuptools

Accessing the GPIO Pins

Using the Camera Module

Extracting Data from the Internet

Creating Command-line Interfaces

Debugging Applications with PDB and Log Files

Designing Your GUI with Qt

Index

Chapter 8. Extracting Data from the Internet

In this chapter, we will look at ways we can extract data and files from the Internet using a range of data formats and services, namely web services (or Application Protocol Interfaces (APIs)) using the Extensible Markup Language (XML) and JavaScript Object Notation (JSON) data formats.

We will also look at how we can use Python to download files and extract information from web pages for when a website does not offer an API to access their data.

Using urllib2 to download data

Before we get on to processing the data we extract from the online sources, we will first demonstrate use of the in-built urllib2 Python module for downloading data from the internet.

This will be used in all the examples later on in the chapter for parsing information downloaded from the various online sources.

In the following example, we will write a simple script that will download the text contents of a web page and print them to the terminal. This is not a practical use for this module, however it does demonstrate the use of the module for retrieving data from web resources.

We will start by importing the Python modules required for this script. We will save this script file as urllib_example.py:

import urllib2
import sys

In this line, we are taking the first argument on the command line as a URL to open and return the HTML contents of:

url = sys.argv[1]

Now, we will create a request object that represents a request to be sent to the web server. This is not...

Parsing JSON APIs

In this section, we will be creating a simple currency converter application that will be run from the command line using the free to use Fixer.io API (http://fixer.io) to provide the exchange rates, which are updated daily (which is less frequent than some other paid for APIs, but will be good enough for our use).

This is a JSON API; an example URL is: http://api.fixer.io/latest?base=GBP&symbols=JPY,EUR

This is making a request for the exchange rates to convert British pounds to Euro and Yen and returns data in the format:

{
    "base": "GBP",
    "date": "2015-07-08",
    "rates": {
        "JPY": 186.64,
        "EUR": 1.3941
    }
}

As we will see in the next code, this data can be parsed using the json Python module, which will return the structure of the JSON tree as a nested tree of Python dictionaries.

We will start by importing the required Python modules for this script, which we will save as currency_converter.py:

import urllib2
import json
from string import...

Parsing XML APIs

In this section, we will look at creating a simple weather forecast application using the OpenWeatherMap 5 day forecast API (http://openweathermap.org/forecast#5days), which can return an XML document containing the forecast data.

This API is accessed through a URL in the following format; in this case, we are searching for the weather in Harwell, UK:

http://api.openweathermap.org/data/2.5/forecast?q=Harwell,GB&mode=xml

This gives an output in the following format, where the time element is repeated for the number of forecasts that are available in the 5 day time range:

<?xml version="1.0" encoding="UTF-8"?>
<weatherdata>
   <location>
      <name>Harwell</name>
      <type />
      <country>GB</country>
      <timezone />
      <location altitude="0" latitude="51.599468" longitude="-1.29175" geobase="geonames" geobaseid="0" />
   </location>
   <credit />
   <meta>
      <lastupdate />
...

Parsing a web page using BeautifulSoup

In this section, we will use the BeautifulSoup library to parse an HTML web page to extract information from it. This is particularly useful for when you wish to interact with a web page that does not provide an API to access their data, with the drawback being that it is more likely that an application using this method will be broken by a change in the web page structure (rather than an API, which is rarely changed, and when they are, developers are typically given warning of such a change).

In this next example, we will write a simple script to download low resolution previews of images from Pixiv (www.pixiv.net). This script will start in a similar way to the others we have written so far. Note that the UTF-8 character encoding is required here as the contents of the web pages are likely to contain Japanese characters.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import urllib2
import os
import sys
from string import Template

This string template...

Summary

In this chapter, we looked at the urllib2 Python module and how this can be used to download data from the internet, as well as a series of modules and libraries for parsing the data in a variety of formats once it has been downloaded.

In the next chapter, we will start looking at building complete applications as we start designing and implementing command line interfaces.