Home Programming Python Programming Blueprints

Python Programming Blueprints

By Daniel Furtado , Marcus Pennington
books-svg-icon Book
eBook $43.99 $29.99
Print $54.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $43.99 $29.99
Print $54.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    Implementing the Weather Application
About this book
Python is a very powerful, high-level, object-oriented programming language. It's known for its simplicity and huge community support. Python Programming Blueprints will help you build useful, real-world applications using Python. In this book, we will cover some of the most common tasks that Python developers face on a daily basis, including performance optimization and making web applications more secure. We will familiarize ourselves with the associated software stack and master asynchronous features in Python. We will build a weather application using command-line parsing. We will then move on to create a Spotify remote control where we'll use OAuth and the Spotify Web API. The next project will cover reactive extensions by teaching you how to cast votes on Twitter the Python way. We will also focus on web development by using the famous Django framework to create an online game store. We will then create a web-based messenger using the new Nameko microservice framework. We will cover topics like authenticating users and, storing messages in Redis. By the end of the book, you will have gained hands-on experience in coding with Python.
Publication date:
February 2018
Publisher
Packt
Pages
456
ISBN
9781786468161

 

Chapter 1. Implementing the Weather Application

The first application in this book is going to be a web scraping application that will scrape weather forecast information from https://weather.com and present it in a terminal. We will add some options that can be passed as arguments to the application, such as:

  • The temperature unit (Celsius or Fahrenheit)
  • The area where you can get the weather forecast
  • Output options where the user of our application can choose between the current forecast, a five-day forecast, a ten-day forecast, and the weekend
  • Ways to complement the output with extra information such as wind and humidity

Apart from the aforementioned arguments, this application will be designed to be extendable, which means that we can create parsers for different websites to get a weather forecast, and these parsers will be available as argument options.

In this chapter, you will learn how to:

  • Use object-oriented programming concepts in Python applications
  • Scrape data from websites using the BeautifulSoup package
  • Receive command line arguments
  • Utilize the inspect module
  • Load Python modules dynamically
  • Use Python comprehensions
  • Use Selenium to request a webpage and inspect its DOM elements

Before we get started, it is important to say that when developing web scraping applications, you should keep in mind that these types of applications are susceptible to changes. If the developers of the site that you are getting data from change a CSS class name, or the structure of the HTML DOM, the application will stop working. Also, if the URL of the site we are getting the data from changes, the application will not be able to send requests. 

 

Setting up the environment


Before we get right into writing our first example, we need to set up an environment to work and install any dependencies that the project may have. Luckily, Python has a really nice tooling system to work with virtual environments.

Virtual environments in Python are a broad subject, and beyond the scope of this book. However, if you are not familiar with virtual environments, it will suffice to know that a virtual environment is a contained Python environment that is isolated from your global Python installation. This isolation allows developers to easily work with different versions of Python, install packages within the environment, and manage project dependencies without interfering with Python's global installation.

Python's installation comes with a module called venv, which you can use to create virtual environments; the syntax is fairly straightforward. The application that we are going to create is called weatherterm (weather terminal), so we can create a virtual environment with the same name to make it simple.

To create a new virtual environment, open a terminal and run the following command:

$ python3 -m venv weatherterm

If everything goes well, you should see a directory called weatherterm in the directory you are currently at. Now that we have the virtual environment, we just need to activate it with the following command:

$ . weatherterm/bin/activate

Note

I recommend installing and using virtualenvwrapper, which is an extension of the virtualenv tool. This makes it very simple to manage, create, and delete virtual environments as well as quickly switch between them. If you wish to investigate this further, visit: https://virtualenvwrapper.readthedocs.io/en/latest/#.

Now, we need to create a directory where we are going to create our application. Don't create this directory in the same directory where you created the virtual environment; instead, create a projects directory and create the directory for the application in there. I would recommend you name it with the same name as the virtual environment for simplicity.

Note

I am setting the environment and running all the examples in a machine with Debian 9.2 installed, and at the time of writing, I am running the latest Python version (3.6.2). If you are a Mac user, it shouldn't be so different; however, if you are on Windows, the steps can be slightly different, but it is not hard to find information on how to set up virtual environments on it. A Python 3 installation works nicely on Windows nowadays.

Go into the project's directory that you just created and create a file named requirements.txt with the following content:

beautifulsoup4==4.6.0
selenium==3.6.0

These are all the dependencies that we need for this project:

  • BeautifulSoup: This is a package for parsing HTML and XML files. We will be using it to parse the HTML that we fetch from weather sites and to get the weather data we need on the terminal. It is very simple to use and it has a great documentation available online at: http://beautiful-soup-4.readthedocs.io/en/latest/.
  • Selenium: This is a well-known set of tools for testing. There are many applications, but it is mostly used for the automated testing of web applications. 

To install the required packages in our virtual environment, you can run the following command:

pip install -r requirements.txt

Note

It is always a good idea to make use of version-control tools like GIT or Mercurial. It is very helpful to control changes, check history, rollback changes, and more. If you are not familiar with any of these tools, there are plenty of tutorials on the internet. You can get started by checking the documentation for GIT at: https://git-scm.com/book/en/v1/Getting-Started.

One last tool that we need to install is PhantomJS; you can download it from: http://phantomjs.org/download.html

After downloading it, extract the contents inside the weatherterm directory and rename the folder to phantomjs.

With our virtual environment set up and PhantomJS installed, we are ready to start coding!

 

Core functionality


Let's start by creating a directory for your module. Inside of the project's root directory, create a subdirectory called weatherterm. The subdirectory weatherterm is where our module will live. The module directory needs two subdirectories - core and parsers. The project's directory structure should look like this:

weatherterm
├── phantomjs
└── weatherterm
    ├── core
    ├── parsers   

Loading parsers dynamically

This application is intended to be flexible and allow developers to create different parsers for different weather websites. We are going to create a parser loader that will dynamically discover files inside of the parsers directory, load them, and make them available to be used by the application without requiring changes to any other parts of the code. Here are the rules that our loader will require when implementing new parsers:

  • Create a file with a class implementing the methods for fetching the current weather forecast as well as five-day, ten-day, and weekend weather forecasts
  • The file name has to end with parser, for example, weather_com_parser.py
  • The file name can't start with double underscores

With that said, let's go ahead and create the parser loader. Create a file namedparser_loader.py inside of the weatherterm/core directory and add the following content:

import os
import re
import inspect


def _get_parser_list(dirname):
    files = [f.replace('.py', '')
             for f in os.listdir(dirname)
             if not f.startswith('__')]

    return files


def _import_parsers(parserfiles):

    m = re.compile('.+parser$', re.I)

    _modules = __import__('weatherterm.parsers',
                          globals(),
                          locals(),
                          parserfiles,
                          0)

    _parsers = [(k, v) for k, v in inspect.getmembers(_modules)
                if inspect.ismodule(v) and m.match(k)]

    _classes = dict()

    for k, v in _parsers:
        _classes.update({k: v for k, v in inspect.getmembers(v)
                         if inspect.isclass(v) and m.match(k)})

    return _classes


def load(dirname):
    parserfiles = _get_parser_list(dirname)
    return _import_parsers(parserfiles)

First, the _get_parser_list function is executed and returns a list of all files located in weatherterm/parsers; it will filter the files based on the rules of the parser described previously. After returning a list of files, it is time to import the module. This is done by the _import_parsers function, which first imports the weatherterm.parsers module and makes use of the inspect package in the standard library to find the parser classes within the module.

The inspect.getmembers function returns a list of tuples where the first item is a key representing a property in the module, and the second item is the value, which can be of any type. In our scenario, we are interested in a property with a key ending with parser and with the value of type class.

Assuming that we already have a parser in place in the weatherterm/parsers  directory, the value returned by the inspect.getmembers(_modules) will look something like this:

[('WeatherComParser',
  <class 'weatherterm.parsers.weather_com_parser.WeatherComParser'>),
  ...]

Note

inspect.getmembers(_module) returns many more items, but they have been omitted since it is not relevant to show all of them at this point.

Lastly, we loop through the items in the module and extract the parser classes, returning a dictionary containing the name of the class and the class object that will be later used to create instances of the parser.

Creating the application's model

Let's start creating the model that will represent all the information that our application will scrape from the weather website. The first item we are going to add is an enumeration to represent each option of the weather forecast we will provide to the users of our application. Create a file named forecast_type.py in the directory weatherterm/core with the following contents:

from enum import Enum, unique


@unique
class ForecastType(Enum):
    TODAY = 'today'
    FIVEDAYS = '5day'
    TENDAYS = '10day'
    WEEKEND = 'weekend'

Enumerations have been in Python's standard library since version 3.4 and they can be created using the syntax for creating classes. Just create a class inheriting from enum.Enum containing a set of unique properties set to constant values. Here, we have values for the four types of forecast that the application will provide, and where values such as ForecastType.TODAY, ForecastType.WEEKEND, and so on can be accessed.

Note that we are assigning constant values that are different from the property item of the enumeration, the reason being that later these values will be used to build the URL to make requests to the weather website.

The application needs one more enumeration to represent the temperature units that the user will be able to choose from in the command line. This enumeration will contain Celsius and Fahrenheit items. 

First, let's include a base enumeration. Create a file called base_enum.py in the weatherterm/core directory with the following contents:

from enum import Enum


class BaseEnum(Enum):
    def _generate_next_value_(name, start, count, last_value):
        return name

 BaseEnum is a very simple class inheriting from Enum . The only thing we want to do here is override the method _generate_next_value_ so that every enumeration that inherits from BaseEnum and has properties with the value set to auto()  will automatically get the same value as the property name.

Now, we can create an enumeration for the temperature units. Create a file called unit.py in the weatherterm/core directory with the following content:

from enum import auto, unique

from .base_enum import BaseEnum


@unique
class Unit(BaseEnum):
    CELSIUS = auto()
    FAHRENHEIT = auto()

This class inherits from the BaseEnum that we just created, and every property is set to auto(), meaning the value for every item in the enumeration will be set automatically for us. Since the Unit class inherits from BaseEnum, every time the auto() is called, the _generate_next_value_ method on BaseEnum will be invoked and will return the name of the property itself.

Before we try this out, let's create a file called __init__.py in the weatherterm/core directory and import the enumeration that we just created, like so:

from .unit import Unit

If we load this class in the Python REPL and check the values, the following will occur:

Python 3.6.2 (default, Sep 11 2017, 22:31:28) 
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from weatherterm.core import Unit
>>> [value for key, value in Unit.__members__.items()]
[<Unit.CELSIUS: 'CELSIUS'>, <Unit.FAHRENHEIT: 'FAHRENHEIT'>]

Another item that we also want to add to the core module of our application is a class to represent the weather forecast data that the parser returns. Let's go ahead and create a file named forecast.py in the weatherterm/core directory with the following contents:

from datetime import date

from .forecast_type import ForecastType


class Forecast:
    def __init__(
            self,
            current_temp,
            humidity,
            wind,
            high_temp=None,
            low_temp=None,
            description='',
            forecast_date=None,
            forecast_type=ForecastType.TODAY):
        self._current_temp = current_temp
        self._high_temp = high_temp
        self._low_temp = low_temp
        self._humidity = humidity
        self._wind = wind
        self._description = description
        self._forecast_type = forecast_type

        if forecast_date is None:
            self.forecast_date = date.today()
        else:
            self._forecast_date = forecast_date

    @property
    def forecast_date(self):
        return self._forecast_date

    @forecast_date.setter
    def forecast_date(self, forecast_date):
        self._forecast_date = forecast_date.strftime("%a %b %d")

    @property
    def current_temp(self):
        return self._current_temp

    @property
    def humidity(self):
        return self._humidity

    @property
    def wind(self):
        return self._wind

    @property
    def description(self):
        return self._description

    def __str__(self):
        temperature = None
        offset = ' ' * 4

        if self._forecast_type == ForecastType.TODAY:
            temperature = (f'{offset}{self._current_temp}\xb0\n'
                           f'{offset}High {self._high_temp}\xb0 / '
                           f'Low {self._low_temp}\xb0 ')
        else:
            temperature = (f'{offset}High {self._high_temp}\xb0 / '
                           f'Low {self._low_temp}\xb0 ')

        return(f'>> {self.forecast_date}\n'
               f'{temperature}'
               f'({self._description})\n'
               f'{offset}Wind: '
               f'{self._wind} / Humidity: {self._humidity}\n')

In the Forecast class, we will define properties for all the data we are going to parse: 

current_temp

Represents the current temperature. It will only be available when getting today's weather forecast.

humidity

The humidity percentage for the day.

wind

Information about today's current wind levels.

high_temp

The highest temperature for the day.

low_temp

The lowest temperature for the day.

description

A description of the weather conditions, for example, Partly Cloudy.

forecast_date

Forecast date; if not supplied, it will be set to the current date.

forecast_type

Any value in the enumeration ForecastType (today, fivedays, tendays, or weekend).

 

We can also implement two methods called forecast_date  with the decorators @property  and @forecast_date.setter . The @property  decorator will turn the method into a getter for the _forecast_date property of the Forecast class, and the @forecast_date.setter will turn the method into a setter.  The setter was defined here because, every time we need to set the date in an instance of Forecast, we need to make sure that it will be formatted accordingly. In the setter, we call the strftime method, passing the format codes %a (weekday abbreviated name), %b (monthly abbreviated name), and %d (day of the month).

Note

The format codes %a and %b will use the locale configured in the machine that the code is running on.

Lastly, we override the __str__ method to allow us to format the output the way we would like when using the print, format, and str functions.

By default, the temperature unit used by weather.com is Fahrenheit, and we want to give the users of our application the option to use Celsius instead. So, let's go ahead and create one more file in the weatherterm/core directory called unit_converter.py with the following content:

from .unit import Unit


class UnitConverter:
    def __init__(self, parser_default_unit, dest_unit=None):
        self._parser_default_unit = parser_default_unit
        self.dest_unit = dest_unit

        self._convert_functions = {
            Unit.CELSIUS: self._to_celsius,
            Unit.FAHRENHEIT: self._to_fahrenheit,
        }

    @property
def dest_unit(self):
        return self._dest_unit

    @dest_unit.setter
    def dest_unit(self, dest_unit):
        self._dest_unit = dest_unit

    def convert(self, temp):

        try:
            temperature = float(temp)
        except ValueError:
            return 0

        if (self.dest_unit == self._parser_default_unit or
                self.dest_unit is None):
            return self._format_results(temperature)

        func = self._convert_functions[self.dest_unit]
        result = func(temperature)

        return self._format_results(result)

    def _format_results(self, value):
        return int(value) if value.is_integer() else f'{value:.1f}'

    def _to_celsius(self, fahrenheit_temp):
        result = (fahrenheit_temp - 32) * 5/9
        return result

    def _to_fahrenheit(self, celsius_temp):
        result = (celsius_temp * 9/5) + 32
        return result

This is the class that is going to make the temperature conversions from Celsius to Fahrenheit and vice versa. The initializer of this class gets two arguments; the default unit used by the parser and the destination unit. In the initializer, we will define a dictionary containing the functions that will be used for temperature unit conversion.

The convert method only gets one argument, the temperature. Here, the temperature is a string, so the first thing we need to do is try converting it to a float value; if it fails, it will return a zero value right away.

You can also verify whether the destination unit is the same as the parser's default unit or not. In that case, we don't need to continue and perform any conversion; we simply format the value and return it.

If we need to perform a conversion, we can look up the _convert_functions  dictionary to find the conversion function that we need to run. If we find the function we are looking for, we invoke it and return the formatted value.

The code snippet below shows the _format_results method, which is a utility method that will format the temperature value for us:

return int(value) if value.is_integer() else f'{value:.1f}'

The _format_results method checks if the number is an integer; the value.is_integer() will return True if the number is, for example, 10.0. If True, we will use the int function to convert the value to 10; otherwise, the value is returned as a fixed-point number with a precision of 1. The default precision in Python is 6. Lastly, there are two utility methods that perform the temperature conversions, _to_celsius and _to_fahrenheit.

Now, we only need to edit the __init__.py file in the weatherterm/core directory and include the following import statements:

from .base_enum import BaseEnum
from .unit_converter import UnitConverter
from .forecast_type import ForecastType
from .forecast import Forecast

Fetching data from the weather website

We are going to add a class named Request that will be responsible for getting the data from the weather website. Let's add a file named request.py in the weatherterm/core directory with the following content:

import os
from selenium import webdriver


class Request:
    def __init__(self, base_url):
        self._phantomjs_path = os.path.join(os.curdir,
                                          'phantomjs/bin/phantomjs')
        self._base_url = base_url
        self._driver = webdriver.PhantomJS(self._phantomjs_path)

    def fetch_data(self, forecast, area):
        url = self._base_url.format(forecast=forecast, area=area)
        self._driver.get(url)

        if self._driver.title == '404 Not Found':
            error_message = ('Could not find the area that you '
                             'searching for')
            raise Exception(error_message)

        return self._driver.page_source

This class is very simple; the initializer defines the base URL and creates a PhantomJS driver, using the path where PhantomJS is installed. The fetch_data method formats the URL, adding the forecast option and the area. After that, the webdriver performs a request and returns the page source. If the title of the markup returned is 404 Not Found, it will raise an exception. Unfortunately, Selenium doesn't provide a proper way of getting the HTTP Status code; this would have been much better than comparing strings.

Note

You may notice that I prefix some of the class properties with an underscore sign. I usually do that to show that the underlying property is private and shouldn't be set outside the class. In Python, there is no need to do that because there's no way to set private or public properties; however, I like it because I can clearly show my intent.

Now, we can import it in the __init__.py  file in the weatherterm/core directory:

from .request import Request

Now we have a parser loader to load any parser that we drop into the directory weatherterm/parsers,  we have a class representing the forecast model, and an enumeration ForecastType so we can specify which type of forecast we are parsing. The enumeration represents temperature units and utility functions to convert temperatures from Fahrenheit to Celsius and Celsius to Fahrenheit. So now, we should be ready to create the application's entry point to receive all the arguments passed by the user, run the parser, and present the data on the terminal.

 

Getting the user's input with ArgumentParser


Before we run our application for the first time, we need to add the application's entry point. The entry point is the first code that will be run when our application is executed. 

We want to give the users of our application the best user experience possible, so the first features that we need to add are the ability to receive and parse command line arguments, perform argument validation, set arguments when needed, and, last but not least, show an organized and informative help system so the users can see which arguments can be used and how to use the application. 

Sounds like tedious work, right?

Luckily, Python has batteries included and the standard library contains a great module that allows us to implement this in a very simple way; the module is called argparse.

Another feature that would be good to have is for our application to be easy to distribute to our users. One approach is to create a __main__.py file in the weatherterm module directory, and you can run the module as a regular script. Python will automatically run the __main__.py file, like so:

$ python -m weatherterm

Another option is to zip the entire application's directory and execute the Python passing the name of the ZIP file instead. This is an easy, fast, and simple way to distribute our Python programs.

There are many other ways of distributing your programs, but they are beyond the scope of this book; I just wanted to give you some examples of the usage of the __main__.py file.

With that said, let's create a __main__.py file inside of the weatherterm directory with the following content:

import sys
from argparse import ArgumentParser

from weatherterm.core import parser_loader
from weatherterm.core import ForecastType
from weatherterm.core import Unit


def _validate_forecast_args(args):
    if args.forecast_option is None:
        err_msg = ('One of these arguments must be used: '
                   '-td/--today, -5d/--fivedays, -10d/--tendays, -
                    w/--weekend')
        print(f'{argparser.prog}: error: {err_msg}', 
        file=sys.stderr)
        sys.exit()

parsers = parser_loader.load('./weatherterm/parsers')

argparser = ArgumentParser(
    prog='weatherterm',
    description='Weather info from weather.com on your terminal')

required = argparser.add_argument_group('required arguments')

required.add_argument('-p', '--parser',
                      choices=parsers.keys(),
                      required=True,
                      dest='parser',
                      help=('Specify which parser is going to be  
                       used to '
                            'scrape weather information.'))

unit_values = [name.title() for name, value in Unit.__members__.items()]

argparser.add_argument('-u', '--unit',
                       choices=unit_values,
                       required=False,
                       dest='unit',
                       help=('Specify the unit that will be used to 
                       display '
                             'the temperatures.'))

required.add_argument('-a', '--areacode',
                      required=True,
                      dest='area_code',
                      help=('The code area to get the weather 
                       broadcast from. '
                            'It can be obtained at 
                              https://weather.com'))

argparser.add_argument('-v', '--version',
                       action='version',
                       version='%(prog)s 1.0')

argparser.add_argument('-td', '--today',
                       dest='forecast_option',
                       action='store_const',
                       const=ForecastType.TODAY,
                       help='Show the weather forecast for the 
                       current day')

args = argparser.parse_args()

_validate_forecast_args(args)

cls = parsers[args.parser]

parser = cls()
results = parser.run(args)

for result in results:
    print(results)

The weather forecast options (today, five days, ten days, and weekend forecast) that our application will accept will not be required; however, at least one option must be provided in the command line, so we create a simple function called _validate_forecast_args to perform this validation for us. This function will show a help message and exit the application.

First, we get all the parsers available in the weatherterm/parsers directory. The list of parsers will be used as valid values for the parser argument.

It is the ArgumentParser object that does the job of defining the parameters, parsing the values, and showing help, so we create an instance of ArgumentParser and also create an argument group for the required parameters. This will make the help output look much nicer and organized.

In order to make the parameters and the help output more organized, we are going to create a group within the ArgumentParser object. This group will contain all the required arguments that our application needs. This way, the users of our application can easily see which parameters are required and the ones that are not required.

We achieve this with the following statement:

required = argparser.add_argument_group('required arguments')

After creating the argument group for the required arguments, we get a list of all members of the enumeration Unit and use the title() function to make only the first letter a capital letter.

Now, we can start adding the arguments that our application will be able to receive on the command line. Most argument definitions use the same set of keyword arguments, so I will not be covering all of them.

The first argument that we will create is --parser or -p:

required.add_argument('-p', '--parser',
                      choices=parsers.keys(),
                      required=True,
                      dest='parser',
                      help=('Specify which parser is going to be 
                       used to '
                            'scrape weather information.'))

Let's break down every parameter of the add_argument  used when creating the parser flag:

  • The first two parameters are the flags. In this case, the user passes a value to this argument using either -p or --parser in the command line, for example, --parser WeatherComParser.
  • The choices parameter specifies a list of valid values for that argument that we are creating. Here, we are using parsers.keys(), which will return a list of parser names. The advantage of this implementation is that if we add a new parser, it will be automatically added to this list, and no changes will be required in this file.
  • The required parameter, as the name says, specifies if the argument will be required or not.
  • The dest parameter specifies the name of the attribute to be added to the resulting object of the parser argument. The object returned by parser_args() will contain an attribute called parser with the value that we passed to this argument in the command line.
  • Finally, the help parameter is the argument's help text, shown when using the -h or --help flag.

Moving on to the --today argument:

argparser.add_argument('-td', '--today',
                       dest='forecast_option',
                       action='store_const',
                       const=ForecastType.TODAY,
                       help='Show the weather forecast for the 
                       current day')

Heren we have two keyword arguments that we haven't seen before, action and const.

Actions can be bound to the arguments that we create and they can perform many things. The argparse module contains a great set of actions, but if you need to do something specific, you can create your own action that will meet your needs. Most actions defined in the argparse module are actions to store values in the parse result's object attributes.

In the previous code snippet, we use the store_const action, which will store a constant value to an attribute in the object returned by parse_args().

We also used the keyword argument const, which specifies the constant default value when the flag is used in the command line.

Remember that I mentioned that it is possible to create custom actions? The argument unit is a great use case for a custom action. The choices argument is just a list of strings, so we use this comprehension to get the list of names of every item in the Unit enumeration, as follows:

unit_values = [name.title() for name, value in Unit.__members__.items()]

required.add_argument('-u', '--unit',
                      choices=unit_values,
                      required=False,
                      dest='unit',
                      help=('Specify the unit that will be used to 
                       display '
                            'the temperatures.'))

The object returned by parse_args() will contain an attribute called unit with a string value (Celsius or Fahrenheit), but this is not exactly what we want. Wouldn't it be nice to have the value as an enumeration item instead? We can change this behavior by creating a custom action.

First, add a new file named set_unit_action.py in the weatherterm/core directory with the following contents:

from argparse import Action

from weatherterm.core import Unit


class SetUnitAction(Action):

    def __call__(self, parser, namespace, values, option_string=None):
        unit = Unit[values.upper()]
        setattr(namespace, self.dest, unit)

This action class is very simple; it just inherits from argparse.Action and overrides the __call__ method, which will be called when the argument value is parsed. This is going to be set to the destination attribute.

The parser parameter will be an instance of ArgumentParser. The namespace is an instance of argparser.Namespace and it is just a simple class containing all the attributes defined in the ArgumentParser object. If you inspect this parameter with the debugger, you will see something similar to this:

Namespace(area_code=None, fields=None, forecast_option=None, parser=None, unit=None)

The values parameter is the value that the user has passed on the command line; in our case, it can be either Celsius or Fahrenheit. Lastly, the option_string parameter is the flag defined for the argument. For the unit argument, the value of option_string will be -u.

Fortunately, enumerations in Python allow us to access their members and attributes using item access:

Unit[values.upper()]

Verifying this in Python REPL, we have:

>>> from weatherterm.core import Unit
>>> Unit['CELSIUS']
<Unit.CELSIUS: 'CELSIUS'>
>>> Unit['FAHRENHEIT']
<Unit.FAHRENHEIT: 'FAHRENHEIT'>

After getting the correct enumeration member, we set the value of the property specified by self.dest in the namespace object. That is much cleaner and we don't need to deal with magic strings.

With the custom action in place, we need to add the import statement in the __init__.py file in the weatherterm/core directory:

from .set_unit_action import SetUnitAction

Just include the line above at the end of the file. Then, we need to import it into the __main__.py file,  like so:

from weatherterm.core import SetUnitAction

And we are going to add the action keyword argument in the definition of the unit argument and set it to SetUnitAction, like so:

required.add_argument('-u', '--unit',
                      choices=unit_values,
                      required=False,
                      action=SetUnitAction,
                      dest='unit',
                      help=('Specify the unit that will be used to 
                       display '
                            'the temperatures.'))

So, when the user of our application uses the flag -u for Celsius, the value of the attribute unit in the object returned by the parse_args()  function will be:

<Unit.CELSIUS: 'CELSIUS'>

The rest of the code is very straightforward; we invoke the parse_args function to parse the arguments and set the result in the args variable. Then, we use the value of args.parser (the name of the selected parser) and access that item in the parser's dictionary. Remember that the value is the class type, so we create an instance of the parser, and lastly, invoke the method run, which will kick off website scraping.

 

Creating the parser


In order to run our code for the first time, we need to create a parser. We can quickly create a parser to run our code and check whether the values are being parsed properly.

Let's go ahead and create a file called weather_com_parser.py in the weatherterm/parsers directory. To make it simple, we are going to create just the necessary methods, and the only thing we are going to do when the methods are invoked is to raise a NotImplementedError:

from weatherterm.core import ForecastType


class WeatherComParser:

    def __init__(self):
        self._forecast = {
            ForecastType.TODAY: self._today_forecast,
            ForecastType.FIVEDAYS: self._five_and_ten_days_forecast,
            ForecastType.TENDAYS: self._five_and_ten_days_forecast,
            ForecastType.WEEKEND: self._weekend_forecast,
            }

    def _today_forecast(self, args):
        raise NotImplementedError()

    def _five_and_ten_days_forecast(self, args):
        raise NotImplementedError()

    def _weekend_forecast(self, args):
        raise NotImplementedError()

    def run(self, args):
        self._forecast_type = args.forecast_option
        forecast_function = self._forecast[args.forecast_option]
        return forecast_function(args)

In the initializer, we create a dictionary where the key is a member of the ForecasType enumeration, and the value is the method bound to any of these options. Our application will be able to present today's, a five-day, ten-day, and the weekend forecast, so we implement all four methods.

The run method only does two things; it looks up the function that needs to be executed using the forecast_option that we passed as an argument in the command line, and executes the function returning its value.

Now, the application is finally ready to be executed for the first time if you run the command in the command line:

$ python -m weatherterm --help

You should see the application's help options:

usage: weatherterm [-h] -p {WeatherComParser} [-u {Celsius,Fahrenheit}] -a AREA_CODE [-v] [-td] [-5d] [-10d] [-w]

Weather info from weather.com on your terminal

optional arguments:
  -h, --help show this help message and exit
  -u {Celsius,Fahrenheit}, --unit {Celsius,Fahrenheit}
                        Specify the unit that will be used to display  
                        the temperatures.
  -v, --version show program's version number and exit
  -td, --today Show the weather forecast for the current day

require arguments:
  -p {WeatherComParser}, --parser {WeatherComParser}
                    Specify which parser is going to be used to scrape
                    weather information.
  -a AREA_CODE, --areacode AREA_CODE
                    The code area to get the weather broadcast from. It
                     can be obtained at https://weather.com

As you can see, the ArgumentParse module already provides out-of-the-box output for help. There are ways you can customize the output how you want to, but I find the default layout really good.

Notice that the -p argument already gave you the option to choose the WeatherComParser. It wasn't necessary to hardcode it anywhere because the parser loader did all the work for us. The -u (--unit) flag also contains the items of the enumeration Unit. If someday you want to extend this application and add new units, the only thing you need to do here is to add the new item to the enumeration, and it will be automatically picked up and included as an option for the -u flag.

Now, if you run the application again and this time pass some parameters:

$ python -m weatherterm -u Celsius -a SWXX2372:1:SW -p WeatherComParser -td

You will get an exception similar to this:

Don't worry -- this is exactly what we wanted! If you follow the stack trace, you can see that everything is working as intended. When we run our code, we call the run method on the selected parser from the __main__.py file, then we select the method associated with the forecast option, in this case, _today_forecast, and finally store the result in the forecast_function variable.

When the function stored in the forecast_function variable was executed, the NotImplementedError exception was raised. So far so good; the code is working perfectly and now we can start adding the implementation for each of these methods.

Getting today's weather forecast

The core functionality is in place and the entry point of the application with the argument parser will give the users of our application a much better experience. Now, it is finally the time we all have been waiting for, the time to start implementing the parser. We will start implementing the method to get today's weather forecast.

Since I am in Sweden, I will use the area code SWXX2372:1:SW (Stockholm, Sweden); however, you can use any area code you want. To get the area code of your choice, go to https://weather.com and search for the area you want. After selecting the area, the weather forecast for the current day will be displayed. Note that the URL changes, for example, when searching Stockholm, Sweden, the URL changes to:

https://weather.com/weather/today/l/SWXX2372:1:SW

For São Paulo, Brazil it will be:

https://weather.com/weather/today/l/BRXX0232:1:BR

Note that there is only one part of the URL that changes, and this is the area code that we want to pass as an argument to our application.

Adding helper methods

To start with, we need to import some packages:

import re

from weatherterm.core import Forecast
from weatherterm.core import Request
from weatherterm.core import Unit
from weatherterm.core import UnitConverter

And in the initializer, we are going to add the following code:

self._base_url = 'http://weather.com/weather/{forecast}/l/{area}'
self._request = Request(self._base_url)

self._temp_regex = re.compile('([0-9]+)\D{,2}([0-9]+)')
self._only_digits_regex = re.compile('[0-9]+')

self._unit_converter = UnitConverter(Unit.FAHRENHEIT)

In the initializer, we define the URL template we are going to use to perform requests to the weather website; then, we create a Request object. This is the object that will perform the requests for us.

Regular expressions are only used when parsing today's weather forecast temperatures.

We also define a UnitConverter object and set the default unit to Fahrenheit.

Now, we are ready to start adding two methods that will be responsible for actually searching for HTML elements within a certain class and return its contents. The first method is called _get_data:

def _get_data(self, container, search_items):
    scraped_data = {}

    for key, value in search_items.items():
        result = container.find(value, class_=key)

        data = None if result is None else result.get_text()

        if data is not None:
            scraped_data[key] = data

    return scraped_data

The idea of this method is to search items within a container that matches some criteria. The container is just a DOM element in the HTML and the search_items is a dictionary where the key is a CSS class and the value is the type of the HTML element. It can be a DIV, SPAN, or anything that you wish to get the value from.

It starts looping through search_items.items() and uses the find method to find the element within the container. If the item is found, we use get_text to extract the text of the DOM element and add it to a dictionary that will be returned when there are no more items to search.

The second method that we will implement is the _parser method. This will make use of the _get_data that we just implemented:

def _parse(self, container, criteria):
    results = [self._get_data(item, criteria)
               for item in container.children]

    return [result for result in results if result]

Here, we also get a container and criteria like the _get_data method. The container is a DOM element and the criterion is a dictionary of nodes that we want to find. The first comprehension gets all the container's children elements and passes them to the _get_data method.

The results will be a list of dictionaries with all the items that have been found, and we will only return the dictionaries that are not empty.

There are only two more helper methods we need to implement in order to get today's weather forecast in place. Let's implement a method called _clear_str_number:

def _clear_str_number(self, str_number):
    result = self._only_digits_regex.match(str_number)
    return '--' if result is None else result.group()

This method will use a regular expression to make sure that only digits are returned.

And the last method that needs to be implemented is the _get_additional_info method:

def _get_additional_info(self, content):
    data = tuple(item.td.span.get_text()
                 for item in content.table.tbody.children)
    return data[:2]

This method loops through the table rows, getting the text of every cell. This comprehension will return lots of information about the weather, but we are only interested in the first 2, the wind and the humidity.

Implementing today's weather forecast

It's time to start adding the implementation of the _today_forecast method, but first, we need to import BeautifulSoup. At the top of the file, add the following import statement:

from bs4 import BeautifulSoup

Now, we can start adding the _today_forecast method:

def _today_forecast(self, args):
    criteria = {
        'today_nowcard-temp': 'div',
        'today_nowcard-phrase': 'div',
        'today_nowcard-hilo': 'div',
        }

    content = self._request.fetch_data(args.forecast_option.value,
                                       args.area_code)

    bs = BeautifulSoup(content, 'html.parser')

    container = bs.find('section', class_='today_nowcard-container')

    weather_conditions = self._parse(container, criteria)

    if len(weather_conditions) < 1:
        raise Exception('Could not parse weather foreecast for 
        today.')

    weatherinfo = weather_conditions[0]

    temp_regex = re.compile(('H\s+(\d+|\-{,2}).+'
                             'L\s+(\d+|\-{,2})'))
    temp_info = temp_regex.search(weatherinfo['today_nowcard-hilo'])
    high_temp, low_temp = temp_info.groups()

    side = container.find('div', class_='today_nowcard-sidecar')
    humidity, wind = self._get_additional_info(side)

    curr_temp = self._clear_str_number(weatherinfo['today_nowcard- 
    temp'])

    self._unit_converter.dest_unit = args.unit

    td_forecast = Forecast(self._unit_converter.convert(curr_temp),
                           humidity,
                           wind,
                           high_temp=self._unit_converter.convert(
                               high_temp),
                           low_temp=self._unit_converter.convert(
                               low_temp),
                           description=weatherinfo['today_nowcard-
                            phrase'])

    return [td_forecast]

That is the function that will be called when the -td or --today flag is used on the command line. Let's break down this code so that we can easily understand what it does. Understanding this method is important because these methods parse data from other weather forecast options (five days, ten days, and weekend) that are very similar to this one.

The method's signature is quite simple; it only gets args, which is the Argument object that is created in the __main__ method. The first thing we do in this method is to create a criteria dictionary with all the DOM elements that we want to find in the markup:

criteria = {
    'today_nowcard-temp': 'div',
    'today_nowcard-phrase': 'div',
    'today_nowcard-hilo': 'div',
}

As mentioned before, the key to the criteria dictionary is the name of the DOM element's CSS class, and the value is the type of the HTML element:

  • The today_nowcard-temp class is a CSS class of the DOM element containing the current temperature
  • The today_nowcard-phrase class is a CSS class of the DOM element containing weather conditions text (Cloudy, Sunny, and so on)
  • The today_nowcard-hilo class is the CSS class of the DOM element containing the highest and lowest temperature

Next, we are going to fetch, create, and use BeautifulSoup to parse the DOM:

content = self._request.fetch_data(args.forecast_option.value, 
                                   args.area_code)

bs = BeautifulSoup(content, 'html.parser')

container = bs.find('section', class_='today_nowcard-container')

weather_conditions = self._parse(container, criteria)

if len(weather_conditions) < 1:
    raise Exception('Could not parse weather forecast for today.')

weatherinfo = weather_conditions[0]

First, we make use of the fetch_data method of the Request class that we created on the core module and pass two arguments; the first is the forecast option and the second argument is the area code that we passed on the command line. After fetching the data, we create a BeautifulSoup object passing the content and a parser. Since we are getting back HTML, we use html.parser.

Now is the time to start looking for the HTML elements that we are interested in. Remember, we need to find an element that will be a container, and the _parser function will search through the children elements and try to find items that we defined in the dictionary criteria. For today's weather forecast, the element that contains all the data we need is a section element with the today_nowcard-container CSS class.

BeautifulSoup contains the find method, which we can use to find elements in the HTML DOM with specific criteria. Note that the keyword argument is called class_ and not class because class is a reserved word in Python.

Now that we have the container element, we can pass it to the _parse method, which will return a list. We perform a check if the result list contains at least one element and raise an exception if it is empty. If it is not empty, we just get the first element and assign it to the weatherinfo variable. The weatherinfo variable now contains a dictionary with all the items that we were looking for.

The next step is split the highest and lowest temperature:

temp_regex = re.compile(('H\s+(\d+|\-{,2}).+'
                         'L\s+(\d+|\-{,2})'))
temp_info = temp_regex.search(weatherinfo['today_nowcard-hilo'])
high_temp, low_temp = temp_info.groups()

We want to parse the text that has been extracted from the DOM element with the today_nowcard-hilo CSS class, and the text should look something like H 50 L 60H -- L 60, and so on. An easy and simple way of extracting the text we want is to use a regular expression:

H\s+(\d+|\-{,2}).L\s+(\d+|\-{,2})

We can break this regular expression into two parts. First, we want to get the highest temperature—H\s+(\d+|\-{,2}); this means that it will match an H followed by some spaces, and then it will group a value that matches either numbers or a maximum of two dash symbols. After that, it will match any character. Lastly, comes the second part that basically does the same; however, it starts matching an L.

After executing the search method, it gets regular expression groups that have been returned calling the groups() function, which in this case will return two groups, one for the highest temperature and the second for the lowest.

Other information that we want to provide to our users is information about wind and humidity. The container element that contains this information has a CSS class called today_nowcard-sidecar:

side = container.find('div', class_='today_nowcard-sidecar')
wind, humidity = self._get_additional_info(side)

We just find the container and pass it into the _get_additional_info method that will loop through the children elements of the container, extracting the text and finally returning the results for us.

Finally, the last part of this method:

curr_temp = self._clear_str_number(weatherinfo['today_nowcard-temp'])

self._unit_converter.dest_unit = args.unit

td_forecast = Forecast(self._unit_converter.convert(curr_temp),
                       humidity,
                       wind,
                       high_temp=self._unit_converter.convert(
                           high_temp),
                       low_temp=self._unit_converter.convert(
                           low_temp),
                       description=weatherinfo['today_nowcard- 
                        phrase'])

return [td_forecast]

Since the current temperature contains a special character (degree sign) that we don't want to have at this point, we use the _clr_str_number method to pass the today_nowcard-temp item of the weatherinfo dictionary.

Now that we have all the information we need, we construct the Forecast object and return it. Note that we are returning an array here; this is because all other options that we are going to implement (five-day, ten-day, and weekend forecasts) will return a list, so to make it consistent; also to facilitate when we will have to display this information on the terminal, we are also returning a list.

Another thing to note is that we are making use of the convert method of our UnitConverter to convert all the temperatures to the unit selected in the command line.

When running the command again:

$ python -m weatherterm -u Fahrenheit -a SWXX2372:1:SW -p WeatherComParser -td

You should see an output similar to this:

Congratulations! You have implemented your first web scraping application. Next up, let's add the other forecast options.

Getting five- and ten-day weather forecasts

The site that we are currently scraping the weather forecast from (weather.com) also provides the weather forecast for five and ten days, so in this section, we are going to implement methods to parse these forecast options as well.

The markup of the pages that present data for five and ten days are very similar; they have the same DOM structure and share the same CSS classes, which makes it easier for us to implement just one method that will work for both options. Let's go ahead and add a new method to the wheater_com_parser.py file with the following contents:

def _parse_list_forecast(self, content, args):
    criteria = {
        'date-time': 'span',
        'day-detail': 'span',
        'description': 'td',
        'temp': 'td',
        'wind': 'td',
        'humidity': 'td',
    }

    bs = BeautifulSoup(content, 'html.parser')

    forecast_data = bs.find('table', class_='twc-table')
    container = forecast_data.tbody

    return self._parse(container, criteria)

As I mentioned before, the DOM for the five- and ten-day weather forecasts is very similar, so we create the _parse_list_forecast method, which can be used for both options. First, we define the criteria:

  • The date-time is a span element and contains a string representing the day of the week
  • The day-detail is a span element and contains a string with the date, for example, SEP 29
  • The description is a TD element and contains the weather conditions, for example, Cloudy
  • temp is a TD element and contains temperature information such as high and low temperature
  • wind is a TD element and contains wind information
  • humidity is a TD element and contains humidity information

Now that we have the criteria, we create a BeatufulSoup object, passing the content and the html.parser. All the data that we would like to get is on the table with a CSS class named twc-table. We find the table and define the tbody element as a container. Finally, we run the _parse method, passing the container and the criteria that we defined. The return of this function will look something like this:

[{'date-time': 'Today',
  'day-detail': 'SEP 28',
  'description': 'Partly Cloudy',
  'humidity': '78%',
  'temp': '60°50°',
  'wind': 'ESE 10 mph '},
 {'date-time': 'Fri',
  'day-detail': 'SEP 29',
  'description': 'Partly Cloudy',
  'humidity': '79%',
  'temp': '57°48°',
  'wind': 'ESE 10 mph '},
 {'date-time': 'Sat',
  'day-detail': 'SEP 30',
  'description': 'Partly Cloudy',
  'humidity': '77%',
  'temp': '57°49°',
  'wind': 'SE 10 mph '},
 {'date-time': 'Sun',
  'day-detail': 'OCT 1',
  'description': 'Cloudy',
  'humidity': '74%',
  'temp': '55°51°',
  'wind': 'SE 14 mph '},
 {'date-time': 'Mon',
  'day-detail': 'OCT 2',
  'description': 'Rain',
  'humidity': '87%',
  'temp': '55°48°',
  'wind': 'SSE 18 mph '}]

Another method that we need to create is a method that will prepare the data for us, for example, parsing and converting temperature values and creating a Forecast object. Add a new method called _prepare_data with the following content:

def _prepare_data(self, results, args):
    forecast_result = []

    self._unit_converter.dest_unit = args.unit

    for item in results:
        match = self._temp_regex.search(item['temp'])
        if match is not None:
            high_temp, low_temp = match.groups()

        try:
            dateinfo = item['weather-cell']
            date_time, day_detail = dateinfo[:3], dateinfo[3:]
            item['date-time'] = date_time
            item['day-detail'] = day_detail
        except KeyError:
            pass

        day_forecast = Forecast(
            self._unit_converter.convert(item['temp']),
            item['humidity'],
            item['wind'],
            high_temp=self._unit_converter.convert(high_temp),
            low_temp=self._unit_converter.convert(low_temp),
            description=item['description'].strip(),
            forecast_date=f'{item["date-time"]} {item["day-
             detail"]}',
            forecast_type=self._forecast_type)
        forecast_result.append(day_forecast)

    return forecast_result

This method is quite simple. First, loop through the results and apply the regex that we created to split the high and low temperatures stored in item['temp']. If there's a match, it will get the groups and assign the value to high_temp and low_temp.

After that, we create a Forecast object and append it to a list that will be returned later on.

Lastly, we add the method that will be invoked when the -5d or -10d flag is used. Create another method called _five_and_ten_days_forecast with the following contents:

def _five_and_ten_days_forecast(self, args):
    content = self._request.fetch_data(args.forecast_option.value, args.area_code)
    results = self._parse_list_forecast(content, args)
    return self._prepare_data(results)

This method only fetches the contents of the page passing the forecast_option value and the area code, so it will be possible to build the URL to perform the request. When the data is returned, we pass it down to the _parse_list_forecast, which will return a list of Forecast objects (one for each day); finally, we prepare the data to be returned using the _prepare_data method.

Before we run the command, we need to enable this option in the command line tool that we implemented; go over to the __main__.py file, and, just after the definition of the -td flag, add the following code:

argparser.add_argument('-5d', '--fivedays',
                       dest='forecast_option',
                       action='store_const',
                       const=ForecastType.FIVEDAYS,
                       help='Shows the weather forecast for the next         
                       5 days')

Now, run the application again, but this time using the -5d or --fivedays flag:

$ python -m weatherterm -u Fahrenheit -a SWXX2372:1:SW -p WeatherComParser -5d

It will produce the following output:

>> [Today SEP 28]
    High 60° / Low 50° (Partly Cloudy)
    Wind: ESE 10 mph / Humidity: 78%

>> [Fri SEP 29]
    High 57° / Low 48° (Partly Cloudy)
    Wind: ESE 10 mph / Humidity: 79%

>> [Sat SEP 30]
    High 57° / Low 49° (Partly Cloudy)
    Wind: SE 10 mph / Humidity: 77%

>> [Sun OCT 1]
    High 55° / Low 51° (Cloudy)
    Wind: SE 14 mph / Humidity: 74%

>> [Mon OCT 2]
    High 55° / Low 48° (Rain)
    Wind: SSE 18 mph / Humidity: 87%

To wrap this section up, let's include the option to get the weather forecast for the next ten days as well, in the __main__.py file, just below the -5d flag definition. Add the following code:

argparser.add_argument('-10d', '--tendays',
                       dest='forecast_option',
                       action='store_const',
                       const=ForecastType.TENDAYS,
                       help='Shows the weather forecast for the next  
                       10 days')

If you run the same command as we used to get the five-day forecast but replace the -5d flag with -10d, like so:

$ python -m weatherterm -u Fahrenheit -a SWXX2372:1:SW -p WeatherComParser -10d

You should see the ten-day weather forecast output:

>> [Today SEP 28]
    High 60° / Low 50° (Partly Cloudy)
    Wind: ESE 10 mph / Humidity: 78%

>> [Fri SEP 29]
    High 57° / Low 48° (Partly Cloudy)
    Wind: ESE 10 mph / Humidity: 79%

>> [Sat SEP 30]
    High 57° / Low 49° (Partly Cloudy)
    Wind: SE 10 mph / Humidity: 77%

>> [Sun OCT 1]
    High 55° / Low 51° (Cloudy)
    Wind: SE 14 mph / Humidity: 74%

>> [Mon OCT 2]
    High 55° / Low 48° (Rain)
    Wind: SSE 18 mph / Humidity: 87%

>> [Tue OCT 3]
    High 56° / Low 46° (AM Clouds/PM Sun)
    Wind: S 10 mph / Humidity: 84%

>> [Wed OCT 4]
    High 58° / Low 47° (Partly Cloudy)
    Wind: SE 9 mph / Humidity: 80%

>> [Thu OCT 5]
    High 57° / Low 46° (Showers)
    Wind: SSW 8 mph / Humidity: 81%

>> [Fri OCT 6]
    High 57° / Low 46° (Partly Cloudy)
    Wind: SW 8 mph / Humidity: 76%

>> [Sat OCT 7]
    High 56° / Low 44° (Mostly Sunny)
    Wind: W 7 mph / Humidity: 80%

>> [Sun OCT 8]
    High 56° / Low 44° (Partly Cloudy)
    Wind: NNE 7 mph / Humidity: 78%

>> [Mon OCT 9]
    High 56° / Low 43° (AM Showers)
    Wind: SSW 9 mph / Humidity: 79%

>> [Tue OCT 10]
    High 55° / Low 44° (AM Showers)
    Wind: W 8 mph / Humidity: 79%

>> [Wed OCT 11]
    High 55° / Low 42° (AM Showers)
    Wind: SE 7 mph / Humidity: 79%

>> [Thu OCT 12]
    High 53° / Low 43° (AM Showers)
    Wind: NNW 8 mph / Humidity: 87%

As you can see, the weather was not so great here in Sweden while I was writing this book.

Getting the weekend weather forecast

The last weather forecast option that we are going to implement in our application is the option to get the weather forecast for the upcoming weekend. This implementation is a bit different from the others because the data returned by the weekend's weather is slightly different from today's, five, and ten days weather forecast.

The DOM structure is different and some CSS class names are different as well. If you remember the previous methods that we implemented, we always use the _parser method, which gives us arguments such as the container DOM and a dictionary with the search criteria. The return value of that method is also a dictionary where the key is the class name of the DOM that we were searching and the value is the text within that DOM element. 

Since the CSS class names of the weekend page are different, we need to implement some code to get that array of results and rename all the keys so the _prepare_data function can use scraped results properly.

With that said, let's go ahead and create a new file in the weatherterm/core directory called mapper.py with the following contents:

class Mapper:

    def __init__(self):
        self._mapping = {}

    def _add(self, source, dest):
        self._mapping[source] = dest

    def remap_key(self, source, dest):
        self._add(source, dest)

    def remap(self, itemslist):
        return [self._exec(item) for item in itemslist]

    def _exec(self, src_dict):
        dest = dict()

        if not src_dict:
            raise AttributeError('The source dictionary cannot be  
            empty or None')

        for key, value in src_dict.items():
            try:
                new_key = self._mapping[key]
                dest[new_key] = value
            except KeyError:
                dest[key] = value
        return dest

The Mapper class gets a list with dictionaries and renames specific keys that we would like to rename. The important methods here are remap_key and remap. The remap_key gets two arguments, source and dest. source is the key that we wish to rename and dest is the new name for that key. The remap_key method will add it to an internal dictionary called _mapping, which will be used later on to look up the new key name.

The remap method simply gets a list containing the dictionaries and, for every item on that list, it calls the _exec method that first creates a brand new dictionary, then checks whether the dictionary is empty. In that case, it raises an AttributeError.

If the dictionary has keys, we loop through its items, search for whether the current item's key has a new name in the mapping dictionary. If the new key name is found, will to create a new item with the new key name; otherwise, we just keep the old name. After the loop, the list is returned with all the dictionaries containing the keys with a new name.

Now, we just need to add it to the __init__.py file in the weatherterm/core directory:

from .mapper import Mapper

And, in the weather_com_parser.py file in weatherterm/parsers, we need to import the Mapper:

from weatherterm.core import Mapper

With the mapper in place, we can go ahead and create the _weekend_forecast method in the weather_com_parser.py file, like so:

def _weekend_forecast(self, args):
    criteria = {
        'weather-cell': 'header',
        'temp': 'p',
        'weather-phrase': 'h3',
        'wind-conditions': 'p',
        'humidity': 'p',
    }

    mapper = Mapper()
    mapper.remap_key('wind-conditions', 'wind')
    mapper.remap_key('weather-phrase', 'description')

    content = self._request.fetch_data(args.forecast_option.value,
                                       args.area_code)

    bs = BeautifulSoup(content, 'html.parser')

    forecast_data = bs.find('article', class_='ls-mod')
    container = forecast_data.div.div

    partial_results = self._parse(container, criteria)
    results = mapper.remap(partial_results)

    return self._prepare_data(results, args)

The method starts off by defining the criteria in exactly the same way as the other methods; however, the DOM structure is slightly different and some of the CSS names are also different:

  • weather-cell: Contains the forecast date: FriSEP 29
  • temp: Contains the temperature (high and low): 57°F48°F
  • weather-phrase: Contains the weather conditions: Cloudy
  • wind-conditions: Wind information
  • humidity: The humidity percentage

As you can see, to make it play nicely with the _prepare_data method, we will need to rename some keys in the dictionaries in the result set—wind-conditions should be wind and weather-phrase should be the description.

Luckily, we have introduced the Mapper class to help us out:

mapper = Mapper()
mapper.remap_key('wind-conditions', 'wind')
mapper.remap_key('weather-phrase', 'description')

We create a Mapper object and say, remap wind-conditions to wind and weather-phrase to description:

content = self._request.fetch_data(args.forecast_option.value,
                                   args.area_code)

bs = BeautifulSoup(content, 'html.parser')

forecast_data = bs.find('article', class_='ls-mod')
container = forecast_data.div.div

partial_results = self._parse(container, criteria)

We fetch all the data, create a BeautifulSoup object using the html.parser, and find the container element that contains the children elements that we are interested in. For the weekend forecast, we are interested in getting the article element with a CSS class called ls-mod and within that article we go down to the first child element, which is a DIV, and gets its first child element, which is also a DIV element.

The HTML should look something like this:

<article class='ls-mod'>
  <div>
    <div>
      <!-- this DIV will be our container element -->
    </div>
  </div>
</article>

That's the reason we first find the article, assign it to forecast_data, and then use forecast_data.div.div so we get the DIV element we want.

After defining the container, we pass it to the _parse method together with the container element; when we get the results back, we simply need to run the remap method of the Mapper instance, which will normalize the data for us before we call _prepare_data.

Now, the last detail before we run the application and get the weather forecast for the weekend is that we need to include the --w and --weekend flag to the ArgumentParser. Open the __main__.py file in the weatherterm directory and, just below the --tenday flag, add the following code:

argparser.add_argument('-w', '--weekend',
                       dest='forecast_option',
                       action='store_const',
                       const=ForecastType.WEEKEND,
                       help=('Shows the weather forecast for the 
                             next or '
                             'current weekend'))

Great! Now, run the application using the -w or --weekend flag:

>> [Fri SEP 29]
    High 13.9° / Low 8.9° (Partly Cloudy)
    Wind: ESE 10 mph / Humidity: 79%

>> [Sat SEP 30]
    High 13.9° / Low 9.4° (Partly Cloudy)
    Wind: SE 10 mph / Humidity: 77%

>> [Sun OCT 1]
    High 12.8° / Low 10.6° (Cloudy)
    Wind: SE 14 mph / Humidity: 74%

Note that this time, I used the -u flag to choose Celsius. All the temperatures in the output are represented in Celsius instead of Fahrenheit.

 

Summary


In this Chapter 1, Implementing the Weather Application, you learned the basics of object-oriented programming in Python; we covered how to create classes, use inheritance, and use the @property decorators to create getter and setters.

We covered how to use the inspect module to get more information about modules, classes, and functions. Last but not least, we made use of the powerful package Beautifulsoup to parse HTML and Selenium to make requests to the weather website.

We also learned how to implement command line tools using the argparse module from Python's standard library, which allows us to provide tools that are easier to use and with very helpful documentation.

Next up, we are going to develop a small wrapper around the Spotify Rest API and use it to create a remote control terminal.

About the Authors
  • Daniel Furtado

    Daniel Furtado is a software developer with over 20 years of experience in different technologies such as Python, C, .NET, C#, and JavaScript. He started programming at the age of 13 on his ZX Spectrum. He joined the Bioinformatics Laboratory of the Human Cancer Genome Project in Brazil, where he developed web applications and tools in Perl and Python to help researchers analyze data. He has never stopped developing in Python ever since. Daniel has worked on various open source projects; the latest one is a PyTerrier web micro-framework.

    Browse publications by this author
  • Marcus Pennington

    Marcus Pennington started his journey into computer science at Highams Park Sixth Form College where he took a Cisco CCNA course. He then went to the University of Hertfordshire, where he graduated with a degree in Computer Science with Artificial Intelligence. Since then, he has had the privilege of working with some of the best developers and learning the benefits and pitfalls of many of the software practices we use today. He has a passion for writing clean, cohesive, and beautiful code.

    Browse publications by this author
Latest Reviews (8 reviews total)
Great products. Love the price and the current-Ness of the books
You always want something else in a book. If the price is right then is acceptable
So far so good. Very informative. Not yet finished reading it
Python Programming Blueprints
Unlock this book and the full library FREE for 7 days
Start now