The first application in this book is going to be a web scraping application that will scrape weather forecast information from https://weather.com and present it in a terminal. We will add some options that can be passed as arguments to the application, such as:
- The temperature unit (Celsius or Fahrenheit)
- The area where you can get the weather forecast
- Output options where the user of our application can choose between the current forecast, a five-day forecast, a ten-day forecast, and the weekend
- Ways to complement the output with extra information such as wind and humidity
Apart from the aforementioned arguments, this application will be designed to be extendable, which means that we can create parsers for different websites to get a weather forecast, and these parsers will be available as argument options.
In this chapter, you will learn how to:
- Use object-oriented programming concepts in Python applications
- Scrape data from websites using the
BeautifulSoup
package - Receive command line arguments
- Utilize the
inspect
module - Load Python modules dynamically
- Use Python comprehensions
- Use
Selenium
to request a webpage and inspect its DOM elements
Before we get started, it is important to say that when developing web scraping applications, you should keep in mind that these types of applications are susceptible to changes. If the developers of the site that you are getting data from change a CSS class name, or the structure of the HTML DOM, the application will stop working. Also, if the URL of the site we are getting the data from changes, the application will not be able to send requests.
Before we get right into writing our first example, we need to set up an environment to work and install any dependencies that the project may have. Luckily, Python has a really nice tooling system to work with virtual environments.
Virtual environments in Python are a broad subject, and beyond the scope of this book. However, if you are not familiar with virtual environments, it will suffice to know that a virtual environment is a contained Python environment that is isolated from your global Python installation. This isolation allows developers to easily work with different versions of Python, install packages within the environment, and manage project dependencies without interfering with Python's global installation.
Python's installation comes with a module called venv
, which you can use to create virtual environments; the syntax is fairly straightforward. The application that we are going to create is called weatherterm
(weather terminal), so we can create a virtual environment with the same name to make it simple.
To create a new virtual environment, open a terminal and run the following command:
$ python3 -m venv weatherterm
If everything goes well, you should see a directory called weatherterm
in the directory you are currently at. Now that we have the virtual environment, we just need to activate it with the following command:
$ . weatherterm/bin/activate
Note
I recommend installing and using virtualenvwrapper
, which is an extension of the virtualenv
tool. This makes it very simple to manage, create, and delete virtual environments as well as quickly switch between them. If you wish to investigate this further, visit: https://virtualenvwrapper.readthedocs.io/en/latest/#.
Now, we need to create a directory where we are going to create our application. Don't create this directory in the same directory where you created the virtual environment; instead, create a projects directory and create the directory for the application in there. I would recommend you name it with the same name as the virtual environment for simplicity.
Note
I am setting the environment and running all the examples in a machine with Debian 9.2 installed, and at the time of writing, I am running the latest Python version (3.6.2). If you are a Mac user, it shouldn't be so different; however, if you are on Windows, the steps can be slightly different, but it is not hard to find information on how to set up virtual environments on it. A Python 3 installation works nicely on Windows nowadays.
Go into the project's directory that you just created and create a file named requirements.txt
with the following content:
beautifulsoup4==4.6.0 selenium==3.6.0
These are all the dependencies that we need for this project:
BeautifulSoup
: This is a package for parsing HTML and XML files. We will be using it to parse the HTML that we fetch from weather sites and to get the weather data we need on the terminal. It is very simple to use and it has a great documentation available online at: http://beautiful-soup-4.readthedocs.io/en/latest/.Selenium
: This is a well-known set of tools for testing. There are many applications, but it is mostly used for the automated testing of web applications.
To install the required packages in our virtual environment, you can run the following command:
pip install -r requirements.txt
Note
It is always a good idea to make use of version-control tools like GIT or Mercurial. It is very helpful to control changes, check history, rollback changes, and more. If you are not familiar with any of these tools, there are plenty of tutorials on the internet. You can get started by checking the documentation for GIT at: https://git-scm.com/book/en/v1/Getting-Started.
One last tool that we need to install is PhantomJS; you can download it from: http://phantomjs.org/download.html
After downloading it, extract the contents inside the weatherterm
directory and rename the folder to phantomjs.
With our virtual environment set up and PhantomJS installed, we are ready to start coding!
Let's start by creating a directory for your module. Inside of the project's root directory, create a subdirectory called weatherterm.
The subdirectory weatherterm
is where our module will live. The module directory needs two subdirectories - core
and parsers
. The project's directory structure should look like this:
weatherterm ├── phantomjs └── weatherterm ├── core ├── parsers
This application is intended to be flexible and allow developers to create different parsers for different weather websites. We are going to create a parser loader that will dynamically discover files inside of the parsers
directory, load them, and make them available to be used by the application without requiring changes to any other parts of the code. Here are the rules that our loader will require when implementing new parsers:
- Create a file with a class implementing the methods for fetching the current weather forecast as well as five-day, ten-day, and weekend weather forecasts
- The file name has to end with
parser
, for example,weather_com_parser.py
- The file name can't start with double underscores
With that said, let's go ahead and create the parser loader. Create a file namedparser_loader.py
inside of the weatherterm/core
directory and add the following content:
import os import re import inspect def _get_parser_list(dirname): files = [f.replace('.py', '') for f in os.listdir(dirname) if not f.startswith('__')] return files def _import_parsers(parserfiles): m = re.compile('.+parser$', re.I) _modules = __import__('weatherterm.parsers', globals(), locals(), parserfiles, 0) _parsers = [(k, v) for k, v in inspect.getmembers(_modules) if inspect.ismodule(v) and m.match(k)] _classes = dict() for k, v in _parsers: _classes.update({k: v for k, v in inspect.getmembers(v) if inspect.isclass(v) and m.match(k)}) return _classes def load(dirname): parserfiles = _get_parser_list(dirname) return _import_parsers(parserfiles)
First, the _get_parser_list
function is executed and returns a list of all files located in weatherterm/parsers
; it will filter the files based on the rules of the parser described previously. After returning a list of files, it is time to import the module. This is done by the _import_parsers
function, which first imports the weatherterm.parsers
module and makes use of the inspect package in the standard library to find the parser classes within the module.
The inspect.getmembers
function returns a list of tuples where the first item is a key representing a property in the module, and the second item is the value, which can be of any type. In our scenario, we are interested in a property with a key ending with parser
and with the value of type class.
Assuming that we already have a parser in place in the weatherterm/parsers
directory, the value returned by the inspect.getmembers(_modules)
will look something like this:
[('WeatherComParser', <class 'weatherterm.parsers.weather_com_parser.WeatherComParser'>), ...]
Note
inspect.getmembers(_module)
returns many more items, but they have been omitted since it is not relevant to show all of them at this point.
Lastly, we loop through the items in the module and extract the parser classes, returning a dictionary containing the name of the class and the class object that will be later used to create instances of the parser.
Let's start creating the model that will represent all the information that our application will scrape from the weather website. The first item we are going to add is an enumeration to represent each option of the weather forecast we will provide to the users of our application. Create a file named forecast_type.py
in the directory weatherterm/core
with the following contents:
from enum import Enum, unique @unique class ForecastType(Enum): TODAY = 'today' FIVEDAYS = '5day' TENDAYS = '10day' WEEKEND = 'weekend'
Enumerations have been in Python's standard library since version 3.4 and they can be created using the syntax for creating classes. Just create a class inheriting from enum.Enum
containing a set of unique properties set to constant values. Here, we have values for the four types of forecast that the application will provide, and where values such as ForecastType.TODAY
, ForecastType.WEEKEND
, and so on can be accessed.
Note that we are assigning constant values that are different from the property item of the enumeration, the reason being that later these values will be used to build the URL to make requests to the weather website.
The application needs one more enumeration to represent the temperature units that the user will be able to choose from in the command line. This enumeration will contain Celsius and Fahrenheit items.
First, let's include a base enumeration. Create a file called base_enum.py
in the weatherterm/core
directory with the following contents:
from enum import Enum class BaseEnum(Enum): def _generate_next_value_(name, start, count, last_value): return name
BaseEnum
is a very simple class inheriting from Enum
. The only thing we want to do here is override the method _generate_next_value_
so that every enumeration that inherits from BaseEnum
and has properties with the value set to auto()
will automatically get the same value as the property name.
Now, we can create an enumeration for the temperature units. Create a file called unit.py
in the weatherterm/core
directory with the following content:
from enum import auto, unique from .base_enum import BaseEnum @unique class Unit(BaseEnum): CELSIUS = auto() FAHRENHEIT = auto()
This class inherits from the BaseEnum
that we just created, and every property is set to auto()
, meaning the value for every item in the enumeration will be set automatically for us. Since the Unit
class inherits from BaseEnum
, every time the auto()
is called, the _generate_next_value_
method on BaseEnum
will be invoked and will return the name of the property itself.
Before we try this out, let's create a file called __init__.py
in the weatherterm/core
directory and import the enumeration that we just created, like so:
from .unit import Unit
If we load this class in the Python REPL and check the values, the following will occur:
Python 3.6.2 (default, Sep 11 2017, 22:31:28) [GCC 6.3.0 20170516] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from weatherterm.core import Unit >>> [value for key, value in Unit.__members__.items()] [<Unit.CELSIUS: 'CELSIUS'>, <Unit.FAHRENHEIT: 'FAHRENHEIT'>]
Another item that we also want to add to the core module of our application is a class to represent the weather forecast data that the parser returns. Let's go ahead and create a file named forecast.py
in the weatherterm/core
directory with the following contents:
from datetime import date from .forecast_type import ForecastType class Forecast: def __init__( self, current_temp, humidity, wind, high_temp=None, low_temp=None, description='', forecast_date=None, forecast_type=ForecastType.TODAY): self._current_temp = current_temp self._high_temp = high_temp self._low_temp = low_temp self._humidity = humidity self._wind = wind self._description = description self._forecast_type = forecast_type if forecast_date is None: self.forecast_date = date.today() else: self._forecast_date = forecast_date @property def forecast_date(self): return self._forecast_date @forecast_date.setter def forecast_date(self, forecast_date): self._forecast_date = forecast_date.strftime("%a %b %d") @property def current_temp(self): return self._current_temp @property def humidity(self): return self._humidity @property def wind(self): return self._wind @property def description(self): return self._description def __str__(self): temperature = None offset = ' ' * 4 if self._forecast_type == ForecastType.TODAY: temperature = (f'{offset}{self._current_temp}\xb0\n' f'{offset}High {self._high_temp}\xb0 / ' f'Low {self._low_temp}\xb0 ') else: temperature = (f'{offset}High {self._high_temp}\xb0 / ' f'Low {self._low_temp}\xb0 ') return(f'>> {self.forecast_date}\n' f'{temperature}' f'({self._description})\n' f'{offset}Wind: ' f'{self._wind} / Humidity: {self._humidity}\n')
In the Forecast class, we will define properties for all the data we are going to parse:
| Represents the current temperature. It will only be available when getting today's weather forecast. |
| The humidity percentage for the day. |
| Information about today's current wind levels. |
| The highest temperature for the day. |
| The lowest temperature for the day. |
| A description of the weather conditions, for example, Partly Cloudy. |
| Forecast date; if not supplied, it will be set to the current date. |
| Any value in the enumeration |
We can also implement two methods called forecast_date
with the decorators @property
and @forecast_date.setter
. The @property
decorator will turn the method into a getter for the _forecast_date
property of the Forecast class, and the @forecast_date.setter
will turn the method into a setter. The setter was defined here because, every time we need to set the date in an instance of Forecast
, we need to make sure that it will be formatted accordingly. In the setter, we call the strftime
method, passing the format codes %a
(weekday abbreviated name), %b
(monthly abbreviated name), and %d
(day of the month).
Note
The format codes %a
and %b
will use the locale configured in the machine that the code is running on.
Lastly, we override the __str__
method to allow us to format the output the way we would like when using the print
, format
, and str
functions.
By default, the temperature unit used by weather.com
is Fahrenheit
, and we want to give the users of our application the option to use Celsius instead. So, let's go ahead and create one more file in the weatherterm/core
directory called unit_converter.py
with the following content:
from .unit import Unit
class UnitConverter:
def __init__(self, parser_default_unit, dest_unit=None):
self._parser_default_unit = parser_default_unit
self.dest_unit = dest_unit
self._convert_functions = {
Unit.CELSIUS: self._to_celsius,
Unit.FAHRENHEIT: self._to_fahrenheit,
}
@property
def dest_unit(self):
return self._dest_unit
@dest_unit.setter
def dest_unit(self, dest_unit):
self._dest_unit = dest_unit
def convert(self, temp):
try:
temperature = float(temp)
except ValueError:
return 0
if (self.dest_unit == self._parser_default_unit or
self.dest_unit is None):
return self._format_results(temperature)
func = self._convert_functions[self.dest_unit]
result = func(temperature)
return self._format_results(result)
def _format_results(self, value):
return int(value) if value.is_integer() else f'{value:.1f}'
def _to_celsius(self, fahrenheit_temp):
result = (fahrenheit_temp - 32) * 5/9
return result
def _to_fahrenheit(self, celsius_temp):
result = (celsius_temp * 9/5) + 32
return result
This is the class that is going to make the temperature conversions from Celsius to Fahrenheit and vice versa. The initializer of this class gets two arguments; the default unit used by the parser and the destination unit. In the initializer, we will define a dictionary containing the functions that will be used for temperature unit conversion.
The convert
method only gets one argument, the temperature. Here, the temperature is a string, so the first thing we need to do is try converting it to a float value; if it fails, it will return a zero value right away.
You can also verify whether the destination unit is the same as the parser's default unit or not. In that case, we don't need to continue and perform any conversion; we simply format the value and return it.
If we need to perform a conversion, we can look up the _convert_functions
dictionary to find the conversion
function that we need to run. If we find the function we are looking for, we invoke it and return the formatted value.
The code snippet below shows the _format_results
method, which is a utility method that will format the temperature value for us:
return int(value) if value.is_integer() else f'{value:.1f}'
The _format_results
method checks if the number is an integer; the value.is_integer()
will return True
if the number is, for example, 10.0. If True
, we will use the int
function to convert the value to 10; otherwise, the value is returned as a fixed-point number with a precision of 1. The default precision in Python is 6. Lastly, there are two utility methods that perform the temperature conversions, _to_celsius
and _to_fahrenheit
.
Now, we only need to edit the __init__.py
file in the weatherterm/core
directory and include the following import statements:
from .base_enum import BaseEnum from .unit_converter import UnitConverter from .forecast_type import ForecastType from .forecast import Forecast
We are going to add a class named Request
that will be responsible for getting the data from the weather website. Let's add a file named request.py
in the weatherterm/core
directory with the following content:
import os from selenium import webdriver class Request: def __init__(self, base_url): self._phantomjs_path = os.path.join(os.curdir, 'phantomjs/bin/phantomjs') self._base_url = base_url self._driver = webdriver.PhantomJS(self._phantomjs_path) def fetch_data(self, forecast, area): url = self._base_url.format(forecast=forecast, area=area) self._driver.get(url) if self._driver.title == '404 Not Found': error_message = ('Could not find the area that you ' 'searching for') raise Exception(error_message) return self._driver.page_source
This class is very simple; the initializer defines the base URL and creates a PhantomJS driver, using the path where PhantomJS is installed. The fetch_data
method formats the URL, adding the forecast option and the area. After that, the webdriver
performs a request and returns the page source. If the title of the markup returned is 404 Not Found
, it will raise an exception. Unfortunately, Selenium
doesn't provide a proper way of getting the HTTP Status code; this would have been much better than comparing strings.
Note
You may notice that I prefix some of the class properties with an underscore sign. I usually do that to show that the underlying property is private and shouldn't be set outside the class. In Python, there is no need to do that because there's no way to set private or public properties; however, I like it because I can clearly show my intent.
Now, we can import it in the __init__.py
file in the weatherterm/core
directory:
from .request import Request
Now we have a parser loader to load any parser that we drop into the directory weatherterm/parsers
, we have a class representing the forecast model, and an enumeration ForecastType
so we can specify which type of forecast we are parsing. The enumeration represents temperature units and utility functions to convert temperatures from Fahrenheit
to Celsius
and Celsius
to Fahrenheit.
So now, we should be ready to create the application's entry point to receive all the arguments passed by the user, run the parser, and present the data on the terminal.
Before we run our application for the first time, we need to add the application's entry point. The entry point is the first code that will be run when our application is executed.
We want to give the users of our application the best user experience possible, so the first features that we need to add are the ability to receive and parse command line arguments, perform argument validation, set arguments when needed, and, last but not least, show an organized and informative help system so the users can see which arguments can be used and how to use the application.
Sounds like tedious work, right?
Luckily, Python has batteries included and the standard library contains a great module that allows us to implement this in a very simple way; the module is called argparse
.
Another feature that would be good to have is for our application to be easy to distribute to our users. One approach is to create a __main__.py
file in the weatherterm
module directory, and you can run the module as a regular script. Python will automatically run the __main__.py
file, like so:
$ python -m weatherterm
Another option is to zip the entire application's directory and execute the Python passing the name of the ZIP file instead. This is an easy, fast, and simple way to distribute our Python programs.
There are many other ways of distributing your programs, but they are beyond the scope of this book; I just wanted to give you some examples of the usage of the __main__.py
file.
With that said, let's create a __main__.py
file inside of the weatherterm
directory with the following content:
import sys from argparse import ArgumentParser from weatherterm.core import parser_loader from weatherterm.core import ForecastType from weatherterm.core import Unit def _validate_forecast_args(args): if args.forecast_option is None: err_msg = ('One of these arguments must be used: ' '-td/--today, -5d/--fivedays, -10d/--tendays, - w/--weekend') print(f'{argparser.prog}: error: {err_msg}', file=sys.stderr) sys.exit() parsers = parser_loader.load('./weatherterm/parsers') argparser = ArgumentParser( prog='weatherterm', description='Weather info from weather.com on your terminal') required = argparser.add_argument_group('required arguments') required.add_argument('-p', '--parser', choices=parsers.keys(), required=True, dest='parser', help=('Specify which parser is going to be used to ' 'scrape weather information.')) unit_values = [name.title() for name, value in Unit.__members__.items()] argparser.add_argument('-u', '--unit', choices=unit_values, required=False, dest='unit', help=('Specify the unit that will be used to display ' 'the temperatures.')) required.add_argument('-a', '--areacode', required=True, dest='area_code', help=('The code area to get the weather broadcast from. ' 'It can be obtained at https://weather.com')) argparser.add_argument('-v', '--version', action='version', version='%(prog)s 1.0') argparser.add_argument('-td', '--today', dest='forecast_option', action='store_const', const=ForecastType.TODAY, help='Show the weather forecast for the current day') args = argparser.parse_args() _validate_forecast_args(args) cls = parsers[args.parser] parser = cls() results = parser.run(args) for result in results: print(results)
The weather forecast options (today, five days, ten days, and weekend forecast) that our application will accept will not be required; however, at least one option must be provided in the command line, so we create a simple function called _validate_forecast_args
to perform this validation for us. This function will show a help message and exit the application.
First, we get all the parsers available in the weatherterm/parsers
directory. The list of parsers will be used as valid values for the parser argument.
It is the ArgumentParser
object that does the job of defining the parameters, parsing the values, and showing help, so we create an instance of ArgumentParser
and also create an argument group for the required parameters. This will make the help output look much nicer and organized.
In order to make the parameters and the help output more organized, we are going to create a group within the ArgumentParser
object. This group will contain all the required arguments that our application needs. This way, the users of our application can easily see which parameters are required and the ones that are not required.
We achieve this with the following statement:
required = argparser.add_argument_group('required arguments')
After creating the argument group for the required arguments, we get a list of all members of the enumeration Unit
and use the title()
function to make only the first letter a capital letter.
Now, we can start adding the arguments that our application will be able to receive on the command line. Most argument definitions use the same set of keyword arguments, so I will not be covering all of them.
The first argument that we will create is --parser
or -p
:
required.add_argument('-p', '--parser', choices=parsers.keys(), required=True, dest='parser', help=('Specify which parser is going to be used to ' 'scrape weather information.'))
Let's break down every parameter of the add_argument
used when creating the parser flag:
- The first two parameters are the flags. In this case, the user passes a value to this argument using either
-p
or--parser
in the command line, for example,--parser WeatherComParser
. - The
choices
parameter specifies a list of valid values for that argument that we are creating. Here, we are usingparsers.keys()
, which will return a list of parser names. The advantage of this implementation is that if we add a new parser, it will be automatically added to this list, and no changes will be required in this file. - The
required
parameter, as the name says, specifies if the argument will be required or not. - The
dest
parameter specifies the name of the attribute to be added to the resulting object of the parser argument. The object returned byparser_args()
will contain an attribute calledparser
with the value that we passed to this argument in the command line. - Finally, the
help
parameter is the argument's help text, shown when using the-h
or--help
flag.
Moving on to the --today
argument:
argparser.add_argument('-td', '--today', dest='forecast_option', action='store_const', const=ForecastType.TODAY, help='Show the weather forecast for the current day')
Heren we have two keyword arguments that we haven't seen before, action
and const
.
Actions can be bound to the arguments that we create and they can perform many things. The argparse
module contains a great set of actions, but if you need to do something specific, you can create your own action that will meet your needs. Most actions defined in the argparse
module are actions to store values in the parse result's object attributes.
In the previous code snippet, we use the store_const
action, which will store a constant value to an attribute in the object returned by parse_args()
.
We also used the keyword argument const
, which specifies the constant default value when the flag is used in the command line.
Remember that I mentioned that it is possible to create custom actions? The argument unit is a great use case for a custom action. The choices
argument is just a list of strings, so we use this comprehension to get the list of names of every item in the Unit
enumeration, as follows:
unit_values = [name.title() for name, value in Unit.__members__.items()] required.add_argument('-u', '--unit', choices=unit_values, required=False, dest='unit', help=('Specify the unit that will be used to display ' 'the temperatures.'))
The object returned by parse_args()
will contain an attribute called unit with a string value (Celsius
or Fahrenheit
), but this is not exactly what we want. Wouldn't it be nice to have the value as an enumeration item instead? We can change this behavior by creating a custom action.
First, add a new file named set_unit_action.py
in the weatherterm/core
directory with the following contents:
from argparse import Action from weatherterm.core import Unit class SetUnitAction(Action): def __call__(self, parser, namespace, values, option_string=None): unit = Unit[values.upper()] setattr(namespace, self.dest, unit)
This action class is very simple; it just inherits from argparse.Action
and overrides the __call__
method, which will be called when the argument value is parsed. This is going to be set to the destination attribute.
The parser
parameter will be an instance of ArgumentParser
. The namespace is an instance of argparser.Namespace
and it is just a simple class containing all the attributes defined in the ArgumentParser
object. If you inspect this parameter with the debugger, you will see something similar to this:
Namespace(area_code=None, fields=None, forecast_option=None, parser=None, unit=None)
The values
parameter is the value that the user has passed on the command line; in our case, it can be either Celsius or Fahrenheit. Lastly, the option_string
parameter is the flag defined for the argument. For the unit argument, the value of option_string
will be -u
.
Fortunately, enumerations in Python allow us to access their members and attributes using item access:
Unit[values.upper()]
Verifying this in Python REPL, we have:
>>> from weatherterm.core import Unit >>> Unit['CELSIUS'] <Unit.CELSIUS: 'CELSIUS'> >>> Unit['FAHRENHEIT'] <Unit.FAHRENHEIT: 'FAHRENHEIT'>
After getting the correct enumeration member, we set the value of the property specified by self.dest
in the namespace object. That is much cleaner and we don't need to deal with magic strings.
With the custom action in place, we need to add the import statement in the __init__.py
file in the weatherterm/core
directory:
from .set_unit_action import SetUnitAction
Just include the line above at the end of the file. Then, we need to import it into the __main__.py
file, like so:
from weatherterm.core import SetUnitAction
And we are going to add the action
keyword argument in the definition of the unit argument and set it to SetUnitAction
, like so:
required.add_argument('-u', '--unit', choices=unit_values, required=False, action=SetUnitAction, dest='unit', help=('Specify the unit that will be used to display ' 'the temperatures.'))
So, when the user of our application uses the flag -u
for Celsius, the value of the attribute unit in the object returned by the parse_args()
function will be:
<Unit.CELSIUS: 'CELSIUS'>
The rest of the code is very straightforward; we invoke the parse_args
function to parse the arguments and set the result in the args
variable. Then, we use the value of args.parser
(the name of the selected parser) and access that item in the parser's dictionary. Remember that the value is the class type, so we create an instance of the parser, and lastly, invoke the method run, which will kick off website scraping.
In order to run our code for the first time, we need to create a parser. We can quickly create a parser to run our code and check whether the values are being parsed properly.
Let's go ahead and create a file called weather_com_parser.py
in the weatherterm/parsers
directory. To make it simple, we are going to create just the necessary methods, and the only thing we are going to do when the methods are invoked is to raise a NotImplementedError
:
from weatherterm.core import ForecastType class WeatherComParser: def __init__(self): self._forecast = { ForecastType.TODAY: self._today_forecast, ForecastType.FIVEDAYS: self._five_and_ten_days_forecast, ForecastType.TENDAYS: self._five_and_ten_days_forecast, ForecastType.WEEKEND: self._weekend_forecast, } def _today_forecast(self, args): raise NotImplementedError() def _five_and_ten_days_forecast(self, args): raise NotImplementedError() def _weekend_forecast(self, args): raise NotImplementedError() def run(self, args): self._forecast_type = args.forecast_option forecast_function = self._forecast[args.forecast_option] return forecast_function(args)
In the initializer, we create a dictionary where the key is a member of the ForecasType
enumeration, and the value is the method bound to any of these options. Our application will be able to present today's, a five-day, ten-day, and the weekend forecast, so we implement all four methods.
The run
method only does two things; it looks up the function that needs to be executed using the forecast_option
that we passed as an argument in the command line, and executes the function returning its value.
Now, the application is finally ready to be executed for the first time if you run the command in the command line:
$ python -m weatherterm --help
You should see the application's help options:
usage: weatherterm [-h] -p {WeatherComParser} [-u {Celsius,Fahrenheit}] -a AREA_CODE [-v] [-td] [-5d] [-10d] [-w] Weather info from weather.com on your terminal optional arguments: -h, --help show this help message and exit -u {Celsius,Fahrenheit}, --unit {Celsius,Fahrenheit} Specify the unit that will be used to display the temperatures. -v, --version show program's version number and exit -td, --today Show the weather forecast for the current day require arguments: -p {WeatherComParser}, --parser {WeatherComParser} Specify which parser is going to be used to scrape weather information. -a AREA_CODE, --areacode AREA_CODE The code area to get the weather broadcast from. It can be obtained at https://weather.com
As you can see, the ArgumentParse
module already provides out-of-the-box output for help. There are ways you can customize the output how you want to, but I find the default layout really good.
Notice that the -p
argument already gave you the option to choose the WeatherComParser
. It wasn't necessary to hardcode it anywhere because the parser loader did all the work for us. The -u
(--unit
) flag also contains the items of the enumeration Unit
. If someday you want to extend this application and add new units, the only thing you need to do here is to add the new item to the enumeration, and it will be automatically picked up and included as an option for the -u
flag.
Now, if you run the application again and this time pass some parameters:
$ python -m weatherterm -u Celsius -a SWXX2372:1:SW -p WeatherComParser -td
You will get an exception similar to this:

Don't worry -- this is exactly what we wanted! If you follow the stack trace, you can see that everything is working as intended. When we run our code, we call the run
method on the selected parser from the __main__.py
file, then we select the method associated with the forecast option, in this case, _today_forecast
, and finally store the result in the forecast_function
variable.
When the function stored in the forecast_function
variable was executed, the NotImplementedError
exception was raised. So far so good; the code is working perfectly and now we can start adding the implementation for each of these methods.
The core functionality is in place and the entry point of the application with the argument parser will give the users of our application a much better experience. Now, it is finally the time we all have been waiting for, the time to start implementing the parser. We will start implementing the method to get today's weather forecast.
Since I am in Sweden, I will use the area code SWXX2372:1:SW
(Stockholm, Sweden); however, you can use any area code you want. To get the area code of your choice, go to https://weather.com and search for the area you want. After selecting the area, the weather forecast for the current day will be displayed. Note that the URL changes, for example, when searching Stockholm, Sweden, the URL changes to:
https://weather.com/weather/today/l/SWXX2372:1:SW
For São Paulo, Brazil it will be:
https://weather.com/weather/today/l/BRXX0232:1:BR
Note that there is only one part of the URL that changes, and this is the area code that we want to pass as an argument to our application.
To start with, we need to import some packages:
import re from weatherterm.core import Forecast from weatherterm.core import Request from weatherterm.core import Unit from weatherterm.core import UnitConverter
And in the initializer, we are going to add the following code:
self._base_url = 'http://weather.com/weather/{forecast}/l/{area}' self._request = Request(self._base_url) self._temp_regex = re.compile('([0-9]+)\D{,2}([0-9]+)') self._only_digits_regex = re.compile('[0-9]+') self._unit_converter = UnitConverter(Unit.FAHRENHEIT)
In the initializer, we define the URL template we are going to use to perform requests to the weather website; then, we create a Request
object. This is the object that will perform the requests for us.
Regular expressions are only used when parsing today's weather forecast temperatures.
We also define a UnitConverter
object and set the default unit to Fahrenheit
.
Now, we are ready to start adding two methods that will be responsible for actually searching for HTML elements within a certain class and return its contents. The first method is called _get_data
:
def _get_data(self, container, search_items): scraped_data = {} for key, value in search_items.items(): result = container.find(value, class_=key) data = None if result is None else result.get_text() if data is not None: scraped_data[key] = data return scraped_data
The idea of this method is to search items within a container that matches some criteria. The container
is just a DOM element in the HTML and the search_items
is a dictionary where the key is a CSS class and the value is the type of the HTML element. It can be a DIV, SPAN, or anything that you wish to get the value from.
It starts looping through search_items.items()
and uses the find method to find the element within the container. If the item is found, we use get_text
to extract the text of the DOM element and add it to a dictionary that will be returned when there are no more items to search.
The second method that we will implement is the _parser
method. This will make use of the _get_data
that we just implemented:
def _parse(self, container, criteria): results = [self._get_data(item, criteria) for item in container.children] return [result for result in results if result]
Here, we also get a container
and criteria
like the _get_data
method. The container is a DOM element and the criterion is a dictionary of nodes that we want to find. The first comprehension gets all the container's children elements and passes them to the _get_data
method.
The results will be a list of dictionaries with all the items that have been found, and we will only return the dictionaries that are not empty.
There are only two more helper methods we need to implement in order to get today's weather forecast in place. Let's implement a method called _clear_str_number
:
def _clear_str_number(self, str_number): result = self._only_digits_regex.match(str_number) return '--' if result is None else result.group()
This method will use a regular expression to make sure that only digits are returned.
And the last method that needs to be implemented is the _get_additional_info
method:
def _get_additional_info(self, content): data = tuple(item.td.span.get_text() for item in content.table.tbody.children) return data[:2]
This method loops through the table rows, getting the text of every cell. This comprehension will return lots of information about the weather, but we are only interested in the first 2
, the wind and the humidity.
It's time to start adding the implementation of the _today_forecast
method, but first, we need to import BeautifulSoup
. At the top of the file, add the following import statement:
from bs4 import BeautifulSoup
Now, we can start adding the _today_forecast
method:
def _today_forecast(self, args): criteria = { 'today_nowcard-temp': 'div', 'today_nowcard-phrase': 'div', 'today_nowcard-hilo': 'div', } content = self._request.fetch_data(args.forecast_option.value, args.area_code) bs = BeautifulSoup(content, 'html.parser') container = bs.find('section', class_='today_nowcard-container') weather_conditions = self._parse(container, criteria) if len(weather_conditions) < 1: raise Exception('Could not parse weather foreecast for today.') weatherinfo = weather_conditions[0] temp_regex = re.compile(('H\s+(\d+|\-{,2}).+' 'L\s+(\d+|\-{,2})')) temp_info = temp_regex.search(weatherinfo['today_nowcard-hilo']) high_temp, low_temp = temp_info.groups() side = container.find('div', class_='today_nowcard-sidecar') humidity, wind = self._get_additional_info(side) curr_temp = self._clear_str_number(weatherinfo['today_nowcard- temp']) self._unit_converter.dest_unit = args.unit td_forecast = Forecast(self._unit_converter.convert(curr_temp), humidity, wind, high_temp=self._unit_converter.convert( high_temp), low_temp=self._unit_converter.convert( low_temp), description=weatherinfo['today_nowcard- phrase']) return [td_forecast]
That is the function that will be called when the -td
or --today
flag is used on the command line. Let's break down this code so that we can easily understand what it does. Understanding this method is important because these methods parse data from other weather forecast options (five days, ten days, and weekend) that are very similar to this one.
The method's signature is quite simple; it only gets args
, which is the Argument
object that is created in the __main__
method. The first thing we do in this method is to create a criteria
dictionary with all the DOM elements that we want to find in the markup:
criteria = { 'today_nowcard-temp': 'div', 'today_nowcard-phrase': 'div', 'today_nowcard-hilo': 'div', }
As mentioned before, the key to the criteria
dictionary is the name of the DOM element's CSS class, and the value is the type of the HTML element:
- The
today_nowcard-temp
class is a CSS class of the DOM element containing the current temperature - The
today_nowcard-phrase
class is a CSS class of the DOM element containing weather conditions text (Cloudy, Sunny, and so on) - The
today_nowcard-hilo
class is the CSS class of the DOM element containing the highest and lowest temperature
Next, we are going to fetch, create, and use BeautifulSoup
to parse the DOM:
content = self._request.fetch_data(args.forecast_option.value, args.area_code) bs = BeautifulSoup(content, 'html.parser') container = bs.find('section', class_='today_nowcard-container') weather_conditions = self._parse(container, criteria) if len(weather_conditions) < 1: raise Exception('Could not parse weather forecast for today.') weatherinfo = weather_conditions[0]
First, we make use of the fetch_data
method of the Request
class that we created on the core module and pass two arguments; the first is the forecast option and the second argument is the area code that we passed on the command line.
After fetching the data, we create a BeautifulSoup
object passing the content
and a parser
. Since we are getting back HTML, we use html.parser
.
Now is the time to start looking for the HTML elements that we are interested in. Remember, we need to find an element that will be a container, and the _parser
function will search through the children elements and try to find items that we defined in the dictionary criteria. For today's weather forecast, the element that contains all the data we need is a section
element with the today_nowcard-container
CSS class.
BeautifulSoup
contains the find
method, which we can use to find elements in the HTML DOM with specific criteria. Note that the keyword argument is called class_
and not class
because class
is a reserved word in Python.
Now that we have the container element, we can pass it to the _parse
method, which will return a list. We perform a check if the result list contains at least one element and raise an exception if it is empty. If it is not empty, we just get the first element and assign it to the weatherinfo
variable. The weatherinfo
variable now contains a dictionary with all the items that we were looking for.
The next step is split the highest and lowest temperature:
temp_regex = re.compile(('H\s+(\d+|\-{,2}).+' 'L\s+(\d+|\-{,2})')) temp_info = temp_regex.search(weatherinfo['today_nowcard-hilo']) high_temp, low_temp = temp_info.groups()
We want to parse the text that has been extracted from the DOM element with the today_nowcard-hilo
CSS class, and the text should look something like H 50 L 60
, H -- L 60
, and so on. An easy and simple way of extracting the text we want is to use a regular expression:
H\s+(\d+|\-{,2}).L\s+(\d+|\-{,2})
We can break this regular expression into two parts. First, we want to get the highest temperature—H\s+(\d+|\-{,2})
; this means that it will match an H
followed by some spaces, and then it will group a value that matches either numbers or a maximum of two dash symbols. After that, it will match any character. Lastly, comes the second part that basically does the same; however, it starts matching an L
.
After executing the search method, it gets regular expression groups that have been returned calling the groups()
function, which in this case will return two groups, one for the highest temperature and the second for the lowest.
Other information that we want to provide to our users is information about wind and humidity. The container element that contains this information has a CSS class called today_nowcard-sidecar
:
side = container.find('div', class_='today_nowcard-sidecar') wind, humidity = self._get_additional_info(side)
We just find the container and pass it into the _get_additional_info
method that will loop through the children elements of the container, extracting the text and finally returning the results for us.
Finally, the last part of this method:
curr_temp = self._clear_str_number(weatherinfo['today_nowcard-temp']) self._unit_converter.dest_unit = args.unit td_forecast = Forecast(self._unit_converter.convert(curr_temp), humidity, wind, high_temp=self._unit_converter.convert( high_temp), low_temp=self._unit_converter.convert( low_temp), description=weatherinfo['today_nowcard- phrase']) return [td_forecast]
Since the current temperature contains a special character (degree sign) that we don't want to have at this point, we use the _clr_str_number
method to pass the today_nowcard-temp
item of the weatherinfo
dictionary.
Now that we have all the information we need, we construct the Forecast
object and return it. Note that we are returning an array here; this is because all other options that we are going to implement (five-day, ten-day, and weekend forecasts) will return a list, so to make it consistent; also to facilitate when we will have to display this information on the terminal, we are also returning a list.
Another thing to note is that we are making use of the convert method of our UnitConverter
to convert all the temperatures to the unit selected in the command line.
When running the command again:
$ python -m weatherterm -u Fahrenheit -a SWXX2372:1:SW -p WeatherComParser -td
You should see an output similar to this:

Congratulations! You have implemented your first web scraping application. Next up, let's add the other forecast options.
The site that we are currently scraping the weather forecast from (weather.com) also provides the weather forecast for five and ten days, so in this section, we are going to implement methods to parse these forecast options as well.
The markup of the pages that present data for five and ten days are very similar; they have the same DOM structure and share the same CSS classes, which makes it easier for us to implement just one method that will work for both options. Let's go ahead and add a new method to the wheater_com_parser.py
file with the following contents:
def _parse_list_forecast(self, content, args): criteria = { 'date-time': 'span', 'day-detail': 'span', 'description': 'td', 'temp': 'td', 'wind': 'td', 'humidity': 'td', } bs = BeautifulSoup(content, 'html.parser') forecast_data = bs.find('table', class_='twc-table') container = forecast_data.tbody return self._parse(container, criteria)
As I mentioned before, the DOM for the five- and ten-day weather forecasts is very similar, so we create the _parse_list_forecast
method, which can be used for both options. First, we define the criteria:
- The
date-time
is aspan
element and contains a string representing the day of the week - The
day-detail
is aspan
element and contains a string with the date, for example,SEP 29
- The
description
is aTD
element and contains the weather conditions, for example,Cloudy
temp
is aTD
element and contains temperature information such as high and low temperaturewind
is aTD
element and contains wind informationhumidity
is aTD
element and contains humidity information
Now that we have the criteria, we create a BeatufulSoup
object, passing the content and the html.parser
. All the data that we would like to get is on the table with a CSS class named twc-table
. We find the table and define the tbody
element as a container.
Finally, we run the _parse
method, passing the container
and the criteria
that we defined. The return of this function will look something like this:
[{'date-time': 'Today', 'day-detail': 'SEP 28', 'description': 'Partly Cloudy', 'humidity': '78%', 'temp': '60°50°', 'wind': 'ESE 10 mph '}, {'date-time': 'Fri', 'day-detail': 'SEP 29', 'description': 'Partly Cloudy', 'humidity': '79%', 'temp': '57°48°', 'wind': 'ESE 10 mph '}, {'date-time': 'Sat', 'day-detail': 'SEP 30', 'description': 'Partly Cloudy', 'humidity': '77%', 'temp': '57°49°', 'wind': 'SE 10 mph '}, {'date-time': 'Sun', 'day-detail': 'OCT 1', 'description': 'Cloudy', 'humidity': '74%', 'temp': '55°51°', 'wind': 'SE 14 mph '}, {'date-time': 'Mon', 'day-detail': 'OCT 2', 'description': 'Rain', 'humidity': '87%', 'temp': '55°48°', 'wind': 'SSE 18 mph '}]
Another method that we need to create is a method that will prepare the data for us, for example, parsing and converting temperature values and creating a Forecast
object. Add a new method called _prepare_data
with the following content:
def _prepare_data(self, results, args): forecast_result = [] self._unit_converter.dest_unit = args.unit for item in results: match = self._temp_regex.search(item['temp']) if match is not None: high_temp, low_temp = match.groups() try: dateinfo = item['weather-cell'] date_time, day_detail = dateinfo[:3], dateinfo[3:] item['date-time'] = date_time item['day-detail'] = day_detail except KeyError: pass day_forecast = Forecast( self._unit_converter.convert(item['temp']), item['humidity'], item['wind'], high_temp=self._unit_converter.convert(high_temp), low_temp=self._unit_converter.convert(low_temp), description=item['description'].strip(), forecast_date=f'{item["date-time"]} {item["day- detail"]}', forecast_type=self._forecast_type) forecast_result.append(day_forecast) return forecast_result
This method is quite simple. First, loop through the results and apply the regex that we created to split the high and low temperatures stored in item['temp']
. If there's a match, it will get the groups and assign the value to high_temp
and low_temp
.
After that, we create a Forecast
object and append it to a list that will be returned later on.
Lastly, we add the method that will be invoked when the -5d
or -10d
flag is used. Create another method called _five_and_ten_days_forecast
with the following contents:
def _five_and_ten_days_forecast(self, args): content = self._request.fetch_data(args.forecast_option.value, args.area_code) results = self._parse_list_forecast(content, args) return self._prepare_data(results)
This method only fetches the contents of the page passing the forecast_option
value and the area code, so it will be possible to build the URL to perform the request. When the data is returned, we pass it down to the _parse_list_forecast
, which will return a list of Forecast
objects (one for each day); finally, we prepare the data to be returned using the _prepare_data
method.
Before we run the command, we need to enable this option in the command line tool that we implemented; go over to the __main__.py
file, and, just after the definition of the -td
flag, add the following code:
argparser.add_argument('-5d', '--fivedays', dest='forecast_option', action='store_const', const=ForecastType.FIVEDAYS, help='Shows the weather forecast for the next 5 days')
Now, run the application again, but this time using the -5d
or --fivedays
flag:
$ python -m weatherterm -u Fahrenheit -a SWXX2372:1:SW -p WeatherComParser -5d
It will produce the following output:
>> [Today SEP 28] High 60° / Low 50° (Partly Cloudy) Wind: ESE 10 mph / Humidity: 78% >> [Fri SEP 29] High 57° / Low 48° (Partly Cloudy) Wind: ESE 10 mph / Humidity: 79% >> [Sat SEP 30] High 57° / Low 49° (Partly Cloudy) Wind: SE 10 mph / Humidity: 77% >> [Sun OCT 1] High 55° / Low 51° (Cloudy) Wind: SE 14 mph / Humidity: 74% >> [Mon OCT 2] High 55° / Low 48° (Rain) Wind: SSE 18 mph / Humidity: 87%
To wrap this section up, let's include the option to get the weather forecast for the next ten days as well, in the __main__.py
file, just below the -5d
flag definition. Add the following code:
argparser.add_argument('-10d', '--tendays', dest='forecast_option', action='store_const', const=ForecastType.TENDAYS, help='Shows the weather forecast for the next 10 days')
If you run the same command as we used to get the five-day forecast but replace the -5d
flag with -10d
, like so:
$ python -m weatherterm -u Fahrenheit -a SWXX2372:1:SW -p WeatherComParser -10d
You should see the ten-day weather forecast output:
>> [Today SEP 28] High 60° / Low 50° (Partly Cloudy) Wind: ESE 10 mph / Humidity: 78% >> [Fri SEP 29] High 57° / Low 48° (Partly Cloudy) Wind: ESE 10 mph / Humidity: 79% >> [Sat SEP 30] High 57° / Low 49° (Partly Cloudy) Wind: SE 10 mph / Humidity: 77% >> [Sun OCT 1] High 55° / Low 51° (Cloudy) Wind: SE 14 mph / Humidity: 74% >> [Mon OCT 2] High 55° / Low 48° (Rain) Wind: SSE 18 mph / Humidity: 87% >> [Tue OCT 3] High 56° / Low 46° (AM Clouds/PM Sun) Wind: S 10 mph / Humidity: 84% >> [Wed OCT 4] High 58° / Low 47° (Partly Cloudy) Wind: SE 9 mph / Humidity: 80% >> [Thu OCT 5] High 57° / Low 46° (Showers) Wind: SSW 8 mph / Humidity: 81% >> [Fri OCT 6] High 57° / Low 46° (Partly Cloudy) Wind: SW 8 mph / Humidity: 76% >> [Sat OCT 7] High 56° / Low 44° (Mostly Sunny) Wind: W 7 mph / Humidity: 80% >> [Sun OCT 8] High 56° / Low 44° (Partly Cloudy) Wind: NNE 7 mph / Humidity: 78% >> [Mon OCT 9] High 56° / Low 43° (AM Showers) Wind: SSW 9 mph / Humidity: 79% >> [Tue OCT 10] High 55° / Low 44° (AM Showers) Wind: W 8 mph / Humidity: 79% >> [Wed OCT 11] High 55° / Low 42° (AM Showers) Wind: SE 7 mph / Humidity: 79% >> [Thu OCT 12] High 53° / Low 43° (AM Showers) Wind: NNW 8 mph / Humidity: 87%
As you can see, the weather was not so great here in Sweden while I was writing this book.
The last weather forecast option that we are going to implement in our application is the option to get the weather forecast for the upcoming weekend. This implementation is a bit different from the others because the data returned by the weekend's weather is slightly different from today's, five, and ten days weather forecast.
The DOM structure is different and some CSS class names are different as well. If you remember the previous methods that we implemented, we always use the _parser
method, which gives us arguments such as the container DOM and a dictionary with the search criteria. The return value of that method is also a dictionary where the key is the class name of the DOM that we were searching and the value is the text within that DOM element.
Since the CSS class names of the weekend page are different, we need to implement some code to get that array of results and rename all the keys so the _prepare_data
function can use scraped results properly.
With that said, let's go ahead and create a new file in the weatherterm/core
directory called mapper.py
with the following contents:
class Mapper: def __init__(self): self._mapping = {} def _add(self, source, dest): self._mapping[source] = dest def remap_key(self, source, dest): self._add(source, dest) def remap(self, itemslist): return [self._exec(item) for item in itemslist] def _exec(self, src_dict): dest = dict() if not src_dict: raise AttributeError('The source dictionary cannot be empty or None') for key, value in src_dict.items(): try: new_key = self._mapping[key] dest[new_key] = value except KeyError: dest[key] = value return dest
The Mapper
class gets a list with dictionaries and renames specific keys that we would like to rename. The important methods here are remap_key
and remap
. The remap_key
gets two arguments, source
and dest
. source
is the key that we wish to rename and dest
is the new name for that key. The remap_key
method will add it to an internal dictionary called _mapping
, which will be used later on to look up the new key name.
The remap
method simply gets a list containing the dictionaries and, for every item on that list, it calls the _exec
method that first creates a brand new dictionary, then checks whether the dictionary is empty. In that case, it raises an AttributeError
.
If the dictionary has keys, we loop through its items, search for whether the current item's key has a new name in the mapping dictionary. If the new key name is found, will to create a new item with the new key name; otherwise, we just keep the old name. After the loop, the list is returned with all the dictionaries containing the keys with a new name.
Now, we just need to add it to the __init__.py
file in the weatherterm/core
directory:
from .mapper import Mapper
And, in the weather_com_parser.py
file in weatherterm/parsers
, we need to import the Mapper
:
from weatherterm.core import Mapper
With the mapper in place, we can go ahead and create the _weekend_forecast
method in the weather_com_parser.py
file, like so:
def _weekend_forecast(self, args): criteria = { 'weather-cell': 'header', 'temp': 'p', 'weather-phrase': 'h3', 'wind-conditions': 'p', 'humidity': 'p', } mapper = Mapper() mapper.remap_key('wind-conditions', 'wind') mapper.remap_key('weather-phrase', 'description') content = self._request.fetch_data(args.forecast_option.value, args.area_code) bs = BeautifulSoup(content, 'html.parser') forecast_data = bs.find('article', class_='ls-mod') container = forecast_data.div.div partial_results = self._parse(container, criteria) results = mapper.remap(partial_results) return self._prepare_data(results, args)
The method starts off by defining the criteria in exactly the same way as the other methods; however, the DOM structure is slightly different and some of the CSS names are also different:
weather-cell
: Contains the forecast date:FriSEP 29
temp
: Contains the temperature (high and low):57°F48°F
weather-phrase
: Contains the weather conditions:Cloudy
wind-conditions
: Wind informationhumidity
: The humidity percentage
As you can see, to make it play nicely with the _prepare_data
method, we will need to rename some keys in the dictionaries in the result set—wind-conditions
should be wind
and weather-phrase
should be the description
.
Luckily, we have introduced the Mapper
class to help us out:
mapper = Mapper() mapper.remap_key('wind-conditions', 'wind') mapper.remap_key('weather-phrase', 'description')
We create a Mapper
object and say, remap wind-conditions
to wind
and weather-phrase
to description
:
content = self._request.fetch_data(args.forecast_option.value, args.area_code) bs = BeautifulSoup(content, 'html.parser') forecast_data = bs.find('article', class_='ls-mod') container = forecast_data.div.div partial_results = self._parse(container, criteria)
We fetch all the data, create a BeautifulSoup
object using the html.parser
, and find the container element that contains the children elements that we are interested in. For the weekend forecast, we are interested in getting the article
element with a CSS class called ls-mod
and within that article
we go down to the first child element, which is a DIV, and gets its first child element, which is also a DIV element.
The HTML should look something like this:
<article class='ls-mod'> <div> <div> <!-- this DIV will be our container element --> </div> </div> </article>
That's the reason we first find the article, assign it to forecast_data
, and then use forecast_data.div.div
so we get the DIV element we want.
After defining the container, we pass it to the _parse
method together with the container element; when we get the results back, we simply need to run the remap
method of the Mapper
instance, which will normalize the data for us before we call _prepare_data
.
Now, the last detail before we run the application and get the weather forecast for the weekend is that we need to include the --w
and --weekend
flag to the ArgumentParser
. Open the __main__.py
file in the weatherterm
directory and, just below the --tenday
flag, add the following code:
argparser.add_argument('-w', '--weekend', dest='forecast_option', action='store_const', const=ForecastType.WEEKEND, help=('Shows the weather forecast for the next or ' 'current weekend'))
Great! Now, run the application using the -w
or --weekend
flag:
>> [Fri SEP 29] High 13.9° / Low 8.9° (Partly Cloudy) Wind: ESE 10 mph / Humidity: 79% >> [Sat SEP 30] High 13.9° / Low 9.4° (Partly Cloudy) Wind: SE 10 mph / Humidity: 77% >> [Sun OCT 1] High 12.8° / Low 10.6° (Cloudy) Wind: SE 14 mph / Humidity: 74%
Note that this time, I used the -u
flag to choose Celsius. All the temperatures in the output are represented in Celsius instead of Fahrenheit.
In this Chapter 1, Implementing the Weather Application, you learned the basics of object-oriented programming in Python; we covered how to create classes, use inheritance, and use the @property
decorators to create getter and setters.
We covered how to use the inspect module to get more information about modules, classes, and functions. Last but not least, we made use of the powerful package Beautifulsoup
to parse HTML and Selenium
to make requests to the weather website.
We also learned how to implement command line tools using the argparse
module from Python's standard library, which allows us to provide tools that are easier to use and with very helpful documentation.
Next up, we are going to develop a small wrapper around the Spotify Rest API and use it to create a remote control terminal.