Before we begin using Beautiful Soup, we should ensure that it is properly installed on our machine. The steps required are so simple that any user can install this in no time. In this chapter, we will be covering the following topics:
Installing Beautiful Soup
Verifying the installation of Beautiful Soup
Python supports the installation of third-party modules such as Beautiful Soup. In the best case scenario, we can expect that the module developer might have prepared a platform-specific installer, for example, an executable installer, in the case of Windows; an rpm package, in the case of Red Hat-based Linux operating systems (Red Hat, Open Suse, and so on); and a Debian package, in the case of Debian-based operating systems (Debian, Ubuntu, and so on). But this is not always the case and we should know the alternatives if the platform-specific installer is not available. We will discuss the different installation options available for Beautiful Soup in different operating systems, such as Linux, Windows, and Mac OS X. The Python version that we are going to use in the later examples for installing Beautiful Soup is Python 2.7.5 and the instructions for Python 3 are probably different. You can directly go to the installation section corresponding to the operating system.
Installing Beautiful Soup is pretty simple and straightforward in Linux machines. For recent versions of Debian or Ubuntu, Beautiful Soup is available as a package and we can install this using the system package manager. For other versions of Debian or Ubuntu, where Beautiful Soup is not available as a package, we can use alternative methods for installation.
Normally, these are the following three ways to install Beautiful Soup in Linux machines:
Using package manager
The choices are ranked depending on the complexity levels and to avoid the trial-and-error method. The easiest method is always using the package manager since it requires less effort from the user, so we will cover this first. If the installation is successful in one step, we don't need to do the next because the three steps mentioned previously do the same thing.
Linux machines normally come with a package manager to install various packages. In the recent version of Debian or Ubuntu, since Beautiful Soup is available as a package, we will be using the system package manager for installation. In Linux machines such as Ubuntu and Debian, the default package manager is based on
apt-get and hence we will use
apt-get to do the task.
Just open up a terminal and type in the following command:
sudo apt-get install python-bs4
The preceding command will install Beautiful Soup Version 4 in our Linux operating system. Installing new packages in the system normally requires root user privileges, which is why we append
sudo in front of the
apt-get command. If we didn't append
sudo, we will basically end up with a permission denied error. If the packages are already updated, we will see the following success message in the command line itself:
Since we are using a recent version of Ubuntu or Debian,
python-bs4 will be listed in the
apt repository. But if the preceding command fails with
Package Not Found Error, it means that the package list is not up-to-date. This normally happens if we have just installed our operating system and the package list is not downloaded from the package repository. In this case, we need to first update the package list using the following command:
sudo apt-get update
The preceding command will update the necessary package list from the online package repositories. After this, we need to try the preceding command to install Beautiful Soup.
In the older versions of the Linux operating system, even after running the
apt-get update command, we might not be able to install Beautiful Soup because it might not be available in the repositories. In these scenarios, we can rely on the other methods of installation using either
easy_install are the tools used for managing and installing Python packages. Either of them can be used to install Beautiful Soup.
sudo pip install beautifulsoup4
The preceding command will install Beautiful Soup Version 4 in the system after downloading the necessary packages from http://pypi.python.org/.
sudo easy_install beautifulsoup4
All the previous methods to install Beautiful Soup in Linux will not work if you do not have an active network connection. So, in case everything fails, we can still install Beautiful Soup. The last option would be to use the
setup.py script that comes with every Python package downloaded from pypi.python.org. This method is also the recommended method to install Beautiful Soup in Windows and in Mac OS X machines. So, we will discuss this method in the Installing Beautiful Soup in Windows section.
In Windows, we will make use of the recent Python package for Beautiful Soup available from https://pypi.python.org/packages/source/b/beautifulsoup4/ and use the
setup.py script to install Beautiful Soup. But before doing this, it will be easier for us if we add the path of Python in the system path. The next section discusses setting up the path to Python on a Windows machine.
Often, the path to
python.exe will not be added to an environment variable by default in Windows. So, in order to check this from the Windows command-line prompt, you need to type the following command:
The preceding command will work without any errors if the path to Python is already added in the environment path variable or we are already within the Python installed directory. But, it would be good to check the path variable for the Python directory entry.
If it doesn't exist in the path variable, we have to find out the actual path, which is entirely dependent on where you installed Python. For Python 2.x, it will be by
C:\Python2x by default, and for Python 3.x, the path will be
C:\Python3x by default.
We have to add this to the
Path environment variable in the Windows machine. For this, right-click on My Computer | Properties | Environment Variables | System Variable.
Path variable and add the following section to the
;C:\PythonXY for example C:\Python27
This is shown in the following screenshot:
We can install Python packages using the
setup.py script that comes with every Python package downloaded from the Python package index website: https://pypi.python.org/. The following steps are used to install the Beautiful Soup using
Download the latest
Unzip it to a folder (for example,
Open up the command-line prompt and navigate to the folder where you have unzipped the folder as follows:
cd BeautifulSoup python setup.py install.
python setup.py installline will install Beautiful Soup in our system.
The installation processes that we have discussed till now normally copy the module contents to a chosen installation directory. This varies from operating system to operating system and the path is normally
/usr/local/lib/pythonX.Y/site-packages in Linux operating systems such as Debian and
C:\PythonXY\Lib\site-packages in Windows (where X and Y represent the corresponding versions, such as Python 2.7). When we use import statements in the Python interpreter or as a part of a Python script, normally what the Python interpreter does is look in the predefined Python
Path variable and look for the module in those directories. So, installing actually means copying the module contents into the predefined directory or copying this to some other location and adding the location into the Python path. The following method of using Beautiful Soup without going through the installation can be used in any operating system, such as Windows, Linux, or Mac OS X:
Download the latest version of Beautiful Soup package from https://pypi.python.org/packages/source/b/beautifulsoup4/.
Unzip the package.
bs4directory into the directory where we want to place all our Python Beautiful Soup scripts.
After we perform all the preceding steps, we are good to use Beautiful Soup. In order to import Beautiful Soup in this case, either we need to open the terminal in the directory where the
bs4 directory exists or add this directory to the Python
Path variable; otherwise, we will get the
module not found error. This extra step is required because the method is specific to a project where the
bs4 directory is included. But in the case of installing methods, as we have seen previously, Beautiful Soup will be available globally and can be used in any of the projects, and so the additional steps are not required.
Open up the Python interpreter in a terminal by using the following command:
Now, we can issue a simple import statement to see whether we have successfully installed Beautiful Soup or not by using the following command:
from bs4 import BeautifulSoup
If we did not install Beautiful Soup and instead copied the
bs4 directory in the workspace, we have to change to the directory where we have placed the
bs4 directory before using the preceding commands.
The following table is an overview of commands and their implications:
This command is used for installing Python using a package manger in Linux.
This command is used for installing Python using
This command is used for installing Python using
This command is used for installing Python using
This command is used for verifying installation.
In this chapter, we covered the various options to install Beautiful Soup in Linux machines. We also discussed a way of installing Beautiful Soup in Windows, Linux, and Mac OS X using the Python
setup.py script itself. We also discussed the method to use Beautiful Soup without even installing it. The verification of the Beautiful Soup installation was also covered.
In the next chapter, we are going to have a first look at Beautiful Soup by learning the different methods of converting HTML/XML content to different Beautiful Soup objects and thereby understanding the properties of Beautiful Soup.