In this section, we will cover installing Python and the environment that we will use for most of the book, the Jupyter Notebook. Furthermore, we will install the NumPy module, which we will use for the first set of examples.
Note
The Jupyter Notebook was, until very recently, called the IPython Notebook. You'll notice the term in web searches for the project. Jupyter is the new name, representing a broadening of the project beyond using just Python.
The Python programming language is a fantastic, versatile, and an easy to use language.
For this book, we will be using Python 3.5, which is available for your system from the Python Organization's website https://www.python.org/downloads/. However, I recommend that you use Anaconda to install Python, which you can download from the official website at https://www.continuum.io/downloads.
Note
There will be two major versions to choose from, Python 3.5 and Python 2.7. Remember to download and install Python 3.5, which is the version tested throughout this book. Follow the installation instructions on that website for your system. If you have a strong reason to learn version 2 of Python, then do so by downloading the Python 2.7 version. Keep in mind that some code may not work as in the book, and some workarounds may be needed.
In this book, I assume that you have some knowledge of programming and Python itself. You do not need to be an expert with Python to complete this book, although a good level of knowledge will help. I will not be explaining general code structures and syntax in this book, except where it is different from what is considered normal python coding practice.
If you do not have any experience with programming, I recommend that you pick up the Learning Python book from Packt Publishing, or the book Dive Into Python, available online at www.diveintopython3.net
The Python organization also maintains a list of two online tutorials for those new to Python:
- For non-programmers who want to learn to program through the Python language:
https://wiki.python.org/moin/BeginnersGuide/NonProgrammers
- For programmers who already know how to program, but need to learn Python specifically:
https://wiki.python.org/moin/BeginnersGuide/ProgrammersWindows users will need to set an environment variable to use Python from the command line, where other systems will usually be immediately executable. We set it in the following steps
- First, find where you install Python 3 onto your computer; the default location is
C:\Python35
. - Next, enter this command into the command line (cmd program): set the environment to
PYTHONPATH=%PYTHONPATH%;C:\Python35
.
Once you have Python running on your system, you should be able to open a command prompt and can run the following code to be sure it has installed correctly.
$ python
Python 3.5.1 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on Linux
Type "help", "copyright", "credits" or "license" for more
information.
>>> print("Hello, world!")
Hello, world!
>>> exit()
Note that we will be using the dollar sign ($) to denote that a command that you type into the terminal (also called a shell or cmd
on Windows). You do not need to type this character (or retype anything that already appears on your screen). Just type in the rest of the line and press Enter.
After you have the above "Hello, world!"
example running, exit the program and move on to installing a more advanced environment to run Python code, the Jupyter Notebook.
Note
Python 3.5 will include a program called pip, which is a package manager that helps to install new libraries on your system. You can verify that pip
is working on your system by running the $ pip freeze
command, which tells you which packages you have installed on your system. Anaconda also installs their package manager, conda
, that you can use. If unsure, use conda
first, use pip
only if that fails.
Jupyter is a platform for Python development that contains some tools and environments for running Python and has more features than the standard interpreter. It contains the powerful Jupyter Notebook, which allows you to write programs in a web browser. It also formats your code, shows output, and allows you to annotate your scripts. It is a great tool for exploring datasets and we will be using it as our main environment for the code in this book.
To install the Jupyter Notebook on your computer, you can type the following into a command line prompt (not into Python):
$ conda install jupyter notebook
You will not need administrator privileges to install this, as Anaconda keeps packages in the user's directory.
With the Jupyter Notebook installed, you can launch it with the following:
$ jupyter notebook
Running this command will do two things. First, it will create a Jupyter Notebook instance - the backend - that will run in the command prompt you just used. Second, it will launch your web browser and connect to this instance, allowing you to create a new notebook. It will look something like the following screenshot (where you need to replace /home/bob
with your current working directory):
To stop the Jupyter Notebook from running, open the command prompt that has the instance running (the one you used earlier to run the jupyter notebook
command). Then, press Ctrl + C and you will be prompted Shutdown this notebook server (y/[n])?
. Type y and press Enter and the Jupyter Notebook will shut down.
The scikit-learn
package is a machine learning library, written in Python (but also containing code in other languages). It contains numerous algorithms, datasets, utilities, and frameworks for performing machine learning. Scikit-learnis built upon the scientific python stack, including libraries such as the NumPy
and SciPy
for speed. Scikit-learn is fast and scalable in many instances and useful for all skill ranges from beginners to advanced research users. We will cover more details of scikit-learn in Chapter 2, Classifying with scikit-learn Estimators.
To install scikit-learn
, you can use the conda
utility that comes with Python 3, which will also install the NumPy
and SciPy
libraries if you do not already have them. Open a terminal with administrator/root privileges and enter the following command:
$ conda install scikit-learn
Users of major Linux distributions such as Ubuntu or Red Hat may wish to install the official package from their package manager.
Note
Not all distributions have the latest versions of scikit-learn, so check the version before installing it. The minimum version needed for this book is 0.14. My recommendation for this book is to use Anaconda to manage this for you, rather than installing using your system's package manager.
Those wishing to install the latest version by compiling the source, or view more detailed installation instructions, can go to http://scikit-learn.org/stable/install.html and refer the official documentation on installing scikit-learn.