Welcome! Let's get started. Python has become one of the de facto standard language and platform for data analysis and data science. The mind map that you will see shortly depicts some of the numerous libraries available in the Python ecosystem that are used by data analysts and data scientists. NumPy, SciPy, Pandas, and Matplotlib libraries lay the foundation of Python data analysis and are now part of SciPy Stack 1.0 (http://www.scipy.org/stackspec.html). We will learn how to install SciPy Stack 1.0 and Jupyter Notebook, and write some simple data analysis code as a warm-up exercise.
The following are the libraries available in the Python ecosystem that are used by data analysts and data scientists:
NumPy: This is a general-purpose library that provides numerical arrays, and functions to manipulate the arrays efficiently.
SciPy: This is a scientific computing library that provides science and engineering related functions. SciPy supplements and slightly overlaps NumPy. NumPy and SciPy historically shared their code base but were later separated.
Pandas: This is a data-manipulation library that provides data structures and operations for manipulating tables and time series data.
Matplotlib: This is a 2D plotting library that provides support for producing plots, graphs, and figures. Matplotlib is used by SciPy and supports NumPy.
IPython: This provides a powerful interactive shell for Python, kernel for Jupyter, and support for interactive data visualization. We will cover the IPython shell later in this chapter.
Jupyter Notebook: This provides a web-based interactive shell for creating and sharing documents with live code and visualizations. Jupyter Notebook supports multiple versions of Python through the kernel provided by IPython. We will cover the Jupyter Notebook later in this chapter.
Installation instructions for the other required software will be given throughout the book at the appropriate time. At the end of this chapter, you will find pointers on how to find additional information online if you get stuck or are uncertain about the best way of solving problems:

In this chapter, we will cover the following topics:
Installing Python 3
Using IPython as a shell
Reading manual pages
Jupyter Notebook
NumPy arrays
A simple application
Where to find help and references
Listing modules inside the Python libraries
Visualizing data using matplotlib
The software used in this book is based on Python 3, so you need to have Python 3 installed. On some operating systems, Python 3 is already installed. There are many implementations of Python, including commercial implementations and distributions. In this book, we will focus on the standard Python implementation, which is guaranteed to be compatible with NumPy.
Note
You can download Python 3.5.x from https://www.python.org/downloads/. On this web page, you can find installers for Windows and Mac OS X, as well as source archives for Linux, Unix, and Mac OS X. You can find instructions for installing and using Python for various operating systems at https://docs.python.org/3/using/index.html.
The software we will install in this chapter has binary installers for Windows, various Linux distributions, and Mac OS X. There are also source distributions, if you prefer. You need to have Python 3.5.x or above installed on your system. The sunset date for Python 2.7 was moved from 2015 to 2020, thus Python 2.7 will be supported and maintained until 2020. For these reasons, we have updated this book for Python 3.
We will learn how to install and set up NumPy, SciPy, Pandas, Matplotlib, IPython, and Jupyter Notebook on Windows, Linux, and Mac OS X. Let's look at the process in detail. We shall use pip3
to install the libraries. From version 3.4 onwards, pip3
has been included by default with the Python installation.
To install the foundational libraries, run the following command line instruction:
It may be necessary to prepend sudo
to this command if your current user doesn't have sufficient rights on your system.
At the time of writing this book, we had the following software installed as a prerequisite on our Windows 10 virtual machine:
Python 3.6 from https://www.python.org/ftp/python/3.6.0/python-3.6.0-amd64.exe
Microsoft Visual C++ Build Tools 2015 from http://landinghub.visualstudio.com/visual-cpp-build-tools
Download and install the appropriate prebuilt NumPy and Scipy binaries for your Windows platform from http://www.lfd.uci.edu/~gohlke/pythonlibs/:
We downloaded numpy-1.12.0+mkl-cp36-cp36m-win_amd64.whl and scipy-0.18.1-cp36-cp36m-win_amd64.whl
After downloading, we executed the
pip3 install Downloads\numpy-1.12.0+mkl-cp36-cp36m-win_amd64.whl
andpip3 install Downloads\scipy-0.18.1-cp36-cp36m-win_amd64.whl
commands
After these prerequisites are installed, to install the rest of the foundational libraries, run the following command line instruction:
Data analysts, data scientists, and engineers are used to experimenting. IPython was created by scientists with experimentation in mind. The interactive environment that IPython provides is comparable to an interactive computing environment provided by Matlab, Mathematica, and Maple.
The following is a list of features of the IPython shell:
Tab completion, which helps you find a command
History mechanism
Inline editing
Ability to call external Python scripts with
%run
Access to system commands
Access to the Python debugger and profiler
The following list describes how to use the IPython shell:
Starting a session: To start a session with IPython,enter the following instruction on the command line:
Saving a session: We might want to be able to go back to our experiments. In IPython, it is easy to save a session for later use with the following command:
Logging can be switched off as follows:
Executing a system shell command: Execute a system shell command in the default IPython profile by prefixing the command with the
!
symbol. For instance, the following input will get the current date:In fact, any line prefixed with
!
is sent to the system shell. We can also store the command output, as shown here:Displaying history: We can show the history of our commands with the
%hist
command. For example:This is a common feature in command line interface (CLI) environments. We can also search through the history with the
-g
switch as follows:
We saw a number of so-called magic functions in action. These functions start with the %
character. If the magic function is used on a line by itself, the %
prefix is optional.