Python is an open source general-purpose language created by Guido van Rossum in the late 1980s. It is widely-used by system administrators and developers for many purposes: for example, automating routine tasks or creating a web server. Python is a flexible and powerful language, yet it is sufficiently simple to be taught to school children with great success.
In the past few years, Python has also emerged as one of the leading open platforms for data science and high-performance numerical computing. This might seem surprising as Python was not originally designed for scientific computing. Python's interpreted nature makes it much slower than lower-level languages like C or Fortran, which are more amenable to number crunching and the efficient implementation of complex mathematical algorithms.
However, the performance of these low-level languages comes at a cost: they are hard to use and they require advanced knowledge of how computers work. In the late 1990s, several scientists began investigating the possibility of using Python for numerical computing by interoperating it with mainstream C/Fortran scientific libraries. This would bring together the ease-of-use of Python with the performance of C/Fortran: the dream of any scientist!
Consequently, the past 15 years have seen the development of widely-used libraries such as NumPy (providing a practical array data structure), SciPy (scientific computing), matplotlib (graphical plotting), pandas (data analysis and statistics), scikit-learn (machine learning), SymPy (symbolic computing), and Jupyter/IPython (efficient interfaces for interactive computing). Python, along with this set of libraries, is sometimes referred to as the SciPy stack or PyData platform.
Tip
Competing platforms
Python has several competitors. For example, MATLAB (by Mathworks) is a commercial software focusing on numerical computing that is widely-used in scientific research and engineering. SPSS (by IBM) is a commercial software for statistical analysis. Python, however, is free and open source, and that's one of its greatest strengths. Alternative open source platforms include R (specialized in statistics) and Julia (a young language for high-performance numerical computing).
More recently, this platform has gained popularity in other non-academic communities such as finance, engineering, statistics, data science, and others.
This book provides a solid introduction to the whole platform by focusing on one of its main components: Jupyter/IPython.
IPython was created in 2001 by Fernando Perez (the I in IPython stands for "interactive"). It was originally meant to be a convenient command-line interface to the scientific Python platform. In scientific computing, trial and error is the rule rather than the exception, and this requires an efficient interface that allows for interactive exploration of algorithms, data, and graphs.
In 2011, IPython introduced the interactive Notebook. Inspired by commercial software such as Maple (by Maplesoft) or Mathematica (by Wolfram Research), the Notebook runs in a browser and provides a unified web interface where code, text, mathematical equations, plots, graphics, and interactive graphical controls can be combined into a single document. This is an ideal interface for scientific computing. Here is a screenshot of a notebook:

Example of a notebook
It quickly became clear that this interface could be used with languages other than Python such as R, Julia, Lua, Ruby, and many others. Further, the Notebook is not restricted to scientific computing: it can be used for academic courses, software documentation, or book writing thanks to conversion tools targeting Markdown, HTML, PDF, ODT, and many other formats. Therefore, the IPython developers decided in 2014 to acknowledge the general-purpose nature of the Notebook by giving a new name to the project: Jupyter.
Jupyter features a language-independent Notebook platform that can work with a variety of kernels. Implemented in any language, a kernel is the backend of the Notebook interface. It manages the interactive session, the variables, the data, and so on. By contrast, the Notebook interface is the frontend of the system. It manages the user interface, the text editor, the plots, and so on. IPython is henceforth the name of the Python kernel for the Jupyter Notebook. Other kernels include IR, IJulia, ILua, IRuby, and many others (50 at the time of this writing).
In August 2015, the IPython/Jupyter developers achieved the "Big Split" by splitting the previous monolithic IPython codebase into a set of smaller projects, including the language-independent Jupyter Notebook (see https://blog.jupyter.org/2015/08/12/first-release-of-jupyter/). For example, the parallel computing features of IPython are now implemented in a standalone Python package named ipyparallel
, the IPython widgets are implemented in ipywidgets
, and so on. This separation makes the code of the project more modular and facilitates third-party contributions. IPython itself is now a much smaller project than before since it only features the interactive Python terminal and the Python kernel for the Jupyter Notebook.
Note
You will find the list of changes in IPython 4.0 at http://ipython.readthedocs.org/en/latest/whatsnew/version4.html. Many internal IPython imports have been deprecated due to the code reorganization. Warnings are raised if you attempt to perform a deprecated import. Also, the profiles have been removed and replaced with a unique default profile. However, you can simulate this functionality with environment variables. You will find more information at http://jupyter.readthedocs.org.
This book covers the Jupyter Notebook 1.0 and focuses on its Python kernel, IPython 4.0. In this chapter, we will introduce the platform, the Python language, the Jupyter Notebook interface, and IPython. In the remaining chapters, we will cover data analysis and scientific computing in Jupyter/IPython with the help of mainstream scientific libraries such as NumPy, pandas, and matplotlib.
Note
This book gives you a solid introduction to Jupyter and the SciPy platform. The IPython Interactive Computing and Visualization Cookbook (http://ipython-books.github.io/cookbook/) is the sequel of this introductory-level book. In 15 chapters and more than 500 pages, it contains a hundred recipes covering a wide range of interactive numerical computing techniques and data science topics. The IPython Cookbook is an excellent addition to the present IPython minibook if you're interested in delving into the platform in much greater detail.
Here are a few references about IPython and the Notebook:
The main Jupyter page at: http://jupyter.org/
The main Jupyter documentation at: https://jupyter.readthedocs.org/en/latest/
The main IPython page at: http://ipython.org/
Jupyter on GitHub at: https://github.com/jupyter
Try Jupyter online at: https://try.jupyter.org/
The IPython Notebook in research, a Nature note at http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261
Although Python is an open-source, cross-platform language, installing it with the usual scientific packages used to be overly complicated. Fortunately, there is now an all-in-one scientific Python distribution, Anaconda (by Continuum Analytics), that is free, cross-platform, and easy to install. Anaconda comes with Jupyter and all of the scientific packages we will use in this book. There are other distributions and installation options (like Canopy, WinPython, Python(x, y), and others), but for the purpose of this book we will use Anaconda throughout.
Tip
Running Jupyter in the cloud
You can also use Jupyter directly from your web browser, without installing anything on your local computer: go to http://try.jupyter.org. Note that the notebooks created there are not saved. Let's also mention a similar service, Wakari (https://wakari.io), by Continuum Analytics.
Anaconda comes with a package manager named conda, which lets you manage your Python distribution and install new packages.
Tip
Miniconda
Miniconda (http://conda.pydata.org/miniconda.html) is a light version of Anaconda that gives you the ability to only install the packages you need.
The first step is to download Anaconda from Continuum Analytics' website (http://continuum.io/downloads). This is actually not the easiest part since several versions are available. Three properties define a particular version:
The operating system (OS): Linux, Mac OS X, or Windows. This will depend on the computer you want to install Python on.
32-bit or 64-bit: You want the 64-bit version, unless you're on an old or low-end computer. The 64-bit version will allow you to manipulate large datasets.
The version of Python: 2.7, or 3.4 (or later). In this book, we will use Python 3.4. You can also use Python 3.5 (released in September 2015) which introduces many features, including a new
@
operator for matrix multiplication. However, it is easy to temporarily switch to a Python 2.7 environment with Anaconda if necessary (see the next section).Note
Python 3 brought a few backward-incompatible changes over Python 2 (also known as Legacy Python). This is why many people are still using Python 2.7 at this time, even though Python 3 was released in 2008. We will use Python 3 in this book, and we recommend that newcomers learn Python 3. If you need to use legacy Python code that hasn't yet been updated to Python 3, you can use conda to temporarily switch to a Python 2 interpreter.
Once you have found the right link for your OS and Python 3 64-bit, you can download the package. You should then find it in your downloads
directory (depending on your OS and your browser's settings).
The Anaconda installer comes in different flavors depending on your OS, as follows:
Linux: The Linux installer is a bash
.sh
script. Run it with a command likebash Anaconda3-2.3.0-Linux-x86_64.sh
(if necessary, replace the filename by the one you downloaded).Mac: The Mac graphical installer is a
.pkg
file that you can run with a double-click.Windows: The Windows graphical installer is an
.exe
file that you can run with a double-click.
Then, follow the instructions to install Anaconda on your computer. Here are a few remarks:
You don't need administrator rights to install Anaconda. In most cases, you can choose to install it in your personal user account.
Choose to put Anaconda in your system path, so that Anaconda's Python is the system default.
Note
Anaconda comes with a graphical launcher that you can use to start IPython, manage environments, and so on. You will find more details at http://docs.continuum.io/anaconda-launcher/
Before you get started with Anaconda, there are a few things you need to know:
Opening a terminal
Finding your home directory
Manipulating your system path
You can skip this section if you already know how to do these things.
A terminal is a command-line application that lets you interact with your computer by typing commands with the keyboard, instead of clicking on windows with the mouse. While most computer users only know Graphical User Interfaces, developers and scientists generally need to know how to use the command-line interface for advanced usage. To use the command-line interface, follow the instructions that are specific to your OS:
On Windows, you can use Powershell. Press the Windows + R keys, type
powershell
in the Run box, and press Enter. You will find more information about Powershell at https://blog.udemy.com/powershell-tutorial/. Alternatively, you can use the older Windows terminal by typingcmd
in the Run box.On OS X, you can open the Terminal application, for example by pressing Cmd + Space, typing
terminal
, and pressing Enter.On Linux, you can open the Terminal from your application manager.
In a terminal, use the cd /path/to/directory
command to move to a given directory. For example, cd ~
moves to your home directory, which is introduced in the next section.
Your home directory is specific to your user account on your computer. It generally contains your applications' settings. It is often referred to as ~
.Depending on the OS, the location of the home directory is as follows:
On Windows, its location is
C:\Users\YourName\
whereYourName
is the name of your account.On OS X, its location is
/Users/YourName/
whereYourName
is the name of your account.On Linux, its location is generally
/home/yourname/
whereyourname
is the name of your account.
For example, the directory ~/anaconda3
refers to C:\Users\YourName\anaconda3\
on Windows and /home/yourname/anaconda3/
on Linux.
The system path is a global variable (also called an environment variable) defined by your operating system with the list of directories where executable programs are located. If you type a command like python
in your terminal, you generally need to have a python
(or python.exe
on Windows) executable in one of the directories listed in the system path. If that's not the case, an error may be raised.
You can manually add directories to your system path as follows:
On Windows, press the Windows + R keys, type
rundll32.exe sysdm.cpl
,EditEnvironmentVariables
, and press Enter. You can then edit the PATH variable and append;C:\path\to\directory
if you want to add that directory. You will find more detailed instructions at http://www.computerhope.com/issues/ch000549.htm.On OS X, edit or create the file
~/.bash_profile
and addexport PATH="$PATH:/path/to/directory"
at the end of the file.On Linux, edit or create the file
~/.bashrc
and addexport PATH="$PATH:/path/to/directory"
at the end of the file.
To test Anaconda once it has been installed, open a terminal and type python
. This opens a Python console, not to be confused with the OS terminal. The Python console is identified with a >>>
prompt string, whereas the OS terminal is identified with a $
(Linux/OS X) or >
(Windows) prompt string. These strings are displayed in the terminal, often preceded by your computer's name, your login, and the current directory (for example, yourname@computer:~$
on Linux or PS C:\Users\YourName>
on Windows). You can type commands after the prompt string. After typing python
, you should see something like the following:
$ python Python 3.4.3 |Anaconda 2.3.0 (64-bit)| (default, Jun 4 2015, 15:29:08) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>>
What matters is that Anaconda
or Continuum Analytics
is mentioned here. Otherwise, typing python
might have launched your system's default Python, which is not the one you want to use in this book.
If you have this problem, you may need to add the path to the Anaconda executables to your system path. For example, this path will be ~/anaconda3/bin
if you chose to install Anaconda in ~/anaconda3
. The bin
directory contains Anaconda executables including python.
If you have any problem installing and testing Anaconda, you can ask for help on the mailing list (see the link in the References section under the Installing Python with Anaconda section of this chapter).
Next, exit the Python prompt by typing exit()
and pressing Enter.
Anaconda lets you create different isolated Python environments. For example, you can have a Python 2 distribution for the rare cases where you need to temporarily switch to Python 2.
To create a new environment for Python 2, type the following command in an OS terminal:
$ conda create -n py2 anaconda python=2.7
This will create a new isolated environment named py2
based on the original Anaconda distribution, but with Python 2.7.
You could also use the command conda env
: type conda env -h
to see the details.
You can now activate your py2
environment by typing the following command in a terminal:
Windows:
activate py2
(note that you might have problems with Powershell, see https://github.com/conda/conda/issues/626, or use the oldcmd
terminal)Linux and Mac OS X:
source activate py2
Now, you should see a (py2) prefix in front of your terminal prompt. Typing python
in your terminal with the py2
environment activated will open a Python 2 interpreter.
Type deactivate
on Windows or source deactivate
on Linux/OS X to deactivate the environment in the terminal.
Here is a list of common commands:
conda help
: Displays the list of conda commands.conda list
: Lists all packages installed in the current environment.conda env list
: Displays the list of environments installed. The currently active one is marked by a star*
.conda install somepackage
: Installs a Python package (replacesomepackage
by the name of the package you want to install).conda install somepackage=0.7
: Installs a specific version of a package.conda update somepackage
: Updates a Python package to the latest available version.conda update anaconda
: Updates all packages.conda update conda
: Updates conda itself.conda update --all
: Updates all packages.conda remove somepackage
: Uninstalls a Python package.conda remove -n myenv --all
: Removes the environment namedmyenv
(replace this by the name of the environment you want to uninstall).conda clean -t
: Removes the old tarballs that are left over after installation and updates.
Some commands ask for confirmation (you need to press y
to confirm). You can also use the -y
option to avoid the confirmation prompt.
If conda install somepackage
fails, you can try pip install somepackage
instead. This will use the Python Package Index (PyPI) instead of Anaconda. Many scientific Anaconda packages are easier to install than the corresponding PyPI packages because they are precompiled for your platform. However, many packages are available on PyPI but not on Anaconda.
Here are some references:
pip documentation at https://pip.pypa.io/en/stable/
PyPI repository at https://pypi.python.org/pypi
Here are a few references about Anaconda:
Continuum Analytics' website: http://continuum.io/
Anaconda main page: https://store.continuum.io/cshop/anaconda/
Anaconda downloads: http://continuum.io/downloads
List of Anaconda packages: http://docs.continuum.io/anaconda/pkg-docs
Conda main page: http://conda.io/
Anaconda mailing list: https://groups.google.com/a/continuum.io/forum/#!forum/anaconda
Continuum Analytics Twitter account at https://twitter.com/ContinuumIO
Conda FAQ: http://conda.pydata.org/docs/faq.html
Curated list of Python packages at http://awesome-python.com/
All of this book's code is available on GitHub as notebooks. We recommend that you download the notebooks and experiment with them as you're working through the book.
Note
GitHub is a popular online service that hosts open source projects. It is based on the Git Distributed Version Control System (DVCS). Git keeps track of file changes and enables collaborative work on a given project. Learning a version control system like Git is highly recommended for all programmers. Not using a version control system when working with code or even text documents is now considered as bad practice. You will find several references at https://help.github.com/articles/good-resources-for-learning-git-and-github/. The IPython Cookbook also contains several recipes about Git and best interactive programming practices.
Here is how to download the book's notebooks:
Install git: http://git-scm.com/downloads.
Check your git installation: Open a new OS terminal and type
git version
. You should see the version of git and not an error message.Type the following command (this is a single line):
$ git clone https://github.com/ipython-books/minibook-2nd-code.git "$HOME/minibook"
This will download the very latest version of the code into a minibook
subdirectory in your home directory. You can also choose another directory.
From this directory, you can update to the latest version at any time by typing git pull
.
Originally, IPython provided an enhanced command-line console to run Python code interactively. The Jupyter Notebook is a more recent and more sophisticated alternative to the console. Today, both tools are available, and we recommend that you learn to use both.
To run the IPython console, type ipython
in an OS terminal. There, you can write Python commands and see the results instantly. Here is a screenshot:

IPython console
The IPython console is most convenient when you have a command-line-based workflow and you want to execute some quick Python commands.
You can exit the IPython console by typing exit
.
Note
Let's mention the Qt console, which is similar to the IPython console but offers additional features such as multiline editing, enhanced tab completion, image support, and so on. The Qt console can also be integrated within a graphical application written with Python and Qt. See http://jupyter.org/qtconsole/stable/ for more information.
To run the Jupyter Notebook, open an OS terminal, go to ~/minibook/
(or into the directory where you've downloaded the book's notebooks), and type jupyter notebook
. This will start the Jupyter server and open a new window in your browser (if that's not the case, go to the following URL: http://localhost:8888
). Here is a screenshot of Jupyter's entry point, the Notebook dashboard:

The Notebook dashboard
Note
At the time of writing, the following browsers are officially supported: Chrome 13 and greater; Safari 5 and greater; and Firefox 6 or greater. Other browsers may work also. Your mileage may vary.
The Notebook is most convenient when you start a complex analysis project that will involve a substantial amount of interactive experimentation with your code. Other common use-cases include keeping track of your interactive session (like a lab notebook), or writing technical documents that involve code, equations, and figures.
In the rest of this section, we will focus on the Notebook interface.
The dashboard contains several tabs:
Files: shows all files and notebooks in the current directory
Running: shows all kernels currently running on your computer
Clusters: lets you launch kernels for parallel computing (covered in Chapter 5, High-Performance and Parallel Computing)
A notebook is an interactive document containing code, text, and other elements. A notebook is saved in a file with the .ipynb
extension. This file is a plain text file storing a JSON data structure.
A kernel is a process running an interactive session. When using IPython, this kernel is a Python process. There are kernels in many languages other than Python.
Note
We follow the convention to use the term notebook for a file, and Notebook for the application and the web interface.
In Jupyter, notebooks and kernels are strongly separated. A notebook is a file, whereas a kernel is a process. The kernel receives snippets of code from the Notebook interface, executes them, and sends the outputs and possible errors back to the Notebook interface. Thus, in general, the kernel has no notion of a Notebook. A notebook is persistent (it's a file), whereas a kernel may be closed at the end of an interactive session and it is therefore not persistent. When a notebook is re-opened, it needs to be re-executed.
In general, no more than one Notebook interface can be connected to a given kernel. However, several IPython consoles can be connected to a given kernel.
To create a new notebook, click on the New button, and select Notebook (Python 3). A new browser tab opens and shows the Notebook interface as follows:

A new notebook
Here are the main components of the interface, from top to bottom:
The notebook name, which you can change by clicking on it. This is also the name of the
.ipynb
file.The Menu bar gives you access to several actions pertaining to either the notebook or the kernel.
To the right of the menu bar is the Kernel name. You can change the kernel language of your notebook from the Kernel menu. We will see in Chapter 6, Customizing IPython how to manage different kernel languages.
The Toolbar contains icons for common actions. In particular, the dropdown menu showing Code lets you change the type of a cell.
Following is the main component of the UI: the actual Notebook. It consists of a linear list of cells. We will detail the structure of a cell in the following sections.
There are two main types of cells: Markdown cells and code cells, and they are described as follows:
A Markdown cell contains rich text. In addition to classic formatting options like bold or italics, we can add links, images, HTML elements, LaTeX mathematical equations, and more. We will cover Markdown in more detail in the Ten Jupyter/IPython essentials section of this chapter.
A code cell contains code to be executed by the kernel. The programming language corresponds to the kernel's language. We will only use Python in this book, but you can use many other languages.
You can change the type of a cell by first clicking on a cell to select it, and then choosing the cell's type in the toolbar's dropdown menu showing Markdown or Code.
Here is a screenshot of a Markdown cell:

A Markdown cell
The top panel shows the cell in edit mode, while the bottom one shows it in render mode. The edit mode lets you edit the text, while the render mode lets you display the rendered cell. We will explain the differences between these modes in greater detail in the following section.
Here is a screenshot of a complex code cell:

Structure of a code cell
This code cell contains several parts, as follows:
The Prompt number shows the cell's number. This number increases every time you run the cell. Since you can run cells of a notebook out of order, nothing guarantees that code numbers are linearly increasing in a given notebook.
The Input area contains a multiline text editor that lets you write one or several lines of code with syntax highlighting.
The Widget area may contain graphical controls; here, it displays a slider.
The Output area can contain multiple outputs, here:
Standard output (text in black)
Error output (text with a red background)
Rich output (an HTML table and an image here)
The Notebook implements a modal interface similar to some text editors such as vim. Mastering this interface may represent a small learning curve for some users.
Use the edit mode to write code (the selected cell has a green border, and a pen icon appears at the top right of the interface). Click inside a cell to enable the edit mode for this cell (you need to double-click with Markdown cells).
Use the command mode to operate on cells (the selected cell has a gray border, and there is no pen icon). Click outside the text area of a cell to enable the command mode (you can also press the Esc key).
Keyboard shortcuts are available in the Notebook interface. Type h
to show them. We review here the most common ones (for Windows and Linux; shortcuts for OS X may be slightly different).
Here are a few keyboard shortcuts that are always available when a cell is selected:
Ctrl + Enter: run the cell
Shift + Enter: run the cell and select the cell below
Alt + Enter: run the cell and insert a new cell below
Ctrl + S: save the notebook
In the edit mode, you can type code as usual, and you have access to the following keyboard shortcuts:
Esc: switch to command mode
Ctrl + Shift + -: split the cell
In the command mode, keystrokes are bound to cell operations. Don't write code in command mode or unexpected things will happen! For example, typing dd
in command mode will delete the selected cell! Here are some keyboard shortcuts available in command mode:
Enter: switch to edit mode
↑ or k: select the previous cell
↓ or j: select the next cell
y / m: change the cell type to code cell/Markdown cell
a / b: insert a new cell above/below the current cell
x / c / v: cut/copy/paste the current cell
dd: delete the current cell
z: undo the last delete operation
Shift + =: merge the cell below
h: display the help menu with the list of keyboard shortcuts
Spending some time learning these shortcuts is highly recommended.
Main documentation of Jupyter at http://jupyter.readthedocs.org/en/latest/
Jupyter Notebook interface explained at http://jupyter-notebook.readthedocs.org/en/latest/notebook.html
If you don't know Python, read this section to learn the fundamentals. Python is a very accessible language and, if you have ever programmed, it will only take you a few minutes to learn the basics.
Open a new notebook and type the following in the first cell:
In [1]: print("Hello world!") Out[1]: Hello world!
Here is a screenshot:

"Hello world" in the Notebook
Tip
Prompt string
Note that the convention chosen in this book is to show Python code (also called the input) prefixed with In [x]:
(which shouldn't be typed). This is the standard IPython prompt. Here, you should just type print("Hello world!")
and then press Shift + Enter.
Congratulations! You are now a Python programmer.
Tip
Downloading the example code
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. You will also find the book's code on this GitHub repository: https://github.com/ipython-books/minibook-2nd-code.
Let's use Python as a calculator.
In [2]: 2 * 2 Out[2]: 4
Here, 2 * 2
is an expression statement. This operation is performed, the result is returned, and IPython displays it in the notebook cell's output.
Tip
Division
In Python 3, 3 / 2 returns 1.5 (floating-point division), whereas it returns 1 in Python 2 (integer division). This can be source of errors when porting Python 2 code to Python 3. It is recommended to always use the explicit 3.0 / 2.0 for floating-point division (by using floating-point numbers) and 3 // 2 for integer division. Both syntaxes work in Python 2 and Python 3. See http://python3porting.com/differences.html#integer-division for more details.
Other built-in mathematical operators include +
, -
, **
for the exponentiation, and others. You will find more details at https://docs.python.org/3/reference/expressions.html#the-power-operator.
Variables form a fundamental concept of any programming language. A variable has a name and a value. Here is how to create a new variable in Python:
In [3]: a = 2
And here is how to use an existing variable:
In [4]: a * 3 Out[4]: 6
Several variables can be defined at once (this is called unpacking):
In [5]: a, b = 2, 6
There are different types of variables. Here, we have used a number (more precisely, an integer). Other important types include floating-point numbers to represent real numbers, strings to represent text, and booleans to represent True/False values. Here are a few examples:
In [6]: somefloat = 3.1415 sometext = 'pi is about' # You can also use double quotes. print(sometext, somefloat) # Display several variables. Out[6]: pi is about 3.1415
Note how we used the #
character to write comments. Whereas Python discards the comments completely, adding comments in the code is important when the code is to be read by other humans (including yourself in the future).
String escaping refers to the ability to insert special characters in a string. For example, how can you insert '
and "
, given that these characters are used to delimit a string in Python code? The backslash \
is the go-to escape character in Python (and in many other languages too). Here are a few examples:
In [7]: print("Hello \"world\"") print("A list:\n* item 1\n* item 2") print("C:\\path\\on\\windows") print(r"C:\path\on\windows") Out[7]: Hello "world" A list: * item 1 * item 2 C:\path\on\windows C:\path\on\windows
The special character \n
is the new line (or line feed) character. To insert a backslash, you need to escape it, which explains why it needs to be doubled as \\
.
You can also disable escaping by using raw literals with a r
prefix before the string, like in the last example above. In this case, backslashes are considered as normal characters.
This is convenient when writing Windows paths, since Windows uses backslash separators instead of forward slashes like on Unix systems. A very common error on Windows is
forgetting to escape backslashes in
paths: writing "C:\path"
may lead to subtle errors.
You will find the list of special characters in Python at https://docs.python.org/3.4/reference/lexical_analysis.html#string-and-bytes-literals.
A list contains a sequence of items. You can concisely instruct Python to perform repeated actions on the elements of a list. Let's first create a list of numbers as follows:
In [8]: items = [1, 3, 0, 4, 1]
Note the syntax we used to create the list: square brackets []
, and commas , to separate the items.
The built-in function len()
returns the number of elements in a list:
In [9]: len(items) Out[9]: 5
Note
Python comes with a set of built-in functions, including print()
, len()
, max()
, functional routines like filter()
and map()
, and container-related routines like all()
, any()
, range()
, and sorted()
. You will find the full list of built-in functions at https://docs.python.org/3.4/library/functions.html.
Now, let's compute the sum of all elements in the list. Python provides a built-in function for this:
In [10]: sum(items) Out[10]: 9
We can also access individual elements in the list, using the following syntax:
In [11]: items[0] Out[11]: 1 In [12]: items[-1] Out[12]: 1
Note that indexing starts at 0
in Python: the first element of the list is indexed by 0
, the second by 1
, and so on. Also, -1
refers to the last element, -2
to the penultimate element, and so on.
The same syntax can be used to alter elements in the list:
In [13]: items[1] = 9 items Out[13]: [1, 9, 0, 4, 1]
We can access sublists with the following syntax:
In [14]: items[1:3] Out[14]: [9, 0]
Here, 1:3
represents a slice going from element 1
included (this is the second element of the list) to element 3
excluded. Thus, we get a sublist with the second and third element of the original list. The first-included/last-excluded asymmetry leads to an intuitive treatment of overlaps between consecutive slices. Also, note that a sublist refers to a dynamic view of the original list, not a copy; changing elements in the sublist automatically changes them in the original list.
Python provides several other types of containers:
Tuples are immutable and contain a fixed number of elements:
In [15]: my_tuple = (1, 2, 3) my_tuple[1] Out[15]: 2
Dictionaries contain key-value pairs. They are extremely useful and common:
In [16]: my_dict = {'a': 1, 'b': 2, 'c': 3} print('a:', my_dict['a']) Out[16]: a: 1 In [17]: print(my_dict.keys()) Out[17]: dict_keys(['c', 'a', 'b'])
There is no notion of order in a dictionary. However, the native
collections
module provides anOrderedDict
structure that keeps the insertion order (see https://docs.python.org/3.4/library/collections.html).Sets, like mathematical sets, contain distinct elements:
In [18]: my_set = set([1, 2, 3, 2, 1]) my_set Out[18]: {1, 2, 3}
Note
A Python object is mutable if its value can change after it has been created. Otherwise, it is immutable. For example, a string is immutable; to change it, a new string needs to be created. A list, a dictionary, or a set is mutable; elements can be added or removed. By contrast, a tuple is immutable, and it is not possible to change the elements it contains without recreating the tuple. See https://docs.python.org/3.4/reference/datamodel.html for more details.
We can run through all elements of a list using a for
loop:
In [19]: for item in items: print(item) Out[19]: 1 9 0 4 1
There are several things to note here:
The
for item in items
syntax means that a temporary variable nameditem
is created at every iteration. This variable contains the value of every item in the list, one at a time.Note the colon
:
at the end of thefor
statement. Forgetting it will lead to a syntax error!The statement
print(item)
will be executed for all items in the list.Note the four spaces before
print
: this is called the indentation. You will find more details about indentation in the next subsection.
Python supports a concise syntax to perform a given operation on all elements of a list, as follows:
In [20]: squares = [item * item for item in items] squares Out[20]: [1, 81, 0, 16, 1]
This is called a list comprehension. A new list is created here; it contains the squares of all numbers in the list. This concise syntax leads to highly readable and Pythonic code.
Indentation refers to the spaces that may appear at the beginning of some lines of code. This is a particular aspect of Python's syntax.
In most programming languages, indentation is optional and is generally used to make the code visually clearer. But in Python, indentation also has a syntactic meaning. Particular indentation rules need to be followed for Python code to be correct.
In general, there are two ways to indent some text: by inserting a tab character (also referred to as \t
), or by inserting a number of spaces (typically, four). It is recommended to use spaces instead of tab characters. Your text editor should be configured such that the Tab key on the keyboard inserts four spaces instead of a tab character.
In the Notebook, indentation is automatically configured properly; so you shouldn't worry about this issue. The question only arises if you use another text editor for your Python code.
Finally, what is the meaning of indentation? In Python, indentation delimits coherent blocks of code, for example, the contents of a loop, a conditional branch, a function, and other objects. Where other languages such as C or JavaScript use curly braces to delimit such blocks, Python uses indentation.
Sometimes, you need to perform different operations on your data depending on some condition. For example, let's display all even numbers in our list:
In [21]: for item in items: if item % 2 == 0: print(item) Out[21]: 0 4
Again, here are several things to note:
An
if
statement is followed by a boolean expression.If a and b are two integers, the modulo operand
a % b
returns the remainder from the division of a by b. Here,item % 2
is 0 for even numbers, and 1 for odd numbers.The equality is represented by a double equal sign
==
to avoid confusion with the assignment operator=
that we use when we create variables.Like with the
for
loop, theif
statement ends with a colon:
.The part of the code that is executed when the condition is satisfied follows the
if
statement. It is indented. Indentation is cumulative: since thisif
is inside afor
loop, there are eight spaces before theprint(item)
statement.
Python supports a concise syntax to select all elements in a list that satisfy certain properties. Here is how to create a sublist with only even numbers:
In [22]: even = [item for item in items if item % 2 == 0] even Out[22]: [0, 4]
This is also a form of list comprehension.
Code is typically organized into functions. A function encapsulates part of your code. Functions allow you to reuse bits of functionality without copy-pasting the code. Here is a function that tells whether an integer number is even or not:
In [23]: def is_even(number): """Return whether an integer is even or not.""" return number % 2 == 0
There are several things to note here:
A function is defined with the
def
keyword.After
def
comes the function name. A general convention in Python is to only use lowercase characters, and separate words with an underscore_
. A function name generally starts with a verb.The function name is followed by parentheses, with one or several variable names called the arguments. These are the inputs of the function. There is a single argument here, named
number
.No type is specified for the argument. This is because Python is dynamically typed; you could pass a variable of any type. This function would work fine with floating point numbers, for example (the modulo operation works with floating point numbers in addition to integers).
The body of the function is indented (and note the colon
:
at the end of thedef
statement).There is a docstring wrapped by triple quotes
"""
. This is a particular form of comment that explains what the function does. It is not mandatory, but it is strongly recommended to write docstrings for the functions exposed to the user.The
return
keyword in the body of the function specifies the output of the function. Here, the output is a Boolean, obtained from the expressionnumber % 2 == 0
. It is possible to return several values; just use a comma to separate them (in this case, a tuple of Booleans would be returned).
Once a function is defined, it can be called like this:
In [24]: is_even(3) Out[24]: False In [25]: is_even(4) Out[25]: True
Here, 3
and 4
are successively passed as arguments to the function.
A Python function can accept an arbitrary number of arguments, called positional arguments. It can also accept optional named arguments, called keyword arguments. Here is an example:
In [26]: def remainder(number, divisor=2): return number % divisor
The second argument of this function, divisor
, is optional. If it is not provided by the caller, it will default to the number 2
, as shown here:
In [27]: remainder(5) Out[27]: 1
There are two equivalent ways of specifying a keyword argument when calling a function. They are as follows:
In [28]: remainder(5, 3) Out[28]: 2 In [29]: remainder(5, divisor=3) Out[29]: 2
In the first case, 3
is understood as the second argument, divisor
. In the second case, the name of the argument is given explicitly by the caller. This second syntax is clearer and less error-prone than the first one.
Functions can also accept arbitrary sets of positional and keyword arguments, using the following syntax:
In [30]: def f(*args, **kwargs): print("Positional arguments:", args) print("Keyword arguments:", kwargs) In [31]: f(1, 2, c=3, d=4) Out[31]: Positional arguments: (1, 2) Keyword arguments: {'c': 3, 'd': 4}
Inside the function, args
is a tuple containing positional arguments, and kwargs
is a dictionary containing keyword arguments.
When passing a parameter to a Python function, a reference to the object is actually passed (passage by assignment):
If the passed object is mutable, it can be modified by the function
If the passed object is immutable, it cannot be modified by the function
Here is an example:
In [32]: my_list = [1, 2] def add(some_list, value): some_list.append(value) add(my_list, 3) my_list Out[32]: [1, 2, 3]
The add()
function modifies an object defined outside it (in this case, the object my_list
); we say this function has side-effects. A function with no side-effects is called a pure function: it doesn't modify anything in the outer context, and it deterministically returns the same result for any given set of inputs. Pure functions are to be preferred over functions with side-effects.
Knowing this can help you spot out subtle bugs. There are further related concepts that are useful to know, including function scopes, naming, binding, and more. Here are a couple of links:
Passage by reference at https://docs.python.org/3/faq/programming.html#how-do-i-write-a-function-with-output-parameters-call-by-reference
Naming, binding, and scope at https://docs.python.org/3.4/reference/executionmodel.html
Let's talk about errors in Python. As you learn, you will inevitably come across errors and exceptions. The Python interpreter will most of the time tell you what the problem is, and where it occurred. It is important to understand the vocabulary used by Python so that you can more quickly find and correct your errors.
Let's see the following example:
In [33]: def divide(a, b): return a / b In [34]: divide(1, 0) Out[34]: --------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) <ipython-input-2-b77ebb6ac6f6> in <module>() ----> 1 divide(1, 0) <ipython-input-1-5c74f9fd7706> in divide(a, b) 1 def divide(a, b): ----> 2 return a / b ZeroDivisionError: division by zero
Here, we defined a divide()
function, and called it to divide 1
by 0
. Dividing a number by 0 is an error in Python. Here, a ZeroDivisionError
exception was raised. An exception is a particular type of error that can be raised at any point in a program. It is propagated from the innards of the code up to the command that launched the code. It can be caught and processed at any point. You will find more details about exceptions at https://docs.python.org/3/tutorial/errors.html, and common exception types at https://docs.python.org/3/library/exceptions.html#bltin-exceptions.
The error message you see contains the stack trace, the exception type, and the exception message. The stack trace shows all function calls between the raised exception and the script calling point.
The top frame, indicated by the first arrow ---->
, shows the entry point of the code execution. Here, it is divide(1, 0)
, which was called directly in the Notebook. The error occurred while this function was called.
The next and last frame is indicated by the second arrow. It corresponds to line 2 in our function divide(a, b)
. It is the last frame in the stack trace: this means that the error occurred there.
We will see later in this chapter how to debug such errors interactively in IPython and in the Jupyter Notebook. Knowing how to navigate up and down in the stack trace is critical when debugging complex Python code.
Object-oriented programming (OOP) is a relatively advanced topic. Although we won't use it much in this book, it is useful to know the basics. Also, mastering OOP is often essential when you start to have a large code base.
In Python, everything is an object. A number, a string, or a function is an object. An object is an instance of a type (also known as class). An object has attributes and methods, as specified by its type. An attribute is a variable bound to an object, giving some information about it. A method is a function that applies to the object.
For example, the object 'hello'
is an instance of the built-in str
type (string). The type()
function returns the type of an object, as shown here:
In [35]: type('hello') Out[35]: str
There are native types, like str
or int
(integer), and custom types, also called classes, that can be created by the user.
In IPython, you can discover the attributes and methods of any object with the dot syntax and tab completion. For example, typing 'hello'.u
and pressing Tab automatically shows us the existence of the upper()
method:
In [36]: 'hello'.upper() Out[36]: 'HELLO'
Here, upper()
is a method available to all str
objects; it returns an uppercase copy of a string.
A useful string method is format()
. This simple and convenient templating system lets you generate strings dynamically, as shown in the following example:
In [37]: 'Hello {0:s}!'.format('Python') Out[37]: Hello Python!
The {0:s}
syntax means "replace this with the first argument of format()
, which should be a string". The variable type after the colon is especially useful for numbers, where you can specify how to display the number (for example, .3f
to display three decimals). The 0
makes it possible to replace a given value several times in a given string. You can also use a name instead of a position—for example 'Hello {name}!'.format(name='Python')
.
Some methods are prefixed with an underscore _
; they are private and are generally not meant to be used directly. IPython's tab completion won't show you these private attributes and methods unless you explicitly type _
before pressing Tab.
In practice, the most important thing to remember is that appending a dot .
to any Python object and pressing Tab in IPython will show you a lot of functionality pertaining to that object.
Python is a multi-paradigm language; it notably supports imperative, object-oriented, and functional programming models. Python functions are objects and can be handled like other objects. In particular, they can be passed as arguments to other functions (also called higher-order functions). This is the essence of functional programming.
Decorators provide a convenient syntax construct to define higher-order functions. Here is an example using the is_even()
function from the previous Functions section:
In [38]: def show_output(func): def wrapped(*args, **kwargs): output = func(*args, **kwargs) print("The result is:", output) return wrapped
The show_output()
function transforms an arbitrary function func()
to a new function, named wrapped()
, that displays the result of the function, as follows:
In [39]: f = show_output(is_even) f(3) Out[39]: The result is: False
Equivalently, this higher-order function can also be used with a decorator, as follows:
In [40]: @show_output def square(x): return x * x In [41]: square(3) Out[41]: The result is: 9
You can find more information about Python decorators at https://en.wikipedia.org/wiki/Python_syntax_and_semantics#Decorators and at http://www.thecodeship.com/patterns/guide-to-python-function-decorators/.
Let's finish this section with a few notes about Python 2 and Python 3 compatibility issues.
There are still some Python 2 code and libraries that are not compatible with Python 3. Therefore, it is sometimes useful to be aware of the differences between the two versions. One of the most obvious differences is that print
is a statement in Python 2, whereas it is a function in Python 3. Therefore, print "Hello"
(without parentheses) works in Python 2 but not in Python 3, while print("Hello")
works in both Python 2 and Python 3.
There are several non-mutually exclusive options to write portable code that works with both versions:
futures: A built-in module supporting backward-incompatible Python syntax
2to3: A built-in Python module to port Python 2 code to Python 3
six: An external lightweight library for writing compatible code
Here are a few references:
Official Python 2/3 wiki page at https://wiki.python.org/moin/Python2orPython3
The Porting to Python 3 book, by CreateSpace Independent Publishing Platform at http://www.python3porting.com/bookindex.html
futures at https://docs.python.org/3.4/library/__future__.html
The IPython Cookbook contains an in-depth recipe about choosing between Python 2 and 3, and how to support both.
You now know the fundamentals of Python, the bare minimum that you will need in this book. As you can imagine, there is much more to say about Python.
Following are a few further basic concepts that are often useful and that we cannot cover here, unfortunately. You are highly encouraged to have a look at them in the references given at the end of this section:
range
andenumerate
pass
,break
, and,continue
, to be used in loopsWorking with files
Creating and importing modules
The Python standard library provides a wide range of functionality (OS, network, file systems, compression, mathematics, and more)
Here are some slightly more advanced concepts that you might find useful if you want to strengthen your Python skills:
Regular expressions for advanced string processing
Lambda functions for defining small anonymous functions
Generators for controlling custom loops
Exceptions for handling errors
with
statements for safely handling contextsAdvanced object-oriented programming
Metaprogramming for modifying Python code dynamically
The
pickle
module for persisting Python objects on disk and exchanging them across a network
Finally, here are a few references:
Getting started with Python: https://www.python.org/about/gettingstarted/
A Python tutorial: https://docs.python.org/3/tutorial/index.html
The Python Standard Library: https://docs.python.org/3/library/index.html
Interactive tutorial: http://www.learnpython.org/
Codecademy Python course: http://www.codecademy.com/tracks/python
Language reference (expert level): https://docs.python.org/3/reference/index.html
Python Cookbook, by David Beazley and Brian K. Jones, O'Reilly Media (advanced level, highly recommended if you want to become a Python expert)
In this section, we will cover ten essential features of Jupyter and IPython that make them so useful for interactive computing.
Note
Unfortunately, this subsection will not work well on Windows. The goal here is to demonstrate accessing the operating system's shell from IPython. We could say that, by design, the Windows shell is much more limited than those provided by Linux and OS X. Windows favors user interactions from the graphical interface, whereas Linux and OS X inherit Unix's flexible command-line capabilities. If you want to share and distribute your notebooks, you shouldn't rely on the techniques exposed in this subsection. Rather, you should use the Python equivalents, which are more verbose but also more powerful. Using the shell from IPython is only useful during interactive sessions of users already familiar with the Unix shell.
Open a terminal and type the following commands to go to the minibook's chapter1
directory and launch the Notebook server:
$ cd ~/minibook/chapter1/ $ jupyter notebook
In the Notebook dashboard, open the 15-ten.ipynb
notebook. You can also create a new notebook if you prefer not to use the book's code.
Let's illustrate how to use IPython as an extended shell. We will download an example dataset, navigate through the filesystem, and open text files, all from the Notebook. The dataset contains social network data of hundreds of volunteer Facebook users. This BSD-licensed dataset is provided freely by Stanford's SNAP project (http://snap.stanford.edu/data/).
IPython provides several magic commands that let you interact with your filesystem. These commands are prefixed with a %
. For example here is how to display the current working directory:
In [1]: %pwd Out[1]: '/home/cyrille/minibook/chapter1'
Note
Like most other magic commands, this magic command works on all operating systems, including Windows. IPython implements several cross-platform Python equivalents of common Unix commands like pwd
. For other commands not implemented by IPython, we need to call shell commands directly with the !
prefix (as shown in the following examples). This doesn't work well on Windows since many of these commands are Unix-specific. In brief, %
-prefixed commands should work on all operating systems while !
-prefixed commands will generally only work on Linux and OS X, not Windows.
Let's download the dataset from the book's data repository (https://github.com/ipython-books/minibook-2nd-data). IPython doesn't yet provide a magic command for downloading data, but we can use another IPython trick: we can run any system or terminal command from IPython by prefixing it with an exclamation mark (!
). For example, here is how to use the wget
download utility only available on Unix systems:
In [2]: !wget https://raw.githubusercontent.com/ipython-books/minibook-2nd-data/master/facebook.zip
Note
If wget
is not installed, you can install it with your OS package manager. For example, on Ubuntu: sudo apt-get install wget
; on OS X: brew install wget
. On OS X, brew is available at http://brew.sh/. On Windows, you should download the file manually from the data repository, as explained later.
This wget
command downloads a file from a URL and saves it to a file in the local filesystem. Let's display the list of files in the current directory using the %ls
magic command (available on all systems, even on Windows, since it is a magic command provided by IPython), as follows:
In [3]: %ls Out[3]: facebook.zip [...]
We see a new facebook.zip
file.
Note
If you are on Windows, or if downloading the file from IPython didn't work, you can always download this file manually via your web browser at the following URL: https://github.com/ipython-books/minibook-2nd-data/. Then save the Facebook dataset in the current directory (the one containing this notebook, which should be ~/minibook/chapter1/
).
The next step is to unzip this file in the current directory. The first way of doing it is to use your operating system, generally with a right-click on the icon. On Linux and OS X, we can also use the unzip
command-line tool (you may need to install it first, for example with a command like sudo apt-get install unzip
on Ubuntu). Finally, it is also possible to do it in pure Python with the zipfile
module (see https://docs.python.org/3.4/library/zipfile.html).
Here, we'll call the unzip
tool, which will only work on Linux and OS X, not Windows:
In [4]: !unzip facebook.zip
Once the archive has been extracted, a new subdirectory named facebook
appears, as shown here:
In [5]: %ls Out[5]: facebook facebook.zip [...]
Let's enter into this subdirectory with the %cd
magic command (all operating systems), as follows:
In [6]: %cd facebook Out[6]: /home/cyrille/minibook/chapter1/facebook
IPython provides a %bookmark
magic to create an alias to the current directory. Let's type the following:
In [7]: %bookmark fbdata
Now, in any future session, we'll be able to just type %cd fbdata
to enter into this directory. Type %bookmark?
to see all options. This magic command is helpful when dealing with many directories.
Let's display the contents of the directory:
In [8]: %ls Out[8]: 0.circles 1684.circles 3437.circles 3980.circles 686.circles 0.edges 1684.edges 3437.edges 3980.edges 686.edges 107.circles 1912.circles 348.circles 414.circles 698.circles 107.edges 1912.edges 348.edges 414.edges 698.edges
Here, every number identifies a Facebook user (called the ego user). The .edges
file contains its social graph. In this graph, nodes represent other Facebook users, and edges represent friendship links between them. The .circles
file contains lists of friends.
Let's retrieve the list of .edges
files with the following command (which won't work on Windows):
In [9]: files = !ls -1 -S | grep .edges
The Unix command ls -1 -S
lists all files in the current directory, sorted by decreasing size. The pipe | grep edges
filters only those files that contain .edges
. Then, this list is assigned to a new Python variable named files
, as follows:
In [10]: files Out[10]: ['1912.edges', '107.edges', '1684.edges', '3437.edges', '348.edges', '0.edges', '414.edges', '686.edges', '698.edges', '3980.edges']
On Windows, you can use the following Python code to obtain the same list (if you're not on Windows, you can skip this code listing):
In [11]: import os from operator import itemgetter # Get the name and file size of all .edges files. files = [(file, os.stat(file).st_size) for file in os.listdir('.') if file.endswith('.edges')] # Sort the list with the second item (file size), # in decreasing order. files = sorted(files, key=itemgetter(1), reverse=True) # Only keep the first item (file name), in the same order. files = [file for (file, size) in files]
Let's display the first few lines of the first file in the list (Unix-specific command):
In [12]: !head -n5 {files[0]} Out[12]: 2290 2363 2346 2025 2140 2428 2201 2506 2425 2557
The curly braces {}
let us insert a Python variable within a system command (here, the head
Unix command which displays the first lines of a text file).
In an .edges
file, every line contains the two nodes forming every edge. The .circles
file contains lists of friends. Every line contains a space-separated list of the users forming every circle.
Besides the filesystem commands we have seen in the previous section, IPython provides many other magic commands. You can display the list of all magic commands with the %lsmagic
magic command, as follows:
In [13]: %lsmagic Out[13]: Available line magics: %alias %alias_magic %autocall %automagic %autosave %bookmark %cat %cd %clear %colors %config %connect_info %cp %debug %dhist %dirs %doctest_mode %ed %edit %env %gui %hist %history %install_default_config %install_ext %install_profiles %killbgscripts %ldir %less %lf %lk %ll %load %load_ext %loadpy %logoff %logon %logstart %logstate %logstop %ls %lsmagic %lx %macro %magic %man %matplotlib %mkdir %more %mv %notebook %page %pastebin %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %popd %pprint %precision %profile %prun %psearch %psource %pushd %pwd %pycat %pylab %qtconsole %quickref %recall %rehashx %reload_ext %rep %rerun %reset %reset_selective %rm %rmdir %run %save %sc %set_env %store %sx %system %tb %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode Available cell magics: %%! %%HTML %%SVG %%bash %%capture %%debug %%file %%html %%javascript %%latex %%perl %%prun %%pypy %%python %%python2 %%python3 %%ruby %%script %%sh %%svg %%sx %%system %%time %%timeit %%writefile Automagic is ON, % prefix IS NOT needed for line magics.
To obtain information about a magic command, append a question mark (?
) after the command, as shown in the following example:
In [14]: %history?
The %history
magic command lets you display and manipulate your command history in IPython. For example, the following command shows your last five commands:
In [15]: %history -l 5 Out[15]: files = !ls -1 -S | grep .edges files !head -n5 {files[0]} %lsmagic %history?
Let's also mention the %dhist
magic command that shows you a history of all visited directories.
Another useful magic command is %paste
, which lets you copy-paste Python code from anywhere into the IPython console (it is not available in the Notebook, where you can copy-paste as usual).
In IPython, the underscore (_
) character always contains the last output. This is useful if you ran some command and forgot to assign the output to a variable.
In [16]: # how many minutes in a day? 24 * 60 Out[16]: 1440 In [17]: # and in a year? _ * 365 Out[17]: 525600
We will now see several cell magics, which are magic commands that apply to a whole code cell rather than just a line of code. They are prefixed by two percent signs (%%
).
The %%capture
cell magic lets you capture the standard output and error output of some code into a Python variable. Here is an example (the outputs are captured in the output
Python variable):
In [18]: %%capture output %ls In [19]: output.stdout Out[19]: 0.circles 1684.circles 3437.circles 3980.circles 686.circles 0.edges 1684.edges 3437.edges 3980.edges 686.edges 107.circles 1912.circles 348.circles 414.circles 698.circles 107.edges 1912.edges 348.edges 414.edges 698.edges
The %%bash
cell magic is an extension of the !
shell prefix. It lets you run multiline bash code in the Notebook, as shown here:
In [20]: %%bash cd .. touch _HEY ls rm _HEY cd facebook Out[20]: _HEY facebook facebook.zip [...]
More generally, the %%script
cell magic lets you execute code with any program installed on your system. For example, assuming Haskell is installed (see https://www.haskell.org/downloads), you can easily execute Haskell code from the Notebook, as follows:
In [21]: %%script ghci putStrLn "Hello world!" Out[21]: GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude> Hello world! Prelude> Leaving GHCi.
The ghci
executable runs in a separate process, and the contents of the cell are passed to the executable's input. You can also put a full path after %%script
, for example, on Linux: %%script /usr/bin/ghci
.
Tip
IHaskell kernel
This way of calling external scripts is only useful for quick interactive experiments. If you want to run Haskell notebooks, you can use the IHaskell notebook for Jupyter, available at https://github.com/gibiansky/IHaskell.
Finally, the %%writefile
cell magic lets you write some text in a new file, as shown here:
In [22]: %%writefile myfile.txt Hello world! Out[22]: Writing myfile.txt In [23]: !more myfile.txt Out[23]: Hello world!
Now, let's delete the file, as follows:
In [24]: !rm myfile.txt
There are many other magic commands available. We will see several of them later in this book. Also, in Chapter 6, Customizing IPython, we will see how to create new magic commands. This is much easier than it sounds!
Refer to the following page for up-to-date documentation about all magic commands: http://www.ipython.org/ipython-doc/dev/interactive/magics.html.
Tab completion is an incredibly useful feature in Jupyter and IPython. When you start to write something and press the Tab key on your keyboard, IPython can guess what you're trying to do, and propose a list of options that match what you have typed so far. This works for Python functions, variables, magic commands, files, and more.
Let's first make sure we are in the facebook
directory (using the directory alias created previously):
In [25]: %cd fbdata %ls Out[25]: (bookmark:fbdata) -> /home/cyrille/minibook/chapter1/facebook /home/cyrille/minibook/chapter1/facebook 0.circles 1684.circles 3437.circles 3980.circles 686.circles 0.edges 1684.edges 3437.edges 3980.edges 686.edges 107.circles 1912.circles 348.circles 414.circles 698.circles 107.edges 1912.edges 348.edges 414.edges 698.edges
Now, start typing a command and press Tab before finishing it (here, press the Tab key on your keyboard right after typing e
), as follows:
!head -n5 107.e<TAB>
IPython automatically completes the command and adds the four remaining characters (dges
). IPython recognized the beginning of a file name and completed the command. If there are several completion possibilities, IPython doesn't complete anything, but instead shows a list of all options. You can then choose the appropriate solution by pressing the Up or Down keys on the keyboard, and pressing Tab again. The following screenshot shows an example:

Tab completion in the Notebook
Tab completion is extremely useful when you're getting acquainted with a new Python package. For example, to quickly see all functions provided by the NetworkX package, you can type import networkx; networkx.<TAB>
.
Tip
Customizing tab completion
If you're writing a Python library, you probably want to write tab-completion-aware code. Your users who work with IPython will thank you! In most cases, you have nothing to do, and tab completion will just work. In the rare cases where you use advanced dynamic techniques in a class, you can customize tab completion by implementing a __dir__(self)
method that returns all attributes available in the current class instance. See this reference for more details: https://docs.python.org/3.4/library/functions.html#dir.
You can write code and text in the Notebook. Every cell is either a Markdown cell or a code cell. The Markdown cell lets you write text. Markdown is a text formatting syntax that supports headers, bold, italics, hypertext links, images, and code. In the Notebook, you can also write mathematical equations in a Markdown cell using LaTeX, a markup language widely used for equations. Finally, you can also write some HTML in a Markdown cell, and it will be interpreted correctly.
Here is an example of a paragraph in Markdown:
### New paragraph This is *rich* **text** with [links](http://ipython.org), equations: $$\hat{f}(\xi) = \int_{-\infty}^{+\infty} f(x)\, \mathrm{e}^{-i \xi x} dx$$ code with syntax highlighting: ```python print("Hello world!") ``` and images: 
If you write this in a Markdown cell, and "play" the cell (for example, by pressing Ctrl + Enter), you will see the rendered text. The following screenshot shows the two modes of the cell:

A Markdown cell in the Notebook
By using both Markdown cells and code cells in a notebook, you can write an interactive document about any technical topic. Hence, the Notebook is not only an interface to code, it is also a platform to write documents or even books. In fact, this very book is entirely written in the Notebook!
Here are a few references about Markdown and LaTeX:
Markdown on Wikipedia at http://en.wikipedia.org/wiki/Markdown
The original specification, at http://daringfireball.net/projects/markdown/
A Markdown tutorial by GitHub, at https://help.github.com/articles/markdown-basics/
CommonMark, a standardized version of Markdown, at http://commonmark.org/
LaTeX on Wikipedia at http://en.wikipedia.org/wiki/LaTeX
You can add interactive graphical elements called widgets in a notebook. Examples of rich graphical widgets include buttons, sliders, dropdown menus, interactive plots, as well as videos, audio files, and complete Graphical User Interfaces (GUIs). Widget support in Jupyter is still relatively experimental at this point, but we will use them at several occasions in this book. This section shows a few basic examples.
First, let's add a YouTube video in a notebook, as follows:
In [26]: from IPython.display import YouTubeVideo YouTubeVideo('j9YpkSX7NNM')
Following is a screenshot of a YouTube video in a notebook:

Youtube in the Notebook
The YoutubeVideo
constructor accepts a YouTube identifier as input.
Next, let's show how to create a graphical control to manipulate the inputs to a Python function:
In [27]: from ipywidgets import interact # IPython.html.widgets before # IPython 4.0 @interact(x=(0, 10)) def square(x): print("The square of %d is %d." % (x, x**2)) Out[27]: 'The square of 7 is 49.'
Here is a screenshot:

Interactive widget in the Notebook
The square(x)
function just prints a sentence like The square of 7 is 49
. By adding the @interact
decorator above the function's definition, we tell IPython to create a widget to control the function's input x
. The argument x=(0, 10)
is a convention to indicate that we want a slider to control an integer between 0 and 10.
This method supports other common controls like checkboxes, dropdown menus, radio buttons, push buttons, and others.
Finally, entirely customizable widgets can be created, but this requires some knowledge of web technologies such as HTML, CSS, and JavaScript. The IPython Cookbook (http://ipython-books.github.io/cookbook/) contains many examples. You can also refer to the following links for more information:
IPython widgets tutorial at https://github.com/ipython/ipywidgets/blob/master/examples/Index.ipynb
Introducing the interactive features of the IPython Notebook, at https://github.com/rossant/euroscipy2014
A piano in the Notebook, at http://nbviewer.ipython.org/github/ipython-books/cookbook-code/blob/master/notebooks/chapter03_notebook/05_basic_widgets.ipynb
Notebooks are mainly designed for interactive exploration, not for reusability. It is currently difficult to reuse parts of a notebook in another script or notebook. Many users just copy-paste their code, which goes against the Don't Repeat Yourself (DRY) principle.
A common practice is to put frequently used code into a Python script, for example myscript.py
. Such a script can be called from the system terminal like this: python myscript.py
. Python will execute the script and quit at the end. If you use the -i
option, Python will start the interactive prompt when the script ends.
IPython also supports this technique; just replace python
by ipython
. For example: ipython -i script.py
to run script.py
interactively with IPython.
You can also run a script from within IPython by using the %run
magic command. The script runs in an empty namespace, meaning that any variable defined in the interactive namespace is not available within the executed script. However, at the end of the execution, the control returns to IPython, and the variables defined in the script are imported into the interactive namespace. This lets you inspect the intermediate variables used in the script. If you use the -i
option, the script will run in the interactive namespace. Any variable defined in the interactive session will be available in the script.
Let's also mention the similar %load
magic command.
Note
A namespace is a dictionary mapping variable names to Python objects. The global namespace contains global variables, whereas the local namespace of a function contains the local variables defined in the function. In IPython, the interactive namespace contains all objects defined and imported within the current interactive session. The %who
, %whos
, and %who_ls
magic commands give you some information about the interactive variables.
For example, let's write a script egos.py
that lists all ego identifiers in the Facebook data folder. Since each filename is of the form <egoid>.<extension>
, we list all files, remove the extensions, and take the sorted list of all unique identifiers. We can create this file from the Notebook, using the %%writefile
cell magic as follows:
In [28]: %cd fbdata %cd .. Out[28]: (bookmark:fbdata) -> /home/cyrille/minibook/chapter1/facebook /home/cyrille/minibook/chapter1/facebook In [29]: %%writefile egos.py import sys import os # We retrieve the folder as the first positional argument # to the command-line call if len(sys.argv) > 1: folder = sys.argv[1] # We list all files in the specified folder files = os.listdir(folder) # ids contains the list of idenfitiers identifiers = [int(file.split('.')[0]) for file in files] # Finally, we remove duplicates with set(), and sort the list # with sorted(). ids = sorted(set(identifiers)) Out[29]: Overwriting egos.py
This script accepts an argument folder
as an input. It is retrieved from the Python script via the sys.argv
list, which contains the list of arguments passed to the script via the command-line interface.
Let's execute this script in IPython using the %run
magic command, as follows:
In [30]: %run egos.py facebook
Note
If you get an error when running this script, make sure that the facebook
directory only contains <number>.xxx
files (like 0.circles
or 1684.edges
).
In [31]: ids Out[31]: [0, 107, 348, 414, 686, 698, 1684, 1912, 3437, 3980]
The ids
variable created in the script is now available in the interactive namespace.
Let's see what happens if we do not specify the folder name to the script, as follows:
In [32]: folder = 'facebook' In [33]: %run egos.py
We get an error: NameError: name 'folder' is not defined
. This is because the variable folder
is defined in the interactive namespace, but is not available within the script by default. We can change this behavior with the -i
option, as follows:
In [34]: %run -i egos.py In [35]: ids Out[35]: [0, 107, 348, 414, 686, 698, 1684, 1912, 3437, 3980]
This time, the script correctly used the folder
variable.
IPython can display detailed information about any Python object.
First, type ?
after a variable name to get some information about it. For example, let's inspect NetworkX's Graph
class, as follows:
In [36]: import networkx In [37]: networkx.Graph?
This shows the docstring and other information in the Notebook pager, as shown in the following screenshot:

Typing ??
instead of ?
shows even more information, including the whole source code of the Python object when it is available.
There are also several magic commands for inspecting Python objects:
%pdef
: Displays a function definition%pdoc
: Displays the docstring of a Python object%psource
: Displays the source code of an object (function, class, or method)%pfile
: Displays the source code of the Python script where an object is defined
IPython makes it convenient to debug a script or an entire application. It provides interactive access to an enhanced version of the Python debugger.
First, when you encounter an exception, you can immediately use the %debug
magic command to launch the IPython debugger at the exact point where the exception was raised.
If you activate the %pdb
magic command, the debugger will automatically start at the very next exception. You can also start IPython with ipython --pdb
.
Finally, you can run a whole script under the control of the debugger with the %run -d
command. This command executes the specified script with a break point at the first line so that you can precisely control the execution flow of the script. You can also specify explicitly where to put the first breakpoint; type %run -d -b29 script.py
to pause the program execution on line 29 of script.py
. In all cases, you first need to type c
to start the script execution.
When the debugger starts, you enter into a special prompt, as indicated by ipdb>
. The program execution is then paused at a given point in the code. You can type w
to display the line and stack location where the debugger has paused. At this point, you have access to all local variables and you can precisely control how you want to resume the execution. Within the debugger, several commands are available to navigate into the traceback; they are as follows:
u
/d
for going up/down into the call stacks
to step into the next statementn
to continue execution until the next line in the current functionr
to continue execution until the current function returnsc
to continue execution until the next breakpoint or exception
Other useful commands include:
p
to evaluate and print any expressiona
to obtain the arguments of the current functionsThe
!
prefix to execute any Python command within the debugger
The entire list of commands can be found in the documentation of the pdb
module in Python at https://docs.python.org/3.4/library/pdb.html.
Let's also mention the IPython.embed()
function that you can call anywhere in a Python script. This stops the script execution and starts IPython for debugging purposes. Leaving the embedded IPython terminal resumes the normal execution of the script.
The %timeit
magic function lets us estimate the execution time of any Python statement. Under the hood, it uses Python's native timeit
module.
In the following example, we first load an ego graph from our Facebook dataset using the NetworkX package. Then we evaluate how much time it takes to tell whether the graph is connected or not:
Let's go to the data directory, as follows:
In [38]: %cd fbdata Out[38]: (bookmark:fbdata) -> /home/cyrille/minibook/chapter1/facebook /home/cyrille/minibook/chapter1/facebook
We load NetworkX, as follows:
In [39]: import networkx
We can load a graph using the read_edgelist()
function, as follows:
In [40]: graph = networkx.read_edgelist('107.edges')
How big is our graph?
In [41]: len(graph.nodes()), len(graph.edges()) Out[41]: (1034, 26749)
Now let's find out whether the graph is connected or not:
In [42]: networkx.is_connected(graph) Out[42]: True
How long did this call take?
In [43]: %timeit networkx.is_connected(graph) Out[43]: 100 loops, best of 3: 5.92 ms per loop
Multiple calls are done in order to get more reliable time estimates. The number of calls is determined automatically, but you can use the -r
and -n
options to specify them directly. Type %timeit?
to get more information.
The %timeit
magic command gives you precious information about the total time taken by a function or a statement. This can help you find the fastest among several implementations of an algorithm, for example.
When you're finding that some code is too slow, you need to profile it before you can make it faster. Profiling gives you more than the total time taken by a function; it tells you exactly what is taking too long in your code.
The %prun
magic command lets you easily profile your code. It provides a convenient interface to Python's native profile
module.
Let's see a simple example. We first create a function returning the number of connected components in a file, as follows:
In [44]: import networkx In [45]: def ncomponents(file): graph = networkx.read_edgelist(file) return networkx.number_connected_components(graph)
Now we write a function that returns the number of connected components in all graphs defined in the directory, as follows:
In [46]: import glob def ncomponents_files(): return [(file, ncomponents(file)) for file in sorted(glob.glob('*.edges'))]
The glob
module (https://docs.python.org/3.4/library/glob.html) lets us find all files matching a given pattern (here, all files with the .edges
file extension).
In [47]: for file, n in ncomponents_files(): print(file.ljust(12), n, 'component(s)') Out[47]: 0.edges 5 component(s) 107.edges 1 component(s) 1684.edges 4 component(s) 1912.edges 2 component(s) 3437.edges 2 component(s) 348.edges 1 component(s) 3980.edges 4 component(s) 414.edges 2 component(s) 686.edges 1 component(s) 698.edges 3 component(s)
Let's first evaluate the time taken by this function:
In [48]: %timeit ncomponents_files() Out[48]: 1 loops, best of 3: 634 ms per loop
Now, to run the profiler, we use the %prun
magic function, as follows:
In [49]: %prun -s cumtime ncomponents_files() Out[49]: 2391070 function calls in 1.038 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.038 1.038 {built-in method exec} 1 0.000 0.000 1.038 1.038 <string>:1(<module>) 10 0.000 0.000 0.995 0.100 <string>:1(read_edgelist) 10 0.000 0.000 0.995 0.100 decorators.py:155(_open_file) 10 0.376 0.038 0.995 0.099 edgelist.py:174(parse_edgelist) 170174 0.279 0.000 0.350 0.000 graph.py:648(add_edge) 170184 0.059 0.000 0.095 0.000 edgelist.py:366(<genexpr>) 10 0.000 0.000 0.021 0.002 connected.py:98(number_connected_components) 35 0.001 0.000 0.021 0.001 connected.py:22(connected_components)
Let's explain what happened here. The profiler kept track of all function calls (including functions internal to NetworkX and Python) performed while our ncomponents_files()
function was running. There were 2,391,070 function calls. That's a lot! Opening a file, reading and parsing every line, creating the graphs, finding the number of connected components, and so on, are operations that involve many function calls.
The profiler shows the list of all function calls (we just showed a subset here). There are many ways to sort the functions. Here, we chose to sort them by cumulative time, which is the total time spent within every function (-s cumtime
option).
For every function, the profiler shows the total number of calls, and several time statistics, described here (copied verbatim from the profiler documentation):
tottime
: the total time spent in the given function (and excluding time made in calls to sub-functions)percall
: the quotient oftottime
divided byncalls
cumtime
: the cumulative time spent in this and all subfunctionspercall
: the quotient ofcumtime
divided by the number of non-recursive function calls
You will find more information by typing %prun?
or by looking here: https://docs.python.org/3.4/library/profile.html
Here, we see that computing the number of connected components took considerably less time than loading the graphs from the text files. Depending on the use-case, this might suggest using a more efficient file format.
There is of course much more to say about profiling and optimization. For example, it is possible to profile a function line by line, which provides an even more fine-grained profiling report. The IPython Cookbook contains many more details.
In this chapter, we covered everything you need to get started with Python, IPython, and the Jupyter Notebook. We detailed how to install the software, we reviewed the basics of the Python language, and we demonstrated ten of the most essential features of IPython and the Jupyter Notebook.
In the next chapter, we will use these tools to analyze real-world datasets.