Learning IPython for Interactive Computing and Data Visualization - Second Edition

4.7 (12 reviews total)
By Cyrille Rossant
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
About this book

Python is a user-friendly and powerful programming language. IPython offers a convenient interface to the language and its analysis libraries, while the Jupyter Notebook is a rich environment well-adapted to data science and visualization. Together, these open source tools are widely used by beginners and experts around the world, and in a huge variety of fields and endeavors.

This book is a beginner-friendly guide to the Python data analysis platform. After an introduction to the Python language, IPython, and the Jupyter Notebook, you will learn how to analyze and visualize data on real-world examples, how to create graphical user interfaces for image processing in the Notebook, and how to perform fast numerical computations for scientific simulations with NumPy, Numba, Cython, and ipyparallel. By the end of this book, you will be able to perform in-depth analyses of all sorts of data.

Publication date:
October 2015
Publisher
Packt
Pages
200
ISBN
9781783986989

 

Chapter 1. Getting Started with IPython

In this chapter, we will cover the following topics:

  • What are Python, IPython, and Jupyter?

  • Installing Python with Anaconda

  • Introducing the Notebook

  • A crash course on Python

  • Ten Jupyter/IPython essentials

 

What are Python, IPython, and Jupyter?


Python is an open source general-purpose language created by Guido van Rossum in the late 1980s. It is widely-used by system administrators and developers for many purposes: for example, automating routine tasks or creating a web server. Python is a flexible and powerful language, yet it is sufficiently simple to be taught to school children with great success.

In the past few years, Python has also emerged as one of the leading open platforms for data science and high-performance numerical computing. This might seem surprising as Python was not originally designed for scientific computing. Python's interpreted nature makes it much slower than lower-level languages like C or Fortran, which are more amenable to number crunching and the efficient implementation of complex mathematical algorithms.

However, the performance of these low-level languages comes at a cost: they are hard to use and they require advanced knowledge of how computers work. In the late 1990s, several scientists began investigating the possibility of using Python for numerical computing by interoperating it with mainstream C/Fortran scientific libraries. This would bring together the ease-of-use of Python with the performance of C/Fortran: the dream of any scientist!

Consequently, the past 15 years have seen the development of widely-used libraries such as NumPy (providing a practical array data structure), SciPy (scientific computing), matplotlib (graphical plotting), pandas (data analysis and statistics), scikit-learn (machine learning), SymPy (symbolic computing), and Jupyter/IPython (efficient interfaces for interactive computing). Python, along with this set of libraries, is sometimes referred to as the SciPy stack or PyData platform.

Tip

Competing platforms

Python has several competitors. For example, MATLAB (by Mathworks) is a commercial software focusing on numerical computing that is widely-used in scientific research and engineering. SPSS (by IBM) is a commercial software for statistical analysis. Python, however, is free and open source, and that's one of its greatest strengths. Alternative open source platforms include R (specialized in statistics) and Julia (a young language for high-performance numerical computing).

More recently, this platform has gained popularity in other non-academic communities such as finance, engineering, statistics, data science, and others.

This book provides a solid introduction to the whole platform by focusing on one of its main components: Jupyter/IPython.

Jupyter and IPython

IPython was created in 2001 by Fernando Perez (the I in IPython stands for "interactive"). It was originally meant to be a convenient command-line interface to the scientific Python platform. In scientific computing, trial and error is the rule rather than the exception, and this requires an efficient interface that allows for interactive exploration of algorithms, data, and graphs.

In 2011, IPython introduced the interactive Notebook. Inspired by commercial software such as Maple (by Maplesoft) or Mathematica (by Wolfram Research), the Notebook runs in a browser and provides a unified web interface where code, text, mathematical equations, plots, graphics, and interactive graphical controls can be combined into a single document. This is an ideal interface for scientific computing. Here is a screenshot of a notebook:

Example of a notebook

It quickly became clear that this interface could be used with languages other than Python such as R, Julia, Lua, Ruby, and many others. Further, the Notebook is not restricted to scientific computing: it can be used for academic courses, software documentation, or book writing thanks to conversion tools targeting Markdown, HTML, PDF, ODT, and many other formats. Therefore, the IPython developers decided in 2014 to acknowledge the general-purpose nature of the Notebook by giving a new name to the project: Jupyter.

Jupyter features a language-independent Notebook platform that can work with a variety of kernels. Implemented in any language, a kernel is the backend of the Notebook interface. It manages the interactive session, the variables, the data, and so on. By contrast, the Notebook interface is the frontend of the system. It manages the user interface, the text editor, the plots, and so on. IPython is henceforth the name of the Python kernel for the Jupyter Notebook. Other kernels include IR, IJulia, ILua, IRuby, and many others (50 at the time of this writing).

In August 2015, the IPython/Jupyter developers achieved the "Big Split" by splitting the previous monolithic IPython codebase into a set of smaller projects, including the language-independent Jupyter Notebook (see https://blog.jupyter.org/2015/08/12/first-release-of-jupyter/). For example, the parallel computing features of IPython are now implemented in a standalone Python package named ipyparallel, the IPython widgets are implemented in ipywidgets, and so on. This separation makes the code of the project more modular and facilitates third-party contributions. IPython itself is now a much smaller project than before since it only features the interactive Python terminal and the Python kernel for the Jupyter Notebook.

Note

You will find the list of changes in IPython 4.0 at http://ipython.readthedocs.org/en/latest/whatsnew/version4.html. Many internal IPython imports have been deprecated due to the code reorganization. Warnings are raised if you attempt to perform a deprecated import. Also, the profiles have been removed and replaced with a unique default profile. However, you can simulate this functionality with environment variables. You will find more information at http://jupyter.readthedocs.org.

What this book covers

This book covers the Jupyter Notebook 1.0 and focuses on its Python kernel, IPython 4.0. In this chapter, we will introduce the platform, the Python language, the Jupyter Notebook interface, and IPython. In the remaining chapters, we will cover data analysis and scientific computing in Jupyter/IPython with the help of mainstream scientific libraries such as NumPy, pandas, and matplotlib.

Note

This book gives you a solid introduction to Jupyter and the SciPy platform. The IPython Interactive Computing and Visualization Cookbook (http://ipython-books.github.io/cookbook/) is the sequel of this introductory-level book. In 15 chapters and more than 500 pages, it contains a hundred recipes covering a wide range of interactive numerical computing techniques and data science topics. The IPython Cookbook is an excellent addition to the present IPython minibook if you're interested in delving into the platform in much greater detail.

References

Here are a few references about IPython and the Notebook:

 

Installing Python with Anaconda


Although Python is an open-source, cross-platform language, installing it with the usual scientific packages used to be overly complicated. Fortunately, there is now an all-in-one scientific Python distribution, Anaconda (by Continuum Analytics), that is free, cross-platform, and easy to install. Anaconda comes with Jupyter and all of the scientific packages we will use in this book. There are other distributions and installation options (like Canopy, WinPython, Python(x, y), and others), but for the purpose of this book we will use Anaconda throughout.

Tip

Running Jupyter in the cloud

You can also use Jupyter directly from your web browser, without installing anything on your local computer: go to http://try.jupyter.org. Note that the notebooks created there are not saved. Let's also mention a similar service, Wakari (https://wakari.io), by Continuum Analytics.

Anaconda comes with a package manager named conda, which lets you manage your Python distribution and install new packages.

Tip

Miniconda

Miniconda (http://conda.pydata.org/miniconda.html) is a light version of Anaconda that gives you the ability to only install the packages you need.

Downloading Anaconda

The first step is to download Anaconda from Continuum Analytics' website (http://continuum.io/downloads). This is actually not the easiest part since several versions are available. Three properties define a particular version:

  • The operating system (OS): Linux, Mac OS X, or Windows. This will depend on the computer you want to install Python on.

  • 32-bit or 64-bit: You want the 64-bit version, unless you're on an old or low-end computer. The 64-bit version will allow you to manipulate large datasets.

  • The version of Python: 2.7, or 3.4 (or later). In this book, we will use Python 3.4. You can also use Python 3.5 (released in September 2015) which introduces many features, including a new @ operator for matrix multiplication. However, it is easy to temporarily switch to a Python 2.7 environment with Anaconda if necessary (see the next section).

    Note

    Python 3 brought a few backward-incompatible changes over Python 2 (also known as Legacy Python). This is why many people are still using Python 2.7 at this time, even though Python 3 was released in 2008. We will use Python 3 in this book, and we recommend that newcomers learn Python 3. If you need to use legacy Python code that hasn't yet been updated to Python 3, you can use conda to temporarily switch to a Python 2 interpreter.

Once you have found the right link for your OS and Python 3 64-bit, you can download the package. You should then find it in your downloads directory (depending on your OS and your browser's settings).

Installing Anaconda

The Anaconda installer comes in different flavors depending on your OS, as follows:

  • Linux: The Linux installer is a bash .sh script. Run it with a command like bash Anaconda3-2.3.0-Linux-x86_64.sh (if necessary, replace the filename by the one you downloaded).

  • Mac: The Mac graphical installer is a .pkg file that you can run with a double-click.

  • Windows: The Windows graphical installer is an .exe file that you can run with a double-click.

Then, follow the instructions to install Anaconda on your computer. Here are a few remarks:

  • You don't need administrator rights to install Anaconda. In most cases, you can choose to install it in your personal user account.

  • Choose to put Anaconda in your system path, so that Anaconda's Python is the system default.

Note

Anaconda comes with a graphical launcher that you can use to start IPython, manage environments, and so on. You will find more details at http://docs.continuum.io/anaconda-launcher/

Before you get started...

Before you get started with Anaconda, there are a few things you need to know:

  • Opening a terminal

  • Finding your home directory

  • Manipulating your system path

You can skip this section if you already know how to do these things.

Opening a terminal

A terminal is a command-line application that lets you interact with your computer by typing commands with the keyboard, instead of clicking on windows with the mouse. While most computer users only know Graphical User Interfaces, developers and scientists generally need to know how to use the command-line interface for advanced usage. To use the command-line interface, follow the instructions that are specific to your OS:

  • On Windows, you can use Powershell. Press the Windows + R keys, type powershell in the Run box, and press Enter. You will find more information about Powershell at https://blog.udemy.com/powershell-tutorial/. Alternatively, you can use the older Windows terminal by typing cmd in the Run box.

  • On OS X, you can open the Terminal application, for example by pressing Cmd + Space, typing terminal, and pressing Enter.

  • On Linux, you can open the Terminal from your application manager.

In a terminal, use the cd /path/to/directory command to move to a given directory. For example, cd ~ moves to your home directory, which is introduced in the next section.

Finding your home directory

Your home directory is specific to your user account on your computer. It generally contains your applications' settings. It is often referred to as ~.Depending on the OS, the location of the home directory is as follows:

  • On Windows, its location is C:\Users\YourName\ where YourName is the name of your account.

  • On OS X, its location is /Users/YourName/ where YourName is the name of your account.

  • On Linux, its location is generally /home/yourname/ where yourname is the name of your account.

For example, the directory ~/anaconda3 refers to C:\Users\YourName\anaconda3\ on Windows and /home/yourname/anaconda3/ on Linux.

Manipulating your system path

The system path is a global variable (also called an environment variable) defined by your operating system with the list of directories where executable programs are located. If you type a command like python in your terminal, you generally need to have a python (or python.exe on Windows) executable in one of the directories listed in the system path. If that's not the case, an error may be raised.

You can manually add directories to your system path as follows:

  • On Windows, press the Windows + R keys, type rundll32.exe sysdm.cpl,EditEnvironmentVariables, and press Enter. You can then edit the PATH variable and append ;C:\path\to\directory if you want to add that directory. You will find more detailed instructions at http://www.computerhope.com/issues/ch000549.htm.

  • On OS X, edit or create the file ~/.bash_profile and add export PATH="$PATH:/path/to/directory" at the end of the file.

  • On Linux, edit or create the file ~/.bashrc and add export PATH="$PATH:/path/to/directory" at the end of the file.

Testing your installation

To test Anaconda once it has been installed, open a terminal and type python. This opens a Python console, not to be confused with the OS terminal. The Python console is identified with a >>> prompt string, whereas the OS terminal is identified with a $ (Linux/OS X) or > (Windows) prompt string. These strings are displayed in the terminal, often preceded by your computer's name, your login, and the current directory (for example, yourname@computer:~$ on Linux or PS C:\Users\YourName> on Windows). You can type commands after the prompt string. After typing python, you should see something like the following:

$ python
Python 3.4.3 |Anaconda 2.3.0 (64-bit)| (default, Jun  4 2015, 15:29:08) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

What matters is that Anaconda or Continuum Analytics is mentioned here. Otherwise, typing python might have launched your system's default Python, which is not the one you want to use in this book.

If you have this problem, you may need to add the path to the Anaconda executables to your system path. For example, this path will be ~/anaconda3/bin if you chose to install Anaconda in ~/anaconda3. The bin directory contains Anaconda executables including python.

If you have any problem installing and testing Anaconda, you can ask for help on the mailing list (see the link in the References section under the Installing Python with Anaconda section of this chapter).

Next, exit the Python prompt by typing exit() and pressing Enter.

Managing environments

Anaconda lets you create different isolated Python environments. For example, you can have a Python 2 distribution for the rare cases where you need to temporarily switch to Python 2.

To create a new environment for Python 2, type the following command in an OS terminal:

$ conda create -n py2 anaconda python=2.7

This will create a new isolated environment named py2 based on the original Anaconda distribution, but with Python 2.7. You could also use the command conda env: type conda env -h to see the details.

You can now activate your py2 environment by typing the following command in a terminal:

Now, you should see a (py2) prefix in front of your terminal prompt. Typing python in your terminal with the py2 environment activated will open a Python 2 interpreter.

Type deactivate on Windows or source deactivate on Linux/OS X to deactivate the environment in the terminal.

Common conda commands

Here is a list of common commands:

  • conda help: Displays the list of conda commands.

  • conda list: Lists all packages installed in the current environment.

  • conda info: Displays system information.

  • conda env list: Displays the list of environments installed. The currently active one is marked by a star *.

  • conda install somepackage: Installs a Python package (replace somepackage by the name of the package you want to install).

  • conda install somepackage=0.7: Installs a specific version of a package.

  • conda update somepackage: Updates a Python package to the latest available version.

  • conda update anaconda: Updates all packages.

  • conda update conda: Updates conda itself.

  • conda update --all: Updates all packages.

  • conda remove somepackage: Uninstalls a Python package.

  • conda remove -n myenv --all: Removes the environment named myenv (replace this by the name of the environment you want to uninstall).

  • conda clean -t: Removes the old tarballs that are left over after installation and updates.

Some commands ask for confirmation (you need to press y to confirm). You can also use the -y option to avoid the confirmation prompt.

If conda install somepackage fails, you can try pip install somepackage instead. This will use the Python Package Index (PyPI) instead of Anaconda. Many scientific Anaconda packages are easier to install than the corresponding PyPI packages because they are precompiled for your platform. However, many packages are available on PyPI but not on Anaconda.

Here are some references:

References

Here are a few references about Anaconda:

Downloading the notebooks

All of this book's code is available on GitHub as notebooks. We recommend that you download the notebooks and experiment with them as you're working through the book.

Note

GitHub is a popular online service that hosts open source projects. It is based on the Git Distributed Version Control System (DVCS). Git keeps track of file changes and enables collaborative work on a given project. Learning a version control system like Git is highly recommended for all programmers. Not using a version control system when working with code or even text documents is now considered as bad practice. You will find several references at https://help.github.com/articles/good-resources-for-learning-git-and-github/. The IPython Cookbook also contains several recipes about Git and best interactive programming practices.

Here is how to download the book's notebooks:

  • Install git: http://git-scm.com/downloads.

  • Check your git installation: Open a new OS terminal and type git version. You should see the version of git and not an error message.

  • Type the following command (this is a single line):

    $ git clone https://github.com/ipython-books/minibook-2nd-code.git  "$HOME/minibook"
    

This will download the very latest version of the code into a minibook subdirectory in your home directory. You can also choose another directory.

From this directory, you can update to the latest version at any time by typing git pull.

Tip

Notebooks on GitHub

Notebook documents stored on GitHub (with the file extension .ipynb) are automatically rendered on the GitHub website.

 

Introducing the Notebook


Originally, IPython provided an enhanced command-line console to run Python code interactively. The Jupyter Notebook is a more recent and more sophisticated alternative to the console. Today, both tools are available, and we recommend that you learn to use both.

Launching the IPython console

To run the IPython console, type ipython in an OS terminal. There, you can write Python commands and see the results instantly. Here is a screenshot:

IPython console

The IPython console is most convenient when you have a command-line-based workflow and you want to execute some quick Python commands.

You can exit the IPython console by typing exit.

Note

Let's mention the Qt console, which is similar to the IPython console but offers additional features such as multiline editing, enhanced tab completion, image support, and so on. The Qt console can also be integrated within a graphical application written with Python and Qt. See http://jupyter.org/qtconsole/stable/ for more information.

Launching the Jupyter Notebook

To run the Jupyter Notebook, open an OS terminal, go to ~/minibook/ (or into the directory where you've downloaded the book's notebooks), and type jupyter notebook. This will start the Jupyter server and open a new window in your browser (if that's not the case, go to the following URL: http://localhost:8888). Here is a screenshot of Jupyter's entry point, the Notebook dashboard:

The Notebook dashboard

Note

At the time of writing, the following browsers are officially supported: Chrome 13 and greater; Safari 5 and greater; and Firefox 6 or greater. Other browsers may work also. Your mileage may vary.

The Notebook is most convenient when you start a complex analysis project that will involve a substantial amount of interactive experimentation with your code. Other common use-cases include keeping track of your interactive session (like a lab notebook), or writing technical documents that involve code, equations, and figures.

In the rest of this section, we will focus on the Notebook interface.

Tip

Closing the Notebook server

To close the Notebook server, go to the OS terminal where you launched the server from, and press Ctrl + C. You may need to confirm with y.

The Notebook dashboard

The dashboard contains several tabs:

  • Files: shows all files and notebooks in the current directory

  • Running: shows all kernels currently running on your computer

  • Clusters: lets you launch kernels for parallel computing (covered in Chapter 5, High-Performance and Parallel Computing)

A notebook is an interactive document containing code, text, and other elements. A notebook is saved in a file with the .ipynb extension. This file is a plain text file storing a JSON data structure.

A kernel is a process running an interactive session. When using IPython, this kernel is a Python process. There are kernels in many languages other than Python.

Note

We follow the convention to use the term notebook for a file, and Notebook for the application and the web interface.

In Jupyter, notebooks and kernels are strongly separated. A notebook is a file, whereas a kernel is a process. The kernel receives snippets of code from the Notebook interface, executes them, and sends the outputs and possible errors back to the Notebook interface. Thus, in general, the kernel has no notion of a Notebook. A notebook is persistent (it's a file), whereas a kernel may be closed at the end of an interactive session and it is therefore not persistent. When a notebook is re-opened, it needs to be re-executed.

In general, no more than one Notebook interface can be connected to a given kernel. However, several IPython consoles can be connected to a given kernel.

The Notebook user interface

To create a new notebook, click on the New button, and select Notebook (Python 3). A new browser tab opens and shows the Notebook interface as follows:

A new notebook

Here are the main components of the interface, from top to bottom:

  • The notebook name, which you can change by clicking on it. This is also the name of the .ipynb file.

  • The Menu bar gives you access to several actions pertaining to either the notebook or the kernel.

  • To the right of the menu bar is the Kernel name. You can change the kernel language of your notebook from the Kernel menu. We will see in Chapter 6, Customizing IPython how to manage different kernel languages.

  • The Toolbar contains icons for common actions. In particular, the dropdown menu showing Code lets you change the type of a cell.

  • Following is the main component of the UI: the actual Notebook. It consists of a linear list of cells. We will detail the structure of a cell in the following sections.

Structure of a notebook cell

There are two main types of cells: Markdown cells and code cells, and they are described as follows:

  • A Markdown cell contains rich text. In addition to classic formatting options like bold or italics, we can add links, images, HTML elements, LaTeX mathematical equations, and more. We will cover Markdown in more detail in the Ten Jupyter/IPython essentials section of this chapter.

  • A code cell contains code to be executed by the kernel. The programming language corresponds to the kernel's language. We will only use Python in this book, but you can use many other languages.

You can change the type of a cell by first clicking on a cell to select it, and then choosing the cell's type in the toolbar's dropdown menu showing Markdown or Code.

Markdown cells

Here is a screenshot of a Markdown cell:

A Markdown cell

The top panel shows the cell in edit mode, while the bottom one shows it in render mode. The edit mode lets you edit the text, while the render mode lets you display the rendered cell. We will explain the differences between these modes in greater detail in the following section.

Code cells

Here is a screenshot of a complex code cell:

Structure of a code cell

This code cell contains several parts, as follows:

  • The Prompt number shows the cell's number. This number increases every time you run the cell. Since you can run cells of a notebook out of order, nothing guarantees that code numbers are linearly increasing in a given notebook.

  • The Input area contains a multiline text editor that lets you write one or several lines of code with syntax highlighting.

  • The Widget area may contain graphical controls; here, it displays a slider.

  • The Output area can contain multiple outputs, here:

    • Standard output (text in black)

    • Error output (text with a red background)

    • Rich output (an HTML table and an image here)

The Notebook modal interface

The Notebook implements a modal interface similar to some text editors such as vim. Mastering this interface may represent a small learning curve for some users.

  • Use the edit mode to write code (the selected cell has a green border, and a pen icon appears at the top right of the interface). Click inside a cell to enable the edit mode for this cell (you need to double-click with Markdown cells).

  • Use the command mode to operate on cells (the selected cell has a gray border, and there is no pen icon). Click outside the text area of a cell to enable the command mode (you can also press the Esc key).

Keyboard shortcuts are available in the Notebook interface. Type h to show them. We review here the most common ones (for Windows and Linux; shortcuts for OS X may be slightly different).

Keyboard shortcuts available in both modes

Here are a few keyboard shortcuts that are always available when a cell is selected:

  • Ctrl + Enter: run the cell

  • Shift + Enter: run the cell and select the cell below

  • Alt + Enter: run the cell and insert a new cell below

  • Ctrl + S: save the notebook

Keyboard shortcuts available in the edit mode

In the edit mode, you can type code as usual, and you have access to the following keyboard shortcuts:

  • Esc: switch to command mode

  • Ctrl + Shift + -: split the cell

Keyboard shortcuts available in the command mode

In the command mode, keystrokes are bound to cell operations. Don't write code in command mode or unexpected things will happen! For example, typing dd in command mode will delete the selected cell! Here are some keyboard shortcuts available in command mode:

  • Enter: switch to edit mode

  • or k: select the previous cell

  • or j: select the next cell

  • y / m: change the cell type to code cell/Markdown cell

  • a / b: insert a new cell above/below the current cell

  • x / c / v: cut/copy/paste the current cell

  • dd: delete the current cell

  • z: undo the last delete operation

  • Shift + =: merge the cell below

  • h: display the help menu with the list of keyboard shortcuts

Spending some time learning these shortcuts is highly recommended.

References

Here are a few references:

 

A crash course on Python


If you don't know Python, read this section to learn the fundamentals. Python is a very accessible language and, if you have ever programmed, it will only take you a few minutes to learn the basics.

Hello world

Open a new notebook and type the following in the first cell:

In [1]: print("Hello world!")
Out[1]: Hello world!

Here is a screenshot:

"Hello world" in the Notebook

Tip

Prompt string

Note that the convention chosen in this book is to show Python code (also called the input) prefixed with In [x]: (which shouldn't be typed). This is the standard IPython prompt. Here, you should just type print("Hello world!") and then press Shift + Enter.

Congratulations! You are now a Python programmer.

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. You will also find the book's code on this GitHub repository: https://github.com/ipython-books/minibook-2nd-code.

Variables

Let's use Python as a calculator.

In [2]: 2 * 2
Out[2]: 4

Here, 2 * 2 is an expression statement. This operation is performed, the result is returned, and IPython displays it in the notebook cell's output.

Tip

Division

In Python 3, 3 / 2 returns 1.5 (floating-point division), whereas it returns 1 in Python 2 (integer division). This can be source of errors when porting Python 2 code to Python 3. It is recommended to always use the explicit 3.0 / 2.0 for floating-point division (by using floating-point numbers) and 3 // 2 for integer division. Both syntaxes work in Python 2 and Python 3. See http://python3porting.com/differences.html#integer-division for more details.

Other built-in mathematical operators include +, -, ** for the exponentiation, and others. You will find more details at https://docs.python.org/3/reference/expressions.html#the-power-operator.

Variables form a fundamental concept of any programming language. A variable has a name and a value. Here is how to create a new variable in Python:

In [3]: a = 2

And here is how to use an existing variable:

In [4]: a * 3
Out[4]: 6

Several variables can be defined at once (this is called unpacking):

In [5]: a, b = 2, 6

There are different types of variables. Here, we have used a number (more precisely, an integer). Other important types include floating-point numbers to represent real numbers, strings to represent text, and booleans to represent True/False values. Here are a few examples:

In [6]: somefloat = 3.1415
        sometext = 'pi is about'  # You can also use double quotes.
        print(sometext, somefloat)  # Display several variables.
Out[6]: pi is about 3.1415

Note how we used the # character to write comments. Whereas Python discards the comments completely, adding comments in the code is important when the code is to be read by other humans (including yourself in the future).

String escaping

String escaping refers to the ability to insert special characters in a string. For example, how can you insert ' and ", given that these characters are used to delimit a string in Python code? The backslash \ is the go-to escape character in Python (and in many other languages too). Here are a few examples:

In [7]: print("Hello \"world\"")
        print("A list:\n* item 1\n* item 2")
        print("C:\\path\\on\\windows")
        print(r"C:\path\on\windows")
Out[7]: Hello "world"
        A list:
        * item 1
        * item 2
        C:\path\on\windows
        C:\path\on\windows

The special character \n is the new line (or line feed) character. To insert a backslash, you need to escape it, which explains why it needs to be doubled as \\.

You can also disable escaping by using raw literals with a r prefix before the string, like in the last example above. In this case, backslashes are considered as normal characters.

This is convenient when writing Windows paths, since Windows uses backslash separators instead of forward slashes like on Unix systems. A very common error on Windows is forgetting to escape backslashes in paths: writing "C:\path" may lead to subtle errors.

You will find the list of special characters in Python at https://docs.python.org/3.4/reference/lexical_analysis.html#string-and-bytes-literals.

Lists

A list contains a sequence of items. You can concisely instruct Python to perform repeated actions on the elements of a list. Let's first create a list of numbers as follows:

In [8]: items = [1, 3, 0, 4, 1]

Note the syntax we used to create the list: square brackets [], and commas , to separate the items.

The built-in function len() returns the number of elements in a list:

In [9]: len(items)
Out[9]: 5

Note

Python comes with a set of built-in functions, including print(), len(), max(), functional routines like filter() and map(), and container-related routines like all(), any(), range(), and sorted(). You will find the full list of built-in functions at https://docs.python.org/3.4/library/functions.html.

Now, let's compute the sum of all elements in the list. Python provides a built-in function for this:

In [10]: sum(items)
Out[10]: 9

We can also access individual elements in the list, using the following syntax:

In [11]: items[0]
Out[11]: 1
In [12]: items[-1]
Out[12]: 1

Note that indexing starts at 0 in Python: the first element of the list is indexed by 0, the second by 1, and so on. Also, -1 refers to the last element, -2 to the penultimate element, and so on.

The same syntax can be used to alter elements in the list:

In [13]: items[1] = 9
         items
Out[13]: [1, 9, 0, 4, 1]

We can access sublists with the following syntax:

In [14]: items[1:3]
Out[14]: [9, 0]

Here, 1:3 represents a slice going from element 1 included (this is the second element of the list) to element 3 excluded. Thus, we get a sublist with the second and third element of the original list. The first-included/last-excluded asymmetry leads to an intuitive treatment of overlaps between consecutive slices. Also, note that a sublist refers to a dynamic view of the original list, not a copy; changing elements in the sublist automatically changes them in the original list.

Python provides several other types of containers:

  • Tuples are immutable and contain a fixed number of elements:

    In [15]: my_tuple = (1, 2, 3)
             my_tuple[1]
    Out[15]: 2
    
  • Dictionaries contain key-value pairs. They are extremely useful and common:

    In [16]: my_dict = {'a': 1, 'b': 2, 'c': 3}
             print('a:', my_dict['a'])
    Out[16]: a: 1
    In [17]: print(my_dict.keys())
    Out[17]: dict_keys(['c', 'a', 'b'])
    

    There is no notion of order in a dictionary. However, the native collections module provides an OrderedDict structure that keeps the insertion order (see https://docs.python.org/3.4/library/collections.html).

  • Sets, like mathematical sets, contain distinct elements:

    In [18]: my_set = set([1, 2, 3, 2, 1])
             my_set
    Out[18]: {1, 2, 3}
    

    Note

    A Python object is mutable if its value can change after it has been created. Otherwise, it is immutable. For example, a string is immutable; to change it, a new string needs to be created. A list, a dictionary, or a set is mutable; elements can be added or removed. By contrast, a tuple is immutable, and it is not possible to change the elements it contains without recreating the tuple. See https://docs.python.org/3.4/reference/datamodel.html for more details.

Loops

We can run through all elements of a list using a for loop:

In [19]: for item in items:
             print(item)
Out[19]: 1
         9
         0
         4
         1

There are several things to note here:

  • The for item in items syntax means that a temporary variable named item is created at every iteration. This variable contains the value of every item in the list, one at a time.

  • Note the colon : at the end of the for statement. Forgetting it will lead to a syntax error!

  • The statement print(item) will be executed for all items in the list.

  • Note the four spaces before print: this is called the indentation. You will find more details about indentation in the next subsection.

Python supports a concise syntax to perform a given operation on all elements of a list, as follows:

In [20]: squares = [item * item for item in items]
         squares
Out[20]: [1, 81, 0, 16, 1]

This is called a list comprehension. A new list is created here; it contains the squares of all numbers in the list. This concise syntax leads to highly readable and Pythonic code.

Indentation

Indentation refers to the spaces that may appear at the beginning of some lines of code. This is a particular aspect of Python's syntax.

In most programming languages, indentation is optional and is generally used to make the code visually clearer. But in Python, indentation also has a syntactic meaning. Particular indentation rules need to be followed for Python code to be correct.

In general, there are two ways to indent some text: by inserting a tab character (also referred to as \t), or by inserting a number of spaces (typically, four). It is recommended to use spaces instead of tab characters. Your text editor should be configured such that the Tab key on the keyboard inserts four spaces instead of a tab character.

In the Notebook, indentation is automatically configured properly; so you shouldn't worry about this issue. The question only arises if you use another text editor for your Python code.

Finally, what is the meaning of indentation? In Python, indentation delimits coherent blocks of code, for example, the contents of a loop, a conditional branch, a function, and other objects. Where other languages such as C or JavaScript use curly braces to delimit such blocks, Python uses indentation.

Conditional branches

Sometimes, you need to perform different operations on your data depending on some condition. For example, let's display all even numbers in our list:

In [21]: for item in items:
             if item % 2 == 0:
                 print(item)
Out[21]: 0
         4

Again, here are several things to note:

  • An if statement is followed by a boolean expression.

  • If a and b are two integers, the modulo operand a % b returns the remainder from the division of a by b. Here, item % 2 is 0 for even numbers, and 1 for odd numbers.

  • The equality is represented by a double equal sign == to avoid confusion with the assignment operator = that we use when we create variables.

  • Like with the for loop, the if statement ends with a colon :.

  • The part of the code that is executed when the condition is satisfied follows the if statement. It is indented. Indentation is cumulative: since this if is inside a for loop, there are eight spaces before the print(item) statement.

Python supports a concise syntax to select all elements in a list that satisfy certain properties. Here is how to create a sublist with only even numbers:

In [22]: even = [item for item in items if item % 2 == 0]
         even
Out[22]: [0, 4]

This is also a form of list comprehension.

Functions

Code is typically organized into functions. A function encapsulates part of your code. Functions allow you to reuse bits of functionality without copy-pasting the code. Here is a function that tells whether an integer number is even or not:

In [23]: def is_even(number):
             """Return whether an integer is even or not."""
             return number % 2 == 0

There are several things to note here:

  • A function is defined with the def keyword.

  • After def comes the function name. A general convention in Python is to only use lowercase characters, and separate words with an underscore _. A function name generally starts with a verb.

  • The function name is followed by parentheses, with one or several variable names called the arguments. These are the inputs of the function. There is a single argument here, named number.

  • No type is specified for the argument. This is because Python is dynamically typed; you could pass a variable of any type. This function would work fine with floating point numbers, for example (the modulo operation works with floating point numbers in addition to integers).

  • The body of the function is indented (and note the colon : at the end of the def statement).

  • There is a docstring wrapped by triple quotes """. This is a particular form of comment that explains what the function does. It is not mandatory, but it is strongly recommended to write docstrings for the functions exposed to the user.

  • The return keyword in the body of the function specifies the output of the function. Here, the output is a Boolean, obtained from the expression number % 2 == 0. It is possible to return several values; just use a comma to separate them (in this case, a tuple of Booleans would be returned).

Once a function is defined, it can be called like this:

In [24]: is_even(3)
Out[24]: False
In [25]: is_even(4)
Out[25]: True

Here, 3 and 4 are successively passed as arguments to the function.

Positional and keyword arguments

A Python function can accept an arbitrary number of arguments, called positional arguments. It can also accept optional named arguments, called keyword arguments. Here is an example:

In [26]: def remainder(number, divisor=2):
             return number % divisor

The second argument of this function, divisor, is optional. If it is not provided by the caller, it will default to the number 2, as shown here:

In [27]: remainder(5)
Out[27]: 1

There are two equivalent ways of specifying a keyword argument when calling a function. They are as follows:

In [28]: remainder(5, 3)
Out[28]: 2
In [29]: remainder(5, divisor=3)
Out[29]: 2

In the first case, 3 is understood as the second argument, divisor. In the second case, the name of the argument is given explicitly by the caller. This second syntax is clearer and less error-prone than the first one.

Functions can also accept arbitrary sets of positional and keyword arguments, using the following syntax:

In [30]: def f(*args, **kwargs):
             print("Positional arguments:", args)
             print("Keyword arguments:", kwargs)
In [31]: f(1, 2, c=3, d=4)
Out[31]: Positional arguments: (1, 2)
         Keyword arguments: {'c': 3, 'd': 4}

Inside the function, args is a tuple containing positional arguments, and kwargs is a dictionary containing keyword arguments.

Passage by assignment

When passing a parameter to a Python function, a reference to the object is actually passed (passage by assignment):

  • If the passed object is mutable, it can be modified by the function

  • If the passed object is immutable, it cannot be modified by the function

Here is an example:

In [32]: my_list = [1, 2]

         def add(some_list, value):
             some_list.append(value)

         add(my_list, 3)
         my_list
Out[32]: [1, 2, 3]

The add() function modifies an object defined outside it (in this case, the object my_list); we say this function has side-effects. A function with no side-effects is called a pure function: it doesn't modify anything in the outer context, and it deterministically returns the same result for any given set of inputs. Pure functions are to be preferred over functions with side-effects.

Knowing this can help you spot out subtle bugs. There are further related concepts that are useful to know, including function scopes, naming, binding, and more. Here are a couple of links:

Errors

Let's talk about errors in Python. As you learn, you will inevitably come across errors and exceptions. The Python interpreter will most of the time tell you what the problem is, and where it occurred. It is important to understand the vocabulary used by Python so that you can more quickly find and correct your errors.

Let's see the following example:

In [33]: def divide(a, b):
             return a / b
In [34]: divide(1, 0)
Out[34]: ---------------------------------------------------------
         ZeroDivisionError       Traceback (most recent call last)
         <ipython-input-2-b77ebb6ac6f6> in <module>()
         ----> 1 divide(1, 0)

         <ipython-input-1-5c74f9fd7706> in divide(a, b)
               1 def divide(a, b):
         ----> 2     return a / b

         ZeroDivisionError: division by zero

Here, we defined a divide() function, and called it to divide 1 by 0. Dividing a number by 0 is an error in Python. Here, a ZeroDivisionError exception was raised. An exception is a particular type of error that can be raised at any point in a program. It is propagated from the innards of the code up to the command that launched the code. It can be caught and processed at any point. You will find more details about exceptions at https://docs.python.org/3/tutorial/errors.html, and common exception types at https://docs.python.org/3/library/exceptions.html#bltin-exceptions.

The error message you see contains the stack trace, the exception type, and the exception message. The stack trace shows all function calls between the raised exception and the script calling point.

The top frame, indicated by the first arrow ---->, shows the entry point of the code execution. Here, it is divide(1, 0), which was called directly in the Notebook. The error occurred while this function was called.

The next and last frame is indicated by the second arrow. It corresponds to line 2 in our function divide(a, b). It is the last frame in the stack trace: this means that the error occurred there.

We will see later in this chapter how to debug such errors interactively in IPython and in the Jupyter Notebook. Knowing how to navigate up and down in the stack trace is critical when debugging complex Python code.

Object-oriented programming

Object-oriented programming (OOP) is a relatively advanced topic. Although we won't use it much in this book, it is useful to know the basics. Also, mastering OOP is often essential when you start to have a large code base.

In Python, everything is an object. A number, a string, or a function is an object. An object is an instance of a type (also known as class). An object has attributes and methods, as specified by its type. An attribute is a variable bound to an object, giving some information about it. A method is a function that applies to the object.

For example, the object 'hello' is an instance of the built-in str type (string). The type() function returns the type of an object, as shown here:

In [35]: type('hello')
Out[35]: str

There are native types, like str or int (integer), and custom types, also called classes, that can be created by the user.

In IPython, you can discover the attributes and methods of any object with the dot syntax and tab completion. For example, typing 'hello'.u and pressing Tab automatically shows us the existence of the upper() method:

In [36]: 'hello'.upper()
Out[36]: 'HELLO'

Here, upper() is a method available to all str objects; it returns an uppercase copy of a string.

A useful string method is format(). This simple and convenient templating system lets you generate strings dynamically, as shown in the following example:

In [37]: 'Hello {0:s}!'.format('Python')
Out[37]: Hello Python!

The {0:s} syntax means "replace this with the first argument of format(), which should be a string". The variable type after the colon is especially useful for numbers, where you can specify how to display the number (for example, .3f to display three decimals). The 0 makes it possible to replace a given value several times in a given string. You can also use a name instead of a position—for example 'Hello {name}!'.format(name='Python').

Some methods are prefixed with an underscore _; they are private and are generally not meant to be used directly. IPython's tab completion won't show you these private attributes and methods unless you explicitly type _ before pressing Tab.

In practice, the most important thing to remember is that appending a dot . to any Python object and pressing Tab in IPython will show you a lot of functionality pertaining to that object.

Functional programming

Python is a multi-paradigm language; it notably supports imperative, object-oriented, and functional programming models. Python functions are objects and can be handled like other objects. In particular, they can be passed as arguments to other functions (also called higher-order functions). This is the essence of functional programming.

Decorators provide a convenient syntax construct to define higher-order functions. Here is an example using the is_even() function from the previous Functions section:

In [38]: def show_output(func):
             def wrapped(*args, **kwargs):
                 output = func(*args, **kwargs)
                 print("The result is:", output)
             return wrapped

The show_output() function transforms an arbitrary function func() to a new function, named wrapped(), that displays the result of the function, as follows:

In [39]: f = show_output(is_even)
         f(3)
Out[39]: The result is: False

Equivalently, this higher-order function can also be used with a decorator, as follows:

In [40]: @show_output
         def square(x):
             return x * x
In [41]: square(3)
Out[41]: The result is: 9

You can find more information about Python decorators at https://en.wikipedia.org/wiki/Python_syntax_and_semantics#Decorators and at http://www.thecodeship.com/patterns/guide-to-python-function-decorators/.

Python 2 and 3

Let's finish this section with a few notes about Python 2 and Python 3 compatibility issues.

There are still some Python 2 code and libraries that are not compatible with Python 3. Therefore, it is sometimes useful to be aware of the differences between the two versions. One of the most obvious differences is that print is a statement in Python 2, whereas it is a function in Python 3. Therefore, print "Hello" (without parentheses) works in Python 2 but not in Python 3, while print("Hello") works in both Python 2 and Python 3.

There are several non-mutually exclusive options to write portable code that works with both versions:

  • futures: A built-in module supporting backward-incompatible Python syntax

  • 2to3: A built-in Python module to port Python 2 code to Python 3

  • six: An external lightweight library for writing compatible code

Here are a few references:

Going beyond the basics

You now know the fundamentals of Python, the bare minimum that you will need in this book. As you can imagine, there is much more to say about Python.

Following are a few further basic concepts that are often useful and that we cannot cover here, unfortunately. You are highly encouraged to have a look at them in the references given at the end of this section:

  • range and enumerate

  • pass, break, and, continue, to be used in loops

  • Working with files

  • Creating and importing modules

  • The Python standard library provides a wide range of functionality (OS, network, file systems, compression, mathematics, and more)

Here are some slightly more advanced concepts that you might find useful if you want to strengthen your Python skills:

  • Regular expressions for advanced string processing

  • Lambda functions for defining small anonymous functions

  • Generators for controlling custom loops

  • Exceptions for handling errors

  • with statements for safely handling contexts

  • Advanced object-oriented programming

  • Metaprogramming for modifying Python code dynamically

  • The pickle module for persisting Python objects on disk and exchanging them across a network

Finally, here are a few references:

 

Ten Jupyter/IPython essentials


In this section, we will cover ten essential features of Jupyter and IPython that make them so useful for interactive computing.

Using IPython as an extended shell

Note

Unfortunately, this subsection will not work well on Windows. The goal here is to demonstrate accessing the operating system's shell from IPython. We could say that, by design, the Windows shell is much more limited than those provided by Linux and OS X. Windows favors user interactions from the graphical interface, whereas Linux and OS X inherit Unix's flexible command-line capabilities. If you want to share and distribute your notebooks, you shouldn't rely on the techniques exposed in this subsection. Rather, you should use the Python equivalents, which are more verbose but also more powerful. Using the shell from IPython is only useful during interactive sessions of users already familiar with the Unix shell.

Open a terminal and type the following commands to go to the minibook's chapter1 directory and launch the Notebook server:

$ cd ~/minibook/chapter1/
$ jupyter notebook

In the Notebook dashboard, open the 15-ten.ipynb notebook. You can also create a new notebook if you prefer not to use the book's code.

Let's illustrate how to use IPython as an extended shell. We will download an example dataset, navigate through the filesystem, and open text files, all from the Notebook. The dataset contains social network data of hundreds of volunteer Facebook users. This BSD-licensed dataset is provided freely by Stanford's SNAP project (http://snap.stanford.edu/data/).

IPython provides several magic commands that let you interact with your filesystem. These commands are prefixed with a %. For example here is how to display the current working directory:

In [1]: %pwd
Out[1]: '/home/cyrille/minibook/chapter1'

Note

Like most other magic commands, this magic command works on all operating systems, including Windows. IPython implements several cross-platform Python equivalents of common Unix commands like pwd. For other commands not implemented by IPython, we need to call shell commands directly with the ! prefix (as shown in the following examples). This doesn't work well on Windows since many of these commands are Unix-specific. In brief, %-prefixed commands should work on all operating systems while !-prefixed commands will generally only work on Linux and OS X, not Windows.

Let's download the dataset from the book's data repository (https://github.com/ipython-books/minibook-2nd-data). IPython doesn't yet provide a magic command for downloading data, but we can use another IPython trick: we can run any system or terminal command from IPython by prefixing it with an exclamation mark (!). For example, here is how to use the wget download utility only available on Unix systems:

In [2]: !wget https://raw.githubusercontent.com/ipython-books/minibook-2nd-data/master/facebook.zip

Note

If wget is not installed, you can install it with your OS package manager. For example, on Ubuntu: sudo apt-get install wget; on OS X: brew install wget. On OS X, brew is available at http://brew.sh/. On Windows, you should download the file manually from the data repository, as explained later.

This wget command downloads a file from a URL and saves it to a file in the local filesystem. Let's display the list of files in the current directory using the %ls magic command (available on all systems, even on Windows, since it is a magic command provided by IPython), as follows:

In [3]: %ls
Out[3]: facebook.zip  [...]

We see a new facebook.zip file.

Note

If you are on Windows, or if downloading the file from IPython didn't work, you can always download this file manually via your web browser at the following URL: https://github.com/ipython-books/minibook-2nd-data/. Then save the Facebook dataset in the current directory (the one containing this notebook, which should be ~/minibook/chapter1/).

The next step is to unzip this file in the current directory. The first way of doing it is to use your operating system, generally with a right-click on the icon. On Linux and OS X, we can also use the unzip command-line tool (you may need to install it first, for example with a command like sudo apt-get install unzip on Ubuntu). Finally, it is also possible to do it in pure Python with the zipfile module (see https://docs.python.org/3.4/library/zipfile.html).

Here, we'll call the unzip tool, which will only work on Linux and OS X, not Windows:

In [4]: !unzip facebook.zip

Once the archive has been extracted, a new subdirectory named facebook appears, as shown here:

In [5]: %ls
Out[5]: facebook  facebook.zip  [...]

Let's enter into this subdirectory with the %cd magic command (all operating systems), as follows:

In [6]: %cd facebook
Out[6]: /home/cyrille/minibook/chapter1/facebook

IPython provides a %bookmark magic to create an alias to the current directory. Let's type the following:

In [7]: %bookmark fbdata

Now, in any future session, we'll be able to just type %cd fbdata to enter into this directory. Type %bookmark? to see all options. This magic command is helpful when dealing with many directories.

Let's display the contents of the directory:

In [8]: %ls
Out[8]: 0.circles    1684.circles  3437.circles  3980.circles  686.circles
        0.edges      1684.edges    3437.edges    3980.edges    686.edges
        107.circles  1912.circles  348.circles   414.circles   698.circles
        107.edges    1912.edges    348.edges     414.edges     698.edges

Here, every number identifies a Facebook user (called the ego user). The .edges file contains its social graph. In this graph, nodes represent other Facebook users, and edges represent friendship links between them. The .circles file contains lists of friends.

Let's retrieve the list of .edges files with the following command (which won't work on Windows):

In [9]: files = !ls -1 -S | grep .edges

The Unix command ls -1 -S lists all files in the current directory, sorted by decreasing size. The pipe | grep edges filters only those files that contain .edges. Then, this list is assigned to a new Python variable named files, as follows:

In [10]: files
Out[10]: ['1912.edges',
          '107.edges',
          '1684.edges',
          '3437.edges',
          '348.edges',
          '0.edges',
          '414.edges',
          '686.edges',
          '698.edges',
          '3980.edges']

On Windows, you can use the following Python code to obtain the same list (if you're not on Windows, you can skip this code listing):

In [11]: import os
         from operator import itemgetter
         # Get the name and file size of all .edges files.
         files = [(file, os.stat(file).st_size)
                  for file in os.listdir('.')
                  if file.endswith('.edges')]
         # Sort the list with the second item (file size),
         # in decreasing order.
         files = sorted(files,
                        key=itemgetter(1),
                        reverse=True)
         # Only keep the first item (file name), in the same order.
         files = [file for (file, size) in files]

Let's display the first few lines of the first file in the list (Unix-specific command):

In [12]: !head -n5 {files[0]}
Out[12]: 2290 2363
         2346 2025
         2140 2428
         2201 2506
         2425 2557

The curly braces {} let us insert a Python variable within a system command (here, the head Unix command which displays the first lines of a text file).

In an .edges file, every line contains the two nodes forming every edge. The .circles file contains lists of friends. Every line contains a space-separated list of the users forming every circle.

Tip

Alias commands

If you use a complex command regularly, you can create an alias with the %alias magic command. Type %alias? for more information. See also the related %store magic command.

Learning magic commands

Besides the filesystem commands we have seen in the previous section, IPython provides many other magic commands. You can display the list of all magic commands with the %lsmagic magic command, as follows:

In [13]: %lsmagic
Out[13]: Available line magics:
         %alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %install_default_config  %install_ext  %install_profiles  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

         Available cell magics:
         %%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

         Automagic is ON, % prefix IS NOT needed for line magics.

To obtain information about a magic command, append a question mark (?) after the command, as shown in the following example:

In [14]: %history?

The %history magic command lets you display and manipulate your command history in IPython. For example, the following command shows your last five commands:

In [15]: %history -l 5
Out[15]: files = !ls -1 -S | grep .edges
         files
         !head -n5 {files[0]}
         %lsmagic
         %history?

Let's also mention the %dhist magic command that shows you a history of all visited directories.

Another useful magic command is %paste, which lets you copy-paste Python code from anywhere into the IPython console (it is not available in the Notebook, where you can copy-paste as usual).

In IPython, the underscore (_) character always contains the last output. This is useful if you ran some command and forgot to assign the output to a variable.

In [16]: # how many minutes in a day?
         24 * 60
Out[16]: 1440
In [17]: # and in a year?
         _ * 365
Out[17]: 525600

We will now see several cell magics, which are magic commands that apply to a whole code cell rather than just a line of code. They are prefixed by two percent signs (%%).

The %%capture cell magic lets you capture the standard output and error output of some code into a Python variable. Here is an example (the outputs are captured in the output Python variable):

In [18]: %%capture output
         %ls
In [19]: output.stdout
Out[19]: 0.circles    1684.circles  3437.circles  3980.circles  686.circles
         0.edges      1684.edges    3437.edges    3980.edges    686.edges
         107.circles  1912.circles  348.circles   414.circles   698.circles
         107.edges    1912.edges    348.edges     414.edges     698.edges

The %%bash cell magic is an extension of the ! shell prefix. It lets you run multiline bash code in the Notebook, as shown here:

In [20]: %%bash
         cd ..
         touch _HEY
         ls
         rm _HEY
         cd facebook
Out[20]: _HEY
         facebook
         facebook.zip
         [...]

More generally, the %%script cell magic lets you execute code with any program installed on your system. For example, assuming Haskell is installed (see https://www.haskell.org/downloads), you can easily execute Haskell code from the Notebook, as follows:

In [21]: %%script ghci
         putStrLn "Hello world!"
Out[21]: GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
         Loading package ghc-prim ... linking ... done.
         Loading package integer-gmp ... linking ... done.
         Loading package base ... linking ... done.
         Prelude> Hello world!
         Prelude> Leaving GHCi.

The ghci executable runs in a separate process, and the contents of the cell are passed to the executable's input. You can also put a full path after %%script, for example, on Linux: %%script /usr/bin/ghci.

Tip

IHaskell kernel

This way of calling external scripts is only useful for quick interactive experiments. If you want to run Haskell notebooks, you can use the IHaskell notebook for Jupyter, available at https://github.com/gibiansky/IHaskell.

Finally, the %%writefile cell magic lets you write some text in a new file, as shown here:

In [22]: %%writefile myfile.txt
         Hello world!
Out[22]: Writing myfile.txt
In [23]: !more myfile.txt
Out[23]: Hello world!

Now, let's delete the file, as follows:

In [24]: !rm myfile.txt

Note

On Windows, you need to type !del myfile.txt instead.

There are many other magic commands available. We will see several of them later in this book. Also, in Chapter 6, Customizing IPython, we will see how to create new magic commands. This is much easier than it sounds!

Refer to the following page for up-to-date documentation about all magic commands: http://www.ipython.org/ipython-doc/dev/interactive/magics.html.

Mastering tab completion

Tab completion is an incredibly useful feature in Jupyter and IPython. When you start to write something and press the Tab key on your keyboard, IPython can guess what you're trying to do, and propose a list of options that match what you have typed so far. This works for Python functions, variables, magic commands, files, and more.

Let's first make sure we are in the facebook directory (using the directory alias created previously):

In [25]: %cd fbdata
         %ls
Out[25]: (bookmark:fbdata) -> /home/cyrille/minibook/chapter1/facebook
         /home/cyrille/minibook/chapter1/facebook
         0.circles    1684.circles  3437.circles  3980.circles  686.circles
         0.edges      1684.edges    3437.edges    3980.edges    686.edges
         107.circles  1912.circles  348.circles   414.circles   698.circles
         107.edges    1912.edges    348.edges     414.edges     698.edges

Now, start typing a command and press Tab before finishing it (here, press the Tab key on your keyboard right after typing e), as follows:

!head -n5 107.e<TAB>

IPython automatically completes the command and adds the four remaining characters (dges). IPython recognized the beginning of a file name and completed the command. If there are several completion possibilities, IPython doesn't complete anything, but instead shows a list of all options. You can then choose the appropriate solution by pressing the Up or Down keys on the keyboard, and pressing Tab again. The following screenshot shows an example:

Tab completion in the Notebook

Tab completion is extremely useful when you're getting acquainted with a new Python package. For example, to quickly see all functions provided by the NetworkX package, you can type import networkx; networkx.<TAB>.

Tip

Customizing tab completion

If you're writing a Python library, you probably want to write tab-completion-aware code. Your users who work with IPython will thank you! In most cases, you have nothing to do, and tab completion will just work. In the rare cases where you use advanced dynamic techniques in a class, you can customize tab completion by implementing a __dir__(self) method that returns all attributes available in the current class instance. See this reference for more details: https://docs.python.org/3.4/library/functions.html#dir.

Writing interactive documents in the Notebook with Markdown

You can write code and text in the Notebook. Every cell is either a Markdown cell or a code cell. The Markdown cell lets you write text. Markdown is a text formatting syntax that supports headers, bold, italics, hypertext links, images, and code. In the Notebook, you can also write mathematical equations in a Markdown cell using LaTeX, a markup language widely used for equations. Finally, you can also write some HTML in a Markdown cell, and it will be interpreted correctly.

Here is an example of a paragraph in Markdown:

### New paragraph

This is *rich* **text** with [links](http://ipython.org), equations:

$$\hat{f}(\xi) = \int_{-\infty}^{+\infty} f(x)\, \mathrm{e}^{-i \xi x} dx$$

code with syntax highlighting:

    ```python
    print("Hello world!")
    ```

and images:

![This is an image](http://ipython.org/_static/IPy_header.png)

If you write this in a Markdown cell, and "play" the cell (for example, by pressing Ctrl + Enter), you will see the rendered text. The following screenshot shows the two modes of the cell:

A Markdown cell in the Notebook

By using both Markdown cells and code cells in a notebook, you can write an interactive document about any technical topic. Hence, the Notebook is not only an interface to code, it is also a platform to write documents or even books. In fact, this very book is entirely written in the Notebook!

Here are a few references about Markdown and LaTeX:

Creating interactive widgets in the Notebook

You can add interactive graphical elements called widgets in a notebook. Examples of rich graphical widgets include buttons, sliders, dropdown menus, interactive plots, as well as videos, audio files, and complete Graphical User Interfaces (GUIs). Widget support in Jupyter is still relatively experimental at this point, but we will use them at several occasions in this book. This section shows a few basic examples.

First, let's add a YouTube video in a notebook, as follows:

In [26]: from IPython.display import YouTubeVideo
         YouTubeVideo('j9YpkSX7NNM')

Following is a screenshot of a YouTube video in a notebook:

Youtube in the Notebook

The YoutubeVideo constructor accepts a YouTube identifier as input.

Next, let's show how to create a graphical control to manipulate the inputs to a Python function:

In [27]: from ipywidgets import interact
         # IPython.html.widgets before
         # IPython 4.0
         @interact(x=(0, 10))
         def square(x):
             print("The square of %d is %d." % (x, x**2))
Out[27]: 'The square of 7 is 49.'

Here is a screenshot:

Interactive widget in the Notebook

The square(x) function just prints a sentence like The square of 7 is 49. By adding the @interact decorator above the function's definition, we tell IPython to create a widget to control the function's input x. The argument x=(0, 10) is a convention to indicate that we want a slider to control an integer between 0 and 10.

This method supports other common controls like checkboxes, dropdown menus, radio buttons, push buttons, and others.


Finally, entirely customizable widgets can be created, but this requires some knowledge of web technologies such as HTML, CSS, and JavaScript. The IPython Cookbook (http://ipython-books.github.io/cookbook/) contains many examples. You can also refer to the following links for more information:

Note

Most of these references describe APIs that were introduced in IPython 3.0, but are still experimental at this point. They may not work with future versions of Jupyter and IPython.

Running Python scripts from IPython

Notebooks are mainly designed for interactive exploration, not for reusability. It is currently difficult to reuse parts of a notebook in another script or notebook. Many users just copy-paste their code, which goes against the Don't Repeat Yourself (DRY) principle.

A common practice is to put frequently used code into a Python script, for example myscript.py. Such a script can be called from the system terminal like this: python myscript.py. Python will execute the script and quit at the end. If you use the -i option, Python will start the interactive prompt when the script ends.

IPython also supports this technique; just replace python by ipython. For example: ipython -i script.py to run script.py interactively with IPython.

You can also run a script from within IPython by using the %run magic command. The script runs in an empty namespace, meaning that any variable defined in the interactive namespace is not available within the executed script. However, at the end of the execution, the control returns to IPython, and the variables defined in the script are imported into the interactive namespace. This lets you inspect the intermediate variables used in the script. If you use the -i option, the script will run in the interactive namespace. Any variable defined in the interactive session will be available in the script.

Let's also mention the similar %load magic command.

Note

A namespace is a dictionary mapping variable names to Python objects. The global namespace contains global variables, whereas the local namespace of a function contains the local variables defined in the function. In IPython, the interactive namespace contains all objects defined and imported within the current interactive session. The %who, %whos, and %who_ls magic commands give you some information about the interactive variables.

For example, let's write a script egos.py that lists all ego identifiers in the Facebook data folder. Since each filename is of the form <egoid>.<extension>, we list all files, remove the extensions, and take the sorted list of all unique identifiers. We can create this file from the Notebook, using the %%writefile cell magic as follows:

In [28]: %cd fbdata
         %cd ..
Out[28]: (bookmark:fbdata) -> /home/cyrille/minibook/chapter1/facebook
         /home/cyrille/minibook/chapter1/facebook
In [29]: %%writefile egos.py
         import sys
         import os
         # We retrieve the folder as the first positional argument
         # to the command-line call
         if len(sys.argv) > 1:
             folder = sys.argv[1]
         # We list all files in the specified folder
         files = os.listdir(folder)
         # ids contains the list of idenfitiers
         identifiers = [int(file.split('.')[0]) for file in files]
         # Finally, we remove duplicates with set(), and sort the list
         # with sorted().
         ids = sorted(set(identifiers))
Out[29]: Overwriting egos.py

This script accepts an argument folder as an input. It is retrieved from the Python script via the sys.argv list, which contains the list of arguments passed to the script via the command-line interface.

Let's execute this script in IPython using the %run magic command, as follows:

In [30]: %run egos.py facebook

Note

If you get an error when running this script, make sure that the facebook directory only contains <number>.xxx files (like 0.circles or 1684.edges).

In [31]: ids
Out[31]: [0, 107, 348, 414, 686, 698, 1684, 1912, 3437, 3980]

The ids variable created in the script is now available in the interactive namespace.

Let's see what happens if we do not specify the folder name to the script, as follows:

In [32]: folder = 'facebook'
In [33]: %run egos.py

We get an error: NameError: name 'folder' is not defined. This is because the variable folder is defined in the interactive namespace, but is not available within the script by default. We can change this behavior with the -i option, as follows:

In [34]: %run -i egos.py
In [35]: ids
Out[35]: [0, 107, 348, 414, 686, 698, 1684, 1912, 3437, 3980]

This time, the script correctly used the folder variable.

Introspecting Python objects

IPython can display detailed information about any Python object.

First, type ? after a variable name to get some information about it. For example, let's inspect NetworkX's Graph class, as follows:

In [36]: import networkx
In [37]: networkx.Graph?

This shows the docstring and other information in the Notebook pager, as shown in the following screenshot:

Typing ?? instead of ? shows even more information, including the whole source code of the Python object when it is available.

There are also several magic commands for inspecting Python objects:

  • %pdef: Displays a function definition

  • %pdoc: Displays the docstring of a Python object

  • %psource: Displays the source code of an object (function, class, or method)

  • %pfile: Displays the source code of the Python script where an object is defined

Debugging Python code

IPython makes it convenient to debug a script or an entire application. It provides interactive access to an enhanced version of the Python debugger.

First, when you encounter an exception, you can immediately use the %debug magic command to launch the IPython debugger at the exact point where the exception was raised.

If you activate the %pdb magic command, the debugger will automatically start at the very next exception. You can also start IPython with ipython --pdb.

Finally, you can run a whole script under the control of the debugger with the %run -d command. This command executes the specified script with a break point at the first line so that you can precisely control the execution flow of the script. You can also specify explicitly where to put the first breakpoint; type %run -d -b29 script.py to pause the program execution on line 29 of script.py. In all cases, you first need to type c to start the script execution.

When the debugger starts, you enter into a special prompt, as indicated by ipdb>. The program execution is then paused at a given point in the code. You can type w to display the line and stack location where the debugger has paused. At this point, you have access to all local variables and you can precisely control how you want to resume the execution. Within the debugger, several commands are available to navigate into the traceback; they are as follows:

  • u/d for going up/down into the call stack

  • s to step into the next statement

  • n to continue execution until the next line in the current function

  • r to continue execution until the current function returns

  • c to continue execution until the next breakpoint or exception

Other useful commands include:

  • p to evaluate and print any expression

  • a to obtain the arguments of the current functions

  • The ! prefix to execute any Python command within the debugger

The entire list of commands can be found in the documentation of the pdb module in Python at https://docs.python.org/3.4/library/pdb.html.

Let's also mention the IPython.embed() function that you can call anywhere in a Python script. This stops the script execution and starts IPython for debugging purposes. Leaving the embedded IPython terminal resumes the normal execution of the script.

Benchmarking Python code

The %timeit magic function lets us estimate the execution time of any Python statement. Under the hood, it uses Python's native timeit module.

In the following example, we first load an ego graph from our Facebook dataset using the NetworkX package. Then we evaluate how much time it takes to tell whether the graph is connected or not:

Let's go to the data directory, as follows:

In [38]: %cd fbdata
Out[38]: (bookmark:fbdata) -> /home/cyrille/minibook/chapter1/facebook
         /home/cyrille/minibook/chapter1/facebook

We load NetworkX, as follows:

In [39]: import networkx

We can load a graph using the read_edgelist() function, as follows:

In [40]: graph = networkx.read_edgelist('107.edges')

How big is our graph?

In [41]: len(graph.nodes()), len(graph.edges())
Out[41]: (1034, 26749)

Now let's find out whether the graph is connected or not:

In [42]: networkx.is_connected(graph)
Out[42]: True

How long did this call take?

In [43]: %timeit networkx.is_connected(graph)
Out[43]: 100 loops, best of 3: 5.92 ms per loop

Multiple calls are done in order to get more reliable time estimates. The number of calls is determined automatically, but you can use the -r and -n options to specify them directly. Type %timeit? to get more information.

Profiling Python code

The %timeit magic command gives you precious information about the total time taken by a function or a statement. This can help you find the fastest among several implementations of an algorithm, for example.

When you're finding that some code is too slow, you need to profile it before you can make it faster. Profiling gives you more than the total time taken by a function; it tells you exactly what is taking too long in your code.

The %prun magic command lets you easily profile your code. It provides a convenient interface to Python's native profile module.

Let's see a simple example. We first create a function returning the number of connected components in a file, as follows:

In [44]: import networkx
In [45]: def ncomponents(file):
             graph = networkx.read_edgelist(file)
             return networkx.number_connected_components(graph)

Now we write a function that returns the number of connected components in all graphs defined in the directory, as follows:

In [46]: import glob
         def ncomponents_files():
             return [(file, ncomponents(file))
                     for file in sorted(glob.glob('*.edges'))]

The glob module (https://docs.python.org/3.4/library/glob.html) lets us find all files matching a given pattern (here, all files with the .edges file extension).

In [47]: for file, n in ncomponents_files():
             print(file.ljust(12), n, 'component(s)')
Out[47]: 0.edges      5 component(s)
         107.edges    1 component(s)
         1684.edges   4 component(s)
         1912.edges   2 component(s)
         3437.edges   2 component(s)
         348.edges    1 component(s)
         3980.edges   4 component(s)
         414.edges    2 component(s)
         686.edges    1 component(s)
         698.edges    3 component(s)

Let's first evaluate the time taken by this function:

In [48]: %timeit ncomponents_files()
Out[48]: 1 loops, best of 3: 634 ms per loop

Now, to run the profiler, we use the %prun magic function, as follows:

In [49]: %prun -s cumtime ncomponents_files()
Out[49]: 2391070 function calls in 1.038 seconds

         Ordered by: cumulative time

         ncalls  tottime  percall  cumtime  percall filename:lineno(function)
              1    0.000    0.000    1.038    1.038 {built-in method exec}
              1    0.000    0.000    1.038    1.038 <string>:1(<module>)
             10    0.000    0.000    0.995    0.100 <string>:1(read_edgelist)
             10    0.000    0.000    0.995    0.100 decorators.py:155(_open_file)
             10    0.376    0.038    0.995    0.099 edgelist.py:174(parse_edgelist)
         170174    0.279    0.000    0.350    0.000 graph.py:648(add_edge)
         170184    0.059    0.000    0.095    0.000 edgelist.py:366(<genexpr>)
             10    0.000    0.000    0.021    0.002 connected.py:98(number_connected_components)
             35    0.001    0.000    0.021    0.001 connected.py:22(connected_components)

Let's explain what happened here. The profiler kept track of all function calls (including functions internal to NetworkX and Python) performed while our ncomponents_files() function was running. There were 2,391,070 function calls. That's a lot! Opening a file, reading and parsing every line, creating the graphs, finding the number of connected components, and so on, are operations that involve many function calls.

The profiler shows the list of all function calls (we just showed a subset here). There are many ways to sort the functions. Here, we chose to sort them by cumulative time, which is the total time spent within every function (-s cumtime option).

For every function, the profiler shows the total number of calls, and several time statistics, described here (copied verbatim from the profiler documentation):

  • tottime: the total time spent in the given function (and excluding time made in calls to sub-functions)

  • percall: the quotient of tottime divided by ncalls

  • cumtime: the cumulative time spent in this and all subfunctions

  • percall: the quotient of cumtime divided by the number of non-recursive function calls

You will find more information by typing %prun? or by looking here: https://docs.python.org/3.4/library/profile.html

Here, we see that computing the number of connected components took considerably less time than loading the graphs from the text files. Depending on the use-case, this might suggest using a more efficient file format.

There is of course much more to say about profiling and optimization. For example, it is possible to profile a function line by line, which provides an even more fine-grained profiling report. The IPython Cookbook contains many more details.

 

Summary


In this chapter, we covered everything you need to get started with Python, IPython, and the Jupyter Notebook. We detailed how to install the software, we reviewed the basics of the Python language, and we demonstrated ten of the most essential features of IPython and the Jupyter Notebook.

In the next chapter, we will use these tools to analyze real-world datasets.

About the Author
  • Cyrille Rossant

    Cyrille Rossant, PhD, is a neuroscience researcher and software engineer at University College London. He is a graduate of École Normale Supérieure, Paris, where he studied mathematics and computer science. He has also worked at Princeton University and Collège de France. While working on data science and software engineering projects, he gained experience in numerical computing, parallel computing, and high-performance data visualization. He is the author of Learning IPython for Interactive Computing and Data Visualization, Second Edition, Packt Publishing.

    Browse publications by this author
Latest Reviews (12 reviews total)
Schnell und ohne Probleme
Learning IPython for Interactive Computing and Data Visualization - Second Edition
Unlock this book and the full library FREE for 7 days
Start now