Reader small image

You're reading from  Learning IPython for Interactive Computing and Data Visualization, Second Edition

Product typeBook
Published inOct 2015
Reading LevelBeginner
Publisher
ISBN-139781783986989
Edition1st Edition
Languages
Right arrow
Author (1)
Cyrille Rossant
Cyrille Rossant
author image
Cyrille Rossant

Cyrille Rossant, PhD, is a neuroscience researcher and software engineer at University College London. He is a graduate of École Normale Supérieure, Paris, where he studied mathematics and computer science. He has also worked at Princeton University and Collège de France. While working on data science and software engineering projects, he gained experience in numerical computing, parallel computing, and high-performance data visualization. He is the author of Learning IPython for Interactive Computing and Data Visualization, Second Edition, Packt Publishing.
Read more about Cyrille Rossant

Right arrow

Chapter 5. High-Performance and Parallel Computing

As an interpreted and dynamic language, Python is slower than C, C++, or Fortran, especially when using loops. Thus, numerical algorithms written in pure Python are generally too slow to be useful. As we saw in Chapter 3, Numerical Computing with NumPy, NumPy solves this problem by offering fast vector computations on array structures.

Some algorithms cannot be easily vectorized with NumPy. Using Python loops is then required. The two main solutions to make loops fast in a context of numerical computing are the following: using a JIT compiler like Numba, or using Cython to translate these loops to C.

Another general method for making computations faster is to distribute jobs across the multiple processors on a multicore computer.

In this chapter, we will cover all of these topics:

  • Accelerating Python code with Numba

  • Writing C in Python with Cython

  • Distributing tasks on several cores with IPython.parallel

  • Further high-performance computing techniques...

Accelerating Python code with Numba


When it is too difficult or impossible to vectorize an algorithm, you often need to use Python loops. However, Python loops are slow. Fortunately, Numba provides a Just-In-Time (JIT) compiler that can compile pure Python code straight to machine code thanks to the LLVM compiler architecture. This can result in massive speedups.

In this section, we'll see how to use Numba to accelerate a mathematical modeling simulation.

To install numba, just type conda install numba on the command-line.

Let's first import a few packages:

In [1]: import math
        import random
        import numpy as np
        from numba import jit, vectorize, float64
        import matplotlib.pyplot as plt
        import seaborn
        %matplotlib inline

Random walk

We will simulate a random walk with jumps. A particle is on the real line, starting at 0. At every time step, the particle makes a step to the right or to the left. If the particle crosses a threshold, it is reset at its...

Writing C in Python with Cython


Cython is a Python library that lets you combine C and Python in various ways. There are two main use-cases:

  • Wrapping a C/C++ library in Python

  • Optimizing your Python code by statically compiling it to C

In this section, we will demonstrate the second use-case. You will find an example of the first use-case in the IPython Cookbook and at http://docs.cython.org/src/tutorial/index.html.

Installing Cython and a C compiler for Python

If you use Anaconda, you should already have Cython (you can always do conda install cython to check).

For Cython to work, you need a C compiler compatible with your version of Python. This is much easier on Unix systems. Here are the instructions given at http://docs.cython.org/src/quickstart/install.html:

  • On Linux, you can install the GNU C Compiler (gcc) via the OS package manager. On Ubuntu or Debian, for example, type sudo apt-get install build-essential.

  • On Mac OS X, you can install Apple's Xcode from http://developer.apple.com.

  • On...

Distributing tasks on several cores with IPython.parallel


In the previous sections, we covered a few methods to accelerate Python code. Here, we will see how to run multiple tasks in parallel on a multicore computer. IPython implements highly-powerful and user-friendly facilities for interactive parallel computing in the Notebook.

We first need to install ipyparallel (also called IPython.parallel) with conda install ipyparallel. Next, let's import NumPy and ipyparallel:

In [1]: import numpy as np
        # ipyparallel was IPython.parallel before IPython 4.0
        from ipyparallel import Client

To use IPython.parallel, we need to launch a few engines.

The first way to do it is to run ipcluster start in the terminal.

You can also launch engines from the Notebook dashboard. However, you first need to add c.NotebookApp.server_extensions.append('ipyparallel.nbextension') in the file ~/.jupyter/jupyter_notebook_config.py (you may need to create this file). Then, from the Notebook dashboard (accessible...

Further high-performance computing techniques


There are many other high-performance computing techniques than those covered in this chapter. The IPython Cookbook contains many more details. Here is an overview of some of these other techniques:

MPI

The Message Passing Interface, or MPI, defines communication protocols for high-performance distributed systems. IPython.parallel has native support for MPI. Here are some other references:

Distributed computing

There are many frameworks for distributed computing and big data analysis in Python.

Summary


In this chapter, we covered some of the main high-performance computing methods in Python. Numba is one of the easiest and most efficient options. Cython is useful with more complex use-cases and when it is necessary to leverage C/C++ code. Also, IPython.parallel allows us to leverage multicore CPUs or multiple computers for independent tasks. Finally, we discussed further high-performance computing techniques.

In the next chapter, we will explore a few customization options in IPython and the Notebook.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning IPython for Interactive Computing and Data Visualization, Second Edition
Published in: Oct 2015Publisher: ISBN-13: 9781783986989
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Cyrille Rossant

Cyrille Rossant, PhD, is a neuroscience researcher and software engineer at University College London. He is a graduate of École Normale Supérieure, Paris, where he studied mathematics and computer science. He has also worked at Princeton University and Collège de France. While working on data science and software engineering projects, he gained experience in numerical computing, parallel computing, and high-performance data visualization. He is the author of Learning IPython for Interactive Computing and Data Visualization, Second Edition, Packt Publishing.
Read more about Cyrille Rossant