IPython Interactive Computing and Visualization Cookbook

Over 100 hands-on recipes to sharpen your skills in high-performance numerical computing and data science with Python

IPython Interactive Computing and Visualization Cookbook

Cookbook
Cyrille Rossant

Over 100 hands-on recipes to sharpen your skills in high-performance numerical computing and data science with Python
$29.99
$49.99
RRP $29.99
RRP $49.99
eBook
Print + eBook
$12.99 p/month

Get Access

Get Unlimited Access to every Packt eBook and Video course

Enjoy full and instant access to over 3000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

+ Collection
Free Sample

Book Details

ISBN 139781783284818
Paperback512 pages

About This Book

  • Leverage the new features of the IPython notebook for interactive web-based big data analysis and visualization
  • Become an expert in high-performance computing and visualization for data analysis and scientific modeling
  • A comprehensive coverage of scientific computing through many hands-on, example-driven recipes with detailed, step-by-step explanations

Who This Book Is For

Intended to anyone interested in numerical computing and data science: students, researchers, teachers, engineers, analysts, hobbyists... Basic knowledge of Python/NumPy is recommended. Some skills in mathematics will help you understand the theory behind the computational methods.

Table of Contents

Chapter 1: A Tour of Interactive Computing with IPython
Introduction
Introducing the IPython notebook
Getting started with exploratory data analysis in IPython
Introducing the multidimensional array in NumPy for fast array computations
Creating an IPython extension with custom magic commands
Mastering IPython's configuration system
Creating a simple kernel for IPython
Chapter 2: Best Practices in Interactive Computing
Introduction
Choosing (or not) between Python 2 and Python 3
Efficient interactive computing workflows with IPython
Learning the basics of the distributed version control system Git
A typical workflow with Git branching
Ten tips for conducting reproducible interactive computing experiments
Writing high-quality Python code
Writing unit tests with nose
Debugging your code with IPython
Chapter 3: Mastering the Notebook
Introduction
Teaching programming in the notebook with IPython blocks
Converting an IPython notebook to other formats with nbconvert
Adding custom controls in the notebook toolbar
Customizing the CSS style in the notebook
Using interactive widgets – a piano in the notebook
Creating a custom JavaScript widget in the notebook – a spreadsheet editor for pandas
Processing webcam images in real time from the notebook
Chapter 4: Profiling and Optimization
Introduction
Evaluating the time taken by a statement in IPython
Profiling your code easily with cProfile and IPython
Profiling your code line-by-line with line_profiler
Profiling the memory usage of your code with memory_profiler
Understanding the internals of NumPy to avoid unnecessary array copying
Using stride tricks with NumPy
Implementing an efficient rolling average algorithm with stride tricks
Making efficient array selections in NumPy
Processing huge NumPy arrays with memory mapping
Manipulating large arrays with HDF5 and PyTables
Manipulating large heterogeneous tables with HDF5 and PyTables
Chapter 5: High-performance Computing
Introduction
Accelerating pure Python code with Numba and just-in-time compilation
Accelerating array computations with Numexpr
Wrapping a C library in Python with ctypes
Accelerating Python code with Cython
Optimizing Cython code by writing less Python and more C
Releasing the GIL to take advantage of multicore processors with Cython and OpenMP
Writing massively parallel code for NVIDIA graphics cards (GPUs) with CUDA
Writing massively parallel code for heterogeneous platforms with OpenCL
Distributing Python code across multiple cores with IPython
Interacting with asynchronous parallel tasks in IPython
Parallelizing code with MPI in IPython
Trying the Julia language in the notebook
Chapter 6: Advanced Visualization
Introduction
Making nicer matplotlib figures with prettyplotlib
Creating beautiful statistical plots with seaborn
Creating interactive web visualizations with Bokeh
Visualizing a NetworkX graph in the IPython notebook with D3.js
Converting matplotlib figures to D3.js visualizations with mpld3
Getting started with Vispy for high-performance interactive data visualizations
Chapter 7: Statistical Data Analysis
Introduction
Exploring a dataset with pandas and matplotlib
Getting started with statistical hypothesis testing – a simple z-test
Getting started with Bayesian methods
Estimating the correlation between two variables with a contingency table and a chi-squared test
Fitting a probability distribution to data with the maximum likelihood method
Estimating a probability distribution nonparametrically with a kernel density estimation
Fitting a Bayesian model by sampling from a posterior distribution with a Markov chain Monte Carlo method
Analyzing data with the R programming language in the IPython notebook
Chapter 8: Machine Learning
Introduction
Getting started with scikit-learn
Predicting who will survive on the Titanic with logistic regression
Learning to recognize handwritten digits with a K-nearest neighbors classifier
Learning from text – Naive Bayes for Natural Language Processing
Using support vector machines for classification tasks
Using a random forest to select important features for regression
Reducing the dimensionality of a dataset with a principal component analysis
Detecting hidden structures in a dataset with clustering
Chapter 9: Numerical Optimization
Introduction
Finding the root of a mathematical function
Minimizing a mathematical function
Fitting a function to data with nonlinear least squares
Finding the equilibrium state of a physical system by minimizing its potential energy
Chapter 10: Signal Processing
Introduction
Analyzing the frequency components of a signal with a Fast Fourier Transform
Applying a linear filter to a digital signal
Computing the autocorrelation of a time series
Chapter 11: Image and Audio Processing
Introduction
Manipulating the exposure of an image
Applying filters on an image
Segmenting an image
Finding points of interest in an image
Detecting faces in an image with OpenCV
Applying digital filters to speech sounds
Creating a sound synthesizer in the notebook
Chapter 12: Deterministic Dynamical Systems
Introduction
Plotting the bifurcation diagram of a chaotic dynamical system
Simulating an elementary cellular automaton
Simulating an ordinary differential equation with SciPy
Simulating a partial differential equation – reaction-diffusion systems and Turing patterns
Chapter 13: Stochastic Dynamical Systems
Introduction
Simulating a discrete-time Markov chain
Simulating a Poisson process
Simulating a Brownian motion
Simulating a stochastic differential equation
Chapter 14: Graphs, Geometry, and Geographic Information Systems
Introduction
Manipulating and visualizing graphs with NetworkX
Analyzing a social network with NetworkX
Resolving dependencies in a directed acyclic graph with a topological sort
Computing connected components in an image
Computing the Voronoi diagram of a set of points
Manipulating geospatial data with Shapely and basemap
Creating a route planner for a road network
Chapter 15: Symbolic and Numerical Mathematics
Introduction
Diving into symbolic computing with SymPy
Solving equations and inequalities
Analyzing real-valued functions
Computing exact probabilities and manipulating random variables
A bit of number theory with SymPy
Finding a Boolean propositional formula from a truth table
Analyzing a nonlinear differential system – Lotka-Volterra (predator-prey) equations
Getting started with Sage

What You Will Learn

  • Code better by writing high-quality, readable, and well-tested programs; profiling and optimizing your code, and conducting reproducible interactive computing experiments
  • Master all of the new features of the IPython notebook, including the interactive HTML/JavaScript widgets
  • Analyze data with Bayesian and frequentist statistics (Pandas, PyMC, and R), and learn from data with machine learning (scikit-learn)
  • Gain valuable insights into signals, images, and sounds with SciPy, scikit-image, and OpenCV
  • Learn how to write blazingly fast Python programs with NumPy, PyTables, ctypes, Numba, Cython, OpenMP, GPU programming (CUDA and OpenCL), parallel IPython, MPI, and many more

In Detail

IPython is at the heart of the Python scientific stack. With its widely acclaimed web-based notebook, IPython is today an ideal gateway to data analysis and numerical computing in Python.

IPython Interactive Computing and Visualization Cookbook contains many ready-to-use focused recipes for high-performance scientific computing and data analysis. The first part covers programming techniques, including code quality and reproducibility; code optimization; high-performance computing through dynamic compilation, parallel computing, and graphics card programming. The second part tackles data science, statistics, machine learning, signal and image processing, dynamical systems, and pure and applied mathematics.

 

 

Read an Extract from the book

Creating a custom JavaScript widget in the notebook – a spreadsheet editor for pandas

In this recipe, we will look at how to go beyond the existing widgets provided by IPython 2.0. Specifically, we will create a custom JavaScript-based widget that communicates with the Python kernel.

We will create a basic interactive Excel-like data grid editor in the IPython notebook, compatible with pandas' DataFrame. Starting from a DataFrame object, we will be able to edit it within a GUI in the notebook. The editor is based on the Handsontable JavaScript library (http://handsontable.com). Other JavaScript data grid editors could be used as well.

Getting ready

You will need both IPython 2.0+ and the Handsontable JavaScript library for this recipe. The following are the instructions to load this Javascript library in the IPython notebook:

  1. First, go to https://github.com/handsontable/jquery-handsontable/tree/master/dist.
  2. Then, download jquery.handsontable.full.css and jquery.handsontable.full.js, and put these two files in ~\.ipython\profile_default\static\custom\.
  3. In this folder, add the following line in custom.js: require(['/static/custom/jquery.handsontable.full.js']);
  4. In this folder, add the following line in custom.css: @import "/static/custom/jquery.handsontable.full.css"
  5. Now, refresh the notebook!

How to do it...

  1. Let's import a few functions and classes as follows: In [1]: from IPython.html import widgets from IPython.display import display from IPython.utils.traitlets import Unicode
  2. We create a new widget. The value trait will contain the JSON representation of the entire table. This trait will be synchronized between Python and JavaScript, thanks to the IPython 2.0's widget machinery. In [2]: class HandsonTableWidget(widgets.DOMWidget): _view_name = Unicode('HandsonTableView', sync=True) value = Unicode(sync=True)
  3. Now, we write the JavaScript code for the widget. The three important functions that are responsible for the synchronization are as follows:
    • render is for the widget initialization
    • update is for Python to JavaScript update
    • handle_table_change is for JavaScript to Python update

    In [3]: %%javascript var table_id = 0; require(["widgets/js/widget"], function(WidgetManager){ // Define the HandsonTableView var HandsonTableView = IPython.DOMWidgetView.extend({ render: function(){ // Initialization: creation of the HTML elements // for our widget. // Add a <div> in the widget area. this.$table = $('<div />') .attr('id', 'table_' + (table_id++)) .appendTo(this.$el); // Create the Handsontable table. this.$table.handsontable({ }); }, update: function() { // Python --> Javascript update. // Get the model's JSON string, and parse it. var data = $.parseJSON(this.model.get('value')); // Give it to the Handsontable widget. this.$table.handsontable({data: data}); return HandsonTableView.__super__. update.apply(this); }, // Tell Backbone to listen to the change event // of input controls. events: {"change": "handle_table_change"}, handle_table_change: function(event) { // Javascript --> Python update. // Get the table instance. var ht = this.$table.handsontable('getInstance'); // Get the data, and serialize it in JSON. var json = JSON.stringify(ht.getData()); // Update the model with the JSON string. this.model.set('value', json); this.touch(); }, }); // Register the HandsonTableView with the widget manager. WidgetManager.register_widget_view( 'HandsonTableView', HandsonTableView); });

  4. Now, we have a synchronized table widget that we can already use. However, we would like to integrate it with pandas. To do this, we create a light wrapper around a DataFrame instance. We create two callback functions for synchronizing the pandas object with the IPython widget. Changes in the GUI will automatically trigger a change in DataFrame, but the converse is not true. We'll need to re-display the widget if we change the DataFrame instance in Python: In [4]: from io import StringIO import numpy as np import pandas as pd In [5]: class HandsonDataFrame(object): def __init__(self, df): self._df = df self._widget = HandsonTableWidget() self._widget.on_trait_change( self._on_data_changed, 'value') self._widget.on_displayed(self._on_displayed) def _on_displayed(self, e): # DataFrame ==> Widget (upon initialization) json = self._df.to_json(orient='values') self._widget.value = json def _on_data_changed(self, e, val): # Widget ==> DataFrame (called every time the # user changes a value in the widget) buf = StringIO(val) self._df = pd.read_json(buf, orient='values') def to_dataframe(self): return self._df def show(self): display(self._widget)
  5. Now, let's test all that! We first create a random DataFrame instance: In [6]: data = np.random.randint(size=(3, 5), low=100, high=900) df = pd.DataFrame(data) df Out[6]: 352 201 859 322 352 326 519 848 802 642 171 480 213 619 192
  6. We wrap it in HandsonDataFrame and show it as follows: In [7]: ht = HandsonDataFrame(df) ht.show()
  7. We can now change the values interactively, and they will be changed in Python accordingly: In [8]: ht.to_dataframe() Out[8]: 352 201 859 322 352 326 519 848 1024 642 171 480 213 619 192

How it works...

Let's explain briefly the architecture underlying the interactive Python-JavaScript communication in IPython 2.0+.

The implementation follows the Model-View-Controller (MVC) design pattern, which is popular in GUI applications. There is a model in the backend (Python kernel) that holds some data. In the frontend (browser), there are one or several views of that model. Those views are dynamically synchronized with the model. When an attribute of the model changes on Python's side, it also changes on JavaScript's side, and vice versa. We can implement Python and JavaScript functions to respond to model changes. These changes are generally triggered by a user action.

In Python, dynamic attributes are implemented as traits. These special class attributes automatically trigger callback functions when they are updated. In JavaScript, the Backbone.js MVC library is used. The communication between Python and the browser is done via Comms, a special communication protocol in IPython.

To create a new widget, we need to create a class deriving from DOMWidget. Then, we define trait attributes that can be synchronized between Python and JavaScript if sync=True is passed to the trait constructors. We can register callback functions that react to trait changes (from either Python or JavaScript), using widget.on_trait_change(callback, trait_name). The callback() function can have one of the following signatures:

callback() callback(trait_name) callback(trait_name, new_value) callback(trait_name, old_value, new_value)

In JavaScript, the render() function creates the HTML elements in the cell's widget area upon initialization. The update() method allows us to react to changes in the model in the backend side (Python). In addition, we can use Backbone.js to react to changes in the frontend (browser). By extending the widget with the {"change": "callback"} events, we tell Backbone.js to call the callback() JavaScript function as soon as the HTML input controls change. This is how we react to user-triggered actions here.

There's more...

The following are the ways this proof-of-concept could be improved:

  • Synchronizing only changes instead of synchronizing the whole array every time
    (the method used here would be slow on large tables)
  • Avoiding recreating a new DataFrame instance upon every change, but updating the same DataFrame instance in-place
  • Supporting named columns
  • Hiding the wrapper, that is, make it so that the default rich representation of DataFrame in the notebook is HandsonDataFrame
  • Implementing everything in an easy-to-use extension

Here are a few references about the widget architecture in the IPython notebook 2.0+:

 

Authors

Table of Contents

Chapter 1: A Tour of Interactive Computing with IPython
Introduction
Introducing the IPython notebook
Getting started with exploratory data analysis in IPython
Introducing the multidimensional array in NumPy for fast array computations
Creating an IPython extension with custom magic commands
Mastering IPython's configuration system
Creating a simple kernel for IPython
Chapter 2: Best Practices in Interactive Computing
Introduction
Choosing (or not) between Python 2 and Python 3
Efficient interactive computing workflows with IPython
Learning the basics of the distributed version control system Git
A typical workflow with Git branching
Ten tips for conducting reproducible interactive computing experiments
Writing high-quality Python code
Writing unit tests with nose
Debugging your code with IPython
Chapter 3: Mastering the Notebook
Introduction
Teaching programming in the notebook with IPython blocks
Converting an IPython notebook to other formats with nbconvert
Adding custom controls in the notebook toolbar
Customizing the CSS style in the notebook
Using interactive widgets – a piano in the notebook
Creating a custom JavaScript widget in the notebook – a spreadsheet editor for pandas
Processing webcam images in real time from the notebook
Chapter 4: Profiling and Optimization
Introduction
Evaluating the time taken by a statement in IPython
Profiling your code easily with cProfile and IPython
Profiling your code line-by-line with line_profiler
Profiling the memory usage of your code with memory_profiler
Understanding the internals of NumPy to avoid unnecessary array copying
Using stride tricks with NumPy
Implementing an efficient rolling average algorithm with stride tricks
Making efficient array selections in NumPy
Processing huge NumPy arrays with memory mapping
Manipulating large arrays with HDF5 and PyTables
Manipulating large heterogeneous tables with HDF5 and PyTables
Chapter 5: High-performance Computing
Introduction
Accelerating pure Python code with Numba and just-in-time compilation
Accelerating array computations with Numexpr
Wrapping a C library in Python with ctypes
Accelerating Python code with Cython
Optimizing Cython code by writing less Python and more C
Releasing the GIL to take advantage of multicore processors with Cython and OpenMP
Writing massively parallel code for NVIDIA graphics cards (GPUs) with CUDA
Writing massively parallel code for heterogeneous platforms with OpenCL
Distributing Python code across multiple cores with IPython
Interacting with asynchronous parallel tasks in IPython
Parallelizing code with MPI in IPython
Trying the Julia language in the notebook
Chapter 6: Advanced Visualization
Introduction
Making nicer matplotlib figures with prettyplotlib
Creating beautiful statistical plots with seaborn
Creating interactive web visualizations with Bokeh
Visualizing a NetworkX graph in the IPython notebook with D3.js
Converting matplotlib figures to D3.js visualizations with mpld3
Getting started with Vispy for high-performance interactive data visualizations
Chapter 7: Statistical Data Analysis
Introduction
Exploring a dataset with pandas and matplotlib
Getting started with statistical hypothesis testing – a simple z-test
Getting started with Bayesian methods
Estimating the correlation between two variables with a contingency table and a chi-squared test
Fitting a probability distribution to data with the maximum likelihood method
Estimating a probability distribution nonparametrically with a kernel density estimation
Fitting a Bayesian model by sampling from a posterior distribution with a Markov chain Monte Carlo method
Analyzing data with the R programming language in the IPython notebook
Chapter 8: Machine Learning
Introduction
Getting started with scikit-learn
Predicting who will survive on the Titanic with logistic regression
Learning to recognize handwritten digits with a K-nearest neighbors classifier
Learning from text – Naive Bayes for Natural Language Processing
Using support vector machines for classification tasks
Using a random forest to select important features for regression
Reducing the dimensionality of a dataset with a principal component analysis
Detecting hidden structures in a dataset with clustering
Chapter 9: Numerical Optimization
Introduction
Finding the root of a mathematical function
Minimizing a mathematical function
Fitting a function to data with nonlinear least squares
Finding the equilibrium state of a physical system by minimizing its potential energy
Chapter 10: Signal Processing
Introduction
Analyzing the frequency components of a signal with a Fast Fourier Transform
Applying a linear filter to a digital signal
Computing the autocorrelation of a time series
Chapter 11: Image and Audio Processing
Introduction
Manipulating the exposure of an image
Applying filters on an image
Segmenting an image
Finding points of interest in an image
Detecting faces in an image with OpenCV
Applying digital filters to speech sounds
Creating a sound synthesizer in the notebook
Chapter 12: Deterministic Dynamical Systems
Introduction
Plotting the bifurcation diagram of a chaotic dynamical system
Simulating an elementary cellular automaton
Simulating an ordinary differential equation with SciPy
Simulating a partial differential equation – reaction-diffusion systems and Turing patterns
Chapter 13: Stochastic Dynamical Systems
Introduction
Simulating a discrete-time Markov chain
Simulating a Poisson process
Simulating a Brownian motion
Simulating a stochastic differential equation
Chapter 14: Graphs, Geometry, and Geographic Information Systems
Introduction
Manipulating and visualizing graphs with NetworkX
Analyzing a social network with NetworkX
Resolving dependencies in a directed acyclic graph with a topological sort
Computing connected components in an image
Computing the Voronoi diagram of a set of points
Manipulating geospatial data with Shapely and basemap
Creating a route planner for a road network
Chapter 15: Symbolic and Numerical Mathematics
Introduction
Diving into symbolic computing with SymPy
Solving equations and inequalities
Analyzing real-valued functions
Computing exact probabilities and manipulating random variables
A bit of number theory with SymPy
Finding a Boolean propositional formula from a truth table
Analyzing a nonlinear differential system – Lotka-Volterra (predator-prey) equations
Getting started with Sage

Book Details

ISBN 139781783284818
Paperback512 pages
Read More