Packt+ | Advance your knowledge in tech

You're reading from Learning IPython for Interactive Computing and Data Visualization, Second Edition

Product type Book

Published in Oct 2015

Publisher

ISBN-13 9781783986989

Pages 200 pages

Edition 1st Edition

Languages

Python

Concepts

Scientific Computing

Author (1):

Cyrille Rossant

Table of Contents (13) Chapters

Learning IPython for Interactive Computing and Data Visualization Second Edition

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Getting Started with IPython

Interactive Data Analysis with pandas

Numerical Computing with NumPy

Interactive Plotting and Graphical Interfaces

High-Performance and Parallel Computing

Customizing IPython

Index

Chapter 3. Numerical Computing with NumPy

NumPy is the library that underlies the entire SciPy/PyData ecosystem. NumPy provides a multidimensional array data type that is widely used in numerical computing.

In this chapter, we will use NumPy on data analysis and scientific modeling examples, covering the following topics:

A primer to vector computing
Creating and loading arrays
Basic array manipulations
Computing with NumPy arrays

A primer to vector computing

Vector computing is about efficiently performing mathematical operations on numerical arrays. Many problems in science and engineering actually consist of a sequence of such operations.

This section introduces and demonstrates the multidimensional array data type for numerical computing.

Multidimensional arrays

What is a multidimensional array? Consider a vector containing 1000 real numbers. It has one dimension, since numbers are stored along a single axis. Now, consider a matrix with 1000 rows and 1000 columns. It contains 1,000,000 numbers. Because it has two dimensions, you need to specify both the row and column to refer to a specific number.

More generally, an n-dimensional array, also called ndarray, is an n-dimensional matrix (or tensor). Every number is identified by n indices (i_1, ... i_n).

Many types of real-world data can be represented as ndarrays:

The evolution of a stock exchange price is a 1D array (vector) with one value per day (or per hour, per...

Creating and loading arrays

In this section, we will see how to create and load NumPy arrays.

Creating arrays

First, there are several NumPy functions for creating common types of arrays. For example, np.zeros(shape) creates an array containing only zeros. The shape argument is a tuple giving the size of every axis. Hence, np.zeros((3, 4)) creates an array of size (3, 4) (note the double parentheses, because we pass a tuple to the function).

Here are some further examples:

In [1]: import numpy as np
        print("ones", np.ones(5))
        print("arange", np.arange(5))
        print("linspace", np.linspace(0., 1., 5))
        print("random", np.random.uniform(size=3))
        print("custom", np.array([2, 3, 5]))
Out[1]: ones [ 1.  1.  1.  1.  1.]
        arange [0 1 2 3 4]
        linspace [ 0.    0.25  0.5   0.75  1.  ]
        random [ 0.68361911  0.33585308  0.70733934]
        custom [2 3 5]

The np.arange() and np.linspace() functions create arrays with regularly spaced numbers. The np...

Basic array manipulations

Let's see some basic array manipulations around multiplication tables.

In [1]: import numpy as np

We first create an array of integers between 1 and 10, as shown here:

In [2]: x = np.arange(1, 11)
In [3]: x
Out[3]: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Note that in np.arange(start, end), start is included while end is excluded.

To create our multiplication table, we first need to transform x into a row and column vector. Our vector x is a 1D array, whereas row and column vectors are 2D arrays (also known as matrices). There are many ways to transform a 1D array to a 2D array. We will see the two most common methods here.

The first method is to use reshape():

In [4]: x_row = x.reshape((1, -1))
        x_row
Out[4]: array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]])

The reshape() method takes the new shape as parameter. The total number of elements must be unchanged. For example, reshaping a (2, 3) array to a (5,) array would raise an error. The number -1 can...

Computing with NumPy arrays

We now get to the substance of array programming with NumPy. We will perform manipulations and computations on ndarrays.

Let's first import NumPy, pandas, matplotlib, and seaborn:

In [1]: import numpy as np
        import pandas as pd
        import matplotlib.pyplot as plt
        import seaborn as sns
        %matplotlib inline

We load the NYC taxi dataset with pandas:

In [2]: data = pd.read_csv('../chapter2/data/nyc_data.csv',
                           parse_dates=['pickup_datetime',
                                        'dropoff_datetime'])

We get the pickup and dropoff locations of the taxi rides as ndarrays, using the .values attribute of pandas DataFrames:

In [3]: pickup = data[['pickup_longitude', 'pickup_latitude']].values
        dropoff = data[['dropoff_longitude',
                        'dropoff_latitude']].values
        pickup
Out[3]: array([[-73.955925,  40.781887],
               [-74.005501,  40.745735],
               [-73.969955,  40.79977...

Summary

In this chapter, we introduced NumPy and the ndarray structure. We explained the main concepts of array computing and the performance benefits it brings over Python loops. We also showed how to use NumPy in conjunction with pandas for advanced data analysis tasks.

In the next chapter, we will explore several options for plotting, visualization, and graphical interfaces.