Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Learning IPython for Interactive Computing and Data Visualization, Second Edition

You're reading from  Learning IPython for Interactive Computing and Data Visualization, Second Edition

Product type Book
Published in Oct 2015
Publisher
ISBN-13 9781783986989
Pages 200 pages
Edition 1st Edition
Languages
Author (1):
Cyrille Rossant Cyrille Rossant
Profile icon Cyrille Rossant

Chapter 3. Numerical Computing with NumPy

NumPy is the library that underlies the entire SciPy/PyData ecosystem. NumPy provides a multidimensional array data type that is widely used in numerical computing.

In this chapter, we will use NumPy on data analysis and scientific modeling examples, covering the following topics:

  • A primer to vector computing

  • Creating and loading arrays

  • Basic array manipulations

  • Computing with NumPy arrays

A primer to vector computing


Vector computing is about efficiently performing mathematical operations on numerical arrays. Many problems in science and engineering actually consist of a sequence of such operations.

This section introduces and demonstrates the multidimensional array data type for numerical computing.

Multidimensional arrays

What is a multidimensional array? Consider a vector containing 1000 real numbers. It has one dimension, since numbers are stored along a single axis. Now, consider a matrix with 1000 rows and 1000 columns. It contains 1,000,000 numbers. Because it has two dimensions, you need to specify both the row and column to refer to a specific number.

More generally, an n-dimensional array, also called ndarray, is an n-dimensional matrix (or tensor). Every number is identified by n indices (i_1, ... i_n).

Many types of real-world data can be represented as ndarrays:

  • The evolution of a stock exchange price is a 1D array (vector) with one value per day (or per hour, per...

Creating and loading arrays


In this section, we will see how to create and load NumPy arrays.

Creating arrays

First, there are several NumPy functions for creating common types of arrays. For example, np.zeros(shape) creates an array containing only zeros. The shape argument is a tuple giving the size of every axis. Hence, np.zeros((3, 4)) creates an array of size (3, 4) (note the double parentheses, because we pass a tuple to the function).

Here are some further examples:

In [1]: import numpy as np
        print("ones", np.ones(5))
        print("arange", np.arange(5))
        print("linspace", np.linspace(0., 1., 5))
        print("random", np.random.uniform(size=3))
        print("custom", np.array([2, 3, 5]))
Out[1]: ones [ 1.  1.  1.  1.  1.]
        arange [0 1 2 3 4]
        linspace [ 0.    0.25  0.5   0.75  1.  ]
        random [ 0.68361911  0.33585308  0.70733934]
        custom [2 3 5]

The np.arange() and np.linspace() functions create arrays with regularly spaced numbers. The np...

Basic array manipulations


Let's see some basic array manipulations around multiplication tables.

In [1]: import numpy as np

We first create an array of integers between 1 and 10, as shown here:

In [2]: x = np.arange(1, 11)
In [3]: x
Out[3]: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Note that in np.arange(start, end), start is included while end is excluded.

To create our multiplication table, we first need to transform x into a row and column vector. Our vector x is a 1D array, whereas row and column vectors are 2D arrays (also known as matrices). There are many ways to transform a 1D array to a 2D array. We will see the two most common methods here.

The first method is to use reshape():

In [4]: x_row = x.reshape((1, -1))
        x_row
Out[4]: array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]])

The reshape() method takes the new shape as parameter. The total number of elements must be unchanged. For example, reshaping a (2, 3) array to a (5,) array would raise an error. The number -1 can...

Computing with NumPy arrays


We now get to the substance of array programming with NumPy. We will perform manipulations and computations on ndarrays.

Let's first import NumPy, pandas, matplotlib, and seaborn:

In [1]: import numpy as np
        import pandas as pd
        import matplotlib.pyplot as plt
        import seaborn as sns
        %matplotlib inline

We load the NYC taxi dataset with pandas:

In [2]: data = pd.read_csv('../chapter2/data/nyc_data.csv',
                           parse_dates=['pickup_datetime',
                                        'dropoff_datetime'])

We get the pickup and dropoff locations of the taxi rides as ndarrays, using the .values attribute of pandas DataFrames:

In [3]: pickup = data[['pickup_longitude', 'pickup_latitude']].values
        dropoff = data[['dropoff_longitude',
                        'dropoff_latitude']].values
        pickup
Out[3]: array([[-73.955925,  40.781887],
               [-74.005501,  40.745735],
               [-73.969955,  40.79977...

Summary


In this chapter, we introduced NumPy and the ndarray structure. We explained the main concepts of array computing and the performance benefits it brings over Python loops. We also showed how to use NumPy in conjunction with pandas for advanced data analysis tasks.

In the next chapter, we will explore several options for plotting, visualization, and graphical interfaces.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Learning IPython for Interactive Computing and Data Visualization, Second Edition
Published in: Oct 2015 Publisher: ISBN-13: 9781783986989
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}