Reader small image

You're reading from  Learning NumPy Array

Product typeBook
Published inJun 2014
Reading LevelIntermediate
Publisher
ISBN-139781783983902
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Ivan Idris
Ivan Idris
author image
Ivan Idris

Ivan Idris has an MSc in experimental physics. His graduation thesis had a strong emphasis on applied computer science. After graduating, he worked for several companies as a Java developer, data warehouse developer, and QA analyst. His main professional interests are business intelligence, big data, and cloud computing. Ivan Idris enjoys writing clean, testable code and interesting technical articles. Ivan Idris is the author of NumPy 1.5. Beginner's Guide and NumPy Cookbook by Packt Publishing.
Read more about Ivan Idris

Right arrow

Chapter 2. NumPy Basics

After installing NumPy and getting some code to work, it's time to cover NumPy basics. This chapter introduces you to the fundamentals of NumPy and arrays. At the end of this chapter you will have a basic understanding of NumPy arrays and their associated functions.

The topics that we shall cover in this chapter are as follows:

  • Data types

  • Array types

  • Type conversions

  • Creating arrays

  • Indexing

  • Fancy indexing

  • Slicing

  • Manipulating shapes

The NumPy array object


NumPy has a multidimensional array object called ndarray. It consists of two parts as follows:

  • The actual data

  • Some metadata describing the data

The majority of array operations leave the raw data untouched. The only aspect that changes is the metadata.

We have already learned in the previous chapter how to create an array using the arange() function. Actually, we created a one-dimensional array that contained a set of numbers. The ndarray object can have more than one dimension.

The advantages of using NumPy arrays

A NumPy array is a general homogeneous array—the items in an array have to be of the same type (there is a special array type that is heterogeneous). The advantage is that if we know that the items in an array are of the same type, it is easy to determine the storage size required for the array. NumPy arrays can perform vectorized operations working on a whole array. Contrast this to Python lists, where normally you have to loop through the list and perform operations...

Creating a multidimensional array


Now that we know how to create a vector, we are ready to create a multidimensional NumPy array. After we create the matrix, we will again want to display its shape (see the arrayattributes.py file in the Chapter02 folder of this book's code bundle), as shown in the following code snippets:

  • To create a multidimensional array, see the following code:

    In: m = array([arange(2), arange(2)])
    In: m
    Out:
    array([[0, 1],[0, 1]])
  • To display the array shape, see the following lines of code:

    In: m.shape
    Out: (2, 2)

We created a 2 x 2 array with the arange() function. Without any warning, the array() function appeared on the stage.

The array() function creates an array from an object that you give to it. The object needs to be array-like, for instance, a Python list. In the preceding example, we passed in a list of arrays. The object is the only required argument of the array() function. NumPy functions tend to have a lot of optional arguments with predefined defaults.

Selecting array elements


From time to time, we will want to select a particular element of an array. We will take a look at how to do this, but first, let's create a 2 x 2 matrix again (see the elementselection.py file in the Chapter02 folder of this book's code bundle):

In: a = array([[1,2],[3,4]])
In: a
Out:
array([[1, 2],       [3, 4]])

The matrix was created this time by passing a list of lists to the array() function. We will now select each item of the matrix one at a time, as shown in the following code snippet. Remember, the indices are numbered starting from 0.

In: a[0,0]
Out: 1
In: a[0,1]
Out: 2
In: a[1,0]
Out: 3
In: a[1,1]
Out: 4

As you can see, selecting elements of the array is pretty simple. For the array a, we just use the notation a[m,n], where m and n are the indices of the item in the array.

NumPy numerical types


Python has an integer type, a float type, and a complex type; however, this is not enough for scientific computing. In practice, we need even more data types with varying precision, and therefore, different memory size of the type. For this reason, NumPy has a lot more data types. The majority of NumPy numerical types end with a number. This number indicates the number of bits associated with the type. The following table (adapted from the NumPy user guide) gives an overview of NumPy numerical types:

Creating a record data type


A record data type is a heterogeneous data type—think of it as representing a row in a spreadsheet or a database. To give an example of a record data type, we will create a record for a shop inventory. This record contains the name of an item represented by a 40-character string, the number of items in the store represented by a 32-bit integer, and finally, the price of the item represented by a 32-bit float. The following steps show how to create a record data type (see the record.py file in the Chapter02 folder of this book's code bundle):

  1. To create a record, check the following code snippet:

    In: t = dtype([('name', str_, 40), ('numitems', int32), ('price', float32)])
    In: t
    Out: dtype([('name', '|S40'), ('numitems', '<i4'), ('price', '<f4')])
  2. To view the type of the field, check the following code snippet:

    In: t['name']
    Out: dtype('|S40')

If you don't give the array() function a data type, it will assume that it is dealing with floating point numbers. To create...

One-dimensional slicing and indexing


Slicing of one-dimensional NumPy arrays works just like slicing of Python lists. We can select a piece of an array from the index 3 to 7 that extracts the elements 3 through 6 (see the slicing1d.py file in the Chapter02 folder of this book's code bundle), as shown in the following code snippet:

In: a = arange(9)
In: a[3:7]
Out: array([3, 4, 5, 6])

We can select elements from the index 0 to 7 with a step of two, as shown in the following lines of code:

In: a[:7:2]
Out: array([0, 2, 4, 6])

Just as in Python, we can use negative indices and reverse the array, as shown in the following lines of code:

In: a[::-1]
Out: array([8, 7, 6, 5, 4, 3, 2, 1, 0])

Manipulating array shapes


Another recurring task is flattening of arrays. Flattening in this context means transforming a multidimensional array into a one-dimensional array. In this example, we will demonstrate a number of ways to manipulate array shapes starting with flattening:

  • ravel(): We can accomplish flattening with the ravel() function (see the shapemanipulation.py file in the Chapter02 folder of this book's code bundle), as shown in the following code:

    In: b
    Out:
    array([[[ 0,  1,  2,  3],[ 4,  5,  6,  7],[ 8,  9, 10, 11]],[[12, 13, 14, 15],[16, 17, 18, 19],[20, 21, 22, 23]]])
    In: b.ravel()
    Out:
    array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])
  • flatten(): The appropriately-named function, flatten(), does the same as ravel(), but flatten() always allocates new memory, whereas ravel() might return a view of an array. This means that we can directly manipulate the array as follows:

    In: b.flatten()
    Out:
    array([ 0,  1,  2,  3,  4,  5...

Creating views and copies


In the example about the ravel() function, views were mentioned. Views should not be confused with the concept of database views. Views in the NumPy world are not read-only, and you don't have the possibility to protect the underlying data. It is important to know when we are dealing with a shared array view and when we have a copy of array data. A slice, for instance, will create a view. This means that if you assign a slice to a variable and then change the underlying array, the value of this variable will change. We will create an array from the famous Lena image, copy the array, create a view, and at the end, modify the view. The Lena image array comes from a SciPy function.

  1. To create a copy of the Lena array, the following line of code is used:

    acopy = lena.copy()
  2. Now, to create a view of the array, use the following line of code:

    aview = lena.view()
  3. Set all the values of the view to 0 with a flat iterator, as follows:

    aview.flat = 0

The end result is that only one...

Fancy indexing


Fancy indexing is indexing that does not involve integers or slices, which is normal indexing. In this section, we will apply fancy indexing to set the diagonal values of the Lena image to 0. This will draw black lines along the diagonals, crossing it through, not because there is something wrong with the image, but just as an exercise. Perform the following steps for fancy indexing:

  1. Set the values of the first diagonal to 0. To set the diagonal values to 0, we need to define two different ranges for the x and y values as follows:

    lena[range(xmax), range(ymax)] = 0
  2. Now, set the values of the other diagonal to 0. To set the values of the other diagonal, we require a different set of ranges, but the principles stay the same, as follows:

    lena[range(xmax-1,-1,-1), range(ymax)] = 0

At the end we get the following image with the diagonals crossed out:

The following code for this section is without comments. The complete code for this is in the fancy.py file in the Chapter02 folder of...

Indexing with a list of locations


Let's use the ix_() function to shuffle the Lena image. This function creates a mesh from multiple sequences. As arguments, we give one-dimensional sequences, and the function returns a tuple of NumPy arrays. For example, check the following code snippet:

In : ix_([0,1], [2,3])
Out:
(array([[0], [1]]), array([[2, 3]]))

To index the array with a list of locations, perform the following steps:

  1. Shuffle the array indices. Create a random indices array with the shuffle() function of the numpy.random module, as shown in the following lines of code. The function changes the array inplace by the way.

    def shuffle_indices(size):
       arr = np.arange(size)
       np.random.shuffle(arr)
    
       return arr
  2. Now plot the shuffled indices as follows:

    plt.imshow(lena[np.ix_(xindices, yindices)])

What we get is a completely scrambled Lena, as shown in the following image:

The following code for this section is without comments. The complete code for this can be found in the ix.py file in the...

Indexing arrays with Booleans


Boolean indexing is indexing based on a Boolean array and falls in the category of fancy indexing. Since Boolean indexing is a form of fancy indexing, the way it works is basically the same. This means that indexing happens with the help of a special iterator object. Perform the following steps to index an array:

  1. First, we create an image with dots on the diagonal. This is in some way similar to the Fancy indexing section. This time we select modulo four points on the diagonal of the image, as shown in the following code snippet:

    def get_indices(size):
       arr = np.arange(size)
       return arr % 4 == 0
  2. Then we just apply this selection and plot the points, as shown in the following code snippet:

    lena1 = lena.copy() 
    xindices = get_indices(lena.shape[0])
    yindices = get_indices(lena.shape[1])
    lena1[xindices, yindices] = 0
    plt.subplot(211)
    plt.imshow(lena1)
  3. Select array values between a quarter and three-quarters of the maximum value, and set them to 0, as shown in the...

Stride tricks for Sudoku


We can do even more fancy things with NumPy. The ndarray class has a field, strides, which is a tuple indicating the number of bytes to step in each dimension when going through an array. Sudoku is a popular puzzle originally from Japan; although it was known in a similar form before in other countries. If you don't know about Sudoku, it's maybe better that way because it is highly addictive. Let's apply some stride tricks to the problem of splitting a Sudoku puzzle to the 3 x 3 squares it is composed of:

  1. First define the Sudoku puzzle array, as shown in the following code snippet. This one is filled with the contents of the actual solved Sudoku puzzle (part of the array is omitted for brevity).

    sudoku = np.array([[2, 8, 7, 1, 6, 5, 9, 4, 3],[9, 5, 4, 7, 3, 2, 1, 6, 8],…[7, 3, 6, 2, 8, 4, 5, 1, 9]])
  2. Now calculate the strides. The itemsize field of ndarray gives us the number of bytes in an array. itemsize calculates the strides as follows:

    strides = sudoku.itemsize ...

Broadcasting arrays


In a nutshell, NumPy tries to perform an operation even though the operands do not have the same shape. In this section, we will multiply an array and a scalar. The scalar is extended to the shape of an array operand, and then the multiplication is performed. We will download an audio file and make a new version that is quieter:

  1. First, read the WAV file. We will use standard Python code to download an audio file of Austin Powers saying "Smashing, baby". SciPy has a wavfile module that allows you to load sound data or generate WAV files. If SciPy is installed, then we should already have this module. The read() function returns a data array and sample rate. In this example, we only care about the data.

    sample_rate, data = scipy.io.wavfile.read(WAV_FILE)
  2. Plot the original WAV data with Matplotlib. Give the subplot the title, Original, as shown in the following lines of code:

    plt.subplot(2, 1, 1)
    plt.title("Original")
    plt.plot(data)
  3. Now create a new array. We will use NumPy to...

Summary


We learned a lot in this chapter about the NumPy fundamentals: data types and arrays. Arrays have several attributes describing them. We learned that one of these attributes is the data type which, in NumPy, is represented by a full-fledged object.

NumPy arrays can be sliced and indexed in an efficient manner, just as in the case of Python lists. NumPy arrays have the added ability of working with multiple dimensions.

The shape of an array can be manipulated in many ways, such as stacking, resizing, reshaping, and splitting. A great number of convenience functions for shape manipulation were demonstrated in this chapter.

Having learned about the basics, it's time to move on to data analysis with commonly used functions in Chapter 3, Basic Data Analysis with NumPy. This includes the usage of basic statistical and mathematical functions.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning NumPy Array
Published in: Jun 2014Publisher: ISBN-13: 9781783983902
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ivan Idris

Ivan Idris has an MSc in experimental physics. His graduation thesis had a strong emphasis on applied computer science. After graduating, he worked for several companies as a Java developer, data warehouse developer, and QA analyst. His main professional interests are business intelligence, big data, and cloud computing. Ivan Idris enjoys writing clean, testable code and interesting technical articles. Ivan Idris is the author of NumPy 1.5. Beginner's Guide and NumPy Cookbook by Packt Publishing.
Read more about Ivan Idris

Type

Description

bool

This stores boolean (True or False) as a bit

inti

This is a platform integer (normally either int32 or int64)

int8

This is an integer ranging from-128 to 127

int16

This is an integer ranging from -32768 to 32767

int32

This is an integer ranging from -2 ** 31 to 2 ** 31 -1

int64

This is an integer ranging from -2 ** 63 to 2 ** 63 -1

uint8

This is an unsigned integer ranging from 0 to 255

uint16

This is an unsigned integer...