NumPy Cookbook — Save 50%
Over 70 interesting recipes for learning the Python open source mathematical library NumPy with this book and ebook.
In this article by Ivan Idris, the author of NumPy Cookbook, we will learn some of NumPy's more advanced and tricky indexing techniques. NumPy has very efficient arrays that are easy to use due to their powerful indexing mechanism. NumPy is famous for its efficient arrays. This fame is partly due to the ease of indexing. We will demonstrate advanced indexing tricks using images.
In this article, we will cover:

Installing SciPy

Installing PIL

Resizing images

Comparing views and copies

Flipping Lena

Fancy indexing

Indexing with a list of locations

Indexing with booleans

Stride tricks for Sudoku

Broadcasting arrays
Before diving into indexing, we will install the necessary software — SciPy and PIL. Some of the examples in this article will involve manipulating images. In order to do that, we will require the Python Image Library (PIL); but don't worry, instructions and pointers to help you install PIL and other necessary Python software are given throughout the article, when necessary.
(For more resources related to this topic, see here.)
Installing SciPy
SciPy is the scientific Python library and is closely related to NumPy. In fact, SciPy and NumPy used to be one and the same project many years ago. In this recipe, we will install SciPy.
How to do it...
In this recipe, we will go through the steps for installing SciPy.

Installing from source: If you have Git installed, you can clone the SciPy repository using the following command:
git clone https://github.com/scipy/scipy.git
python setup.py build
python setup.py install userThis installs to your home directory and requires Python 2.6 or higher.
Before building, you will also need to install the following packages on which SciPy depends:

BLAS and LAPACK libraries

C and Fortran compilers
There is a chance that you have already installed this software as a part of the NumPy installation.


Installing SciPy on Linux: Most Linux distributions have SciPy packages. We will go through the necessary steps for some of the popular Linux distributions:

In order to install SciPy on Red Hat, Fedora, and CentOS, run the following instructions from the command line:
yum install pythonscipy

In order to install SciPy on Mandriva, run the following command line instruction:
urpmi pythonscipy

In order to install SciPy on Gentoo, run the following command line instruction:
sudo emerge scipy

On Debian or Ubuntu, we need to type the following:
sudo aptget install pythonscipy


Installing SciPy on Mac OS X: Apple Developer Tools (XCode) is required, because it contains the BLAS and LAPACK libraries. It can be found either in the App Store, or in the installation DVD that came with your Mac, or you can get the latest version from Apple Developer's connection at https://developer.apple.com/technologies/tools/. Make sure that everything, including all the optional packages is installed.
You probably already have a Fortran compiler installed for NumPy. The binaries for gfortran can be found at http://r.research.att.com/tools/.

Installing SciPy using easy_install or pip: Install with either of the following two commands:
sudo pip install scipy
easy_install scipy 
Installing on Windows: If you have Python installed already, the preferred method is to download and use the binary distribution. Alternatively, you may want to install the Enthought Python distribution, which comes with other scientific Python software packages.

Check your installation: Check the SciPy installation with the following code:
import scipy print scipy.__version__ print scipy.__file__
This should print the correct SciPy version.
How it works...
Most package managers will take care of any dependencies for you. However, in some cases, you will need to install them manually. Unfortunately, this is beyond the scope of this book. If you run into problems, you can ask for help at:

The #scipy IRC channel of freenode, or

The SciPy mailing lists at http://www.scipy.org/Mailing_Lists
Installing PIL
PIL, the Python imaging library, is a prerequisite for the image processing recipes in this article.
How to do it...
Let's see how to install PIL.

Installing PIL on Windows: Install using the Windows executable from the PIL website http://www.pythonware.com/products/pil/.

Installing on Debian or Ubuntu: On Debian or Ubuntu, install PIL using the following command:
sudo aptget install pythonimaging

Installing with easy_install or pip: At the t ime of writing this book, it appeared that the package managers of Red Hat, Fedora, and CentOS did not have direct support for PIL. Therefore, please follow this step if you are using one of these Linux distributions.
Install with either of the following commands:
easy_install PIL
sudo pip install PIL
Resizing images
In this recipe, we will load a sample image of Lena, which is available in the SciPy distribution, into an array. This article is not about image manipulation, by the way; we will just use the image data as an input.
Lena Soderberg appeared in a 1972 Playboy magazine. For historical reasons, one of those images is often used in the field of image processing. Don't worry; the picture in question is completely safe for work.
We will resize the image using the repeat function. This function repeats an array, which in practice means resizing the image by a certain factor.
Getting ready
A prerequisite for this recipe is to have SciPy, Matplotlib, and PIL installed.
How to do it...

Load the Lena image into an array.
SciPy has a lena function , which can load the image into a NumPy array:
lena = scipy.misc.lena()
Some refactoring has occurred since version 0.10, so if you are using an older version, the correct code is:
lena = scipy.lena()

Check the shape.
Check the shape of the Lena array using the assert_equal function from the numpy.testing package—this is an optional sanity check test:
numpy.testing.assert_equal((LENA_X, LENA_Y), lena.shape)

Resize the Lena array.
Resize the Lena array with the repeat function. We give this function a resize factor in the x and y direction:
resized = lena.repeat(yfactor, axis=0).repeat(xfactor, axis=1)

Plot the arrays.
We will plot the Lena image and the resized image in two subplots that are a part of the same grid. Plot the Lena array in a subplot:
matplotlib.pyplot.subplot(211) matplotlib.pyplot.imshow(lena)
The Matplotlib subplot function creates a subplot. This function accepts a 3digit integer as the parameter, where the first digit is the number of rows, the second digit is the number of columns, and the last digit is the index of the subplot starting with 1. The imshow function shows images. Finally, the show function displays the end result.
Plot the resized array in another subplot and display it. The index is now 2:
matplotlib.pyplot.subplot(212) matplotlib.pyplot.imshow(resized) matplotlib.pyplot.show()
The following screenshot is the result with the original image (first) and the resized image (second):
The following is the complete code for this recipe:
import scipy.misc import sys import matplotlib.pyplot import numpy.testing # This script resizes the Lena image from Scipy. if(len(sys.argv) != 3): print "Usage python %s yfactor xfactor" % (sys.argv[0]) sys.exit() # Loads the Lena image into an array lena = scipy.misc.lena() #Lena's dimensions LENA_X = 512 LENA_Y = 512 #Check the shape of the Lena array numpy.testing.assert_equal((LENA_X, LENA_Y), lena.shape) # Get the resize factors yfactor = float(sys.argv[1]) xfactor = float(sys.argv[2]) # Resize the Lena array resized = lena.repeat(yfactor, axis=0).repeat(xfactor, axis=1) #Check the shape of the resized array numpy.testing.assert_equal((yfactor * LENA_Y, xfactor * LENA_Y), resized.shape) # Plot the Lena array matplotlib.pyplot.subplot(211) matplotlib.pyplot.imshow(lena) #Plot the resized array matplotlib.pyplot.subplot(212) matplotlib.pyplot.imshow(resized) matplotlib.pyplot.show()
How it works...
The repeat function repeats arrays, which, in this case, resulted in changing the size of the original image. The Matplotlib subplot function creates a subplot. The imshow function shows images. Finally, the show function displays the end result.
See also

The Installing SciPy recipe

The Installing PIL recipe
Over 70 interesting recipes for learning the Python open source mathematical library NumPy with this book and ebook. 
Creating views and copies
It is important to know when we are dealing with a shared array view, and when we have a copy of the array data. A slice, for instance, will create a view. This means that if you assign the slice to a variable and then change the underlying array, the value of this variable will change. We will create an array from the famous Lena image, copy the array, create a view, and, at the end, modify the view.
Getting ready
The prerequisites are the same as in the previous recipe.
How to do it...
Let's create a copy and views of the Lena array:

Create a copy of the Lena array:
acopy = lena.copy()

Create a view of the array:
aview = lena.view()

Set all the values of the view to 0 with a flat iterator:
aview.flat = 0
The end result is that only one of the images shows the Playboy model. The other ones get censored completely:
The following is the code of this tutorial showing the behavior of array views and copies:
import scipy.misc import matplotlib.pyplot lena = scipy.misc.lena() acopy = lena.copy() aview = lena.view() # Plot the Lena array matplotlib.pyplot.subplot(221) matplotlib.pyplot.imshow(lena) #Plot the copy matplotlib.pyplot.subplot(222) matplotlib.pyplot.imshow(acopy) #Plot the view matplotlib.pyplot.subplot(223) matplotlib.pyplot.imshow(aview) # Plot the view after changes aview.flat = 0 matplotlib.pyplot.subplot(224) matplotlib.pyplot.imshow(aview) matplotlib.pyplot.show()
How it works...
As you can see, by changing the view at the end of the program, we changed the original Lena array. This resulted in having three blue (or black if you are looking at a black and white image) images—the copied array was unaffected. It is important to remember that views are not readonly.
Flipping Lena
We will be flipping the SciPy Lena image—all in the name of science, of course, or at least as a demo. In addition to flipping the image, we will slice it and apply a mask to it.
How to do it...
The steps to follow are listed below:

Plot the flipped image.
Flip the Lena array around the vertical axis using the following code:
matplotlib.pyplot.imshow(lena[:,::1])

Plot a slice of the image.
Take a slice out of the image and plot it. In this step, we will have a look at the shape of the Lena array. The shape is a tuple representing the dimensions of the array. The following code effectively selects the leftupper quadrant of the Playboy picture.
matplotlib.pyplot.imshow(lena[:lena.shape[0]/2, :lena.shape[1]/2])

Apply a mask to the image.
Apply a mask to the image by finding all the values in the Lena array that are even (this is just arbitrary for demo purposes). Copy the array and change the even values to 0. This has the effect of putting lots of blue dots (dark spots if you are looking at a black and white image) on the image:
mask = lena % 2 == 0 masked_lena = lena.copy() masked_lena[mask] = 0
All these efforts result in a 2 by 2 image grid, as shown in the following screenshot:
The following is the complete code for this recipe:
import scipy.misc import matplotlib.pyplot # Load the Lena array lena = scipy.misc.lena() # Plot the Lena array matplotlib.pyplot.subplot(221) matplotlib.pyplot.imshow(lena) #Plot the flipped array matplotlib.pyplot.subplot(222) matplotlib.pyplot.imshow(lena[:,::1]) #Plot a slice array matplotlib.pyplot.subplot(223) matplotlib.pyplot.imshow(lena[:lena.shape[0]/2,:lena.shape[1]/2]) # Apply a mask mask = lena % 2 == 0 masked_lena = lena.copy() masked_lena[mask] = 0 matplotlib.pyplot.subplot(224) matplotlib.pyplot.imshow(masked_lena) matplotlib.pyplot.show()
See also

The Installing SciPy recipe

The Installing PIL recipe
Fancy indexing
In this tutorial, we will apply fancy indexing to set the diagonal values of the Lena image to 0. This will draw black lines along the diagonals, crossing it through, not because there is something wrong with the image, but just as an exercise. Fancy indexing is indexing that does not involve integers or slices, which is normal indexing.
How to do it...
We will start with the first diagonal:

Set the values of the first diagonal to 0.
To set the diagonal values to 0, we need to define two different ranges for the x and y values:
lena[range(xmax), range(ymax)] = 0

Set the values of the other diagonal to 0.
To set the values of the other diagonal, we require a different set of ranges, but the principles stay the same:
lena[range(xmax1,1,1), range(ymax)] = 0
At the end, we get this image with the diagonals crossed off, as shown in the following screenshot:
The following is the complete code for this recipe:
import scipy.misc import matplotlib.pyplot # This script demonstrates fancy indexing by setting values # on the diagonals to 0. # Load the Lena array lena = scipy.misc.lena() xmax = lena.shape[0] ymax = lena.shape[1] # Fancy indexing # Set values on diagonal to 0 # x 0xmax # y 0ymax lena[range(xmax), range(ymax)] = 0 # Set values on other diagonal to 0 # x xmax0 # y 0ymax lena[range(xmax1,1,1), range(ymax)] = 0 # Plot Lena with diagonal lines set to 0 matplotlib.pyplot.imshow(lena) matplotlib.pyplot.show()
How it works...
We defined separate ranges for the x values and y values. These ranges were used to index the Lena array. Fancy indexing is performed based on an internal NumPy iterator object. The following three steps are performed:

The iterator object is created.

The iterator object gets bound to the array.

Array elements are accessed via the iterator.
Indexing with a list of locations
Let's use the ix_ function to shuffle the Lena image. This function creates a mesh from multiple sequences.
How to do it...
We will start by randomly shuffling the array indices:

Shuffle array indices.
Create a random indices array with the shuffle function of the numpy.random module:
def shuffle_indices(size): arr = numpy.arange(size) numpy.random.shuffle(arr) return arr

Plot the shuffled indices:
matplotlib.pyplot.imshow(lena[numpy.ix_(xindices, yindices)])
What we get is a completely scrambled Lena image, as shown in the following screenshot:
The following is the complete code for the recipe:
import scipy.misc import matplotlib.pyplot import numpy.random import numpy.testing # Load the Lena array lena = scipy.misc.lena() xmax = lena.shape[0] ymax = lena.shape[1] def shuffle_indices(size): arr = numpy.arange(size) numpy.random.shuffle(arr) return arr xindices = shuffle_indices(xmax) numpy.testing.assert_equal(len(xindices), xmax) yindices = shuffle_indices(ymax) numpy.testing.assert_equal(len(yindices), ymax) # Plot Lena matplotlib.pyplot.imshow(lena[numpy.ix_(xindices, yindices)]) matplotlib.pyplot.show()
Indexing with booleans
Boolean indexing is indexing based on a boolean array and falls in the category fancy indexing.
How to do it...
We will apply this indexing technique to an image:

Image with dots on the diagonal.
This is in some way similar to the Fancy indexing recipe, in this article. This time we select modulo 4 points on the diagonal of the image:
def get_indices(size): arr = numpy.arange(size) return arr % 4 == 0
Then we just apply this selection and plot the points:
lena1 = lena.copy() xindices = get_indices(lena.shape[0]) yindices = get_indices(lena.shape[1]) lena1[xindices, yindices] = 0 matplotlib.pyplot.subplot(211) matplotlib.pyplot.imshow(lena1)

Set to 0 based on value.
Select array values between quarter and threequarters of the maximum value and set them to 0:
lena2[(lena > lena.max()/4) & (lena < 3 * lena.max()/4)] = 0
The plot with the two new images will look like the following screenshot:
The following is the complete code for this recipe:
import scipy.misc import matplotlib.pyplot import numpy # Load the Lena array lena = scipy.misc.lena() def get_indices(size): arr = numpy.arange(size) return arr % 4 == 0 # Plot Lena lena1 = lena.copy() xindices = get_indices(lena.shape[0]) yindices = get_indices(lena.shape[1]) lena1[xindices, yindices] = 0 matplotlib.pyplot.subplot(211) matplotlib.pyplot.imshow(lena1) lena2 = lena.copy() # Between quarter and 3 quarters of the max value lena2[(lena > lena.max()/4) & (lena < 3 * lena.max()/4)] = 0 matplotlib.pyplot.subplot(212) matplotlib.pyplot.imshow(lena2) matplotlib.pyplot.show()
How it works...
Because boolean indexing is a form of fancy indexing, the way it works is basically the same. This means that indexing happens with the help of a special iterator object.
See also

The Fancy Indexing recipe
Stride tricks for Sudoku
The ndarray class has a strides field, which is a tuple indicating the number of bytes to step in each dimension when going through an array. Let's apply some stride tricks to the problem of splitting a Sudoku puzzle to the 3 by 3 squares of which it is composed.
For more information see http://en.wikipedia.org/wiki/Sudoku.
How to do it...

Define the Sudoku puzzle array
Let's define the Sudoku puzzle array. This one is filled with the contents of an actual, solved Sudoku puzzle:
sudoku = numpy.array([ [2, 8, 7, 1, 6, 5, 9, 4, 3], [9, 5, 4, 7, 3, 2, 1, 6, 8], [6, 1, 3, 8, 4, 9, 7, 5, 2], [8, 7, 9, 6, 5, 1, 2, 3, 4], [4, 2, 1, 3, 9, 8, 6, 7, 5], [3, 6, 5, 4, 2, 7, 8, 9, 1], [1, 9, 8, 5, 7, 3, 4, 2, 6], [5, 4, 2, 9, 1, 6, 3, 8, 7], [7, 3, 6, 2, 8, 4, 5, 1, 9] ])

Calculate the strides. The itemsize field of ndarray gives us the number of bytes in an array. Using the itemsize, calculate the strides:
strides = sudoku.itemsize * numpy.array([27, 3, 9, 1])

Split into squares.
Now we can split the puzzle into squares with the as_strided function of the numpy.lib.stride_tricks module:
squares = numpy.lib.stride_tricks.as_strided (sudoku, shape=shape, strides=strides) print(squares)
This prints separate Sudoku squares:
[[[[2 8 7] [9 5 4] [6 1 3]] [[1 6 5] [7 3 2] [8 4 9]] [[9 4 3] [1 6 8] [7 5 2]]] [[[8 7 9] [4 2 1] [3 6 5]] [[6 5 1] [3 9 8] [4 2 7]] [[2 3 4] [6 7 5] [8 9 1]]] [[[1 9 8] [5 4 2] [7 3 6]] [[5 7 3] [9 1 6] [2 8 4]] [[4 2 6] [3 8 7] [5 1 9]]]]
The following is the complete source code for this recipe:
import numpy sudoku = numpy.array([ [2, 8, 7, 1, 6, 5, 9, 4, 3], [9, 5, 4, 7, 3, 2, 1, 6, 8], [6, 1, 3, 8, 4, 9, 7, 5, 2], [8, 7, 9, 6, 5, 1, 2, 3, 4], [4, 2, 1, 3, 9, 8, 6, 7, 5], [3, 6, 5, 4, 2, 7, 8, 9, 1], [1, 9, 8, 5, 7, 3, 4, 2, 6], [5, 4, 2, 9, 1, 6, 3, 8, 7], [7, 3, 6, 2, 8, 4, 5, 1, 9] ]) shape = (3, 3, 3, 3) strides = sudoku.itemsize * numpy.array([27, 3, 9, 1]) squares = numpy.lib.stride_tricks.as_strided (sudoku, shape=shape, strides=strides) print(squares)
How it works...
We applied stride tricks to decompose a Sudoku puzzle in its constituent 3 by 3 squares. The strides tell us how many bytes we need to skip at each step when going through the Sudoku array.
Broadcasting arrays
Without knowing it, you might have broadcasted arrays. In a nutshell, NumPy tries to perform an operation even though the operands do not have the same shape. In this recipe, we will multiply an array and a scalar. The scalar is "extended" to the shape of the array operand and then the multiplication is performed. We will download an audio file and make a new version that is quieter.
How to do it...
Let's start by reading a WAV file:

Reading a WAV file.
We will use a standard Python code to download an audio file of Austin Powers called "Smashing, baby". SciPy has a wavfile module, which allows you to load sound data or generate WAV files. If SciPy is installed, then we should have this module already. The read function returns a data array and sample rate. In this example, we only care about the data:
sample_rate, data = scipy.io.wavfile.read(WAV_FILE)

Plot the original WAV data.
Plot the original WAV data with Matplotlib. Give the subplot the title Original.
matplotlib.pyplot.subplot(2, 1, 1) matplotlib.pyplot.title("Original") matplotlib.pyplot.plot(data)

Create a new array.
Now we will use NumPy to make a quieter audio sample. It's just a matter of creating a new array with smaller values by multiplying with a constant. This is where the magic of broadcasting occurs. At the end, we need to make sure that we have the same data type as in the original array, because of the WAV format:
newdata = data * 0.2 newdata = newdata.astype(numpy.uint8)

Write to a WAV file.
This new array can be written into a new WAV file as follows:
scipy.io.wavfile.write("quiet.wav", sample_rate, newdata)

Plot the new WAV data.
Plot the new data array with Matplotlib:
matplotlib.pyplot.subplot(2, 1, 2) matplotlib.pyplot.title("Quiet") matplotlib.pyplot.plot(newdata) matplotlib.pyplot.show()
The result is a plot of the original WAV file data and a new array with smaller values, as shown in the following screenshot:
The following is the complete code for this recipe:
import scipy.io.wavfile import matplotlib.pyplot import urllib2 import numpy response = urllib2.urlopen('http://www.thesoundarchive.com/ austinpowers/smashingbaby.wav') print response.info() WAV_FILE = 'smashingbaby.wav' filehandle = open(WAV_FILE, 'w') filehandle.write(response.read()) filehandle.close() sample_rate, data = scipy.io.wavfile.read(WAV_FILE) print "Data type", data.dtype, "Shape", data.shape matplotlib.pyplot.subplot(2, 1, 1) matplotlib.pyplot.title("Original") matplotlib.pyplot.plot(data) newdata = data * 0.2 newdata = newdata.astype(numpy.uint8) print "Data type", newdata.dtype, "Shape", newdata.shape scipy.io.wavfile.write("quiet.wav", sample_rate, newdata) matplotlib.pyplot.subplot(2, 1, 2) matplotlib.pyplot.title("Quiet") matplotlib.pyplot.plot(newdata) matplotlib.pyplot.show()
Summary
NumPy has very efficient arrays that are easy to use due to their powerful indexing mechanism. This fame of efficient arrays is partly due to the ease of indexing. Thus, in this article we have demonstrated the advanced indexing tricks using images.
Resources for Article :
Further resources on this subject:
 Plotting data using Matplotlib: Part 2 [Article]
 Advanced Matplotlib: Part 1 [Article]
 Plotting Data with Sage [Article]
Over 70 interesting recipes for learning the Python open source mathematical library NumPy with this book and ebook. 
About the Author :
Ivan Idris
Ivan Idris has an MSc in Experimental Physics. His graduation thesis had a strong emphasis on Applied Computer Science. After graduating, he worked for several companies as a Java Developer, Data warehouse Developer, and QA Analyst. His main professional interests are Business Intelligence, Big Data, and Cloud Computing. Ivan Idris enjoys writing clean, testable code and interesting technical articles. Ivan Idris is the author of NumPy 1.5 Beginner's Guide and NumPy Cookbook by Packt Publishing. You can find more information and a blog with a few NumPy examples at ivanidris.net.
Post new comment