# Advanced Indexing and Array Concepts

Exclusive offer: get 50% off this eBook here

### NumPy Cookbook — Save 50%

Over 70 interesting recipes for learning the Python open source mathematical library NumPy with this book and ebook.

\$26.99    \$13.50
by Ivan Idris | December 2012 | Cookbooks Open Source

In this article by Ivan Idris, the author of NumPy Cookbook, we will learn some of NumPy's more advanced and tricky indexing techniques. NumPy has very efficient arrays that are easy to use due to their powerful indexing mechanism. NumPy is famous for its efficient arrays. This fame is partly due to the ease of indexing. We will demonstrate advanced indexing tricks using images.

• Installing SciPy

• Installing PIL

• Resizing images

• Comparing views and copies

• Flipping Lena

• Fancy indexing

• Indexing with a list of locations

• Indexing with booleans

• Stride tricks for Sudoku

Before diving into indexing, we will install the necessary software — SciPy and PIL. Some of the examples in this article will involve manipulating images. In order to do that, we will require the Python Image Library (PIL); but don't worry, instructions and pointers to help you install PIL and other necessary Python software are given throughout the article, when necessary.

(For more resources related to this topic, see here.)

# Installing SciPy

SciPy is the scientific Python library and is closely related to NumPy. In fact, SciPy and NumPy used to be one and the same project many years ago. In this recipe, we will install SciPy.

## How to do it...

In this recipe, we will go through the steps for installing SciPy.

• Installing from source: If you have Git installed, you can clone the SciPy repository using the following command:

`git clone https://github.com/scipy/scipy.gitpython setup.py buildpython setup.py install --user`

This installs to your home directory and requires Python 2.6 or higher.

Before building, you will also need to install the following packages on which SciPy depends:

• BLAS and LAPACK libraries

• C and Fortran compilers

There is a chance that you have already installed this software as a part of the NumPy installation.

• Installing SciPy on Linux: Most Linux distributions have SciPy packages. We will go through the necessary steps for some of the popular Linux distributions:

• In order to install SciPy on Red Hat, Fedora, and CentOS, run the following instructions from the command line:

`yum install python-scipy`
• In order to install SciPy on Mandriva, run the following command line instruction:

`urpmi python-scipy`
• In order to install SciPy on Gentoo, run the following command line instruction:

`sudo emerge scipy`
• On Debian or Ubuntu, we need to type the following:

`sudo apt-get install python-scipy`
• Installing SciPy on Mac OS X: Apple Developer Tools (XCode) is required, because it contains the BLAS and LAPACK libraries. It can be found either in the App Store, or in the installation DVD that came with your Mac, or you can get the latest version from Apple Developer's connection at https://developer.apple.com/technologies/tools/. Make sure that everything, including all the optional packages is installed.

You probably already have a Fortran compiler installed for NumPy. The binaries for gfortran can be found at http://r.research.att.com/tools/.

• Installing SciPy using easy_install or pip: Install with either of the following two commands:

`sudo pip install scipyeasy_install scipy`
• Installing on Windows: If you have Python installed already, the preferred method is to download and use the binary distribution. Alternatively, you may want to install the Enthought Python distribution, which comes with other scientific Python software packages.

• Check your installation: Check the SciPy installation with the following code:

```import scipy
print scipy.__version__
print scipy.__file__```

This should print the correct SciPy version.

## How it works...

Most package managers will take care of any dependencies for you. However, in some cases, you will need to install them manually. Unfortunately, this is beyond the scope of this book. If you run into problems, you can ask for help at:

# Installing PIL

PIL, the Python imaging library, is a prerequisite for the image processing recipes in this article.

## How to do it...

Let's see how to install PIL.

• Installing PIL on Windows: Install using the Windows executable from the PIL website http://www.pythonware.com/products/pil/.

• Installing on Debian or Ubuntu: On Debian or Ubuntu, install PIL using the following command:

`sudo apt-get install python-imaging`
• Installing with easy_install or pip: At the t ime of writing this book, it appeared that the package managers of Red Hat, Fedora, and CentOS did not have direct support for PIL. Therefore, please follow this step if you are using one of these Linux distributions.

Install with either of the following commands:

`easy_install PILsudo pip install PIL`

# Resizing images

In this recipe, we will load a sample image of Lena, which is available in the SciPy distribution, into an array. This article is not about image manipulation, by the way; we will just use the image data as an input.

Lena Soderberg appeared in a 1972 Playboy magazine. For historical reasons, one of those images is often used in the field of image processing. Don't worry; the picture in question is completely safe for work.

We will resize the image using the repeat function. This function repeats an array, which in practice means resizing the image by a certain factor.

A prerequisite for this recipe is to have SciPy, Matplotlib, and PIL installed.

## How to do it...

1. Load the Lena image into an array.

SciPy has a lena function , which can load the image into a NumPy array:

`lena = scipy.misc.lena()`

Some refactoring has occurred since version 0.10, so if you are using an older version, the correct code is:

`lena = scipy.lena()`
2. Check the shape.

Check the shape of the Lena array using the assert_equal function from the numpy.testing package—this is an optional sanity check test:

`numpy.testing.assert_equal((LENA_X, LENA_Y), lena.shape)`
3. Resize the Lena array.

Resize the Lena array with the repeat function. We give this function a resize factor in the x and y direction:

`resized = lena.repeat(yfactor, axis=0).repeat(xfactor, axis=1)`
4. Plot the arrays.

We will plot the Lena image and the resized image in two subplots that are a part of the same grid. Plot the Lena array in a subplot:

```
matplotlib.pyplot.subplot(211)
matplotlib.pyplot.imshow(lena)```

The Matplotlib subplot function creates a subplot. This function accepts a 3-digit integer as the parameter, where the first digit is the number of rows, the second digit is the number of columns, and the last digit is the index of the subplot starting with 1. The imshow function shows images. Finally, the show function displays the end result.

Plot the resized array in another subplot and display it. The index is now 2:

```matplotlib.pyplot.subplot(212)
matplotlib.pyplot.imshow(resized)
matplotlib.pyplot.show()```

The following screenshot is the result with the original image (first) and the resized image (second):

The following is the complete code for this recipe:

```import scipy.misc
import sys
import matplotlib.pyplot
import numpy.testing
# This script resizes the Lena image from Scipy.
if(len(sys.argv) != 3):
print "Usage python %s yfactor xfactor" % (sys.argv[0])
sys.exit()
# Loads the Lena image into an array
lena = scipy.misc.lena()
#Lena's dimensions
LENA_X = 512
LENA_Y = 512
#Check the shape of the Lena array
numpy.testing.assert_equal((LENA_X, LENA_Y), lena.shape)
# Get the resize factors
yfactor = float(sys.argv[1])
xfactor = float(sys.argv[2])
# Resize the Lena array
resized = lena.repeat(yfactor, axis=0).repeat(xfactor, axis=1)
#Check the shape of the resized array
numpy.testing.assert_equal((yfactor * LENA_Y, xfactor * LENA_Y),
resized.shape)
# Plot the Lena array
matplotlib.pyplot.subplot(211)
matplotlib.pyplot.imshow(lena)
#Plot the resized array
matplotlib.pyplot.subplot(212)
matplotlib.pyplot.imshow(resized)
matplotlib.pyplot.show()```

## How it works...

The repeat function repeats arrays, which, in this case, resulted in changing the size of the original image. The Matplotlib subplot function creates a subplot. The imshow function shows images. Finally, the show function displays the end result.

• The Installing SciPy recipe

• The Installing PIL recipe

 Over 70 interesting recipes for learning the Python open source mathematical library NumPy with this book and ebook.
Published: October 2012
eBook Price: \$26.99
Book Price: \$44.99
See more

# Creating views and copies

It is important to know when we are dealing with a shared array view, and when we have a copy of the array data. A slice, for instance, will create a view. This means that if you assign the slice to a variable and then change the underlying array, the value of this variable will change. We will create an array from the famous Lena image, copy the array, create a view, and, at the end, modify the view.

The prerequisites are the same as in the previous recipe.

## How to do it...

Let's create a copy and views of the Lena array:

1. Create a copy of the Lena array:

`acopy = lena.copy()`
2. Create a view of the array:

`aview = lena.view()`
3. Set all the values of the view to 0 with a flat iterator:

`aview.flat = 0`

The end result is that only one of the images shows the Playboy model. The other ones get censored completely:

The following is the code of this tutorial showing the behavior of array views and copies:

```import scipy.misc
import matplotlib.pyplot
lena = scipy.misc.lena()
acopy = lena.copy()
aview = lena.view()
# Plot the Lena array
matplotlib.pyplot.subplot(221)
matplotlib.pyplot.imshow(lena)
#Plot the copy
matplotlib.pyplot.subplot(222)
matplotlib.pyplot.imshow(acopy)
#Plot the view
matplotlib.pyplot.subplot(223)
matplotlib.pyplot.imshow(aview)
# Plot the view after changes
aview.flat = 0
matplotlib.pyplot.subplot(224)
matplotlib.pyplot.imshow(aview)
matplotlib.pyplot.show()
```

## How it works...

As you can see, by changing the view at the end of the program, we changed the original Lena array. This resulted in having three blue (or black if you are looking at a black and white image) images—the copied array was unaffected. It is important to remember that views are not read-only.

# Flipping Lena

We will be flipping the SciPy Lena image—all in the name of science, of course, or at least as a demo. In addition to flipping the image, we will slice it and apply a mask to it.

## How to do it...

The steps to follow are listed below:

1. Plot the flipped image.

Flip the Lena array around the vertical axis using the following code:

`matplotlib.pyplot.imshow(lena[:,::-1])`
2. Plot a slice of the image.

Take a slice out of the image and plot it. In this step, we will have a look at the shape of the Lena array. The shape is a tuple representing the dimensions of the array. The following code effectively selects the left-upper quadrant of the Playboy picture.

```matplotlib.pyplot.imshow(lena[:lena.shape[0]/2,
:lena.shape[1]/2])```
3. Apply a mask to the image.

Apply a mask to the image by finding all the values in the Lena array that are even (this is just arbitrary for demo purposes). Copy the array and change the even values to 0. This has the effect of putting lots of blue dots (dark spots if you are looking at a black and white image) on the image:

```mask = lena % 2 == 0

All these efforts result in a 2 by 2 image grid, as shown in the following screenshot:

The following is the complete code for this recipe:

```import scipy.misc
import matplotlib.pyplot
lena = scipy.misc.lena()
# Plot the Lena array
matplotlib.pyplot.subplot(221)
matplotlib.pyplot.imshow(lena)
#Plot the flipped array
matplotlib.pyplot.subplot(222)
matplotlib.pyplot.imshow(lena[:,::-1])
#Plot a slice array
matplotlib.pyplot.subplot(223)
matplotlib.pyplot.imshow(lena[:lena.shape[0]/2,:lena.shape[1]/2])
mask = lena % 2 == 0
matplotlib.pyplot.subplot(224)
matplotlib.pyplot.show()```

• The Installing SciPy recipe

• The Installing PIL recipe

# Fancy indexing

In this tutorial, we will apply fancy indexing to set the diagonal values of the Lena image to 0. This will draw black lines along the diagonals, crossing it through, not because there is something wrong with the image, but just as an exercise. Fancy indexing is indexing that does not involve integers or slices, which is normal indexing.

## How to do it...

1. Set the values of the first diagonal to 0.

To set the diagonal values to 0, we need to define two different ranges for the x and y values:

`lena[range(xmax), range(ymax)] = 0`
2. Set the values of the other diagonal to 0.

To set the values of the other diagonal, we require a different set of ranges, but the principles stay the same:

`lena[range(xmax-1,-1,-1), range(ymax)] = 0`

At the end, we get this image with the diagonals crossed off, as shown in the following screenshot:

The following is the complete code for this recipe:

```import scipy.misc
import matplotlib.pyplot
# This script demonstrates fancy indexing by setting values
# on the diagonals to 0.
lena = scipy.misc.lena()
xmax = lena.shape[0]
ymax = lena.shape[1]
# Fancy indexing
# Set values on diagonal to 0
# x 0-xmax
# y 0-ymax
lena[range(xmax), range(ymax)] = 0
# Set values on other diagonal to 0
# x xmax-0
# y 0-ymax
lena[range(xmax-1,-1,-1), range(ymax)] = 0
# Plot Lena with diagonal lines set to 0
matplotlib.pyplot.imshow(lena)
matplotlib.pyplot.show()```

## How it works...

We defined separate ranges for the x values and y values. These ranges were used to index the Lena array. Fancy indexing is performed based on an internal NumPy iterator object. The following three steps are performed:

1. The iterator object is created.

2. The iterator object gets bound to the array.

3. Array elements are accessed via the iterator.

# Indexing with a list of locations

Let's use the ix_ function to shuffle the Lena image. This function creates a mesh from multiple sequences.

## How to do it...

We will start by randomly shuffling the array indices:

1. Shuffle array indices.

Create a random indices array with the shuffle function of the numpy.random module:

```def shuffle_indices(size):
arr = numpy.arange(size)
numpy.random.shuffle(arr)
return arr
```
2. Plot the shuffled indices:

`matplotlib.pyplot.imshow(lena[numpy.ix_(xindices, yindices)])`

What we get is a completely scrambled Lena image, as shown in the following screenshot:

The following is the complete code for the recipe:

```import scipy.misc
import matplotlib.pyplot
import numpy.random
import numpy.testing
lena = scipy.misc.lena()
xmax = lena.shape[0]
ymax = lena.shape[1]
def shuffle_indices(size):
arr = numpy.arange(size)
numpy.random.shuffle(arr)
return arr
xindices = shuffle_indices(xmax)
numpy.testing.assert_equal(len(xindices), xmax)
yindices = shuffle_indices(ymax)
numpy.testing.assert_equal(len(yindices), ymax)
# Plot Lena
matplotlib.pyplot.imshow(lena[numpy.ix_(xindices, yindices)])
matplotlib.pyplot.show()```

# Indexing with booleans

Boolean indexing is indexing based on a boolean array and falls in the category fancy indexing.

## How to do it...

We will apply this indexing technique to an image:

1. Image with dots on the diagonal.

This is in some way similar to the Fancy indexing recipe, in this article. This time we select modulo 4 points on the diagonal of the image:

```def get_indices(size):
arr = numpy.arange(size)
return arr % 4 == 0```

Then we just apply this selection and plot the points:

```lena1 = lena.copy()
xindices = get_indices(lena.shape[0])
yindices = get_indices(lena.shape[1])
lena1[xindices, yindices] = 0
matplotlib.pyplot.subplot(211)
matplotlib.pyplot.imshow(lena1)
```
2. Set to 0 based on value.

Select array values between quarter and three-quarters of the maximum value and set them to 0:

```lena2[(lena > lena.max()/4) &
(lena < 3 * lena.max()/4)] = 0```

The plot with the two new images will look like the following screenshot:

The following is the complete code for this recipe:

```import scipy.misc
import matplotlib.pyplot
import numpy
lena = scipy.misc.lena()
def get_indices(size):
arr = numpy.arange(size)
return arr % 4 == 0
# Plot Lena
lena1 = lena.copy()
xindices = get_indices(lena.shape[0])
yindices = get_indices(lena.shape[1])
lena1[xindices, yindices] = 0
matplotlib.pyplot.subplot(211)
matplotlib.pyplot.imshow(lena1)
lena2 = lena.copy()
# Between quarter and 3 quarters of the max value
lena2[(lena > lena.max()/4) & (lena < 3 * lena.max()/4)] = 0
matplotlib.pyplot.subplot(212)
matplotlib.pyplot.imshow(lena2)
matplotlib.pyplot.show()
```

## How it works...

Because boolean indexing is a form of fancy indexing, the way it works is basically the same. This means that indexing happens with the help of a special iterator object.

• The Fancy Indexing recipe

# Stride tricks for Sudoku

The ndarray class has a strides field, which is a tuple indicating the number of bytes to step in each dimension when going through an array. Let's apply some stride tricks to the problem of splitting a Sudoku puzzle to the 3 by 3 squares of which it is composed.

## How to do it...

1. Define the Sudoku puzzle array

Let's define the Sudoku puzzle array. This one is filled with the contents of an actual, solved Sudoku puzzle:

```sudoku = numpy.array([
[2, 8, 7, 1, 6, 5, 9, 4, 3],
[9, 5, 4, 7, 3, 2, 1, 6, 8],
[6, 1, 3, 8, 4, 9, 7, 5, 2],
[8, 7, 9, 6, 5, 1, 2, 3, 4],
[4, 2, 1, 3, 9, 8, 6, 7, 5],
[3, 6, 5, 4, 2, 7, 8, 9, 1],
[1, 9, 8, 5, 7, 3, 4, 2, 6],
[5, 4, 2, 9, 1, 6, 3, 8, 7],
[7, 3, 6, 2, 8, 4, 5, 1, 9]
])
```
2. Calculate the strides. The itemsize field of ndarray gives us the number of bytes in an array. Using the itemsize, calculate the strides:

```strides = sudoku.itemsize *
numpy.array([27, 3, 9, 1])
```
3. Split into squares.

Now we can split the puzzle into squares with the as_strided function of the numpy.lib.stride_tricks module:

```squares = numpy.lib.stride_tricks.as_strided
(sudoku, shape=shape, strides=strides)
print(squares)```

This prints separate Sudoku squares:

```[[[[2 8 7]
[9 5 4]
[6 1 3]]
[[1 6 5]
[7 3 2]
[8 4 9]]
[[9 4 3]
[1 6 8]
[7 5 2]]]
[[[8 7 9]
[4 2 1]
[3 6 5]]
[[6 5 1]
[3 9 8]
[4 2 7]]
[[2 3 4]
[6 7 5]
[8 9 1]]]
[[[1 9 8]
[5 4 2]
[7 3 6]]
[[5 7 3]
[9 1 6]
[2 8 4]]
[[4 2 6]
[3 8 7]
[5 1 9]]]]```

The following is the complete source code for this recipe:

```import numpy
sudoku = numpy.array([
[2, 8, 7, 1, 6, 5, 9, 4, 3],
[9, 5, 4, 7, 3, 2, 1, 6, 8],
[6, 1, 3, 8, 4, 9, 7, 5, 2],
[8, 7, 9, 6, 5, 1, 2, 3, 4],
[4, 2, 1, 3, 9, 8, 6, 7, 5],
[3, 6, 5, 4, 2, 7, 8, 9, 1],
[1, 9, 8, 5, 7, 3, 4, 2, 6],
[5, 4, 2, 9, 1, 6, 3, 8, 7],
[7, 3, 6, 2, 8, 4, 5, 1, 9]
])
shape = (3, 3, 3, 3)
strides = sudoku.itemsize *
numpy.array([27, 3, 9, 1])
squares = numpy.lib.stride_tricks.as_strided
(sudoku, shape=shape, strides=strides)
print(squares)```

## How it works...

We applied stride tricks to decompose a Sudoku puzzle in its constituent 3 by 3 squares. The strides tell us how many bytes we need to skip at each step when going through the Sudoku array.

Without knowing it, you might have broadcasted arrays. In a nutshell, NumPy tries to perform an operation even though the operands do not have the same shape. In this recipe, we will multiply an array and a scalar. The scalar is "extended" to the shape of the array operand and then the multiplication is performed. We will download an audio file and make a new version that is quieter.

## How to do it...

Let's start by reading a WAV file:

We will use a standard Python code to download an audio file of Austin Powers called "Smashing, baby". SciPy has a wavfile module, which allows you to load sound data or generate WAV files. If SciPy is installed, then we should have this module already. The read function returns a data array and sample rate. In this example, we only care about the data:

`sample_rate, data = scipy.io.wavfile.read(WAV_FILE)`
2. Plot the original WAV data.

Plot the original WAV data with Matplotlib. Give the subplot the title Original.

```matplotlib.pyplot.subplot(2, 1, 1)
matplotlib.pyplot.title("Original")
matplotlib.pyplot.plot(data)```
3. Create a new array.

Now we will use NumPy to make a quieter audio sample. It's just a matter of creating a new array with smaller values by multiplying with a constant. This is where the magic of broadcasting occurs. At the end, we need to make sure that we have the same data type as in the original array, because of the WAV format:

```newdata = data * 0.2
newdata = newdata.astype(numpy.uint8)```
4. Write to a WAV file.

This new array can be written into a new WAV file as follows:

```scipy.io.wavfile.write("quiet.wav",
sample_rate, newdata)```
5. Plot the new WAV data.

Plot the new data array with Matplotlib:

```matplotlib.pyplot.subplot(2, 1, 2)
matplotlib.pyplot.title("Quiet")
matplotlib.pyplot.plot(newdata)
matplotlib.pyplot.show()```

The result is a plot of the original WAV file data and a new array with smaller values, as shown in the following screenshot:

The following is the complete code for this recipe:

```import scipy.io.wavfile
import matplotlib.pyplot
import urllib2
import numpy
response = urllib2.urlopen('http://www.thesoundarchive.com/
austinpowers/smashingbaby.wav')
print response.info()
WAV_FILE = 'smashingbaby.wav'
filehandle = open(WAV_FILE, 'w')
filehandle.close()
print "Data type", data.dtype, "Shape", data.shape
matplotlib.pyplot.subplot(2, 1, 1)
matplotlib.pyplot.title("Original")
matplotlib.pyplot.plot(data)
newdata = data * 0.2
newdata = newdata.astype(numpy.uint8)
print "Data type", newdata.dtype, "Shape", newdata.shape
scipy.io.wavfile.write("quiet.wav",
sample_rate, newdata)
matplotlib.pyplot.subplot(2, 1, 2)
matplotlib.pyplot.title("Quiet")
matplotlib.pyplot.plot(newdata)
matplotlib.pyplot.show()
```

# Summary

NumPy has very efficient arrays that are easy to use due to their powerful indexing mechanism. This fame of efficient arrays is partly due to the ease of indexing. Thus, in this article we have demonstrated the advanced indexing tricks using images.

# Resources for Article :

Further resources on this subject:

 Over 70 interesting recipes for learning the Python open source mathematical library NumPy with this book and ebook.
Published: October 2012
eBook Price: \$26.99
Book Price: \$44.99
See more

## Ivan Idris

Ivan Idris has an MSc in Experimental Physics. His graduation thesis had a strong emphasis on Applied Computer Science. After graduating, he worked for several companies as a Java Developer, Data warehouse Developer, and QA Analyst. His main professional interests are Business Intelligence, Big Data, and Cloud Computing. Ivan Idris enjoys writing clean, testable code and interesting technical articles. Ivan Idris is the author of NumPy 1.5 Beginner's Guide and NumPy Cookbook by Packt Publishing. You can find more information and a blog with a few NumPy examples at ivanidris.net.