Creating a record data type
A record data type is a heterogeneous data type—think of it as representing a row in a spreadsheet or a database. To give an example of a record data type, we will create a record for a shop inventory. This record contains the name of an item represented by a 40-character string, the number of items in the store represented by a 32-bit integer, and finally, the price of the item represented by a 32-bit float. The following steps show how to create a record data type (see the record.py
file in the Chapter02
folder of this book's code bundle):
To create a record, check the following code snippet:
To view the type of the field, check the following code snippet:
If you don't give the array()
function a data type, it will assume that it is dealing with floating point numbers. To create...
One-dimensional slicing and indexing
Slicing of one-dimensional NumPy arrays works just like slicing of Python lists. We can select a piece of an array from the index 3
to 7
that extracts the elements 3
through 6
(see the slicing1d.py
file in the Chapter02
folder of this book's code bundle), as shown in the following code snippet:
We can select elements from the index 0
to 7
with a step of two, as shown in the following lines of code:
Just as in Python, we can use negative indices and reverse the array, as shown in the following lines of code:
Manipulating array shapes
Another recurring task is flattening of arrays. Flattening in this context means transforming a multidimensional array into a one-dimensional array. In this example, we will demonstrate a number of ways to manipulate array shapes starting with flattening:
ravel()
: We can accomplish flattening with the ravel()
function (see the shapemanipulation.py
file in the Chapter02
folder of this book's code bundle), as shown in the following code:
In: b
Out:
array([[[ 0, 1, 2, 3],[ 4, 5, 6, 7],[ 8, 9, 10, 11]],[[12, 13, 14, 15],[16, 17, 18, 19],[20, 21, 22, 23]]])
In: b.ravel()
Out:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])
flatten()
: The appropriately-named function, flatten()
, does the same as ravel()
, but flatten()
always allocates new memory, whereas ravel()
might return a view of an array. This means that we can directly manipulate the array as follows:
Creating views and copies
In the example about the ravel()
function, views were mentioned. Views should not be confused with the concept of database views. Views in the NumPy world are not read-only, and you don't have the possibility to protect the underlying data. It is important to know when we are dealing with a shared array view and when we have a copy of array data. A slice, for instance, will create a view. This means that if you assign a slice to a variable and then change the underlying array, the value of this variable will change. We will create an array from the famous Lena image, copy the array, create a view, and at the end, modify the view. The Lena image array comes from a SciPy function.
To create a copy of the Lena array, the following line of code is used:
Now, to create a view of the array, use the following line of code:
Set all the values of the view to 0
with a flat iterator, as follows:
The end result is that only one...
Fancy indexing is indexing that does not involve integers or slices, which is normal indexing. In this section, we will apply fancy indexing to set the diagonal values of the Lena image to 0
. This will draw black lines along the diagonals, crossing it through, not because there is something wrong with the image, but just as an exercise. Perform the following steps for fancy indexing:
Set the values of the first diagonal to 0
. To set the diagonal values to 0
, we need to define two different ranges for the x
and y
values as follows:
Now, set the values of the other diagonal to 0
. To set the values of the other diagonal, we require a different set of ranges, but the principles stay the same, as follows:
At the end we get the following image with the diagonals crossed out:
The following code for this section is without comments. The complete code for this is in the fancy.py
file in the Chapter02
folder of...
Indexing with a list of locations
Let's use the ix_()
function to shuffle the Lena image. This function creates a mesh from multiple sequences. As arguments, we give one-dimensional sequences, and the function returns a tuple of NumPy arrays. For example, check the following code snippet:
To index the array with a list of locations, perform the following steps:
Shuffle the array indices. Create a random indices array with the shuffle()
function of the numpy.random
module, as shown in the following lines of code. The function changes the array inplace
by the way.
Now plot the shuffled indices as follows:
What we get is a completely scrambled Lena, as shown in the following image:
The following code for this section is without comments. The complete code for this can be found in the ix.py
file in the...
Indexing arrays with Booleans
Boolean indexing is indexing based on a Boolean array and falls in the category of fancy indexing. Since Boolean indexing is a form of fancy indexing, the way it works is basically the same. This means that indexing happens with the help of a special iterator object. Perform the following steps to index an array:
First, we create an image with dots on the diagonal. This is in some way similar to the Fancy indexing section. This time we select modulo four points on the diagonal of the image, as shown in the following code snippet:
Then we just apply this selection and plot the points, as shown in the following code snippet:
Select array values between a quarter and three-quarters of the maximum value, and set them to 0
, as shown in the...
We can do even more fancy things with NumPy. The ndarray
class has a field, strides
, which is a tuple indicating the number of bytes to step in each dimension when going through an array. Sudoku is a popular puzzle originally from Japan; although it was known in a similar form before in other countries. If you don't know about Sudoku, it's maybe better that way because it is highly addictive. Let's apply some stride tricks to the problem of splitting a Sudoku puzzle to the 3 x 3 squares it is composed of:
First define the Sudoku puzzle array, as shown in the following code snippet. This one is filled with the contents of the actual solved Sudoku puzzle (part of the array is omitted for brevity).
Now calculate the strides. The itemsize
field of ndarray
gives us the number of bytes in an array. itemsize
calculates the strides as follows:
In a nutshell, NumPy tries to perform an operation even though the operands do not have the same shape. In this section, we will multiply an array and a scalar. The scalar is extended to the shape of an array operand, and then the multiplication is performed. We will download an audio file and make a new version that is quieter:
First, read the WAV file. We will use standard Python code to download an audio file of Austin Powers saying "Smashing, baby". SciPy has a wavfile
module that allows you to load sound data or generate WAV files. If SciPy is installed, then we should already have this module. The read()
function returns a data array and sample rate. In this example, we only care about the data.
Plot the original WAV data with Matplotlib. Give the subplot the title, Original
, as shown in the following lines of code:
Now create a new array. We will use NumPy to...
We learned a lot in this chapter about the NumPy fundamentals: data types and arrays. Arrays have several attributes describing them. We learned that one of these attributes is the data type which, in NumPy, is represented by a full-fledged object.
NumPy arrays can be sliced and indexed in an efficient manner, just as in the case of Python lists. NumPy arrays have the added ability of working with multiple dimensions.
The shape of an array can be manipulated in many ways, such as stacking, resizing, reshaping, and splitting. A great number of convenience functions for shape manipulation were demonstrated in this chapter.
Having learned about the basics, it's time to move on to data analysis with commonly used functions in Chapter 3, Basic Data Analysis with NumPy. This includes the usage of basic statistical and mathematical functions.