Packt+ | Advance your knowledge in tech

You're reading from Learning NumPy Array

Product type Book

Published in Jun 2014

Publisher

ISBN-13 9781783983902

Pages 164 pages

Edition 1st Edition

Languages

Python

Concepts

Data Science

Author (1):

Ivan Idris

Table of Contents (14) Chapters

Learning NumPy Array

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Getting Started with NumPy

NumPy Basics

Basic Data Analysis with NumPy

Simple Predictive Analytics with NumPy

Signal Processing Techniques

Profiling, Debugging, and Testing

The Scientific Python Ecosystem

Index

Chapter 2. NumPy Basics

After installing NumPy and getting some code to work, it's time to cover NumPy basics. This chapter introduces you to the fundamentals of NumPy and arrays. At the end of this chapter you will have a basic understanding of NumPy arrays and their associated functions.

The topics that we shall cover in this chapter are as follows:

Data types
Array types
Type conversions
Creating arrays
Indexing
Fancy indexing
Slicing
Manipulating shapes

The NumPy array object

NumPy has a multidimensional array object called ndarray. It consists of two parts as follows:

The actual data
Some metadata describing the data

The majority of array operations leave the raw data untouched. The only aspect that changes is the metadata.

We have already learned in the previous chapter how to create an array using the arange() function. Actually, we created a one-dimensional array that contained a set of numbers. The ndarray object can have more than one dimension.

The advantages of using NumPy arrays

A NumPy array is a general homogeneous array—the items in an array have to be of the same type (there is a special array type that is heterogeneous). The advantage is that if we know that the items in an array are of the same type, it is easy to determine the storage size required for the array. NumPy arrays can perform vectorized operations working on a whole array. Contrast this to Python lists, where normally you have to loop through the list and perform operations...

Creating a multidimensional array

Now that we know how to create a vector, we are ready to create a multidimensional NumPy array. After we create the matrix, we will again want to display its shape (see the arrayattributes.py file in the Chapter02 folder of this book's code bundle), as shown in the following code snippets:

To create a multidimensional array, see the following code:

In: m = array([arange(2), arange(2)])
In: m
Out:
array([[0, 1],[0, 1]])

To display the array shape, see the following lines of code:
```
In: m.shape
Out: (2, 2)
```

We created a 2 x 2 array with the arange() function. Without any warning, the array() function appeared on the stage.

The array() function creates an array from an object that you give to it. The object needs to be array-like, for instance, a Python list. In the preceding example, we passed in a list of arrays. The object is the only required argument of the array() function. NumPy functions tend to have a lot of optional arguments with predefined defaults.

Selecting array elements

From time to time, we will want to select a particular element of an array. We will take a look at how to do this, but first, let's create a 2 x 2 matrix again (see the elementselection.py file in the Chapter02 folder of this book's code bundle):

In: a = array([[1,2],[3,4]])
In: a
Out:
array([[1, 2],       [3, 4]])

The matrix was created this time by passing a list of lists to the array() function. We will now select each item of the matrix one at a time, as shown in the following code snippet. Remember, the indices are numbered starting from 0.

In: a[0,0]
Out: 1
In: a[0,1]
Out: 2
In: a[1,0]
Out: 3
In: a[1,1]
Out: 4

As you can see, selecting elements of the array is pretty simple. For the array a, we just use the notation a[m,n], where m and n are the indices of the item in the array.

NumPy numerical types

Python has an integer type, a float type, and a complex type; however, this is not enough for scientific computing. In practice, we need even more data types with varying precision, and therefore, different memory size of the type. For this reason, NumPy has a lot more data types. The majority of NumPy numerical types end with a number. This number indicates the number of bits associated with the type. The following table (adapted from the NumPy user guide) gives an overview of NumPy numerical types:

Creating a record data type

A record data type is a heterogeneous data type—think of it as representing a row in a spreadsheet or a database. To give an example of a record data type, we will create a record for a shop inventory. This record contains the name of an item represented by a 40-character string, the number of items in the store represented by a 32-bit integer, and finally, the price of the item represented by a 32-bit float. The following steps show how to create a record data type (see the record.py file in the Chapter02 folder of this book's code bundle):

To create a record, check the following code snippet:

In: t = dtype([('name', str_, 40), ('numitems', int32), ('price', float32)])
In: t
Out: dtype([('name', '|S40'), ('numitems', '<i4'), ('price', '<f4')])

To view the type of the field, check the following code snippet:
```
In: t['name']
Out: dtype('|S40')
```

If you don't give the array() function a data type, it will assume that it is dealing with floating point numbers. To create...

One-dimensional slicing and indexing

Slicing of one-dimensional NumPy arrays works just like slicing of Python lists. We can select a piece of an array from the index 3 to 7 that extracts the elements 3 through 6 (see the slicing1d.py file in the Chapter02 folder of this book's code bundle), as shown in the following code snippet:

In: a = arange(9)
In: a[3:7]
Out: array([3, 4, 5, 6])

We can select elements from the index 0 to 7 with a step of two, as shown in the following lines of code:

In: a[:7:2]
Out: array([0, 2, 4, 6])

Just as in Python, we can use negative indices and reverse the array, as shown in the following lines of code:

In: a[::-1]
Out: array([8, 7, 6, 5, 4, 3, 2, 1, 0])

Manipulating array shapes

Another recurring task is flattening of arrays. Flattening in this context means transforming a multidimensional array into a one-dimensional array. In this example, we will demonstrate a number of ways to manipulate array shapes starting with flattening:

ravel(): We can accomplish flattening with the ravel() function (see the shapemanipulation.py file in the Chapter02 folder of this book's code bundle), as shown in the following code:

In: b
Out:
array([[[ 0,  1,  2,  3],[ 4,  5,  6,  7],[ 8,  9, 10, 11]],[[12, 13, 14, 15],[16, 17, 18, 19],[20, 21, 22, 23]]])
In: b.ravel()
Out:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])

flatten(): The appropriately-named function, flatten(), does the same as ravel(), but flatten() always allocates new memory, whereas ravel() might return a view of an array. This means that we can directly manipulate the array as follows:
```
In: b.flatten()
Out:
array([ 0,  1,  2,  3,  4,  5...
```

Creating views and copies

In the example about the ravel() function, views were mentioned. Views should not be confused with the concept of database views. Views in the NumPy world are not read-only, and you don't have the possibility to protect the underlying data. It is important to know when we are dealing with a shared array view and when we have a copy of array data. A slice, for instance, will create a view. This means that if you assign a slice to a variable and then change the underlying array, the value of this variable will change. We will create an array from the famous Lena image, copy the array, create a view, and at the end, modify the view. The Lena image array comes from a SciPy function.

To create a copy of the Lena array, the following line of code is used:
```
acopy = lena.copy()
```
Now, to create a view of the array, use the following line of code:
```
aview = lena.view()
```
Set all the values of the view to 0 with a flat iterator, as follows:
```
aview.flat = 0
```

The end result is that only one...

Fancy indexing

Fancy indexing is indexing that does not involve integers or slices, which is normal indexing. In this section, we will apply fancy indexing to set the diagonal values of the Lena image to 0. This will draw black lines along the diagonals, crossing it through, not because there is something wrong with the image, but just as an exercise. Perform the following steps for fancy indexing:

Set the values of the first diagonal to 0. To set the diagonal values to 0, we need to define two different ranges for the x and y values as follows:
```
lena[range(xmax), range(ymax)] = 0
```
Now, set the values of the other diagonal to 0. To set the values of the other diagonal, we require a different set of ranges, but the principles stay the same, as follows:
```
lena[range(xmax-1,-1,-1), range(ymax)] = 0
```

At the end we get the following image with the diagonals crossed out:

The following code for this section is without comments. The complete code for this is in the fancy.py file in the Chapter02 folder of...

Indexing with a list of locations

Let's use the ix_() function to shuffle the Lena image. This function creates a mesh from multiple sequences. As arguments, we give one-dimensional sequences, and the function returns a tuple of NumPy arrays. For example, check the following code snippet:

In : ix_([0,1], [2,3])
Out:
(array([[0], [1]]), array([[2, 3]]))

To index the array with a list of locations, perform the following steps:

Shuffle the array indices. Create a random indices array with the shuffle() function of the numpy.random module, as shown in the following lines of code. The function changes the array inplace by the way.
```
def shuffle_indices(size):
   arr = np.arange(size)
   np.random.shuffle(arr)

   return arr
```

Now plot the shuffled indices as follows:

plt.imshow(lena[np.ix_(xindices, yindices)])

What we get is a completely scrambled Lena, as shown in the following image:

The following code for this section is without comments. The complete code for this can be found in the ix.py file in the...

Indexing arrays with Booleans

Boolean indexing is indexing based on a Boolean array and falls in the category of fancy indexing. Since Boolean indexing is a form of fancy indexing, the way it works is basically the same. This means that indexing happens with the help of a special iterator object. Perform the following steps to index an array:

First, we create an image with dots on the diagonal. This is in some way similar to the Fancy indexing section. This time we select modulo four points on the diagonal of the image, as shown in the following code snippet:
```
def get_indices(size):
   arr = np.arange(size)
   return arr % 4 == 0
```

Then we just apply this selection and plot the points, as shown in the following code snippet:

lena1 = lena.copy() 
xindices = get_indices(lena.shape[0])
yindices = get_indices(lena.shape[1])
lena1[xindices, yindices] = 0
plt.subplot(211)
plt.imshow(lena1)

Select array values between a quarter and three-quarters of the maximum value, and set them to 0, as shown in the...

Stride tricks for Sudoku

We can do even more fancy things with NumPy. The ndarray class has a field, strides, which is a tuple indicating the number of bytes to step in each dimension when going through an array. Sudoku is a popular puzzle originally from Japan; although it was known in a similar form before in other countries. If you don't know about Sudoku, it's maybe better that way because it is highly addictive. Let's apply some stride tricks to the problem of splitting a Sudoku puzzle to the 3 x 3 squares it is composed of:

First define the Sudoku puzzle array, as shown in the following code snippet. This one is filled with the contents of the actual solved Sudoku puzzle (part of the array is omitted for brevity).
```
sudoku = np.array([[2, 8, 7, 1, 6, 5, 9, 4, 3],[9, 5, 4, 7, 3, 2, 1, 6, 8],…[7, 3, 6, 2, 8, 4, 5, 1, 9]])
```
Now calculate the strides. The itemsize field of ndarray gives us the number of bytes in an array. itemsize calculates the strides as follows:
```
strides = sudoku.itemsize ...
```

Broadcasting arrays

In a nutshell, NumPy tries to perform an operation even though the operands do not have the same shape. In this section, we will multiply an array and a scalar. The scalar is extended to the shape of an array operand, and then the multiplication is performed. We will download an audio file and make a new version that is quieter:

First, read the WAV file. We will use standard Python code to download an audio file of Austin Powers saying "Smashing, baby". SciPy has a wavfile module that allows you to load sound data or generate WAV files. If SciPy is installed, then we should already have this module. The read() function returns a data array and sample rate. In this example, we only care about the data.
```
sample_rate, data = scipy.io.wavfile.read(WAV_FILE)
```
Plot the original WAV data with Matplotlib. Give the subplot the title, Original, as shown in the following lines of code:
```
plt.subplot(2, 1, 1)
plt.title("Original")
plt.plot(data)
```
Now create a new array. We will use NumPy to...

Summary

We learned a lot in this chapter about the NumPy fundamentals: data types and arrays. Arrays have several attributes describing them. We learned that one of these attributes is the data type which, in NumPy, is represented by a full-fledged object.

NumPy arrays can be sliced and indexed in an efficient manner, just as in the case of Python lists. NumPy arrays have the added ability of working with multiple dimensions.

The shape of an array can be manipulated in many ways, such as stacking, resizing, reshaping, and splitting. A great number of convenience functions for shape manipulation were demonstrated in this chapter.

Having learned about the basics, it's time to move on to data analysis with commonly used functions in Chapter 3, Basic Data Analysis with NumPy. This includes the usage of basic statistical and mathematical functions.

The rest of the chapter is locked

You have been reading a chapter from

Learning NumPy Array

Published in: Jun 2014 Publisher: ISBN-13: 9781783983902

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime}

Authors (1)

Ivan Idris

Ivan Idris has an MSc in experimental physics. His graduation thesis had a strong emphasis on applied computer science. After graduating, he worked for several companies as a Java developer, data warehouse developer, and QA analyst. His main professional interests are business intelligence, big data, and cloud computing. Ivan Idris enjoys writing clean, testable code and interesting technical articles. Ivan Idris is the author of NumPy 1.5. Beginner's Guide and NumPy Cookbook by Packt Publishing.

See other products by Ivan Idris

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

Aug 2023 7 hours 40 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

Aug 2023 22 hours 48 minutes

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

Sep 2023 8 hours 36 minutes

Building AI Applications with ChatGPT APIs

Sep 2023 8 hours 36 minutes

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Oct 2023 21 hours 12 minutes

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

Aug 2023 14 hours 0 minutes

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

Dec 2023 8 hours 0 minutes

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

Nov 2023 22 hours 8 minutes

Type	Description
`bool`	This stores boolean (True or False) as a bit
`inti`	This is a platform integer (normally either `int32` or `int64`)
`int8`	This is an integer ranging from-128 to 127
`int16`	This is an integer ranging from -32768 to 32767
`int32`	This is an integer ranging from -2 31 to 2 31 -1
`int64`	This is an integer ranging from -2 63 to 2 63 -1
`uint8`	This is an unsigned integer ranging from 0 to 255
`uint16`	This is an unsigned integer...