Reader small image

You're reading from  NumPy Essentials

Product typeBook
Published inApr 2016
Reading LevelIntermediate
Publisher
ISBN-139781784393670
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Leo (Liang-Huan) Chin
Leo (Liang-Huan) Chin
author image
Leo (Liang-Huan) Chin

Leo (Liang-Huan) Chin is a data engineer with more than 5 years of experience in the field of Python. He works for Gogoro smart scooter, Taiwan, where his job entails discovering new and interesting biking patterns . His previous work experience includes ESRI, California, USA, which focused on spatial-temporal data mining. He loves data, analytics, and the stories behind data and analytics. He received an MA degree of GIS in geography from State University of New York, Buffalo. When Leo isn't glued to a computer screen, he spends time on photography, traveling, and exploring some awesome restaurants across the world. You can reach Leo at http://chinleock.github.io/portfolio/.
Read more about Leo (Liang-Huan) Chin

Tanmay Dutta
Tanmay Dutta
author image
Tanmay Dutta

Tanmay Dutta is a seasoned programmer with expertise in programming languages such as Python, Erlang, C++, Haskell, and F#. He has extensive experience in developing numerical libraries and frameworks for investment banking businesses. He was also instrumental in the design and development of a risk framework in Python (pandas, NumPy, and Django) for a wealth fund in Singapore. Tanmay has a master's degree in financial engineering from Nanyang Technological University, Singapore, and a certification in computational finance from Tepper Business School, Carnegie Mellon University.
Read more about Tanmay Dutta

Shane Holloway
Shane Holloway
author image
Shane Holloway

http://shaneholloway.com/resume/
Read more about Shane Holloway

View More author details
Right arrow

Chapter 2. The NumPy ndarray Object

Array-oriented computing is the very heart of computational sciences. It is something that most Python programmers are not accustomed to. Though list or dictionary comprehension is relative to an array and sometimes used similarly to an array, there is a huge difference between a list/dictionary and an array in terms of performance and manipulation. This chapter introduces a basic array object in NumPy. It covers the information that can be gleaned from the intrinsic characteristics of NumPy arrays without performing any external operations on the array.

The topics that will be covered in the chapter are as follows:

  • numpy.ndarray and how to use it-basic array-oriented computing
  • Performance of numpy.ndarray-memory access, storage, and retrieval
  • Indexing, slicing, views, and copies
  • Array data types

Getting started with numpy.ndarray


In this section, we will go over some of the internals of numpy ndarray, including its structure and behavior. Let's start. Type in the following statements in the IPython prompt:

In [1]: import numpy as np 
 
In [2]: x = np.array([[1,2,3],[2,3,4]]) 
 
In [3]: print(x)

NumPy shares the names of its functions with functions in other modules, such as the math module in the Python standard library. Using imports like the following there is not recommended:

from numpy import * 

As it may overwrite many functions that are already in the global namespace, which is not recommended. This may lead to unexpected behavior from your code and may introduce very subtle bugs in it . This may also create conflicts in the code itself, (example numPy has any and will cause conflicts with the system any keyword) and may cause confusion when reviewing or debugging a piece of code. Therefore, it is important and recommended to always follow the import...

Array indexing and slicing


NumPy provides powerful indexing capabilities for arrays. Indexing capabilities in NumPy became so popular that many of them were added back to Python.

Indexing NumPy arrays, in many ways, is very similar to indexing lists or tuples. There are some differences, which will become apparent as we proceed. To start with, let's create an array that has 100 x 100 dimensions:

In [9]: x = np.random.random((100, 100)) 

Simple integer indexing works by typing indices within a pair of square brackets and placing this next to the array variable. This is a widely used Python construct. Any object that has a __getitem__ method will respond to such indexing. Thus, to access the element in the 42nd row and 87th column, just type this:

In [10]: y = x[42, 87] 

Like lists and other Python sequences, the use of a colon to index a range of values is also supported. The following statement will print the k th row of the x matrix.

In [11]: print(x[k, :]) 

The colon can be...

Memory layout of ndarray


A particularly interesting attribute of the ndarray object is flags. Type the following code:

In [12]: x.flags 

It should produce something like this:

Out[12]: 
  C_CONTIGUOUS : True 
  F_CONTIGUOUS : False 
  OWNDATA : True 
  WRITEABLE : True 
  ALIGNED : True 
  UPDATEIFCOPY : False 

The flags attribute holds information about the memory layout of the array. The C_CONTIGUOUS field in the output indicates whether the array was a C-style array. This means that the indexing of this array is done like a C array. This is also called row-major indexing in the case of 2D arrays. This means that, when moving through the array, the row index is incremented first, and then the column index is incremented. In the case of a multidimensional C-style array, the last dimension is incremented first, followed by the last but one, and so on.

Similarly, the F_CONTIGUOUS attribute indicates whether the array is a Fortran-style array. Such an array...

Views and copies


There are primarily two ways of accessing data by slicing and indexing. They are called copies and views: you can either access elements directly from an array, or create a copy of the array that contains only the accessed elements. Since a view is a reference of the original array (in Python, all variables are references), modifying a view modifies the original array too. This is not true for copies.

The may_share_memory function in NumPy miscellaneous routines can be used to determine whether two arrays are copies or views of each other. While this method does the job in most cases, it is not always reliable, since it uses heuristics. It may return incorrect results too. For introductory purposes, however, we shall take it for granted.

Generally, slicing an array creates a view and indexing it creates a copy. Let us study these differences through a few code snippets. First, let's create a random 100x10 array.

In [21]: x = np.random.rand(100,10) 

Now, let us extract...

Creating arrays


Arrays can be created in a number of ways, for instance from other data structures, by reading files on disk, or from the Web. For the purposes of this chapter, whose aim is to familiarize us with the core characteristics of a NumPy array, we will be creating arrays using lists or various NumPy functions.

Creating arrays from lists

The simplest way to create an array is using the array function. To create a valid array object, arguments to array functions need to adhere to at least one of the following conditions:

  • It has to be a valid iterable value or sequence, which may be nested
  • It must have an __array__ method that returns a valid numpy array

Consider the following snippet:

In [32]: x = np.array([1, 2, 3]) 
 
In [33]: y = np.array(['hello', 'world']) 

The first condition is always true for Python lists and tuples. When creating an array from lists or tuples, the input may consist of different (heterogeneous) data types. The array function, however, will normally...

Array data types


Data types are another important intrinsic aspect of a NumPy array alongside its memory layout and indexing. The data type of a NumPy array can be found by simply checking the dtype attribute of the array. Try out the following examples to check the data types of different arrays:

In [49]: x = np.random.random((10,10)) 
 
In [50]: x.dtype 
Out[50]: dtype('float64') 
In [51]: x = np.array(range(10)) 
 
In [52]: x.dtype 
Out[52]: dtype('int32') 
 
In [53]: x = np.array(['hello', 'world']) 
 
In [54]: x.dtype 
Out [54]: dtype('S5') 

Many array creation functions provide a default array data type. For example, the np.zeros and np.ones functions create arrays that are full of floats by default. But it is possible to make them create arrays of other data types too. Consider the following examples that demonstrate how to use the dtype argument to create arrays of arbitrary data types.

In [55]: x = np.ones((10, 10),...

Summary


In this chapter, we covered some basics of the NumPy ndarray object. We studied some elementary ways of creating NumPy arrays. We also took a look at the differences between copies and views of arrays and how these can affect using indexing and slicing. We saw the subtle differences between the memory layouts offered by NumPy. We are now equipped with the basic vocabulary of the ndarray object and can get started on the core functionality of NumPy. In the next chapter, we will explore more details of ndarray and show you some of them using certain tricks and tips (universal functions and shape manipulation) to make your NumPy script speed up!

lock icon
The rest of the chapter is locked
You have been reading a chapter from
NumPy Essentials
Published in: Apr 2016Publisher: ISBN-13: 9781784393670
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (3)

author image
Leo (Liang-Huan) Chin

Leo (Liang-Huan) Chin is a data engineer with more than 5 years of experience in the field of Python. He works for Gogoro smart scooter, Taiwan, where his job entails discovering new and interesting biking patterns . His previous work experience includes ESRI, California, USA, which focused on spatial-temporal data mining. He loves data, analytics, and the stories behind data and analytics. He received an MA degree of GIS in geography from State University of New York, Buffalo. When Leo isn't glued to a computer screen, he spends time on photography, traveling, and exploring some awesome restaurants across the world. You can reach Leo at http://chinleock.github.io/portfolio/.
Read more about Leo (Liang-Huan) Chin

author image
Tanmay Dutta

Tanmay Dutta is a seasoned programmer with expertise in programming languages such as Python, Erlang, C++, Haskell, and F#. He has extensive experience in developing numerical libraries and frameworks for investment banking businesses. He was also instrumental in the design and development of a risk framework in Python (pandas, NumPy, and Django) for a wealth fund in Singapore. Tanmay has a master's degree in financial engineering from Nanyang Technological University, Singapore, and a certification in computational finance from Tepper Business School, Carnegie Mellon University.
Read more about Tanmay Dutta

author image
Shane Holloway

http://shaneholloway.com/resume/
Read more about Shane Holloway