In this article by Armando Fandango author of the book Python Data Analysis - Second Edition, discuss how the NumPy provides a multidimensional array object called ndarray. NumPy arrays are typed arrays of fixed size. Python lists are heterogeneous and thus elements of a list may contain any object type, while NumPy arrays are homogenous and can contain object of only one type. An ndarray consists of two parts, which are as follows:
- The actual data that is stored in a contiguous block of memory
- The metadata describing the actual data
Since the actual data is stored in a contiguous block of memory hence loading of the large data set as ndarray is affected by availability of large enough contiguous block of memory. Most of the array methods and functions in NumPy leave the actual data unaffected and only modify the metadata.
Actually, we made a one-dimensional array that held a set of numbers. The ndarray can have more than a single dimension.
(For more resources related to this topic, see here.)
Advantages of NumPy arrays
The NumPy array is, in general, homogeneous (there is a particular record array type that is heterogeneous)—the items in the array have to be of the same type. The advantage is that if we know that the items in an array are of the same type, it is easy to ascertain the storage size needed for the array. NumPy arrays can execute vectorized operations, processing a complete array, in contrast to Python lists, where you usually have to loop through the list and execute the operation on each element. NumPy arrays are indexed from 0, just like lists in Python. NumPy utilizes an optimized C API to make the array operations particularly quick.
We will make an array with the arange() subroutine again. You will see snippets from Jupyter Notebook sessions where NumPy is already imported with instruction import numpy as np. Here's how to get the data type of an array:
In: a = np.arange(5)
In: a.dtype
Out: dtype('int64')
The data type of the array a is int64 (at least on my computer), but you may get int32 as the output if you are using 32-bit Python. In both the cases, we are dealing with integers (64 bit or 32 bit). Besides the data type of an array, it is crucial to know its shape. A vector is commonly used in mathematics but most of the time we need higher-dimensional objects. Let's find out the shape of the vector we produced a few minutes ago:
In: a
Out: array([0, 1, 2, 3, 4])
In: a.shape
Out: (5,)
As you can see, the vector has five components with values ranging from 0 to 4.
The shape property of the array is a tuple; in this instance, a tuple of 1 element, which holds the length in each dimension.
Creating a multidimensional array
Now that we know how to create a vector, we are set to create a multidimensional NumPy array. After we produce the matrix, we will again need to show its, as demonstrated in the following code snippets:
- Create a multidimensional array as follows:
In: m = np.array([np.arange(2), np.arange(2)]) In: m Out: array([[0, 1], [0, 1]])
- We can show the array shape as follows:
In: m.shape Out: (2, 2)
We made a 2 x 2 array with the arange() subroutine. The array() function creates
an array from an object that you pass to it. The object has to be an array, for example,
a Python list. In the previous example, we passed a list of arrays. The object is the only required parameter of the array() function. NumPy functions tend to have
a heap of optional arguments with predefined default options.
Selecting NumPy array elements
From time to time, we will wish to select a specific constituent of an array. We will take a look at how to do this, but to kick off, let's make a 2 x 2 matrix again:
In: a = np.array([[1,2],[3,4]])
In: a
Out:
array([[1, 2],
[3, 4]])
The matrix was made this time by giving the array() function a list of lists. We will now choose each item of the matrix one at a time, as shown in the following code snippet. Recall that the index numbers begin from 0:
In: a[0,0]
Out: 1
In: a[0,1]
Out: 2
In: a[1,0]
Out: 3
In: a[1,1]
Out: 4
As you can see, choosing elements of an array is fairly simple. For the array a, we just employ the notation a[m,n], where m and n are the indices of the item in the array. Have a look at the following figure for your reference:
NumPy numerical types
Python has an integer type, a float type, and complex type; nonetheless, this is not sufficient for scientific calculations. In practice, we still demand more data types with varying precisions and, consequently, different storage sizes of the type. For this reason, NumPy has many more data types. The bulk of the NumPy mathematical types ends with a number. This number designates the count of bits related to the type. The following table (adapted from the NumPy user guide) presents an overview of NumPy numerical types:
Type |
Description |
bool |
Boolean (True or False) stored as a bit |
inti |
Platform integer (normally either int32 or int64) |
int8 |
Byte (-128 to 127) |
int16 |
Integer (-32768 to 32767) |
int32 |
Integer (-2 ** 31 to 2 ** 31 -1) |
int64 |
Integer (-2 ** 63 to 2 ** 63 -1) |
uint8 |
Unsigned integer (0 to 255) |
uint16 |
Unsigned integer (0 to 65535) |
uint32 |
Unsigned integer (0 to 2 ** 32 - 1) |
uint64 |
Unsigned integer (0 to 2 ** 64 - 1) |
float16 |
Half precision float: sign bit, 5 bits exponent, and 10 bits mantissa |
float32 |
Single precision float: sign bit, 8 bits exponent, and 23 bits mantissa |
float64 or float |
Double precision float: sign bit, 11 bits exponent, and 52 bits mantissa |
complex64 |
Complex number, represented by two 32-bit floats (real and imaginary components) |
complex128 or complex |
Complex number, represented by two 64-bit floats (real and imaginary components) |
For each data type, there exists a matching conversion function:
In: np.float64(42)
Out: 42.0
In: np.int8(42.0)
Out: 42
In: np.bool(42)
Out: True
In: np.bool(0)
Out: False
In: np.bool(42.0)
Out: True
In: np.float(True)
Out: 1.0
In: np.float(False)
Out: 0.0
Many functions have a data type argument, which is frequently optional:
In: np.arange(7, dtype= np.uint16)
Out: array([0, 1, 2, 3, 4, 5, 6], dtype=uint16)
It is important to be aware that you are not allowed to change a complex number into an integer. Attempting to do that sparks off a TypeError:
In: np.int(42.0 + 1.j)
Traceback (most recent call last):
<ipython-input-24-5c1cd108488d> in <module>()
----> 1 np.int(42.0 + 1.j)
TypeError: can't convert complex to int
The same goes for conversion of a complex number into a floating-point number.
By the way, the j component is the imaginary coefficient of a complex number.
Even so, you can convert a floating-point number to a complex number, for example, complex(1.0). The real and imaginary pieces of a complex number can be pulled out with the real() and imag() functions, respectively.
Data type objects
Data type objects are instances of the numpy.dtype class. Once again, arrays have
a data type. To be exact, each element in a NumPy array has the same data type.
The data type object can tell you the size of the data in bytes. The size in bytes is given by the itemsize property of the dtype class :
In: a.dtype.itemsize
Out: 8
Character codes
Character codes are included for backward compatibility with Numeric. Numeric is the predecessor of NumPy. Its use is not recommended, but the code is supplied here because it pops up in various locations. You should use the dtype object instead. The following table lists several different data types and character codes related to them:
Type |
Character code |
integer |
i |
Unsigned integer |
u |
Single precision float |
f |
Double precision float |
d |
bool |
b |
complex |
D |
string |
S |
unicode |
U |
Void |
V |
Take a look at the following code to produce an array of single precision floats:
In: arange(7, dtype='f')
Out: array([ 0., 1., 2., 3., 4., 5., 6.], dtype=float32)
Likewise, the following code creates an array of complex numbers:
In: arange(7, dtype='D')
In: arange(7, dtype='D')
Out: array([ 0.+0.j, 1.+0.j, 2.+0.j, 3.+0.j, 4.+0.j, 5.+0.j,
6.+0.j])
The dtype constructors
We have a variety of means to create data types. Take the case of floating-point data (have a look at dtypeconstructors.py in this book's code bundle):
- We can use the general Python float, as shown in the following lines of code:
In: np.dtype(float) Out: dtype('float64')
- We can specify a single precision float with a character code:
In: np.dtype('f') Out: dtype('float32')
- We can use a double precision float with a character code:
In: np.dtype('d') Out: dtype('float64')
- We can pass the dtype constructor a two-character code. The first character stands for the type; the second character is a number specifying the number of bytes in the type (the numbers 2, 4, and 8 correspond to floats of 16, 32, and 64 bits, respectively):
In: np.dtype('f8') Out: dtype('float64')
A (truncated) list of all the full data type codes can be found by applying sctypeDict.keys():
In: np.sctypeDict.keys()
In: np.sctypeDict.keys()
Out: dict_keys(['?', 0, 'byte', 'b', 1, 'ubyte', 'B', 2, 'short', 'h', 3,
'ushort', 'H', 4, 'i', 5, 'uint', 'I', 6, 'intp', 'p', 7, 'uintp', 'P', 8,
'long', 'l', 'L', 'longlong', 'q', 9, 'ulonglong', 'Q', 10, 'half', 'e', 23, 'f', 11,
'double', 'd', 12, 'longdouble', 'g', 13, 'cfloat', 'F', 14, 'cdouble', 'D', 15,
'clongdouble', 'G', 16, 'O', 17, 'S', 18, 'unicode', 'U', 19, 'void', 'V', 20, 'M', 21, 'm', 22,
'bool8', 'Bool', 'b1', 'float16', 'Float16', 'f2', 'float32', 'Float32', 'f4', 'float64', '
Float64', 'f8', 'float128', 'Float128', 'f16', 'complex64', 'Complex32', 'c8', 'complex128',
'Complex64', 'c16', 'complex256', 'Complex128', 'c32', 'object0', 'Object0', 'bytes0', 'Bytes0',
'str0', 'Str0', 'void0', 'Void0', 'datetime64', 'Datetime64', 'M8', 'timedelta64',
'Timedelta64', 'm8', 'int64', 'uint64', 'Int64', 'UInt64', 'i8', 'u8', 'int32', 'uint32',
'Int32', 'UInt32', 'i4', 'u4', 'int16', 'uint16', 'Int16', 'UInt16', 'i2', 'u2',
'int8', 'uint8', 'Int8', 'UInt8', 'i1', 'u1', 'complex_', 'int0', 'uint0', 'single',
'csingle', 'singlecomplex', 'float_', 'intc', 'uintc', 'int_', 'longfloat', 'clongfloat',
'longcomplex', 'bool_', 'unicode_', 'object_', 'bytes_', 'str_', 'string_',
'int', 'float', 'complex', 'bool', 'object', 'str', 'bytes', 'a'])
The dtype attributes
The dtype class has a number of useful properties. For instance, we can get information about the character code of a data type through the properties of dtype:
In: t = np.dtype('Float64')
In: t.char
Out: 'd'
The type attribute corresponds to the type of object of the array elements:
In: t.type
Out: numpy.float64
The str attribute of dtype gives a string representation of a data type. It begins with a character representing endianness, if appropriate, then a character code, succeeded by a number corresponding to the number of bytes that each array item needs. Endianness, here, entails the way bytes are ordered inside a 32- or 64-bit word. In the big-endian order, the most significant byte is stored first, indicated by >. In the little-endian order, the least significant byte is stored first, indicated by <, as exemplified in the following lines of code:
In: t.str
Out: '<f8'
One-dimensional slicing and indexing
Slicing of one-dimensional NumPy arrays works just like the slicing of standard Python lists. Let's define an array containing the numbers 0, 1, 2, and so on up to and including 8. We can select a part of the array from indexes 3 to 7, which extracts the elements of the arrays 3 through 6:
In: a = np.arange(9)
In: a[3:7]
Out: array([3, 4, 5, 6])
We can choose elements from indexes the 0 to 7 with an increment of 2:
In: a[:7:2]
Out: array([0, 2, 4, 6])
Just as in Python, we can use negative indices and reverse the array:
In: a[::-1]
Out: array([8, 7, 6, 5, 4, 3, 2, 1, 0])
Manipulating array shapes
We have already learned about the reshape() function. Another repeating chore is the flattening of arrays. Flattening in this setting entails transforming a multidimensional array into a one-dimensional array. Let us create an array b that we shall use for practicing the further examples:
In: b = np.arange(24).reshape(2,3,4)
In: print(b)
Out: [[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
We can manipulate array shapes using the following functions:
- Ravel: We can accomplish this with the ravel() function as follows:
In: b Out: array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]) In: b.ravel() Out: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])
- Flatten: The appropriately named function, flatten(), does the same
as ravel(). However, flatten() always allocates new memory, whereas ravel gives back a view of the array. This means that we can directly manipulate the array as follows:
In: b.flatten() Out: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])
- Setting the shape with a tuple: Besides the reshape() function, we can also define the shape straightaway with a tuple, which is exhibited as follows:
In: b.shape = (6,4) In: b Out: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]])
As you can understand, the preceding code alters the array immediately. Now, we have a 6 x 4 array.
- Transpose: In linear algebra, it is common to transpose matrices. Transposing is a way to transform data. For a two-dimensional table, transposing means that rows become columns and columns become rows. We can do this too
by using the following code:
In: b.transpose() Out: array([[ 0, 4, 8, 12, 16, 20], [ 1, 5, 9, 13, 17, 21], [ 2, 6, 10, 14, 18, 22], [ 3, 7, 11, 15, 19, 23]])
- Resize: The resize() method works just like the reshape() method,
In: b.resize((2,12)) In: b Out: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])
Stacking arrays
Arrays can be stacked horizontally, depth wise, or vertically. We can use, for this goal, the vstack(), dstack(), hstack(), column_stack(), row_stack(), and concatenate() functions. To start with, let's set up some arrays:
In: a = np.arange(9).reshape(3,3)
In: a
Out:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In: b = 2 * a
In: b
Out:
array([[ 0, 2, 4],
[ 6, 8, 10],
[12, 14, 16]])
As mentioned previously, we can stack arrays using the following techniques:
- Horizontal stacking: Beginning with horizontal stacking, we will shape a tuple of ndarrays and hand it to the hstack() function to stack the arrays.
This is shown as follows:
In: np.hstack((a, b)) Out: array([[ 0, 1, 2, 0, 2, 4], [ 3, 4, 5, 6, 8, 10], [ 6, 7, 8, 12, 14, 16]])
We can attain the same thing with the concatenate() function, which is shown as follows:
In: np.concatenate((a, b), axis=1) Out: array([[ 0, 1, 2, 0, 2, 4], [ 3, 4, 5, 6, 8, 10], [ 6, 7, 8, 12, 14, 16]])
The following diagram depicts horizontal stacking:
- Vertical stacking: With vertical stacking, a tuple is formed again. This time it is given to the vstack() function to stack the arrays. This can be seen as follows:
In: np.vstack((a, b)) Out: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 0, 2, 4], [ 6, 8, 10], [12, 14, 16]])
The concatenate() function gives the same outcome with the axis parameter fixed to 0. This is the default value for the axis parameter,
as portrayed in the following code:In: np.concatenate((a, b), axis=0) Out: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 0, 2, 4], [ 6, 8, 10], [12, 14, 16]])
Refer to the following figure for vertical stacking:
- Depth stacking: To boot, there is the depth-wise stacking employing dstack() and a tuple, of course. This entails stacking a list of arrays
along the third axis (depth). For example, we could stack 2D arrays
of image data on top of each other as follows:
In: np.dstack((a, b)) Out: array([[[ 0, 0], [ 1, 2], [ 2, 4]], [[ 3, 6], [ 4, 8], [ 5, 10]], [[ 6, 12], [ 7, 14], [ 8, 16]]])
- Column stacking: The column_stack() function stacks 1D arrays column-wise. This is shown as follows:
In: oned = np.arange(2) In: oned Out: array([0, 1]) In: twice_oned = 2 * oned In: twice_oned Out: array([0, 2]) In: np.column_stack((oned, twice_oned)) Out: array([[0, 0], [1, 2]])
2D arrays are stacked the way the hstack() function stacks them, as demonstrated in the following lines of code:
In: np.column_stack((a, b)) Out: array([[ 0, 1, 2, 0, 2, 4], [ 3, 4, 5, 6, 8, 10], [ 6, 7, 8, 12, 14, 16]]) In: np.column_stack((a, b)) == np.hstack((a, b)) Out: array([[ True, True, True, True, True, True], [ True, True, True, True, True, True], [ True, True, True, True, True, True]], dtype=bool)
Yes, you guessed it right! We compared two arrays with the == operator.
- Row stacking: NumPy, naturally, also has a function that does row-wise stacking. It is named row_stack() and for 1D arrays, it just stacks the arrays in rows into a 2D array:
In: np.row_stack((oned, twice_oned)) Out: array([[0, 1], [0, 2]])
The row_stack() function results for 2D arrays are equal to the vstack() function results:
In: np.row_stack((a, b)) Out: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 0, 2, 4], [ 6, 8, 10], [12, 14, 16]]) In: np.row_stack((a,b)) == np.vstack((a, b)) Out: array([[ True, True, True], [ True, True, True], [ True, True, True], [ True, True, True], [ True, True, True], [ True, True, True]], dtype=bool)
Splitting NumPy arrays
Arrays can be split vertically, horizontally, or depth wise. The functions involved
are hsplit(), vsplit(), dsplit(), and split(). We can split arrays either into arrays of the same shape or indicate the location after which the split should happen. Let's look at each of the functions in detail:
- Horizontal splitting: The following code splits a 3 x 3 array on its horizontal axis into three parts of the same size and shape (see splitting.py in this book's code bundle):
In: a Out: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) In: np.hsplit(a, 3) Out: [array([[0], [3], [6]]), array([[1], [4], [7]]), array([[2], [5], [8]])]
Liken it with a call of the split() function, with an additional argument, axis=1:
In: np.split(a, 3, axis=1) Out: [array([[0], [3], [6]]), array([[1], [4], [7]]), array([[2], [5], [8]])]
- Vertical splitting: vsplit() splits along the vertical axis:
In: np.vsplit(a, 3) Out: [array([[0, 1, 2]]), array([[3, 4, 5]]), array([[6, 7, 8]])]
The split() function, with axis=0, also splits along the vertical axis:
In: np.split(a, 3, axis=0) Out: [array([[0, 1, 2]]), array([[3, 4, 5]]), array([[6, 7, 8]])]
- Depth-wise splitting: The dsplit() function, unsurprisingly, splits depth-wise. We will require an array of rank 3 to begin with:
In: c = np.arange(27).reshape(3, 3, 3) In: c Out: array([[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8]], [[ 9, 10, 11], [12, 13, 14], [15, 16, 17]], [[18, 19, 20], [21, 22, 23], [24, 25, 26]]]) In: np.dsplit(c, 3) Out: [array([[[ 0], [ 3], [ 6]], [[ 9], [12], [15]], [[18], [21], [24]]]), array([[[ 1], [ 4], [ 7]], [[10], [13], [16]], [[19], [22], [25]]]), array([[[ 2], [ 5], [ 8]], [[11], [14], [17]], [[20], [23], [26]]])]
NumPy array attributes
Let's learn more about the NumPy array attributes with the help of an example. Let us create an array b that we shall use for practicing the further examples:
In: b = np.arange(24).reshape(2, 12)
In: b
Out:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])
Besides the shape and dtype attributes, ndarray has a number of other properties, as shown in the following list:
- ndim gives the number of dimensions, as shown in the following code snippet:
In: b.ndim Out: 2
- size holds the count of elements. This is shown as follows:
In: b.size Out: 24
- itemsize returns the count of bytes for each element in the array, as shown in the following code snippet:
In: b.itemsize Out: 8
- If you require the full count of bytes the array needs, you can have a look at nbytes. This is just a product of the itemsize and size properties:
In: b.nbytes Out: 192 In: b.size * b.itemsize Out: 192
- The T property has the same result as the transpose() function, which is shown as follows:
In: b.resize(6,4) In: b Out: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]) In: b.T Out: array([[ 0, 4, 8, 12, 16, 20], [ 1, 5, 9, 13, 17, 21], [ 2, 6, 10, 14, 18, 22], [ 3, 7, 11, 15, 19, 23]])
- If the array has a rank of less than 2, we will just get a view of the array:
In: b.ndim Out: 1 In: b.T Out: array([0, 1, 2, 3, 4])
- Complex numbers in NumPy are represented by j. For instance, we can produce an array with complex numbers as follows:
In: b = np.array([1.j + 1, 2.j + 3]) In: b Out: array([ 1.+1.j, 3.+2.j])
- The real property returns to us the real part of the array, or the array itself if it only holds real numbers:
In: b.real Out: array([ 1., 3.])
- The imag property holds the imaginary part of the array:
In: b.imag Out: array([ 1., 2.])
- If the array holds complex numbers, then the data type will automatically be complex as well:
In: b.dtype Out: dtype('complex128') In: b.dtype.str Out: '<c16'
- The flat property gives back a numpy.flatiter object. This is the only means to get a flatiter object; we do not have access to a flatiter constructor. The flat iterator enables us to loop through an array as if it were a flat array, as shown in the following code snippet:
In: b = np.arange(4).reshape(2,2) In: b Out: array([[0, 1], [2, 3]]) In: f = b.flat In: f Out: <numpy.flatiter object at 0x103013e00> In: for item in f: print(item) Out: 0 1 2 3
It is possible to straightaway obtain an element with the flatiter object:
In: b.flat[2] Out: 2
Also, you can obtain multiple elements as follows:
In: b.flat[[1,3]] Out: array([1, 3])
The flat property can be set. Setting the value of the flat property leads to overwriting the values of the entire array:
In: b.flat = 7 In: b Out: array([[7, 7], [7, 7]])
We can also obtain selected elements as follows:
In: b.flat[[1,3]] = 1 In: b Out: array([[7, 1], [7, 1]])
The next diagram illustrates various properties of ndarray:
Converting arrays
We can convert a NumPy array to a Python list with the tolist() function . The following is a brief explanation:
- Convert to a list:
In: b Out: array([ 1.+1.j, 3.+2.j]) In: b.tolist() Out: [(1+1j), (3+2j)]
- The astype() function transforms the array to an array of the specified data type:
In: b Out: array([ 1.+1.j, 3.+2.j]) In: b.astype(int) /usr/local/lib/python3.5/site-packages/ipykernel/__main__.py:1: ComplexWarning: Casting complex values to real discards the imaginary part … Out: array([1, 3]) In: b.astype('complex') Out: array([ 1.+1.j, 3.+2.j])
We are dropping off the imaginary part when casting from the complex type to int. The astype() function takes the name of a data type as a string too.
The preceding code won't display a warning this time because we used the right data type.
Summary
In this article, we found out a heap about the NumPy basics: data types and arrays. Arrays have various properties that describe them. You learned that one of these properties is the data type, which, in NumPy, is represented by a full-fledged object.
NumPy arrays can be sliced and indexed in an effective way, compared to standard Python lists. NumPy arrays have the extra ability to work with multiple dimensions.
The shape of an array can be modified in multiple ways, such as stacking, resizing, reshaping, and splitting.
Resources for Article:
Further resources on this subject:
- Big Data Analytics [article]
- Python Data Science Up and Running [article]
- R and its Diverse Possibilities [article]