Reader small image

You're reading from  NumPy Essentials

Product typeBook
Published inApr 2016
Reading LevelIntermediate
Publisher
ISBN-139781784393670
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Leo (Liang-Huan) Chin
Leo (Liang-Huan) Chin
author image
Leo (Liang-Huan) Chin

Leo (Liang-Huan) Chin is a data engineer with more than 5 years of experience in the field of Python. He works for Gogoro smart scooter, Taiwan, where his job entails discovering new and interesting biking patterns . His previous work experience includes ESRI, California, USA, which focused on spatial-temporal data mining. He loves data, analytics, and the stories behind data and analytics. He received an MA degree of GIS in geography from State University of New York, Buffalo. When Leo isn't glued to a computer screen, he spends time on photography, traveling, and exploring some awesome restaurants across the world. You can reach Leo at http://chinleock.github.io/portfolio/.
Read more about Leo (Liang-Huan) Chin

Tanmay Dutta
Tanmay Dutta
author image
Tanmay Dutta

Tanmay Dutta is a seasoned programmer with expertise in programming languages such as Python, Erlang, C++, Haskell, and F#. He has extensive experience in developing numerical libraries and frameworks for investment banking businesses. He was also instrumental in the design and development of a risk framework in Python (pandas, NumPy, and Django) for a wealth fund in Singapore. Tanmay has a master's degree in financial engineering from Nanyang Technological University, Singapore, and a certification in computational finance from Tepper Business School, Carnegie Mellon University.
Read more about Tanmay Dutta

Shane Holloway
Shane Holloway
author image
Shane Holloway

http://shaneholloway.com/resume/
Read more about Shane Holloway

View More author details
Right arrow

Chapter 3. Using NumPy Arrays

The beauty of NumPy Arrays is that you can use array indexing and slicing to quickly access your data or perform a computation while keeping the efficiency as the C arrays. There are also plenty of mathematical operations that are supported. In this chapter, we will take an in-depth look at using NumPy Arrays. After this chapter, you will feel comfortable using NumPy Arrays and the bulk of their functionality.

Here is a list of topics that will be covered in this chapter:

  • Basic operations and the attributes of NumPy Arrays
  • Universal functions (ufuncs) and helper functions
  • Broadcasting rules and shape manipulation
  • Masking NumPy Arrays

Vectorized operations


All NumPy operations are vectorized, where you apply operations to the whole array instead of on each element individually. This is not just neat and handy but also improves the performance of computation compared to using loops. In this section, we will experience the power of NumPy vectorized operations. A key idea worth keeping in mind before we start exploring this subject is to always think of entire sets of arrays instead of each element; this will help you enjoy learning about NumPy Arrays and their performance. Let's start by doing some simple calculations with scalars and between NumPy Arrays:

In [1]: import numpy as np 
In [2]: x = np.array([1, 2, 3, 4]) 
In [3]: x + 1 
Out[3]: array([2, 3, 4, 5]) 

All the elements in the array are added by 1 simultaneously. This is very different from Python or most other programming languages. The elements in a NumPy Array all have the same dtype; in the preceding example, this is numpy.int (this is either...

Universal functions (ufuncs)


NumPy has many universal functions (so-called ufuncs), so use them to your advantage to eliminate as many loops as you can to optimize your code. The ufuncs have a pretty good coverage in math, trigonometry, summary statistics, and comparison operations. For detailed ufunc lists, refer to the online documentation at http://docs.scipy.org/doc/numpy/reference/ufuncs.html .

Due to the large amount of ufuncs in NumPy, we can hardy cover all of them in a chapter. In this section, we only aim to understand how and why NumPy ufuncs should be used.

Getting started with basic ufuncs

Most ufuncs are either unary or binary, which means that they can take only one or two arguments and apply them, element-wise or in mathematics; this is referred to as a vectorized operation or a NumPy arithmetic operation, which we explained in previous sections. Here are some common ufuncs:

In [21]: x = np.arange(5,10) 
In [22]: np.square(x) 
Out[22]: array([25, 36, 49, 64, 81]) ...

Broadcasting and shape manipulation


NumPy operations are mostly done element-wise, which requires two arrays in an operation to have the same shape; however, this doesn't mean that NumPy operations can't take two differently shaped arrays (refer to the first example we looked at with scalars). NumPy provides the flexibility to broadcast a smaller-sized array across a larger one. But we can't broadcast the array to just about any shape. It needs to follow certain constrains; we will be covering them in this section. One key idea to keep in mind is that broadcasting involves performing meaningful operations over two differently shaped arrays. However, inappropriate broadcasting might lead to an inefficient use of memory that slows down computation.

Broadcasting rules

The general rule for broadcasting is to determine whether two arrays are compatible with dimensioning. There are two conditions that need to be met:

  • Two arrays should be of equal dimensions
  • One of them is 1

If the preceding conditions...

A boolean mask


Indexing and slicing are quite handy and powerful in NumPy, but with the booling mask it gets even better! Let's start by creating a boolean array first. Note that there is a special kind of array in NumPy named a masked array. Here, we are not talking about it but we're also going to explain how to extend indexing and slicing with NumPy Arrays:

In [58]: x = np.array([1,3,-1, 5, 7, -1]) 
In [59]: mask = (x < 0) 
In [60]: mask 
Out[60]: array([False, False,  True, False, False,  True], dtype=bool) 

We can see from the preceding example that by applying the < logic sign that we applied scalars to a NumPy Array and the naming of a new array to mask, it's still vectorized and returns the True/False boolean with the same shape of the variable x indicated which element in x meet the criteria:

In [61]: x [mask] = 0 
In [62]: x 
Out[62]: array([1, 3, 0, 5, 7, 0]) 

Using the mask, we gain the ability to access or replace any element value in our...

Helper functions


Besides the help() and dir() functions in Python and other online documentation, NumPy also provides a helper function, numpy.lookfor(), to help you find the right function you need. The argument is a string, and it can be in the form of a function name or anything related to it. Let's try to find out more about operations related to resize, which we took a look at in an earlier section:

In [71]: np.lookfor('resize') 
Search results for 'resize' 
--------------------------- 
numpy.ma.resize 
    Return a new masked array with the specified size and shape. 
numpy.chararray.resize 
    Change shape and size of array in-place. 
numpy.oldnumeric.ma.resize 
    The original array's total size can be any size. 
numpy.resize 
    Return a new array with the specified shape. 

Summary


In this chapter, we covered the basic operations of NumPy and its ufuncs. We took a look at the huge difference between NumPy operations and Python looping. We also took a look at how broadcasting works and what we should avoid. We tried to understand the concept of masking as well.

The best way to use NumPy Arrays is to eliminate loops as much as you can and use ufuncs in NumPy instead. Keep in mind the broadcasting rules and use them with care. Using slicing and indexing with masking makes your code more efficient. Most importantly, have fun while using it.

In the next few chapters, we will cover the core libs of NumPy, including date/time and a file I/O to help you extend your NumPy experience.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
NumPy Essentials
Published in: Apr 2016Publisher: ISBN-13: 9781784393670
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (3)

author image
Leo (Liang-Huan) Chin

Leo (Liang-Huan) Chin is a data engineer with more than 5 years of experience in the field of Python. He works for Gogoro smart scooter, Taiwan, where his job entails discovering new and interesting biking patterns . His previous work experience includes ESRI, California, USA, which focused on spatial-temporal data mining. He loves data, analytics, and the stories behind data and analytics. He received an MA degree of GIS in geography from State University of New York, Buffalo. When Leo isn't glued to a computer screen, he spends time on photography, traveling, and exploring some awesome restaurants across the world. You can reach Leo at http://chinleock.github.io/portfolio/.
Read more about Leo (Liang-Huan) Chin

author image
Tanmay Dutta

Tanmay Dutta is a seasoned programmer with expertise in programming languages such as Python, Erlang, C++, Haskell, and F#. He has extensive experience in developing numerical libraries and frameworks for investment banking businesses. He was also instrumental in the design and development of a risk framework in Python (pandas, NumPy, and Django) for a wealth fund in Singapore. Tanmay has a master's degree in financial engineering from Nanyang Technological University, Singapore, and a certification in computational finance from Tepper Business School, Carnegie Mellon University.
Read more about Tanmay Dutta

author image
Shane Holloway

http://shaneholloway.com/resume/
Read more about Shane Holloway