Reader small image

You're reading from  Mastering Numerical Computing with NumPy

Product typeBook
Published inJun 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788993357
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Umit Mert Cakmak
Umit Mert Cakmak
author image
Umit Mert Cakmak

Umit Mert Cakmak is a data scientist at IBM, where he excels at helping clients solve complex data science problems, from inception to delivery of deployable assets. His research spans multiple disciplines beyond his industry and he likes sharing his insights at conferences, universities, and meet-ups.
Read more about Umit Mert Cakmak

Tiago Antao
Tiago Antao
author image
Tiago Antao

Tiago Antao is a bioinformatician currently working in the field of genomics. A former computer scientist, Tiago moved into computational biology with an MSc in Bioinformatics from the Faculty of Sciences at the University of Porto (Portugal) and a PhD on the spread of drug-resistant malaria from the Liverpool School of Tropical Medicine (UK). Postdoctoral, Tiago has worked with human datasets at the University of Cambridge (UK) and with mosquito whole genome sequencing data at the University of Oxford (UK), before helping to set up the bioinformatics infrastructure at the University of Montana. He currently works as a data engineer in the biotechnology field in Boston, MA. He is one of the co-authors of Biopython, a major bioinformatics package written in Python.
Read more about Tiago Antao

Mert Cuhadaroglu
Mert Cuhadaroglu
author image
Mert Cuhadaroglu

Mert Cuhadaroglu is a BI Developer in EPAM, developing E2E analytics solutions for complex business problems in various industries, mostly investment banking, FMCG, media, communication, and pharma. He consistently uses advanced statistical models and ML algorithms to provide actionable insights. Throughout his career, he has worked in several other industries, such as banking and asset management. He continues his academic research in AI for trading algorithms.
Read more about Mert Cuhadaroglu

View More author details
Right arrow

Working with multidimensional arrays

This section will give you a brief understanding of multidimensional arrays by going through different matrix operations.

In order to do matrix multiplication in NumPy, you have to use dot() instead of *. Let's see some examples:

In [66]: c = np.ones((4, 4))
c*c
Out[66]: array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
In [67]: c.dot(c)
Out[67]: array([[ 4., 4., 4., 4.],
[ 4., 4., 4., 4.],
[ 4., 4., 4., 4.],
[ 4., 4., 4., 4.]])

The most important topic in working with multidimensional arrays is stacking, in other words how to merge two arrays. hstack is used for stacking arrays horizontally (column-wise) and vstack is used for stacking arrays vertically (row-wise). You can also split the columns with the hsplit and vsplit methods in the same way that you stacked them:

In [68]: y = np.arange(15).reshape(3,5)
x = np.arange(10).reshape(2,5)
new_array = np.vstack((y,x))
new_array
Out[68]: array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9]])
In [69]: y = np.arange(15).reshape(5,3)
x = np.arange(10).reshape(5,2)
new_array = np.hstack((y,x))
new_array
Out[69]: array([[ 0, 1, 2, 0, 1],
[ 3, 4, 5, 2, 3],
[ 6, 7, 8, 4, 5],
[ 9, 10, 11, 6, 7],
[12, 13, 14, 8, 9]])

These methods are very useful in machine learning applications, especially when creating datasets. After you stack your arrays, you can check their descriptive statistics by using Scipy.stats. Imagine a case where you have 100 records, and each record has 10 features, which means you have a 2D matrix which has 100 rows and 10 columns. The following example shows how you can easily get some descriptive statistics for each feature:

In [70]: from scipy import stats
x= np.random.rand(100,10)
n, min_max, mean, var, skew, kurt = stats.describe(x)
new_array = np.vstack((mean,var,skew,kurt,min_max[0],min_max[1]))
new_array.T
Out[70]: array([[ 5.46011575e-01, 8.30007104e-02, -9.72899085e-02,
-1.17492785e+00, 4.07031246e-04, 9.85652100e-01],
[ 4.79292653e-01, 8.13883169e-02, 1.00411352e-01,
-1.15988275e+00, 1.27241020e-02, 9.85985488e-01],
[ 4.81319367e-01, 8.34107619e-02, 5.55926602e-02,
-1.20006450e+00, 7.49534810e-03, 9.86671083e-01],
[ 5.26977277e-01, 9.33829059e-02, -1.12640661e-01,
-1.19955646e+00, 5.74237697e-03, 9.94980830e-01],
[ 5.42622228e-01, 8.92615897e-02, -1.79102183e-01,
-1.13744108e+00, 2.27821933e-03, 9.93861532e-01],
[ 4.84397369e-01, 9.18274523e-02, 2.33663872e-01,
-1.36827574e+00, 1.18986562e-02, 9.96563489e-01],
[ 4.41436165e-01, 9.54357485e-02, 3.48194314e-01,
-1.15588500e+00, 1.77608372e-03, 9.93865324e-01],
[ 5.34834409e-01, 7.61735119e-02, -2.10467450e-01,
-1.01442389e+00, 2.44706226e-02, 9.97784091e-01],
[ 4.90262346e-01, 9.28757119e-02, 1.02682367e-01,
-1.28987137e+00, 2.97705706e-03, 9.98205307e-01],
[ 4.42767478e-01, 7.32159267e-02, 1.74375646e-01,
-9.58660574e-01, 5.52410464e-04, 9.95383732e-01]])

NumPy has a great module named numpy.ma, which is used for masking array elements. It's very useful when you want to mask (ignore) some elements while doing your calculations. When NumPy masks, it will be treated as an invalid and does not take into account computation:

In [71]: import numpy.ma as ma
x = np.arange(6)
print(x.mean())
masked_array = ma.masked_array(x, mask=[1,0,0,0,0,0])
masked_array.mean()
2.5
Out[71]: 3.0

In the preceding code, you have an array x = [0,1,2,3,4,5]. What you do is mask the first element of the array and then calculate the mean. When an element is masked as 1(True), the associated index value in the array will be masked. This method is also very useful while replacing the NAN values:

In [72]: x = np.arange(25, dtype = float).reshape(5,5)
x[x<5] = np.nan
x
Out[72]: array([[ nan, nan, nan, nan, nan],
[ 5., 6., 7., 8., 9.],
[ 10., 11., 12., 13., 14.],
[ 15., 16., 17., 18., 19.],
[ 20., 21., 22., 23., 24.]])
In [73]: np.where(np.isnan(x), ma.array(x, mask=np.isnan(x)).mean(axis=0), x)
Out[73]: array([[ 12.5, 13.5, 14.5, 15.5, 16.5],
[ 5. , 6. , 7. , 8. , 9. ],
[ 10. , 11. , 12. , 13. , 14. ],
[ 15. , 16. , 17. , 18. , 19. ],
[ 20. , 21. , 22. , 23. , 24. ]])

In preceding code, we changed the value of the first five elements to nan by putting a condition with index. x[x<5] refers to the elements which indexed for 0, 1, 2, 3, and 4. Then we overwrite these values with the mean of each column(excluding nan values). There are many other useful methods in array operations in order help your code be more concise:

Method
Description
np.concatenate
Join to the matrix in a sequence with a given matrix
np.repeat
Repeat the element of an array along a specific axis
np.delete
Return a new array with the deleted subarrays
np.insert
Insert values before the specified axis
np.unique
Find unique values in an array
np.tile
Create an array by repeating a given input for a given number of repetitions
Previous PageNext Page
You have been reading a chapter from
Mastering Numerical Computing with NumPy
Published in: Jun 2018Publisher: PacktISBN-13: 9781788993357
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Umit Mert Cakmak

Umit Mert Cakmak is a data scientist at IBM, where he excels at helping clients solve complex data science problems, from inception to delivery of deployable assets. His research spans multiple disciplines beyond his industry and he likes sharing his insights at conferences, universities, and meet-ups.
Read more about Umit Mert Cakmak

author image
Tiago Antao

Tiago Antao is a bioinformatician currently working in the field of genomics. A former computer scientist, Tiago moved into computational biology with an MSc in Bioinformatics from the Faculty of Sciences at the University of Porto (Portugal) and a PhD on the spread of drug-resistant malaria from the Liverpool School of Tropical Medicine (UK). Postdoctoral, Tiago has worked with human datasets at the University of Cambridge (UK) and with mosquito whole genome sequencing data at the University of Oxford (UK), before helping to set up the bioinformatics infrastructure at the University of Montana. He currently works as a data engineer in the biotechnology field in Boston, MA. He is one of the co-authors of Biopython, a major bioinformatics package written in Python.
Read more about Tiago Antao

author image
Mert Cuhadaroglu

Mert Cuhadaroglu is a BI Developer in EPAM, developing E2E analytics solutions for complex business problems in various industries, mostly investment banking, FMCG, media, communication, and pharma. He consistently uses advanced statistical models and ML algorithms to provide actionable insights. Throughout his career, he has worked in several other industries, such as banking and asset management. He continues his academic research in AI for trading algorithms.
Read more about Mert Cuhadaroglu