Reader small image

You're reading from  Mastering Numerical Computing with NumPy

Product typeBook
Published inJun 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788993357
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Umit Mert Cakmak
Umit Mert Cakmak
author image
Umit Mert Cakmak

Umit Mert Cakmak is a data scientist at IBM, where he excels at helping clients solve complex data science problems, from inception to delivery of deployable assets. His research spans multiple disciplines beyond his industry and he likes sharing his insights at conferences, universities, and meet-ups.
Read more about Umit Mert Cakmak

Tiago Antao
Tiago Antao
author image
Tiago Antao

Tiago Antao is a bioinformatician currently working in the field of genomics. A former computer scientist, Tiago moved into computational biology with an MSc in Bioinformatics from the Faculty of Sciences at the University of Porto (Portugal) and a PhD on the spread of drug-resistant malaria from the Liverpool School of Tropical Medicine (UK). Postdoctoral, Tiago has worked with human datasets at the University of Cambridge (UK) and with mosquito whole genome sequencing data at the University of Oxford (UK), before helping to set up the bioinformatics infrastructure at the University of Montana. He currently works as a data engineer in the biotechnology field in Boston, MA. He is one of the co-authors of Biopython, a major bioinformatics package written in Python.
Read more about Tiago Antao

Mert Cuhadaroglu
Mert Cuhadaroglu
author image
Mert Cuhadaroglu

Mert Cuhadaroglu is a BI Developer in EPAM, developing E2E analytics solutions for complex business problems in various industries, mostly investment banking, FMCG, media, communication, and pharma. He consistently uses advanced statistical models and ML algorithms to provide actionable insights. Throughout his career, he has worked in several other industries, such as banking and asset management. He continues his academic research in AI for trading algorithms.
Read more about Mert Cuhadaroglu

View More author details
Right arrow

NumPy, SciPy, Pandas, and Scikit-Learn

By now, you should be able to write small implementations with NumPy. Throughout the chapters, we aim to provide examples using other libraries as well and in this chapter, we should step back and look at the surrounding libraries that you can use along with NumPy for your projects.

We will be covering how other Python libraries complement NumPy in this chapter. We will be looking at the following topics:

  • NumPy and SciPy
  • NumPy and pandas
  • SciPy and scikit-learn

NumPy and SciPy

Until now, you have seen numerous examples of NumPy usage and only a few of SciPy. NumPy has array data type, which allows you to perform various array operations, such as sorting and reshaping.

NumPy has some numerical algorithms that can be used for tasks such as calculating norms, eigenvalues, and eigenvectors. However, if numerical algorithms are your focus, you should ideally use SciPy, as it includes a more comprehensive algorithm set, as well as the latest versions of the algorithms. SciPy has a lot of useful subpackages for certain kinds of analysis.

The following list will give you an overall idea of the subpackages:

  • Cluster: This subpackage includes clustering algorithms. It has two submodules, vq and hierarchy. The vq module provides functions for k-means clustering. The hierarchy module includes functions for hierarchical clustering.
  • Fftpack: This...

NumPy and pandas

When you think about it, NumPy is a fairly low-level array-manipulation library, and the majority of other Python libraries are written on top of it.

One of these libraries is pandas, which is a high-level data-manipulation library. When you are exploring a dataset, you usually perform operations such as calculating descriptive statistics, grouping by a certain characteristic, and merging. The pandas library has many friendly functions to perform these various useful operations.

Let's use a diabetes dataset in this example. The diabetes dataset in sklearn.datasets is standardized with a zero mean and unit L2 norm.

The dataset contains 442 records with 10 features: age, sex, body mass index, average blood pressure, and six blood serum measurements.

The target represents the disease progression after these baseline measures are taken. You can look at the data...

SciPy and scikit-learn

Scikit-learn is one of the SciKit libraries for machine learning, and it's built on top of SciPy. You can use it to perform regression analysis, as you've done in previous chapters with the scikit-learn library. Take a look at this code:

from sklearn import datasets, linear_model 
from sklearn.metrics import mean_squared_error, r2_score

diabetes = datasets.load_diabetes()

linreg = linear_model.LinearRegression()

linreg.fit(diabetes.data, diabetes.target)

# You can inspect the results by looking at evaluation metrics
print('Coeff.: n', linreg.coef_)
print("MSE: {}".format(mean_squared_error(diabetes.target, linreg.predict(diabetes.data)))) print('Variance Score: {}'.format(r2_score(diabetes.target, linreg.predict(diabetes.data))))
...

Summary

In this chapter, you practiced NumPy, SciPy, Pandas, and scikit-learn, using various examples, mainly for machine learning tasks. When you use Python data science libraries, there is usually more than one way of performing given task, and it usually helps to know more than one method.

You can either use alternatives for better implementations or for the sake of comparison. While trying different methods for a given task, you may either find different options that will allow you to further customize the implementation or simply observe some performance improvements.

The aim of this chapter was to show you these different options, and how flexible the Python language is because of its rich ecosystem of analytics libraries. In the next chapter, you will learn more about NumPy internals, such as how numpy manages data structures and memory, code profiling, and also tips for...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Numerical Computing with NumPy
Published in: Jun 2018Publisher: PacktISBN-13: 9781788993357
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Umit Mert Cakmak

Umit Mert Cakmak is a data scientist at IBM, where he excels at helping clients solve complex data science problems, from inception to delivery of deployable assets. His research spans multiple disciplines beyond his industry and he likes sharing his insights at conferences, universities, and meet-ups.
Read more about Umit Mert Cakmak

author image
Tiago Antao

Tiago Antao is a bioinformatician currently working in the field of genomics. A former computer scientist, Tiago moved into computational biology with an MSc in Bioinformatics from the Faculty of Sciences at the University of Porto (Portugal) and a PhD on the spread of drug-resistant malaria from the Liverpool School of Tropical Medicine (UK). Postdoctoral, Tiago has worked with human datasets at the University of Cambridge (UK) and with mosquito whole genome sequencing data at the University of Oxford (UK), before helping to set up the bioinformatics infrastructure at the University of Montana. He currently works as a data engineer in the biotechnology field in Boston, MA. He is one of the co-authors of Biopython, a major bioinformatics package written in Python.
Read more about Tiago Antao

author image
Mert Cuhadaroglu

Mert Cuhadaroglu is a BI Developer in EPAM, developing E2E analytics solutions for complex business problems in various industries, mostly investment banking, FMCG, media, communication, and pharma. He consistently uses advanced statistical models and ML algorithms to provide actionable insights. Throughout his career, he has worked in several other industries, such as banking and asset management. He continues his academic research in AI for trading algorithms.
Read more about Mert Cuhadaroglu