Reader small image

You're reading from  NumPy Essentials

Product typeBook
Published inApr 2016
Reading LevelIntermediate
Publisher
ISBN-139781784393670
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Leo (Liang-Huan) Chin
Leo (Liang-Huan) Chin
author image
Leo (Liang-Huan) Chin

Leo (Liang-Huan) Chin is a data engineer with more than 5 years of experience in the field of Python. He works for Gogoro smart scooter, Taiwan, where his job entails discovering new and interesting biking patterns . His previous work experience includes ESRI, California, USA, which focused on spatial-temporal data mining. He loves data, analytics, and the stories behind data and analytics. He received an MA degree of GIS in geography from State University of New York, Buffalo. When Leo isn't glued to a computer screen, he spends time on photography, traveling, and exploring some awesome restaurants across the world. You can reach Leo at http://chinleock.github.io/portfolio/.
Read more about Leo (Liang-Huan) Chin

Tanmay Dutta
Tanmay Dutta
author image
Tanmay Dutta

Tanmay Dutta is a seasoned programmer with expertise in programming languages such as Python, Erlang, C++, Haskell, and F#. He has extensive experience in developing numerical libraries and frameworks for investment banking businesses. He was also instrumental in the design and development of a risk framework in Python (pandas, NumPy, and Django) for a wealth fund in Singapore. Tanmay has a master's degree in financial engineering from Nanyang Technological University, Singapore, and a certification in computational finance from Tepper Business School, Carnegie Mellon University.
Read more about Tanmay Dutta

Shane Holloway
Shane Holloway
author image
Shane Holloway

http://shaneholloway.com/resume/
Read more about Shane Holloway

View More author details
Right arrow

Chapter 10. Further Reading

NumPy is a powerful scientific module in Python; hopefully, in the previous nine chapters, we have shown you enough to prove this to you. ndarray is the core of all other Python scientific modules. The best way to use NumPy is by using numpy.ndarray as the basic data format and combining it with other scientific modules for preprocess, analyze, compute, export, and so on. In this chapter, our focus is on introducing you to a couple of modules that can work with NumPy and make your work/research more efficient.

In this chapter, we will be covering the following topics:

  • pandas
  • scikit-learn
  • netCDF4
  • scipy

pandas


pandas is, by far, the most preferable data preprocessing module in Python. The way it handles data is very similar to R. Its data frame not only gives you visually appealing printouts of tables, but also allows you to access data in a more instinctive way. If you are not familiar with R, try to think of using a spreadsheet software such as Microsoft Excel or SQL tables but in a programmatic way. This covers a lot of that what pandas does.

You can download and install pandas from its official site at http://pandas.pydata.org/. A more preferable way is to use pip or install Python scientific distributions, such as Anaconda.

Remember how we used numpy.genfromtxt() to read the csv data in Chapter 4NumPy Core and Libs Submodules? Actually, using pandas to read tables and pass pre-processed data to ndarray (simply performing np.array(data_frame) will transfer a data frame into a multidimensional ndarray) would be a more preferable workflow for analytics. In this section, we are going to...

scikit-learn


Scikit is short for SciPy Toolkits, which are add-on packages for SciPy. It provides a wide range of analytics modules and scikit-learn is one of them; this is by far the most comprehensive machine learning module for Python. scikit-learn provides a simple and efficient way to perform data mining and data analysis, and it has a very active user community.

You can download and install scikit-learn from its official website at http://scikit-learn.org/stable/. If you are using a Python scientific distribution, such as Anaconda, it is included here as well.

Now, it's time for some machine learning using scikit-learn. One of the advantages of scikit-learn is that it provides some sample datasets (demo datasets) for practice. Let's load the diabetes dataset first.

In [1]: from sklearn.datasets import load_diabetes 
In [2]: diabetes = load_diabetes() 
In [3]: diabetes.data 
Out[3]: 
array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226, 
         0.01990842...

netCDF4


netCDF4 is the fourth version of the netCDF library that's implemented on top of HDF5 (Hierarchical Data Format, designed to store and organize large amounts of data), which makes it possible to manage extremely large and complex multidimensional data. The greatest advantage of netCDF4 is that it is a completely portable file format with no limit on the number or size of data objects in a collection, and it's appendable while being archivable as well. Many scientific research organizations use it for data storage. Python also has an interface to access and create this type of data format.

You can download and install the module from its official documentation page at http://unidata.github.io/netcdf4-python/, or clone it from its GitHub repository at https://github.com/Unidata/netcdf4-python. It's not included in the standard Python Scientific distribution, but it's built into NumPy and can build with Cython (this is recommended but not required).

For the following example, we are going...

SciPy


SciPy is a well-known Python library focusing on scientific computing (it contains modules for optimization, linear algebra, integration, interpolation, and special functions such as FFT, signal, and image processing). It builds on the NumPy Array object, and NumPy is part of the whole SciPy stack (remember that we introduced the Scientific Python family in Chapter 1An Introduction to NumPy). However, the SciPy module contains various topics that we can't cover in just one section. Let's look at an example of image processing (noise removal) to help you get some idea of what SciPy can do:

In [1]: from scipy.misc import imread, imsave, ascent 
In [2]: import matplotlib.pyplot as plt 
In [3]: image_data = ascent() 

First, we import three functions from SciPy's miscellaneous routines: imreadimsave, and ascent. In the following example, we use the built-in image ascent, which is a 512 by 512 greyscale image. Of course, you may use your own image; simply call imread('your_image_name...

Summary


NumPy is certainly the core to scientific computation using Python: many modules are based on it. Although sometimes you might find that NumPy has no analytics modules, it certainly provides you with a way of reaching out to a wide range of scientific modules.

We hope the last chapter of this book has given you a good idea about using these modules with NumPy and makes your script more efficient (there are still so many handy NumPy modules we can't cover in this book; just spend an afternoon on GitHub or PyPI, and you may find a handful of them). Last but not least, thank you for spending time with us going through so many functions. Have some fun with NumPy now!

lock icon
The rest of the chapter is locked
You have been reading a chapter from
NumPy Essentials
Published in: Apr 2016Publisher: ISBN-13: 9781784393670
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (3)

author image
Leo (Liang-Huan) Chin

Leo (Liang-Huan) Chin is a data engineer with more than 5 years of experience in the field of Python. He works for Gogoro smart scooter, Taiwan, where his job entails discovering new and interesting biking patterns . His previous work experience includes ESRI, California, USA, which focused on spatial-temporal data mining. He loves data, analytics, and the stories behind data and analytics. He received an MA degree of GIS in geography from State University of New York, Buffalo. When Leo isn't glued to a computer screen, he spends time on photography, traveling, and exploring some awesome restaurants across the world. You can reach Leo at http://chinleock.github.io/portfolio/.
Read more about Leo (Liang-Huan) Chin

author image
Tanmay Dutta

Tanmay Dutta is a seasoned programmer with expertise in programming languages such as Python, Erlang, C++, Haskell, and F#. He has extensive experience in developing numerical libraries and frameworks for investment banking businesses. He was also instrumental in the design and development of a risk framework in Python (pandas, NumPy, and Django) for a wealth fund in Singapore. Tanmay has a master's degree in financial engineering from Nanyang Technological University, Singapore, and a certification in computational finance from Tepper Business School, Carnegie Mellon University.
Read more about Tanmay Dutta

author image
Shane Holloway

http://shaneholloway.com/resume/
Read more about Shane Holloway