Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
NumPy Essentials

You're reading from  NumPy Essentials

Product type Book
Published in Apr 2016
Publisher
ISBN-13 9781784393670
Pages 156 pages
Edition 1st Edition
Languages
Authors (3):
Leo (Liang-Huan) Chin Leo (Liang-Huan) Chin
Profile icon Leo (Liang-Huan) Chin
Tanmay Dutta Tanmay Dutta
Profile icon Tanmay Dutta
Shane Holloway Shane Holloway
Profile icon Shane Holloway
View More author details

Table of Contents (16) Chapters

NumPy Essentials
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
An Introduction to NumPy The NumPy ndarray Object Using NumPy Arrays NumPy Core and Libs Submodules Linear Algebra in NumPy Fourier Analysis in NumPy Building and Distributing NumPy Code Speeding Up NumPy with Cython Introduction to the NumPy C-API Further Reading

Chapter 10. Further Reading

NumPy is a powerful scientific module in Python; hopefully, in the previous nine chapters, we have shown you enough to prove this to you. ndarray is the core of all other Python scientific modules. The best way to use NumPy is by using numpy.ndarray as the basic data format and combining it with other scientific modules for preprocess, analyze, compute, export, and so on. In this chapter, our focus is on introducing you to a couple of modules that can work with NumPy and make your work/research more efficient.

In this chapter, we will be covering the following topics:

  • pandas
  • scikit-learn
  • netCDF4
  • scipy

pandas


pandas is, by far, the most preferable data preprocessing module in Python. The way it handles data is very similar to R. Its data frame not only gives you visually appealing printouts of tables, but also allows you to access data in a more instinctive way. If you are not familiar with R, try to think of using a spreadsheet software such as Microsoft Excel or SQL tables but in a programmatic way. This covers a lot of that what pandas does.

You can download and install pandas from its official site at http://pandas.pydata.org/. A more preferable way is to use pip or install Python scientific distributions, such as Anaconda.

Remember how we used numpy.genfromtxt() to read the csv data in Chapter 4NumPy Core and Libs Submodules? Actually, using pandas to read tables and pass pre-processed data to ndarray (simply performing np.array(data_frame) will transfer a data frame into a multidimensional ndarray) would be a more preferable workflow for analytics. In this section, we are going to...

scikit-learn


Scikit is short for SciPy Toolkits, which are add-on packages for SciPy. It provides a wide range of analytics modules and scikit-learn is one of them; this is by far the most comprehensive machine learning module for Python. scikit-learn provides a simple and efficient way to perform data mining and data analysis, and it has a very active user community.

You can download and install scikit-learn from its official website at http://scikit-learn.org/stable/. If you are using a Python scientific distribution, such as Anaconda, it is included here as well.

Now, it's time for some machine learning using scikit-learn. One of the advantages of scikit-learn is that it provides some sample datasets (demo datasets) for practice. Let's load the diabetes dataset first.

In [1]: from sklearn.datasets import load_diabetes 
In [2]: diabetes = load_diabetes() 
In [3]: diabetes.data 
Out[3]: 
array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226, 
         0.01990842...

netCDF4


netCDF4 is the fourth version of the netCDF library that's implemented on top of HDF5 (Hierarchical Data Format, designed to store and organize large amounts of data), which makes it possible to manage extremely large and complex multidimensional data. The greatest advantage of netCDF4 is that it is a completely portable file format with no limit on the number or size of data objects in a collection, and it's appendable while being archivable as well. Many scientific research organizations use it for data storage. Python also has an interface to access and create this type of data format.

You can download and install the module from its official documentation page at http://unidata.github.io/netcdf4-python/, or clone it from its GitHub repository at https://github.com/Unidata/netcdf4-python. It's not included in the standard Python Scientific distribution, but it's built into NumPy and can build with Cython (this is recommended but not required).

For the following example, we are going...

SciPy


SciPy is a well-known Python library focusing on scientific computing (it contains modules for optimization, linear algebra, integration, interpolation, and special functions such as FFT, signal, and image processing). It builds on the NumPy Array object, and NumPy is part of the whole SciPy stack (remember that we introduced the Scientific Python family in Chapter 1An Introduction to NumPy). However, the SciPy module contains various topics that we can't cover in just one section. Let's look at an example of image processing (noise removal) to help you get some idea of what SciPy can do:

In [1]: from scipy.misc import imread, imsave, ascent 
In [2]: import matplotlib.pyplot as plt 
In [3]: image_data = ascent() 

First, we import three functions from SciPy's miscellaneous routines: imreadimsave, and ascent. In the following example, we use the built-in image ascent, which is a 512 by 512 greyscale image. Of course, you may use your own image; simply call imread('your_image_name...

Summary


NumPy is certainly the core to scientific computation using Python: many modules are based on it. Although sometimes you might find that NumPy has no analytics modules, it certainly provides you with a way of reaching out to a wide range of scientific modules.

We hope the last chapter of this book has given you a good idea about using these modules with NumPy and makes your script more efficient (there are still so many handy NumPy modules we can't cover in this book; just spend an afternoon on GitHub or PyPI, and you may find a handful of them). Last but not least, thank you for spending time with us going through so many functions. Have some fun with NumPy now!

lock icon The rest of the chapter is locked
arrow left Previous Chapter
You have been reading a chapter from
NumPy Essentials
Published in: Apr 2016 Publisher: ISBN-13: 9781784393670
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}