Reader small image

You're reading from  Learning NumPy Array

Product typeBook
Published inJun 2014
Reading LevelIntermediate
Publisher
ISBN-139781783983902
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Ivan Idris
Ivan Idris
author image
Ivan Idris

Ivan Idris has an MSc in experimental physics. His graduation thesis had a strong emphasis on applied computer science. After graduating, he worked for several companies as a Java developer, data warehouse developer, and QA analyst. His main professional interests are business intelligence, big data, and cloud computing. Ivan Idris enjoys writing clean, testable code and interesting technical articles. Ivan Idris is the author of NumPy 1.5. Beginner's Guide and NumPy Cookbook by Packt Publishing.
Read more about Ivan Idris

Right arrow

Chapter 7. The Scientific Python Ecosystem

SciPy is built on top of NumPy. It adds functionality such as numerical integration, optimization, statistics, and special functions. Historically, NumPy was part of SciPy but was then separated in order to be used by other Python libraries. These, when combined, define the common stack for scientific and numerical analysis. Of course, the stack itself is not set in stone; however, everybody agrees on NumPy being at the center of it all. The examples in this chapter should give you some idea about the power of the scientific Python ecosystem.

In this chapter, we will cover the following topics:

  • Numerical integration

  • Interpolation

  • Using Cython with NumPy

  • Clustering with scikit-learn

  • Detecting corners

  • Comparing NumPy to Blaze

Numerical integration


Numerical integration is integration using numerical methods instead of analytical methods. SciPy has a numerical integration package, scipy.integrate, which has no equivalent in NumPy. The quad function can integrate a one-variable function between two points. These points can be at infinity.

Note

The quad function uses the old and tried QUADPACK Fortran library under the hood.

The Gaussian integral is related to the error function, but has no finite limits. It evaluates to the square root of pi. Let's calculate the Gaussian integral with the quad function as shown in the following line of code:

print "Gaussian integral", np.sqrt(np.pi),integrate.quad(lambda x: np.exp(-x**2), -np.inf, np.inf)

The return value is the outcome, and its error would be:

Gaussian integral 1.77245385091 (1.7724538509055159, 1.4202636780944923e-08)

Interpolation


Interpolation predicts values within a range based on observations. For instance, we could have a relationship between two variables x and y and we have a set of observed x-y pairs. In this scenario, we could try to predict the y value given a range of x values. This range will start at the lowest x value already observed and end at the highest x value already observed. The scipy.interpolate function interpolates a function based on experimental data. The interp1d class can create a linear or cubic interpolation function. By default, a linear interpolation function is constructed, but if the kind parameter is set, a cubic interpolation function is created instead. The interp2d class works in the same way but is two dimensional.

We will create data points using a sinc function and then add some random noise to it. After that, we will do a linear and cubic interpolation and plot the results as follows:

  1. Create the data points and add noise as follows:

    x = np.linspace(-18, 18, 36...

Using Cython with NumPy


Cython is a relatively young programming language based on Python. The difference is that with Python we can optionally declare static types for variables in the code. Cython is a compiled language that generates CPython extension modules. Besides providing performance enhancement, a major use of Cython is interfacing already existing C/C++ software with Python.

We can integrate Cython and NumPy code in the same way that we can integrate Cython and Python code. Let's go through an example that analyses the ratio of up days (close higher than the previous day) for a stock. We will apply the formula for binomial proportion confidence (http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval). This indicates how significant the ratio is.

  1. Write a .pyx file.

    The .pyx files contain Cython code. Basically, Cython code is standard Python code with optional static type declarations added for variables. Let's write a .pyx file that contains a function that calculates...

Clustering stocks with scikit-learn


Scikit-learn is an open source software for machine learning. Clustering is a type of machine learning algorithm that aims to group items based on similarities.

Note

A legion of scikits exists. These are all open source scientific Python projects. For a list of scikits, please refer to https://scikits.appspot.com/scikits.

Clustering is unsupervised, which means that you don't have to create learning examples. The algorithm puts items in the appropriate bucket based on some measure of distance, so that items that are close to each other end up in the same bucket. In this example, we will use the log returns of stocks in the Dow Jones Industrial (DJI) Index to cluster.

Note

A myriad of clustering algorithms exist, and since this is a rapidly evolving field, new algorithms are invented each year. Due to the exigencies of this book, we cannot touch upon all of them. The interested reader can have a look at https://en.wikipedia.org/wiki/Cluster_analysis.

First, we...

Detecting corners


Corner detection is a standard technique in computer vision. Scikits-image (a package specialized in image processing) offers a Harris corner detector, which is great since corner detection is pretty complicated. Obviously, we could do it ourselves from scratch, but that would violate the cardinal rule of not reinventing the wheel. We will load a sample image from scikits-learn. This is not absolutely necessary for this example. You can use any other image instead.

Note

For more information on corner detection, please refer to https://en.wikipedia.org/wiki/Corner_detection.

You might need to install jpeglib on your system to be able to load the scikits-learn image, which is a JPEG file. If you are on Windows, use the installer; otherwise, download the distribution, unpack it, and build from the top folder with the following command line:

./configure
 make
  sudo make install

To detect corners of an image, perform the following steps:

  1. Load the sample image.

    Scikits-learn currently...

Comparing NumPy to Blaze


Since we are close to the end of the book, it seems appropriate to discuss the future of NumPy. The future of NumPy is Blaze, a new open source Python numerical library. Blaze is supposed to process Big Data better than NumPy ever can. Big Data can be defined in many ways. Here, we will define Big Data as data that cannot be stored in memory or even on a single machine. Usually, the data is distributed amongst several servers. Blaze should also be able to handle large quantities of streaming data that is never stored.

Note

Blaze can be found at http://blaze.pydata.org/.

Blaze, just like NumPy, allows scientists, analysts, and engineers to quickly write efficient code. Blaze, however, goes a step further and also takes care of the work related to distributing calculations as well as extracting and transforming data from a variety of data source types.

Blaze is centered around general multidimensional array and table abstractions. The classes in Blaze represent different...

Summary


In this chapter, we only scratched the surface of what is possible with the scientific Python ecosystem. We used some of the libraries that are considered, if not part of the common stack, then at least fundamental. We used interpolation and numerical integration provided by SciPy. Two of the dozens of algorithms in scikit-learn were demonstrated. We also saw Cython in action, which is technically a programming language in its own right. Finally, we had a look at Blaze, a library supposed to generalize and extend the principles of NumPy. This is in light of recent developments such as Big Data and Cloud Computing. Blaze and related projects are still in the incubation phase, but we can expect stable software to be produced in the near future. You can refer to http://continuum.io/developer-resources for some of these projects.

Unfortunately, we have come to the end of this book. Because of this book's format, that is the number of pages, you should have essential NumPy knowledge and...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning NumPy Array
Published in: Jun 2014Publisher: ISBN-13: 9781783983902
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Ivan Idris

Ivan Idris has an MSc in experimental physics. His graduation thesis had a strong emphasis on applied computer science. After graduating, he worked for several companies as a Java developer, data warehouse developer, and QA analyst. His main professional interests are business intelligence, big data, and cloud computing. Ivan Idris enjoys writing clean, testable code and interesting technical articles. Ivan Idris is the author of NumPy 1.5. Beginner's Guide and NumPy Cookbook by Packt Publishing.
Read more about Ivan Idris