Packt+ | Advance your knowledge in tech

You're reading from Learning NumPy Array

Product typeBook

Published inJun 2014

Reading LevelIntermediate

Publisher

ISBN-139781783983902

Edition1st Edition

Languages

Python

Tools

NumPy

Concepts

Data Science

Author (1)

Ivan Idris

Chapter 7. The Scientific Python Ecosystem

SciPy is built on top of NumPy. It adds functionality such as numerical integration, optimization, statistics, and special functions. Historically, NumPy was part of SciPy but was then separated in order to be used by other Python libraries. These, when combined, define the common stack for scientific and numerical analysis. Of course, the stack itself is not set in stone; however, everybody agrees on NumPy being at the center of it all. The examples in this chapter should give you some idea about the power of the scientific Python ecosystem.

In this chapter, we will cover the following topics:

Numerical integration
Interpolation
Using Cython with NumPy
Clustering with scikit-learn
Detecting corners
Comparing NumPy to Blaze

Numerical integration

Numerical integration is integration using numerical methods instead of analytical methods. SciPy has a numerical integration package, scipy.integrate, which has no equivalent in NumPy. The quad function can integrate a one-variable function between two points. These points can be at infinity.

Note

The quad function uses the old and tried QUADPACK Fortran library under the hood.

The Gaussian integral is related to the error function, but has no finite limits. It evaluates to the square root of pi. Let's calculate the Gaussian integral with the quad function as shown in the following line of code:

print "Gaussian integral", np.sqrt(np.pi),integrate.quad(lambda x: np.exp(-x**2), -np.inf, np.inf)

The return value is the outcome, and its error would be:

Gaussian integral 1.77245385091 (1.7724538509055159, 1.4202636780944923e-08)

Interpolation

Interpolation predicts values within a range based on observations. For instance, we could have a relationship between two variables x and y and we have a set of observed x-y pairs. In this scenario, we could try to predict the y value given a range of x values. This range will start at the lowest x value already observed and end at the highest x value already observed. The scipy.interpolate function interpolates a function based on experimental data. The interp1d class can create a linear or cubic interpolation function. By default, a linear interpolation function is constructed, but if the kind parameter is set, a cubic interpolation function is created instead. The interp2d class works in the same way but is two dimensional.

We will create data points using a sinc function and then add some random noise to it. After that, we will do a linear and cubic interpolation and plot the results as follows:

Create the data points and add noise as follows:
```
x = np.linspace(-18, 18, 36...
```

Using Cython with NumPy

Cython is a relatively young programming language based on Python. The difference is that with Python we can optionally declare static types for variables in the code. Cython is a compiled language that generates CPython extension modules. Besides providing performance enhancement, a major use of Cython is interfacing already existing C/C++ software with Python.

We can integrate Cython and NumPy code in the same way that we can integrate Cython and Python code. Let's go through an example that analyses the ratio of up days (close higher than the previous day) for a stock. We will apply the formula for binomial proportion confidence (http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval). This indicates how significant the ratio is.

Write a .pyx file.
The .pyx files contain Cython code. Basically, Cython code is standard Python code with optional static type declarations added for variables. Let's write a .pyx file that contains a function that calculates...

Clustering stocks with scikit-learn

Scikit-learn is an open source software for machine learning. Clustering is a type of machine learning algorithm that aims to group items based on similarities.

Note

A legion of scikits exists. These are all open source scientific Python projects. For a list of scikits, please refer to https://scikits.appspot.com/scikits.

Clustering is unsupervised, which means that you don't have to create learning examples. The algorithm puts items in the appropriate bucket based on some measure of distance, so that items that are close to each other end up in the same bucket. In this example, we will use the log returns of stocks in the Dow Jones Industrial (DJI) Index to cluster.

Note

A myriad of clustering algorithms exist, and since this is a rapidly evolving field, new algorithms are invented each year. Due to the exigencies of this book, we cannot touch upon all of them. The interested reader can have a look at https://en.wikipedia.org/wiki/Cluster_analysis.

First, we...

Detecting corners

Corner detection is a standard technique in computer vision. Scikits-image (a package specialized in image processing) offers a Harris corner detector, which is great since corner detection is pretty complicated. Obviously, we could do it ourselves from scratch, but that would violate the cardinal rule of not reinventing the wheel. We will load a sample image from scikits-learn. This is not absolutely necessary for this example. You can use any other image instead.

Note

For more information on corner detection, please refer to https://en.wikipedia.org/wiki/Corner_detection.

You might need to install jpeglib on your system to be able to load the scikits-learn image, which is a JPEG file. If you are on Windows, use the installer; otherwise, download the distribution, unpack it, and build from the top folder with the following command line:

./configure
 make
  sudo make install

To detect corners of an image, perform the following steps:

Load the sample image.
Scikits-learn currently...

Comparing NumPy to Blaze

Since we are close to the end of the book, it seems appropriate to discuss the future of NumPy. The future of NumPy is Blaze, a new open source Python numerical library. Blaze is supposed to process Big Data better than NumPy ever can. Big Data can be defined in many ways. Here, we will define Big Data as data that cannot be stored in memory or even on a single machine. Usually, the data is distributed amongst several servers. Blaze should also be able to handle large quantities of streaming data that is never stored.

Note

Blaze can be found at http://blaze.pydata.org/.

Blaze, just like NumPy, allows scientists, analysts, and engineers to quickly write efficient code. Blaze, however, goes a step further and also takes care of the work related to distributing calculations as well as extracting and transforming data from a variety of data source types.

Blaze is centered around general multidimensional array and table abstractions. The classes in Blaze represent different...

Summary

In this chapter, we only scratched the surface of what is possible with the scientific Python ecosystem. We used some of the libraries that are considered, if not part of the common stack, then at least fundamental. We used interpolation and numerical integration provided by SciPy. Two of the dozens of algorithms in scikit-learn were demonstrated. We also saw Cython in action, which is technically a programming language in its own right. Finally, we had a look at Blaze, a library supposed to generalize and extend the principles of NumPy. This is in light of recent developments such as Big Data and Cloud Computing. Blaze and related projects are still in the incubation phase, but we can expect stable software to be produced in the near future. You can refer to http://continuum.io/developer-resources for some of these projects.

Unfortunately, we have come to the end of this book. Because of this book's format, that is the number of pages, you should have essential NumPy knowledge and...

The rest of the chapter is locked

You have been reading a chapter from

Learning NumPy Array

Published in: Jun 2014Publisher: ISBN-13: 9781783983902

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Ivan Idris

Ivan Idris has an MSc in experimental physics. His graduation thesis had a strong emphasis on applied computer science. After graduating, he worked for several companies as a Java developer, data warehouse developer, and QA analyst. His main professional interests are business intelligence, big data, and cloud computing. Ivan Idris enjoys writing clean, testable code and interesting technical articles. Ivan Idris is the author of NumPy 1.5. Beginner's Guide and NumPy Cookbook by Packt Publishing.
Read more about Ivan Idris

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages