Reader small image

You're reading from  Learning NumPy Array

Product typeBook
Published inJun 2014
Reading LevelIntermediate
Publisher
ISBN-139781783983902
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Ivan Idris
Ivan Idris
author image
Ivan Idris

Ivan Idris has an MSc in experimental physics. His graduation thesis had a strong emphasis on applied computer science. After graduating, he worked for several companies as a Java developer, data warehouse developer, and QA analyst. His main professional interests are business intelligence, big data, and cloud computing. Ivan Idris enjoys writing clean, testable code and interesting technical articles. Ivan Idris is the author of NumPy 1.5. Beginner's Guide and NumPy Cookbook by Packt Publishing.
Read more about Ivan Idris

Right arrow

Chapter 5. Signal Processing Techniques

We will learn about some signal-processing techniques in this chapter, and we will analyze time-series data with these. As example data, we will use the sunspot data provided by a Belgian scientific institute. We can download this data from several places on the Internet, and it is also provided as sample data by the statsmodels library. There are a number of things we can do with the data, such as:

  • Trying to determine periodic cycles within the data. This can be done, but this is a bit advanced, so we will just get you started.

  • Smoothing the data to filter out noise.

  • Forecasting.

Introducing the Sunspot data


Sunspots are dark spots visible on the Sun's surface. This phenomenon has been studied for many centuries by astronomers. Evidence has been found for periodic sunspot cycles. We can download up-to-date annual sunspot data from http://www.quandl.com/SIDC/SUNSPOTS_A-Sunspot-Numbers-Annual. This is provided by the Belgian Solar Influences Data Analysis Center. The data goes back to 1700 and contains more than 300 annual averages. In order to determine sunspot cycles, scientists successfully used the Hilbert-Huang transform (refer to http://en.wikipedia.org/wiki/Hilbert%E2%80%93Huang_transform). A major part of this transform is the so-called Empirical Mode Decomposition (EMD) method. The entire algorithm contains many iterative steps, and we will cover only some of them here. EMD reduces data to a group of Intrinsic Mode Functions (IMF). You can compare this to the way Fast Fourier Transform decomposes a signal in a superposition of sine and cosine terms.

Extracting...

Moving averages


Moving averages are tools commonly used to analyze time-series data. A moving average defines a window of previously seen data that is averaged each time the window slides forward one period. The different types of moving average differ essentially in the weights used for averaging. The exponential moving average, for instance, has exponentially decreasing weights with time. This means that older values have less influence than newer values, which is sometimes desirable.

We can express an equal-weight strategy for the simple moving average as follows in the NumPy code:

weights = np.exp(np.linspace(-1., 0., N))
weights /= weights.sum()

A simple moving average uses equal weights which, in code, looks as follows:

def sma(arr, n):
   weights = np.ones(n) / n

   return np.convolve(weights, arr)[n-1:-n+1]

The following code plots the simple moving average for the 11- and 22-year sunspot cycle:

import numpy as np
import sys
import matplotlib.pyplot as plt

data = np.loadtxt(sys.argv...

Smoothing functions


Smoothing can help us get rid of noise and outliers in raw data. This, for instance, makes it easier to spot trends in the data. NumPy provides a number of smoothing functions.

Note

These functions can calculate weights in a sliding window as we did in the previous example (for more background information, visit http://en.wikipedia.org/wiki/Window_function).

These functions, except the kaiser function, require only one parameter—the size of the window, which we will set to 22 for the middle cycle of the sunspot data. The kaiser function also needs a beta parameter. With this parameter, the kaiser function can mimic the other functions.

The NumPy documentation recommends a starting value of 14 for the beta parameter, so that is what we are going to use too. The code is straightforward and given as follows (the data here is limited to the last 50 years only for easier comparison in the plots):

import numpy as np
import sys
import matplotlib.pyplot as plt

def smooth(weights...

Forecasting with an ARMA model


In the previous chapter, Chapter 4, Simple Predictive Analytics with NumPy, we learned about autoregressive models. ARMA is a generalization of these models that adds an extra component—the moving average. ARMA models are frequently used to predict values of a time-series. These models combine autoregressive and moving-average models. Autoregressive models predict values by assuming that a linear combination is formed by the previously encountered values. For instance, we can consider a linear combination, which is formed from the previous value in the time-series and the value before that. This is also named an AR(2) model since we are using components that lag two periods. In our case, we would be looking at the number of sunspots one year before and two years before the period we are predicting. In an ARMA model, we try to model the residues that we cannot explain from the previous period data (also known as unexpected components). Here, a linear combination...

Filtering a signal


Another common signal processing technique is filtering. This is a big topic, and we could create all sorts of filters. We will only create a very basic filter here. Again, we will use the sunspot data as input.

The iirdesign function, as its name suggests, allows us to construct several types of analog and digital filters.

Designing the filter

Design the filter with the iirdesign function of the scipy.signal module.

Note

IIR stands for Infinite Impulse Response; for more information, visit http://en.wikipedia.org/wiki/Infinite_impulse_response.

We are not going to go into all the details of the iirdesign function. Have a look at the documentation if necessary at http://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.iirdesign.html. In short, the following are the parameters we will set:

  • Frequencies normalized from 0 to 1.

  • Maximum loss.

  • Minimum attenuation.

  • Filter type.

Designing the filter can be done with the following code:

b,a = scipy.signal.iirdesign(wp=0.2, ws=0.1...

Demonstrating cointegration


Cointegration is similar to correlation, but it is considered by many to be a better metric to define the relatedness of two time-series. The usual way to explain the difference between cointegration and correlation is to take the example of a drunken man and his dog. Correlation tells you something about the direction in which they are going. Cointegration relates to their distance over time, which in this case is constrained by the leash of the dog. We will demonstrate cointegration using computer-generated time-series and real data. The data can be downloaded from Quandl in CSV format.

The Augmented Dickey Fuller (ADF) test can be used to measure the cointegration of time-series; proceed with the following steps to demonstrate cointegration:

  1. Define the following function to calculate the ADF statistic.

    def calc_adf(x, y):
        result = stat.OLS(x, y).fit()    
        return ts.adfuller(result.resid)
  2. Generate a sine value and calculate the cointegration of the value...

Summary


In this chapter, we learned a number of sophisticated signal processing techniques. Most of them were applied to a dataset of sunspot data. We looked at smoothing with window functions and moving averages. We also touched upon the sifting process used by scientists to derive sunspot cycles. Last but not least, a demonstration was given of cointegration.

In the next chapter, we will focus on debugging, profiling, and testing, including assert functions and various tools.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning NumPy Array
Published in: Jun 2014Publisher: ISBN-13: 9781783983902
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ivan Idris

Ivan Idris has an MSc in experimental physics. His graduation thesis had a strong emphasis on applied computer science. After graduating, he worked for several companies as a Java developer, data warehouse developer, and QA analyst. His main professional interests are business intelligence, big data, and cloud computing. Ivan Idris enjoys writing clean, testable code and interesting technical articles. Ivan Idris is the author of NumPy 1.5. Beginner's Guide and NumPy Cookbook by Packt Publishing.
Read more about Ivan Idris