Packt+ | Advance your knowledge in tech

You're reading from Learning NumPy Array

Product typeBook

Published inJun 2014

Reading LevelIntermediate

Publisher

ISBN-139781783983902

Edition1st Edition

Languages

Python

Tools

NumPy

Concepts

Data Science

Author (1)

Ivan Idris

Chapter 5. Signal Processing Techniques

We will learn about some signal-processing techniques in this chapter, and we will analyze time-series data with these. As example data, we will use the sunspot data provided by a Belgian scientific institute. We can download this data from several places on the Internet, and it is also provided as sample data by the statsmodels library. There are a number of things we can do with the data, such as:

Trying to determine periodic cycles within the data. This can be done, but this is a bit advanced, so we will just get you started.
Smoothing the data to filter out noise.
Forecasting.

Introducing the Sunspot data

Sunspots are dark spots visible on the Sun's surface. This phenomenon has been studied for many centuries by astronomers. Evidence has been found for periodic sunspot cycles. We can download up-to-date annual sunspot data from http://www.quandl.com/SIDC/SUNSPOTS_A-Sunspot-Numbers-Annual. This is provided by the Belgian Solar Influences Data Analysis Center. The data goes back to 1700 and contains more than 300 annual averages. In order to determine sunspot cycles, scientists successfully used the Hilbert-Huang transform (refer to http://en.wikipedia.org/wiki/Hilbert%E2%80%93Huang_transform). A major part of this transform is the so-called Empirical Mode Decomposition (EMD) method. The entire algorithm contains many iterative steps, and we will cover only some of them here. EMD reduces data to a group of Intrinsic Mode Functions (IMF). You can compare this to the way Fast Fourier Transform decomposes a signal in a superposition of sine and cosine terms.

Extracting...

Moving averages

Moving averages are tools commonly used to analyze time-series data. A moving average defines a window of previously seen data that is averaged each time the window slides forward one period. The different types of moving average differ essentially in the weights used for averaging. The exponential moving average, for instance, has exponentially decreasing weights with time. This means that older values have less influence than newer values, which is sometimes desirable.

We can express an equal-weight strategy for the simple moving average as follows in the NumPy code:

weights = np.exp(np.linspace(-1., 0., N))
weights /= weights.sum()

A simple moving average uses equal weights which, in code, looks as follows:

def sma(arr, n):
   weights = np.ones(n) / n

   return np.convolve(weights, arr)[n-1:-n+1]

The following code plots the simple moving average for the 11- and 22-year sunspot cycle:

import numpy as np
import sys
import matplotlib.pyplot as plt

data = np.loadtxt(sys.argv...

Smoothing functions

Smoothing can help us get rid of noise and outliers in raw data. This, for instance, makes it easier to spot trends in the data. NumPy provides a number of smoothing functions.

Note

These functions can calculate weights in a sliding window as we did in the previous example (for more background information, visit http://en.wikipedia.org/wiki/Window_function).

These functions, except the kaiser function, require only one parameter—the size of the window, which we will set to 22 for the middle cycle of the sunspot data. The kaiser function also needs a beta parameter. With this parameter, the kaiser function can mimic the other functions.

The NumPy documentation recommends a starting value of 14 for the beta parameter, so that is what we are going to use too. The code is straightforward and given as follows (the data here is limited to the last 50 years only for easier comparison in the plots):

import numpy as np
import sys
import matplotlib.pyplot as plt

def smooth(weights...

Forecasting with an ARMA model

In the previous chapter, Chapter 4, Simple Predictive Analytics with NumPy, we learned about autoregressive models. ARMA is a generalization of these models that adds an extra component—the moving average. ARMA models are frequently used to predict values of a time-series. These models combine autoregressive and moving-average models. Autoregressive models predict values by assuming that a linear combination is formed by the previously encountered values. For instance, we can consider a linear combination, which is formed from the previous value in the time-series and the value before that. This is also named an AR(2) model since we are using components that lag two periods. In our case, we would be looking at the number of sunspots one year before and two years before the period we are predicting. In an ARMA model, we try to model the residues that we cannot explain from the previous period data (also known as unexpected components). Here, a linear combination...

Filtering a signal

Another common signal processing technique is filtering. This is a big topic, and we could create all sorts of filters. We will only create a very basic filter here. Again, we will use the sunspot data as input.

The iirdesign function, as its name suggests, allows us to construct several types of analog and digital filters.

Designing the filter

Design the filter with the iirdesign function of the scipy.signal module.

Note

IIR stands for Infinite Impulse Response; for more information, visit http://en.wikipedia.org/wiki/Infinite_impulse_response.

We are not going to go into all the details of the iirdesign function. Have a look at the documentation if necessary at http://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.iirdesign.html. In short, the following are the parameters we will set:

Frequencies normalized from 0 to 1.
Maximum loss.
Minimum attenuation.
Filter type.

Designing the filter can be done with the following code:

b,a = scipy.signal.iirdesign(wp=0.2, ws=0.1...

Demonstrating cointegration

Cointegration is similar to correlation, but it is considered by many to be a better metric to define the relatedness of two time-series. The usual way to explain the difference between cointegration and correlation is to take the example of a drunken man and his dog. Correlation tells you something about the direction in which they are going. Cointegration relates to their distance over time, which in this case is constrained by the leash of the dog. We will demonstrate cointegration using computer-generated time-series and real data. The data can be downloaded from Quandl in CSV format.

The Augmented Dickey Fuller (ADF) test can be used to measure the cointegration of time-series; proceed with the following steps to demonstrate cointegration:

Define the following function to calculate the ADF statistic.

def calc_adf(x, y):
    result = stat.OLS(x, y).fit()    
    return ts.adfuller(result.resid)

Generate a sine value and calculate the cointegration of the value...

Summary

In this chapter, we learned a number of sophisticated signal processing techniques. Most of them were applied to a dataset of sunspot data. We looked at smoothing with window functions and moving averages. We also touched upon the sifting process used by scientists to derive sunspot cycles. Last but not least, a demonstration was given of cointegration.

In the next chapter, we will focus on debugging, profiling, and testing, including assert functions and various tools.

The rest of the chapter is locked

You have been reading a chapter from

Learning NumPy Array

Published in: Jun 2014Publisher: ISBN-13: 9781783983902

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Ivan Idris

Ivan Idris has an MSc in experimental physics. His graduation thesis had a strong emphasis on applied computer science. After graduating, he worked for several companies as a Java developer, data warehouse developer, and QA analyst. His main professional interests are business intelligence, big data, and cloud computing. Ivan Idris enjoys writing clean, testable code and interesting technical articles. Ivan Idris is the author of NumPy 1.5. Beginner's Guide and NumPy Cookbook by Packt Publishing.
Read more about Ivan Idris

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages