Reader small image

You're reading from  Mastering Predictive Analytics with Python

Product typeBook
Published inAug 2016
Reading LevelIntermediate
Publisher
ISBN-139781785882715
Edition1st Edition
Languages
Right arrow
Author (1)
Joseph Babcock
Joseph Babcock
author image
Joseph Babcock

Joseph Babcock has spent more than a decade working with big data and AI in the e-commerce, digital streaming, and quantitative finance domains. Through his career he has worked on recommender systems, petabyte scale cloud data pipelines, A/B testing, causal inference, and time series analysis. He completed his PhD studies at Johns Hopkins University, applying machine learning to the field of drug discovery and genomics.
Read more about Joseph Babcock

Right arrow

Principal component analysis


One of the most commonly used methods of dimensionality reduction is Principal Component Analysis (PCA). Conceptually, PCA computes the axes along which the variation in the data is greatest. You may recall that in Chapter 3, Finding Patterns in the Noise – Clustering and Unsupervised Learning, we calculated the eigenvalues of the adjacency matrix of a dataset to perform spectral clustering. In PCA, we also want to find the eigenvalue of the dataset, but here, instead of any adjacency matrix, we will use the covariance matrix of the data, which is the relative variation within and between columns. The covariance for columns xi and xj in the data matrix X is given by:

This is the average product of the offsets from the mean column values. We saw this value before when we computed the correlation coefficient in Chapter 3, Finding Patterns in the Noise – Clustering and Unsupervised Learning, as it is the denominator of the Pearson coefficient. Let us use a simple...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Mastering Predictive Analytics with Python
Published in: Aug 2016Publisher: ISBN-13: 9781785882715

Author (1)

author image
Joseph Babcock

Joseph Babcock has spent more than a decade working with big data and AI in the e-commerce, digital streaming, and quantitative finance domains. Through his career he has worked on recommender systems, petabyte scale cloud data pipelines, A/B testing, causal inference, and time series analysis. He completed his PhD studies at Johns Hopkins University, applying machine learning to the field of drug discovery and genomics.
Read more about Joseph Babcock