Reader small image

You're reading from  Designing Machine Learning Systems with Python

Product typeBook
Published inApr 2016
Reading LevelBeginner
Publisher
ISBN-139781785882951
Edition1st Edition
Languages
Right arrow
Author (1)
David Julian
David Julian
author image
David Julian

David Julian is a freelance technology consultant and educator. He has worked as a consultant for government, private, and community organizations on a variety of projects, including using machine learning to detect insect outbreaks in controlled agricultural environments (Urban Ecological Systems Ltd., Bluesmart Farms), designing and implementing event management data systems (Sustainable Industry Expo, Lismore City Council), and designing multimedia interactive installations (Adelaide University). He has also written Designing Machine Learning Systems With Python for Packt Publishing and was a technical reviewer for Python Machine Learning and Hands-On Data Structures and Algorithms with Python - Second Edition, published by Packt.
Read more about David Julian

Right arrow

Principle component analysis


Principle Component Analysis (PCA) is the most common form of dimensionality reduction that we can apply to features. Consider the example of a dataset consisting of two features and we would like to convert this two-dimensional data into one dimension. A natural approach would be to draw a line of the closest fit and project each data point onto this line, as shown in the following graph:

PCA attempts to find a surface to project the data by minimizing the distance between the data points and the line we are attempting to project this data to. For the more general case where we have n dimensions and we want to reduce this space to k-dimensions, we find k vectors u(1),u(2), ..., u(k) onto which to project the data so as to minimize the projection error. That is we are trying to find a k-dimensional surface to project the data.

This looks superficially like linear regression however it is different in several important ways. With linear regression we are trying...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Designing Machine Learning Systems with Python
Published in: Apr 2016Publisher: ISBN-13: 9781785882951

Author (1)

author image
David Julian

David Julian is a freelance technology consultant and educator. He has worked as a consultant for government, private, and community organizations on a variety of projects, including using machine learning to detect insect outbreaks in controlled agricultural environments (Urban Ecological Systems Ltd., Bluesmart Farms), designing and implementing event management data systems (Sustainable Industry Expo, Lismore City Council), and designing multimedia interactive installations (Adelaide University). He has also written Designing Machine Learning Systems With Python for Packt Publishing and was a technical reviewer for Python Machine Learning and Hands-On Data Structures and Algorithms with Python - Second Edition, published by Packt.
Read more about David Julian