Introducing scikit-learn with PCA
In this recipe we’ll use an important data science technique to analyze the key factors in a sample breast cancer dataset.PCA is a statistical procedure that’s used to linearly uncorrelated components that explain as much of the variation in a dataset as possible. In this way it performs dimensionality reduction, meaning that we find a simpler or lower-dimensional representation of a more complex, or higher-dimensional dataset, thereby giving us a handle on key features that help explain the data in a powerful way. This step of finding explanatory features is a key first step in machine learning.In this recipe, we will implement PCA using the scikit-learn
library. Scikit-learn is one of the fundamental Python libraries for machine learning. PCA is a form of unsupervised machine learning – meaning we don’t provide information about the class of the sample. We will discuss supervised techniques in the other recipes of this chapter...