Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Machine Learning with PyTorch and Scikit-Learn

You're reading from  Machine Learning with PyTorch and Scikit-Learn

Product type Book
Published in Feb 2022
Publisher Packt
ISBN-13 9781801819312
Pages 774 pages
Edition 1st Edition
Languages
Authors (3):
Sebastian Raschka Sebastian Raschka
Profile icon Sebastian Raschka
Yuxi (Hayden) Liu Yuxi (Hayden) Liu
Profile icon Yuxi (Hayden) Liu
Vahid Mirjalili Vahid Mirjalili
Profile icon Vahid Mirjalili
View More author details

Table of Contents (22) Chapters

Preface 1. Giving Computers the Ability to Learn from Data 2. Training Simple Machine Learning Algorithms for Classification 3. A Tour of Machine Learning Classifiers Using Scikit-Learn 4. Building Good Training Datasets – Data Preprocessing 5. Compressing Data via Dimensionality Reduction 6. Learning Best Practices for Model Evaluation and Hyperparameter Tuning 7. Combining Different Models for Ensemble Learning 8. Applying Machine Learning to Sentiment Analysis 9. Predicting Continuous Target Variables with Regression Analysis 10. Working with Unlabeled Data – Clustering Analysis 11. Implementing a Multilayer Artificial Neural Network from Scratch 12. Parallelizing Neural Network Training with PyTorch 13. Going Deeper – The Mechanics of PyTorch 14. Classifying Images with Deep Convolutional Neural Networks 15. Modeling Sequential Data Using Recurrent Neural Networks 16. Transformers – Improving Natural Language Processing with Attention Mechanisms 17. Generative Adversarial Networks for Synthesizing New Data 18. Graph Neural Networks for Capturing Dependencies in Graph Structured Data 19. Reinforcement Learning for Decision Making in Complex Environments 20. Other Books You May Enjoy
21. Index

Compressing Data via Dimensionality Reduction

In Chapter 4, Building Good Training Datasets – Data Preprocessing, you learned about the different approaches for reducing the dimensionality of a dataset using different feature selection techniques. An alternative approach to feature selection for dimensionality reduction is feature extraction. In this chapter, you will learn about two fundamental techniques that will help you to summarize the information content of a dataset by transforming it onto a new feature subspace of lower dimensionality than the original one. Data compression is an important topic in machine learning, and it helps us to store and analyze the increasing amounts of data that are produced and collected in the modern age of technology.

In this chapter, we will cover the following topics:

  • Principal component analysis for unsupervised data compression
  • Linear discriminant analysis as a supervised dimensionality reduction technique for maximizing...

Unsupervised dimensionality reduction via principal component analysis

Similar to feature selection, we can use different feature extraction techniques to reduce the number of features in a dataset. The difference between feature selection and feature extraction is that while we maintain the original features when we use feature selection algorithms, such as sequential backward selection, we use feature extraction to transform or project the data onto a new feature space.

In the context of dimensionality reduction, feature extraction can be understood as an approach to data compression with the goal of maintaining most of the relevant information. In practice, feature extraction is not only used to improve storage space or the computational efficiency of the learning algorithm but can also improve the predictive performance by reducing the curse of dimensionality—especially if we are working with non-regularized models.

The main steps in principal component analysis...

Supervised data compression via linear discriminant analysis

LDA can be used as a technique for feature extraction to increase computational efficiency and reduce the degree of overfitting due to the curse of dimensionality in non-regularized models. The general concept behind LDA is very similar to PCA, but whereas PCA attempts to find the orthogonal component axes of maximum variance in a dataset, the goal in LDA is to find the feature subspace that optimizes class separability. In the following sections, we will discuss the similarities between LDA and PCA in more detail and walk through the LDA approach step by step.

Principal component analysis versus linear discriminant analysis

Both PCA and LDA are linear transformation techniques that can be used to reduce the number of dimensions in a dataset; the former is an unsupervised algorithm, whereas the latter is supervised. Thus, we might think that LDA is a superior feature extraction technique for classification tasks...

Nonlinear dimensionality reduction and visualization

In the previous section, we covered linear transformation techniques, such as PCA and LDA, for feature extraction. In this section, we will discuss why considering nonlinear dimensionality reduction techniques might be worthwhile.

One nonlinear dimensionality reduction technique that is particularly worth highlighting is t-distributed stochastic neighbor embedding (t-SNE) since it is frequently used in literature to visualize high-dimensional datasets in two or three dimensions. We will see how we can apply t-SNE to plot images of handwritten images in a 2-dimensional feature space.

Why consider nonlinear dimensionality reduction?

Many machine learning algorithms make assumptions about the linear separability of the input data. You have learned that the perceptron even requires perfectly linearly separable training data to converge. Other algorithms that we have covered so far assume that the lack of perfect linear separability...

Summary

In this chapter, you learned about two fundamental dimensionality reduction techniques for feature extraction: PCA and LDA. Using PCA, we projected data onto a lower-dimensional subspace to maximize the variance along the orthogonal feature axes, while ignoring the class labels. LDA, in contrast to PCA, is a technique for supervised dimensionality reduction, which means that it considers class information in the training dataset to attempt to maximize the class separability in a linear feature space. Lastly, you also learned about t-SNE, which is a nonlinear feature extraction technique that can be used for visualizing data in two or three dimensions.

Equipped with PCA and LDA as fundamental data preprocessing techniques, you are now well prepared to learn about the best practices for efficiently incorporating different preprocessing techniques and evaluating the performance of different models in the next chapter.

Join our book’s Discord space

Join our...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Machine Learning with PyTorch and Scikit-Learn
Published in: Feb 2022 Publisher: Packt ISBN-13: 9781801819312
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at AU $19.99/month. Cancel anytime}