Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Machine Learning with PyTorch and Scikit-Learn

You're reading from  Machine Learning with PyTorch and Scikit-Learn

Product type Book
Published in Feb 2022
Publisher Packt
ISBN-13 9781801819312
Pages 774 pages
Edition 1st Edition
Languages
Authors (3):
Sebastian Raschka Sebastian Raschka
Profile icon Sebastian Raschka
Yuxi (Hayden) Liu Yuxi (Hayden) Liu
Profile icon Yuxi (Hayden) Liu
Vahid Mirjalili Vahid Mirjalili
Profile icon Vahid Mirjalili
View More author details

Table of Contents (22) Chapters

Preface 1. Giving Computers the Ability to Learn from Data 2. Training Simple Machine Learning Algorithms for Classification 3. A Tour of Machine Learning Classifiers Using Scikit-Learn 4. Building Good Training Datasets – Data Preprocessing 5. Compressing Data via Dimensionality Reduction 6. Learning Best Practices for Model Evaluation and Hyperparameter Tuning 7. Combining Different Models for Ensemble Learning 8. Applying Machine Learning to Sentiment Analysis 9. Predicting Continuous Target Variables with Regression Analysis 10. Working with Unlabeled Data – Clustering Analysis 11. Implementing a Multilayer Artificial Neural Network from Scratch 12. Parallelizing Neural Network Training with PyTorch 13. Going Deeper – The Mechanics of PyTorch 14. Classifying Images with Deep Convolutional Neural Networks 15. Modeling Sequential Data Using Recurrent Neural Networks 16. Transformers – Improving Natural Language Processing with Attention Mechanisms 17. Generative Adversarial Networks for Synthesizing New Data 18. Graph Neural Networks for Capturing Dependencies in Graph Structured Data 19. Reinforcement Learning for Decision Making in Complex Environments 20. Other Books You May Enjoy
21. Index

A Tour of Machine Learning Classifiers Using Scikit-Learn

In this chapter, we will take a tour of a selection of popular and powerful machine learning algorithms that are commonly used in academia as well as in industry. While learning about the differences between several supervised learning algorithms for classification, we will also develop an appreciation of their individual strengths and weaknesses. In addition, we will take our first steps with the scikit-learn library, which offers a user-friendly and consistent interface for using those algorithms efficiently and productively.

The topics that will be covered throughout this chapter are as follows:

  • An introduction to robust and popular algorithms for classification, such as logistic regression, support vector machines, decision trees, and k-nearest neighbors
  • Examples and explanations using the scikit-learn machine learning library, which provides a wide variety of machine learning algorithms via a user...

Choosing a classification algorithm

Choosing an appropriate classification algorithm for a particular problem task requires practice and experience; each algorithm has its own quirks and is based on certain assumptions. To paraphrase the no free lunch theorem by David H. Wolpert, no single classifier works best across all possible scenarios (The Lack of A Priori Distinctions Between Learning Algorithms, Wolpert, David H, Neural Computation 8.7 (1996): 1341-1390). In practice, it is always recommended that you compare the performance of at least a handful of different learning algorithms to select the best model for the particular problem; these may differ in the number of features or examples, the amount of noise in a dataset, and whether the classes are linearly separable.

Eventually, the performance of a classifier—computational performance as well as predictive power—depends heavily on the underlying data that is available for learning. The five main steps that...

First steps with scikit-learn – training a perceptron

In Chapter 2, Training Simple Machine Learning Algorithms for Classification, you learned about two related learning algorithms for classification, the perceptron rule and Adaline, which we implemented in Python and NumPy by ourselves. Now we will take a look at the scikit-learn API, which, as mentioned, combines a user-friendly and consistent interface with a highly optimized implementation of several classification algorithms. The scikit-learn library offers not only a large variety of learning algorithms, but also many convenient functions to preprocess data and to fine-tune and evaluate our models. We will discuss this in more detail, together with the underlying concepts, in Chapter 4, Building Good Training Datasets – Data Preprocessing, and Chapter 5, Compressing Data via Dimensionality Reduction.

To get started with the scikit-learn library, we will train a perceptron model similar to the one that we implemented...

Modeling class probabilities via logistic regression

Although the perceptron rule offers a nice and easy-going introduction to machine learning algorithms for classification, its biggest disadvantage is that it never converges if the classes are not perfectly linearly separable. The classification task in the previous section would be an example of such a scenario. The reason for this is that the weights are continuously being updated since there is always at least one misclassified training example present in each epoch. Of course, you can change the learning rate and increase the number of epochs, but be warned that the perceptron will never converge on this dataset.

To make better use of our time, we will now take a look at another simple, yet more powerful, algorithm for linear and binary classification problems: logistic regression. Note that, despite its name, logistic regression is a model for classification, not regression.

Logistic regression and conditional probabilities...

Maximum margin classification with support vector machines

Another powerful and widely used learning algorithm is the support vector machine (SVM), which can be considered an extension of the perceptron. Using the perceptron algorithm, we minimized misclassification errors. However, in SVMs, our optimization objective is to maximize the margin. The margin is defined as the distance between the separating hyperplane (decision boundary) and the training examples that are closest to this hyperplane, which are the so-called support vectors.

This is illustrated in Figure 3.10:

Figure 3.10: SVM maximizes the margin between the decision boundary and training data points

Maximum margin intuition

The rationale behind having decision boundaries with large margins is that they tend to have a lower generalization error, whereas models with small margins are more prone to overfitting.

Unfortunately, while the main intuition behind SVMs is relatively simple, the mathematics...

Solving nonlinear problems using a kernel SVM

Another reason why SVMs enjoy high popularity among machine learning practitioners is that they can be easily kernelized to solve nonlinear classification problems. Before we discuss the main concept behind the so-called kernel SVM, the most common variant of SVMs, let’s first create a synthetic dataset to see what such a nonlinear classification problem may look like.

Kernel methods for linearly inseparable data

Using the following code, we will create a simple dataset that has the form of an XOR gate using the logical_xor function from NumPy, where 100 examples will be assigned the class label 1, and 100 examples will be assigned the class label -1:

>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> np.random.seed(1)
>>> X_xor = np.random.randn(200, 2)
>>> y_xor = np.logical_xor(X_xor[:, 0] > 0,
...                        X_xor[:, 1] > 0)
>>> y_xor...

Decision tree learning

Decision tree classifiers are attractive models if we care about interpretability. As the name “decision tree” suggests, we can think of this model as breaking down our data by making a decision based on asking a series of questions.

Let’s consider the following example in which we use a decision tree to decide upon an activity on a particular day:

Figure 3.18: An example of a decision tree

Based on the features in our training dataset, the decision tree model learns a series of questions to infer the class labels of the examples. Although Figure 3.18 illustrates the concept of a decision tree based on categorical variables, the same concept applies if our features are real numbers, like in the Iris dataset. For example, we could simply define a cut-off value along the sepal width feature axis and ask a binary question: “Is the sepal width ≥ 2.8 cm?”

Using the decision algorithm, we start...

K-nearest neighbors – a lazy learning algorithm

The last supervised learning algorithm that we want to discuss in this chapter is the k-nearest neighbor (KNN) classifier, which is particularly interesting because it is fundamentally different from the learning algorithms that we have discussed so far.

KNN is a typical example of a lazy learner. It is called “lazy” not because of its apparent simplicity, but because it doesn’t learn a discriminative function from the training data but memorizes the training dataset instead.

Parametric versus non-parametric models

Machine learning algorithms can be grouped into parametric and non-parametric models. Using parametric models, we estimate parameters from the training dataset to learn a function that can classify new data points without requiring the original training dataset anymore. Typical examples of parametric models are the perceptron, logistic regression, and the linear SVM. In contrast...

Summary

In this chapter, you learned about many different machine learning algorithms that are used to tackle linear and nonlinear problems. You have seen that decision trees are particularly attractive if we care about interpretability. Logistic regression is not only a useful model for online learning via SGD, but also allows us to predict the probability of a particular event.

Although SVMs are powerful linear models that can be extended to nonlinear problems via the kernel trick, they have many parameters that have to be tuned in order to make good predictions. In contrast, ensemble methods, such as random forests, don’t require much parameter tuning and don’t overfit as easily as decision trees, which makes them attractive models for many practical problem domains. The KNN classifier offers an alternative approach to classification via lazy learning that allows us to make predictions without any model training, but with a more computationally expensive prediction...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Machine Learning with PyTorch and Scikit-Learn
Published in: Feb 2022 Publisher: Packt ISBN-13: 9781801819312
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at AU $19.99/month. Cancel anytime}