Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Machine Learning with scikit-learn Quick Start Guide

You're reading from  Machine Learning with scikit-learn Quick Start Guide

Product type Book
Published in Oct 2018
Publisher Packt
ISBN-13 9781789343700
Pages 172 pages
Edition 1st Edition
Languages
Author (1):
Kevin Jolly Kevin Jolly
Profile icon Kevin Jolly

Table of Contents (10) Chapters

Preface Introducing Machine Learning with scikit-learn Predicting Categories with K-Nearest Neighbors Predicting Categories with Logistic Regression Predicting Categories with Naive Bayes and SVMs Predicting Numeric Outcomes with Linear Regression Classification and Regression with Trees Clustering Data with Unsupervised Machine Learning Performance Evaluation Methods Other Books You May Enjoy

Predicting Categories with Naive Bayes and SVMs

In this chapter, you will learn about two popular classification machine learning algorithms: the Naive Bayes algorithm and the linear support vector machine. The Naive Bayes algorithm is a probabilistic model that predicts classes and categories, while the linear support vector machine uses a linear decision boundary to predict classes and categories.

In this chapter, you will learn about the following topics:

  • The theoretical concept behind the Naive Bayes algorithm, explained in mathematical terms
  • Implementing the Naive Bayes algorithm by using scikit-learn
  • How the linear support vector machine algorithm works under the hood
  • Graphically optimizing the hyperparameters of the linear support vector machines

Technical requirements

The Naive Bayes algorithm

The Naive Bayes algorithm makes use of the Bayes theorem, in order to classify classes and categories. The word naive was given to the algorithm because the algorithm assumes that all attributes are independent of one another. This is not actually possible, as every attribute/feature in a dataset is related to another attribute, in one way or another.

Despite being naive, the algorithm does well in actual practice. The formula for the Bayes theorem is as follows:

Bayes theorem formula

We can split the preceding algorithm into the following components:

  • p(h|D): This is the probability of a hypothesis taking place, provided that we have a dataset. An example of this would be the probability of a fraudulent transaction taking place, provided that we had a dataset that consisted of fraudulent and non-fraudulent transactions.
  • p(D|h): This is the probability...

Support vector machines

In this section, you will learn about support vector machines (SVMs), or, to be more specific, linear support vector machines. In order to understand support vector machines, you will need to know what support vectors are. They are illustrated for you in the following diagram:

The concept of support vectors

In the preceding diagram, the following applies:

  • The linear support vector machine is a form of linear classifier. A linear decision tree boundary is constructed, and the observations on one side of the boundary (the circles) belong to one class, while the observations on the other side of the boundary (the squares) belong to another class.
  • The support vectors are the observations that have a triangle on them.
  • These are the observations that are either very close to the linear decision boundary or have been incorrectly classified.
  • We can define...

Summary

This chapter introduced you to two fundamental supervised machine learning algorithms: the Naive Bayes algorithm and linear support vector machines. More specifically, you learned about the following topics:

  • How the Bayes theorem is used to produce a probability, to indicate whether a data point belongs to a particular class or category
  • Implementing the Naive Bayes classifier in scikit-learn
  • How the linear support vector machines work under the hood
  • Implementing the linear support vector machines in scikit-learn
  • Optimizing the inverse regularization strength, both graphically and by using the GridSearchCV algorithm
  • How to scale your data for a potential improvement in performance

In the next chapter, you will learn about the other type of supervised machine learning algorithm, which is used to predict numeric values, rather than classes and categories: linear regression...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Machine Learning with scikit-learn Quick Start Guide
Published in: Oct 2018 Publisher: Packt ISBN-13: 9781789343700
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}