Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Python Machine Learning by Example
Python Machine Learning by Example

Python Machine Learning by Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn , Third Edition

Arrow left icon
Profile Icon Yuxi (Hayden) Liu
Arrow right icon
$40.99
Full star icon Full star icon Full star icon Full star icon Empty star icon 4 (20 Ratings)
Paperback Oct 2020 526 pages 3rd Edition
eBook
$25.19 $27.99
Paperback
$40.99
Paperback
$45.99
Arrow left icon
Profile Icon Yuxi (Hayden) Liu
Arrow right icon
$40.99
Full star icon Full star icon Full star icon Full star icon Empty star icon 4 (20 Ratings)
Paperback Oct 2020 526 pages 3rd Edition
eBook
$25.19 $27.99
Paperback
$40.99
Paperback
$45.99
eBook
$25.19 $27.99
Paperback
$40.99
Paperback
$45.99

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Python Machine Learning by Example

Building a Movie Recommendation Engine with Naïve Bayes

As promised, in this chapter, we will kick off our supervised learning journey with machine learning classification, and specifically, binary classification. The goal of the chapter is to build a movie recommendation system. It is a good starting point to learn classification from a real-life example—movie streaming service providers are already doing this, and we can do the same. You will learn the fundamental concepts of classification, including what it does and its various types and applications, with a focus on solving a binary classification problem using a simple, yet powerful, algorithm, Naïve Bayes. Finally, the chapter will demonstrate how to fine-tune a model, which is an important skill that every data science or machine learning practitioner should learn.

We will go into detail on the following topics:

  • What is machine learning classification?
  • Types of classification
  • Applications of text classification
  • The Naïve Bayes classifier
  • The mechanics of Naïve Bayes
  • Naïve Bayes implementations
  • Building a movie recommender with Naïve Bayes
  • Classification performance evaluation
  • Cross-validation
  • Tuning a classification model

Getting started with classification

Movie recommendation can be framed as a machine learning classification problem. If it is predicted that you like a movie, for example, then it will be on your recommended list, otherwise, it won't. Let's get started by learning the important concepts of machine learning classification.

Classification is one of the main instances of supervised learning. Given a training set of data containing observations and their associated categorical outputs, the goal of classification is to learn a general rule that correctly maps the observations (also called features or predictive variables) to the target categories (also called labels or classes). Putting it another way, a trained classification model will be generated after the model learns from the features and targets of training samples, as shown in the first half of Figure 2.1. When new or unseen data comes in, the trained model will be able to determine their desired class memberships. Class information will be predicted based on the known input features using the trained classification model, as displayed in the second half of Figure 2.1:

Figure 2.1: The training and prediction stages in classification

In general, there are three types of classification based on the possibility of class output—binary, multiclass, and multi-label classification. We will cover them one by one in the next section.

Binary classification

This classifies observations into one of two possible classes. The example of spam email filtering we encounter every day is a typical use case of binary classification, which identifies email messages (input observations) as spam or not spam (output classes). Customer churn prediction is another frequently mentioned example, where a prediction system takes in customer segment data and activity data from CRM systems and identifies which customers are likely to churn.

Another application in the marketing and advertising industry is click-through prediction for online ads—that is, whether or not an ad will be clicked, given users' cookie information and browsing history. Last but not least, binary classification has also been employed in biomedical science, for example, in early cancer diagnosis, classifying patients into high or low risk groups based on MRI images.

As demonstrated in Figure 2.2, binary classification tries to find a way to separate data from two classes (denoted by dots and crosses):

Figure 2.2: Binary classification example

Don't forget that predicting whether a person likes a movie is also a binary classification problem.

Multiclass classification

This type of classification is also referred to as multinomial classification. It allows more than two possible classes, as opposed to only two in binary cases. Handwritten digit recognition is a common instance of classification and has a long history of research and development since the early 1900s. A classification system, for example, can learn to read and understand handwritten ZIP codes (digits from 0 to 9 in most countries) by which envelopes are automatically sorted.

Handwritten digit recognition has become a "Hello, World!" in the journey of studying machine learning, and the scanned document dataset constructed from the National Institute of Standards and Technology, called MNIST (Modified National Institute of Standards and Technology), is a benchmark dataset frequently used to test and evaluate multiclass classification models. Figure 2.3 shows four samples taken from the MNIST dataset:

Figure 2.3: Samples from the MNIST dataset

In Figure 2.4, the multiclass classification model tries to find segregation boundaries to separate data from the following three different classes (denoted by dots, crosses, and triangles):

Figure 2.4: Multiclass classification example

Multi-label classification

In the first two types of classification, target classes are mutually exclusive and a sample is assigned one, and only one, label. It is the opposite in multi-label classification. Increasing research attention has been drawn to multi-label classification by the nature of the omnipresence of categories in modern applications. For example, a picture that captures a sea and a sunset can simultaneously belong to both conceptual scenes, whereas it can only be an image of either a cat or dog in a binary case, or one type of fruit among oranges, apples, and bananas in a multiclass case. Similarly, adventure films are often combined with other genres, such as fantasy, science fiction, horror, and drama.

Another typical application is protein function classification, as a protein may have more than one function—storage, antibody, support, transport, and so on.

A typical approach to solving an n-label classification problem is to transform it into a set of n binary classification problems, where each binary classification problem is handled by an individual binary classifier.

Refer to Figure 2.5 to see the restructuring of a multi-label classification problem into a multiple binary classification problem:

Figure 2.5: Transforming three-label classification into three independent binary classifications

To solve these problems, researchers have developed many powerful classification algorithms, among which Naïve Bayes, support vector machine (SVM), decision tree, and logistic regression are often used. In the following sections, we will cover the mechanics of Naïve Bayes and its in-depth implementation, along with other important concepts, including classifier tuning and classification performance evaluation. Stay tuned for upcoming chapters that cover the other classification algorithms.

Exploring Naïve Bayes

The Naïve Bayes classifier belongs to the family of probabilistic classifiers. It computes the probabilities of each predictive feature (also referred to as an attribute or signal) of the data belonging to each class in order to make a prediction of probability distribution over all classes. Of course, from the resulting probability distribution, we can conclude the most likely class that the data sample is associated with. What Naïve Bayes does specifically, as its name indicates, is as follows:

  • Bayes: As in, it maps the probability of observed input features given a possible class to the probability of the class given observed pieces of evidence based on Bayes' theorem.
  • Naïve: As in, it simplifies probability computation by assuming that predictive features are mutually independent.

I will explain Bayes' theorem with examples in the next section.

Learning Bayes' theorem by example

It is important to understand Bayes' theorem before diving into the classifier. Let A and B denote two events. Events could be that it will rain tomorrowtwo kings are drawn from a deck of cards, or a person has cancer. In Bayes' theorem, P(A |B) is the probability that A occurs given that B is true. It can be computed as follows:

Here, P(B |A) is the probability of observing B given that A occurs, while P(A) and P(B) are the probability that A and B occur, respectively. Is that too abstract? Let's consider the following concrete examples:

  • Example 1: Given two coins, one is unfair, with 90% of flips getting a head and 10% getting a tail, while the other one is fair. Randomly pick one coin and flip it. What is the probability that this coin is the unfair one, if we get a head?

We can solve this by first denoting U for the event of picking the unfair coin, F for the fair coin, and H for the event of getting a head. So, the probability that the unfair coin has been picked when we get a head, P(U |H), can be calculated with the following:

As we know, P(H |U) is 90%. P(U) is 0.5 because we randomly pick a coin out of two. However, deriving the probability of getting a head, P(H), is not that straightforward, as two events can lead to the following, where U is when the unfair coin is picked, and F is when the fair coin is picked:

Now, P(U | H) becomes the following:

  • Example 2: Suppose a physician reported the following cancer screening test scenario among 10,000 people:

Cancer

No Cancer

Total

Test Positive

80

900

980

Test Negative

20

9000

9020

Total

100

9900

10000

Table 2.1: Example of a cancer screening result

This indicates that 80 out of 100 cancer patients are correctly diagnosed, while the other 20 are not; cancer is falsely detected in 900 out of 9,900 healthy people.

If the result of this screening test on a person is positive, what is the probability that they actually have cancer? Let's assign the event of having cancer and positive testing results as C and Pos, respectively. So we have P(Pos |C) = 80/100 = 0.8, P(C) = 100/1000 = 0.1, and P(Pos) = 980/1000 = 0.98.

We can apply Bayes' theorem to calculate P(C |Pos):

Given a positive screening result, the chance that the subject has cancer is 8.16%, which is significantly higher than the one under general assumption (100/10000=1%) without the subject undergoing the screening.

  • Example 3: Three machines AB, and C in a factory account for 35%, 20%, and 45% of bulb production. The fraction of defective bulbs produced by each machine is 1.5%, 1%, and 2%, respectively. A bulb produced by this factory was identified as defective, which is denoted as event D. What are the probabilities that this bulb was manufactured by machine AB, and C, respectively?

Again, we can simply follow Bayes' theorem:

Also, either way, we do not even need to calculate P(D) since we know that the following is the case:

We also know the following concept:

So, we have the following formula:

Now that you understand Bayes' theorem as the backbone of Naïve Bayes, we can easily move forward with the classifier itself.

The mechanics of Naïve Bayes

Let's start by discussing the magic behind the algorithm—how Naïve Bayes works. Given a data sample, x, with n features, x1, x2,..., xn (x represents a feature vector and x = (x1, x2,..., xn)), the goal of Naïve Bayes is to determine the probabilities that this sample belongs to each of K possible classes y1, y2,..., yK, which is P(yK |x) or P(x1, x2,..., xn), where k = 1, 2, …, K.

This looks no different from what we have just dealt with: x or x1, x2,..., xn. This is a joint event where a sample that has observed feature values x1, x2,..., xn . yK is the event that the sample belongs to class k. We can apply Bayes' theorem right away:

Let's look at each component in detail:

  • P(yK) portrays how classes are distributed, with no further knowledge of observation features. Thus, it is also called prior in Bayesian probability terminology. Prior can be either predetermined (usually in a uniform manner where each class has an equal chance of occurrence) or learned from a set of training samples.
  • P(yK |x), in contrast to prior P(yK), is the posterior, with extra knowledge of observation.
  • P(x |yK), or P(x1, x2,..., xn |yK), is the joint distribution of n features, given that the sample belongs to class yK. This is how likely the features with such values co-occur. This is named likelihood in Bayesian terminology. Obviously, the likelihood will be difficult to compute as the number of features increases. In Naïve Bayes, this is solved thanks to the feature independence assumption. The joint conditional distribution of n features can be expressed as the joint product of individual feature conditional distributions:

Each conditional distribution can be efficiently learned from a set of training samples.

  • P(x), also called evidence, solely depends on the overall distribution of features, which is not specific to certain classes and is therefore a normalization constant. As a result, posterior is proportional to prior and likelihood:

Figure 2.6 summarizes how a Naïve Bayes classification model is trained and applied to new data:

Figure 2.6: Training and prediction stages in Naïve Bayes classification

Let's see a Naïve Bayes classifier in action through a simplified example of movie recommendation before we jump to the implementations of Naïve Bayes. Given four (pseudo) users, whether they like each of three movies, m1, m2, m3 (indicated as 1 or 0), and whether they like a target movie (denoted as event Y) or not (denoted as event N), as shown in the following table, we are asked to predict how likely it is that another user will like that movie:

ID

m1

m2

m3

Whether the user likes the target movie

Training data

1

0

1

1

Y

2

0

0

1

N

3

0

0

0

Y

4

1

1

0

Y

Testing case

5

1

1

0

?

Table 2.2: Toy data example for a movie recommendation

Whether users like three movies, m1, m2, m3, are features (signals) that we can utilize to predict the target class. The training data we have are the four samples with both ratings and target information.

Now, let's first compute the prior, P(Y) and P(N). From the training set, we can easily get the following:

Alternatively, we can also impose an assumption of a uniform prior that P(Y) = 50%, for example.

For simplicity, we will denote the event that a user likes three movies or not as f1, f2, f3, respectively. To calculate posterior P(Y| x), where x = (1, 1, 0), the first step is to compute the likelihoods, P(f1 = 1| Y), P(f2 = 1 Y), and P(f3 = 0| Y), and similarly, P(f1 = 1| N), P(f2 = 1| N), and P(f3 = 0| N) based on the training set. However, you may notice that since f1 = 1 was not seen in the N class, we will get P(f1 = 1|N) = 0. Consequently, we will have , which means we will recklessly predict class = Y by any means.

To eliminate the zero-multiplication factor, the unknown likelihood, we usually assign an initial value of 1 to each feature, that is, we start counting each possible value of a feature from one. This technique is also known as Laplace smoothing. With this amendment, we now have the following:

Here, given class N, 0 + 1 means there are zero likes of m1 plus +1 smoothing; 1 + 2 means there is one data point (ID = 2) plus two (two possible values) + 1 smoothings. Given class Y, 1 + 1 means there is one like of m1 (ID = 4) plus +1 smoothing; 3 + 2 means there are three data points (ID = 1, 3, 4) plus two (two possible values) + 1 smoothings.

Similarly, we can compute the following:

Now, we can compute the ratio between two posteriors as follows:

Also, remember this:

So, finally, we have the following:

There is a 92.1% chance that the new user will like the target movie.

I hope that you now have a solid understanding of Naïve Bayes after going through the theory and a toy example. Let's get ready for its implementation in the next section.

Implementing Naïve Bayes

After calculating by hand the movie preference prediction example, as promised, we are going to code Naïve Bayes from scratch. After that, we will implement it using the scikit-learn package.

Implementing Naïve Bayes from scratch

Before we develop the model, let's define the toy dataset we just worked with:

>>> import numpy as np
>>> X_train = np.array([
...     [0, 1, 1],
...     [0, 0, 1],
...     [0, 0, 0],
...     [1, 1, 0]])
>>> Y_train = ['Y', 'N', 'Y', 'Y']
>>> X_test = np.array([[1, 1, 0]])

For the model, starting with the prior, we first group the data by label and record their indices by classes:

>>> def get_label_indices(labels):
...     """
...     Group samples based on their labels and return indices
...     @param labels: list of labels
...     @return: dict, {class1: [indices], class2: [indices]}
...     """
...     from collections import defaultdict
...     label_indices = defaultdict(list)
...     for index, label in enumerate(labels):
...         label_indices[label].append(index)
...     return label_indices

Take a look at what we get:

>>> label_indices = get_label_indices(Y_train)
>>> print('label_indices:\n', label_indices)
label_indices
 defaultdict(<class 'list'>, {'Y': [0, 2, 3], 'N': [1]})

With label_indices, we calculate the prior:

>>> def get_prior(label_indices):
...     """
...     Compute prior based on training samples
...     @param label_indices: grouped sample indices by class
...     @return: dictionary, with class label as key, corresponding 
...              prior as the value
...     """
...     prior = {label: len(indices) for label, indices in
...                                      label_indices.items()}
...     total_count = sum(prior.values())
...     for label in prior:
...         prior[label] /= total_count
...     return prior

Take a look at the computed prior:

>>> prior = get_prior(label_indices)
>>> print('Prior:', prior)
 Prior: {'Y': 0.75, 'N': 0.25}

With prior calculated, we continue with likelihood, which is the conditional probability, P(feature|class):

>>> def get_likelihood(features, label_indices, smoothing=0):
...     """
...     Compute likelihood based on training samples
...     @param features: matrix of features
...     @param label_indices: grouped sample indices by class
...     @param smoothing: integer, additive smoothing parameter
...     @return: dictionary, with class as key, corresponding 
...              conditional probability P(feature|class) vector   
...              as value
...     """
...     likelihood = {}
...     for label, indices in label_indices.items():
...         likelihood[label] = features[indices, :].sum(axis=0) 
...                                + smoothing
...         total_count = len(indices)
...         likelihood[label] = likelihood[label] / 
...                                 (total_count + 2 * smoothing)
...     return likelihood

We set the smoothing value to 1 here, which can also be 0 for no smoothing, or any other positive value, as long as a higher classification performance is achieved:

>>> smoothing = 1
>>> likelihood = get_likelihood(X_train, label_indices, smoothing)
>>> print('Likelihood:\n', likelihood)
Likelihood:
 {'Y': array([0.4, 0.6, 0.4]), 'N': array([0.33333333, 0.33333333, 0.66666667])}

If you ever find any of this confusing, feel free to check Figure 2.7 to refresh your memory:

Figure 2.7: A simple example of computing prior and likelihood

With prior and likelihood ready, we can now compute the posterior for the testing/new samples:

>>> def get_posterior(X, prior, likelihood):
...     """
...     Compute posterior of testing samples, based on prior and 
...     likelihood
...     @param X: testing samples
...     @param prior: dictionary, with class label as key, 
...                   corresponding prior as the value
...     @param likelihood: dictionary, with class label as key, 
...                        corresponding conditional probability
...                            vector as value
...     @return: dictionary, with class label as key, corresponding 
...              posterior as value
...     """
...     posteriors = []
...     for x in X:
...         # posterior is proportional to prior * likelihood
...         posterior = prior.copy()
...         for label, likelihood_label in likelihood.items():
...             for index, bool_value in enumerate(x):
...                 posterior[label] *= likelihood_label[index] if 
...                   bool_value else (1 - likelihood_label[index])
...         # normalize so that all sums up to 1
...         sum_posterior = sum(posterior.values())
...         for label in posterior:
...             if posterior[label] == float('inf'):
...                 posterior[label] = 1.0
...             else:
...                 posterior[label] /= sum_posterior
...         posteriors.append(posterior.copy())
...     return posteriors

Now, let's predict the class of our one sample test set using this prediction function:

>>> posterior = get_posterior(X_test, prior, likelihood)
>>> print('Posterior:\n', posterior)
Posterior:
 [{'Y': 0.9210360075805433, 'N': 0.07896399241945673}]

This is exactly what we got previously. We have successfully developed Naïve Bayes from scratch and we can now move on to the implementation using scikit-learn.

Implementing Naïve Bayes with scikit-learn

Coding from scratch and implementing your own solutions is the best way to learn about machine learning models. Of course, you can take a shortcut by directly using the BernoulliNB module (https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.BernoulliNB.html) from the scikit-learn API:

>>> from sklearn.naive_bayes import BernoulliNB

Let's initialize a model with a smoothing factor (specified as alpha in scikit-learn) of 1.0, and prior learned from the training set (specified as fit_prior=True in scikit-learn):

>>> clf = BernoulliNB(alpha=1.0, fit_prior=True)

To train the Naïve Bayes classifier with the fit method, we use the following line of code:

>>> clf.fit(X_train, Y_train)

And to obtain the predicted probability results with the predict_proba method, we use the following lines of code:

>>> pred_prob = clf.predict_proba(X_test)
>>> print('[scikit-learn] Predicted probabilities:\n', pred_prob)
[scikit-learn] Predicted probabilities:
 [[0.07896399 0.92103601]]

Finally, we do the following to directly acquire the predicted class with the predict method (0.5 is the default threshold, and if the predicted probability of class Y is greater than 0.5, class Y is assigned; otherwise, N is used):

>>> pred = clf.predict(X_test)
>>> print('[scikit-learn] Prediction:', pred)
[scikit-learn] Prediction: ['Y']

The prediction results using scikit-learn are consistent with what we got using our own solution. Now that we've implemented the algorithm both from scratch and using scikit-learn, why don't we use it to solve the movie recommendation problem?

Building a movie recommender with Naïve Bayes

After the toy example, it is now time to build a movie recommender (or, more specifically, movie preference classifier) using a real dataset. We herein use a movie rating dataset (https://grouplens.org/datasets/movielens/). The movie rating data was collected by the GroupLens Research group from the MovieLens website (http://movielens.org).

For demonstration purposes, we will use the small dataset, ml-latest-small (downloaded from the following link: http://files.grouplens.org/datasets/movielens/ml-latest-small.zip of ml-latest-small.zip (size: 1 MB)) as an example. It has around 100,00 ratings, ranging from 1 to 5, given by 6,040 users on 3,706 movies (last updated September 2018).

Unzip the ml-1m.zip file and you will see the following four files:

  • movies.dat: It contains the movie information in the format of MovieID::Title::Genres.
  • ratings.dat: It contains user movie ratings in the format of UserID::MovieID::Rating::Timestamp. We will only be using data from this file in this chapter.
  • users.dat: It contains user information in the format of UserID::Gender::Age::Occupation::Zip-code.
  • README

Let's attempt to determine whether a user likes a particular movie based on how users rate other movies (again, ratings are from 1 to 5).

First, we import all the necessary modules and variables:

>>> import numpy as np
>>> from collections import defaultdict
>>> data_path = 'ml-1m/ratings.dat'
>>> n_users = 6040
>>> n_movies = 3706

We then develop the following function to load the rating data from ratings.dat:

>>> def load_rating_data(data_path, n_users, n_movies):
...     """
...     Load rating data from file and also return the number of 
...         ratings for each movie and movie_id index mapping
...     @param data_path: path of the rating data file
...     @param n_users: number of users
...     @param n_movies: number of movies that have ratings
...     @return: rating data in the numpy array of [user, movie]; 
...              movie_n_rating, {movie_id: number of ratings};
...              movie_id_mapping, {movie_id: column index in 
...                 rating data}
...     """
...     data = np.zeros([n_users, n_movies], dtype=np.float32)
...     movie_id_mapping = {}
...     movie_n_rating = defaultdict(int)
...     with open(data_path, 'r') as file:
...         for line in file.readlines()[1:]:
...             user_id, movie_id, rating, _ = line.split("::")
...             user_id = int(user_id) - 1
...             if movie_id not in movie_id_mapping:
...                 movie_id_mapping[movie_id] = 
...                              len(movie_id_mapping)
...             rating = int(rating)
...             data[user_id, movie_id_mapping[movie_id]] = rating
...             if rating > 0:
...                 movie_n_rating[movie_id] += 1
...     return data, movie_n_rating, movie_id_mapping

And then we load the data using this function:

>>> data, movie_n_rating, movie_id_mapping = 
...     load_rating_data(data_path, n_users, n_movies)

It is always recommended to analyze the data distribution. We do the following:

>>> def display_distribution(data):
...     values, counts = np.unique(data, return_counts=True)
...     for value, count in zip(values, counts):
...         print(f'Number of rating {int(value)}: {count}')
>>> display_distribution(data)
Number of rating 0: 21384032
Number of rating 1: 56174
Number of rating 2: 107557
Number of rating 3: 261197
Number of rating 4: 348971
Number of rating 5: 226309

As you can see, most ratings are unknown; for the known ones, 35% are of rating 4, followed by 26% of rating 3, and 23% of rating 5, and then 11% and 6% of ratings 2 and 1, respectively.

Since most ratings are unknown, we take the movie with the most known ratings as our target movie:

>>> movie_id_most, n_rating_most = sorted(movie_n_rating.items(),
...     key=lambda d: d[1], reverse=True)[0]
>>> print(f'Movie ID {movie_id_most} has {n_rating_most} ratings.')
Movie ID 2858 has 3428 ratings.

The movie with ID 2858 is the target movie, and ratings of the rest of the movies are signals. We construct the dataset accordingly:

>>> X_raw = np.delete(data, movie_id_mapping[movie_id_most], 
...                   axis=1)
>>> Y_raw = data[:, movie_id_mapping[movie_id_most]]

We discard samples without a rating in movie ID 2858:

>>> X = X_raw[Y_raw > 0]
>>> Y = Y_raw[Y_raw > 0]
>>> print('Shape of X:', X.shape)
Shape of X: (3428, 3705)
>>> print('Shape of Y:', Y.shape)
Shape of Y: (3428,)

Again, we take a look at the distribution of the target movie ratings:

>>> display_distribution(Y)
Number of rating 1: 83
Number of rating 2: 134
Number of rating 3: 358
Number of rating 4: 890
Number of rating 5: 1963

We can consider movies with ratings greater than 3 as being liked (being recommended):

>>> recommend = 3
>>> Y[Y <= recommend] = 0
>>> Y[Y > recommend] = 1
>>> n_pos = (Y == 1).sum()
>>> n_neg = (Y == 0).sum()
>>> print(f'{n_pos} positive samples and {n_neg} negative 
...       samples.')
2853 positive samples and 575 negative samples.

As a rule of thumb in solving classification problems, we need to always analyze the label distribution and see how balanced (or imbalanced) the dataset is.

Next, to comprehensively evaluate our classifier's performance, we can randomly split the dataset into two sets, the training and testing sets, which simulate learning data and prediction data, respectively. Generally, the proportion of the original dataset to include in the testing split can be 20%, 25%, 33.3%, or 40%. We use the train_test_split function from scikit-learn to do the random splitting and to preserve the percentage of samples for each class:

>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test, Y_train, Y_test = train_test_split(X, Y,
...     test_size=0.2, random_state=42)

It is a good practice to assign a fixed random_state (for example, 42) during experiments and exploration in order to guarantee that the same training and testing sets are generated every time the program runs. This allows us to make sure that the classifier functions and performs well on a fixed dataset before we incorporate randomness and proceed further.

We check the training and testing sizes as follows:

>>> print(len(Y_train), len(Y_test))
2742 686

Another good thing about the train_test_split function is that the resulting training and testing sets will have the same class ratio.

Next, we train a Naïve Bayes model on the training set. You may notice that the values of the input features are from 0 to 5, as opposed to 0 or 1 in our toy example. Hence, we use the MultinomialNB module (https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html) from scikit-learn instead of the BernoulliNB module, as MultinomialNB can work with integer features. We import the module, initialize a model with a smoothing factor of 1.0 and prior learned from the training set, and train this model against the training set as follows:

>>> from sklearn.naive_bayes import MultinomialNB
>>> clf = MultinomialNB(alpha=1.0, fit_prior=True)
>>> clf.fit(X_train, Y_train)

Then, we use the trained model to make predictions on the testing set. We get the predicted probabilities as follows:

>>> prediction_prob = clf.predict_proba(X_test)
>>> print(prediction_prob[0:10])
[[7.50487439e-23 1.00000000e+00]
 [1.01806208e-01 8.98193792e-01]
 [3.57740570e-10 1.00000000e+00]
 [1.00000000e+00 2.94095407e-16]
 [1.00000000e+00 2.49760836e-25]
 [7.62630220e-01 2.37369780e-01]
 [3.47479627e-05 9.99965252e-01]
 [2.66075292e-11 1.00000000e+00]
 [5.88493563e-10 9.99999999e-01]
 [9.71326867e-09 9.99999990e-01]]

We get the predicted class as follows:

>>> prediction = clf.predict(X_test)
>>> print(prediction[:10])
[1. 1. 1. 0. 0. 0. 1. 1. 1. 1.]

Finally, we evaluate the model's performance with classification accuracy, which is the proportion of correct predictions:

>>> accuracy = clf.score(X_test, Y_test)
>>> print(f'The accuracy is: {accuracy*100:.1f}%')
The accuracy is: 71.6%

The classification accuracy is around 72%, which means that the Naïve Bayes classifier we just developed correctly recommends movies to around 72% of users. This is not bad, given that we extracted user-movie relationships only from the movie rating data where most ratings are unknown. Ideally, we could also utilize movie genre information from the file movies.dat, and user demographics (gender, age, occupation, and zip code) information from the file users.dat. Obviously, movies in similar genres tend to attract similar users, and users of similar demographics likely have similar movie preferences.

So far, we have covered in depth the first machine learning classifier and evaluated its performance by prediction accuracy. Are there any other classification metrics? Let's see in the next section.

Evaluating classification performance

Beyond accuracy, there are several metrics we can use to gain more insight and to avoid class imbalance effects. These are as follows:

  • Confusion matrix
  • Precision
  • Recall
  • F1 score
  • Area under the curve

confusion matrix summarizes testing instances by their predicted values and true values, presented as a contingency table:

Table 2.3: Contingency table for a confusion matrix

To illustrate this, we can compute the confusion matrix of our Naïve Bayes classifier. We use the confusion_matrix function from scikit-learn to compute it, but it is very easy to code it ourselves:

>>> from sklearn.metrics import confusion_matrix
>>> print(confusion_matrix(Y_test, prediction, labels=[0, 1]))
[[ 60  47]
 [148 431]]

As you can see from the resulting confusion matrix, there are 47 false positive cases (where the model misinterprets a dislike as a like for a movie), and 148 false negative cases (where it fails to detect a like for a movie). Hence, classification accuracy is just the proportion of all true cases:

Precision measures the fraction of positive calls that are correct, which is ,  and  in our case.

Recall, on the other hand, measures the fraction of true positives that are correctly identified, which is  and  in our case. Recall is also called the true positive rate.

The f1 score comprehensively includes both the precision and the recall, and equates to their harmonic mean. We tend to value the f1 score above precision or recall alone.

Let's compute these three measurements using corresponding functions from scikit-learn, as follows:

>>> from sklearn.metrics import precision_score, recall_score, f1_score
>>> precision_score(Y_test, prediction, pos_label=1) 
0.9016736401673641 
>>> recall_score(Y_test, prediction, pos_label=1) 
0.7443868739205527 
>>> f1_score(Y_test, prediction, pos_label=1) 
0.815515610217597

On the other hand, the negative (dislike) class can also be viewed as positive, depending on the context. For example, assign the 0 class as pos_label and we have the following:

>>> f1_score(Y_test, prediction, pos_label=0) 
0.38095238095238093

To obtain the precision, recall, and f1 score for each class, instead of exhausting all class labels in the three function calls as shown earlier, a quicker way is to call the classification_report function:

>>> from sklearn.metrics import classification_report
>>> report = classification_report(Y_test, prediction)
>>> print(report)
              precision    recall  f1-score   support
         0.0       0.29      0.56      0.38       107
         1.0       0.90      0.74      0.82       579
   micro avg       0.72      0.72      0.72       686
   macro avg       0.60      0.65      0.60       686
weighted avg       0.81      0.72      0.75       686

Here, weighted avg is the weighted average according to the proportions of the class.

The classification report provides a comprehensive view of how the classifier performs on each class. It is, as a result, useful in imbalanced classification, where we can easily obtain a high accuracy by simply classifying every sample as the dominant class, while the precision, recall, and f1 score measurements for the minority class, however, will be significantly low.

Precision, recall, and the f1 score are also applicable to multiclass classification, where we can simply treat a class we are interested in as a positive case, and any other classes as negative cases.

During the process of tweaking a binary classifier (that is, trying out different combinations of hyperparameters, for example, the smoothing factor in our Naïve Bayes classifier), it would be perfect if there was a set of parameters in which the highest averaged and class individual f1 scores are achieved at the same time. It is, however, usually not the case. Sometimes, a model has a higher average f1 score than another model, but a significantly low f1 score for a particular class; sometimes, two models have the same average f1 scores, but one has a higher f1 score for one class and a lower score for another class. In situations such as these, how can we judge which model works better? The area under the curve (AUC) of the receiver operating characteristic (ROC) is a consolidated measurement frequently used in binary classification.

The ROC curve is a plot of the true positive rate versus the false positive rate at various probability thresholds, ranging from 0 to 1. For a testing sample, if the probability of a positive class is greater than the threshold, then a positive class is assigned; otherwise, we use a negative class. To recap, the true positive rate is equivalent to recall, and the false positive rate is the fraction of negatives that are incorrectly identified as positive. Let's code and exhibit the ROC curve (under thresholds of 0.0, 0.1, 0.2, …, 1.0) of our model:

>>> pos_prob = prediction_prob[:, 1]
>>> thresholds = np.arange(0.0, 1.1, 0.05)
>>> true_pos, false_pos = [0]*len(thresholds), [0]*len(thresholds)
>>> for pred, y in zip(pos_prob, Y_test):
...     for i, threshold in enumerate(thresholds):
...         if pred >= threshold:
...            # if truth and prediction are both 1
...             if y == 1:
...                 true_pos[i] += 1
...            # if truth is 0 while prediction is 1
...             else:
...                 false_pos[i] += 1
...         else:
...             break

Then, let's calculate the true and false positive rates for all threshold settings (remember, there are 516.0 positive testing samples and 1191 negative ones):

>>> n_pos_test = (Y_test == 1).sum()
>>> n_neg_test = (Y_test == 0).sum()
>>> true_pos_rate = [tp / n_pos_test for tp in true_pos]
>>> false_pos_rate = [fp / n_neg_test for fp in false_pos]

Now, we can plot the ROC curve with Matplotlib:

>>> import matplotlib.pyplot as plt
>>> plt.figure()
>>> lw = 2
>>> plt.plot(false_pos_rate, true_pos_rate, 
...          color='darkorange', lw=lw)
>>> plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
>>> plt.xlim([0.0, 1.0])
>>> plt.ylim([0.0, 1.05])
>>> plt.xlabel('False Positive Rate')
>>> plt.ylabel('True Positive Rate')
>>> plt.title('Receiver Operating Characteristic')
>>> plt.legend(loc="lower right")
>>> plt.show()

Refer to Figure 2.8 for the resulting ROC curve:

Figure 2.8: ROC curve

In the graph, the dashed line is the baseline representing random guessing, where the true positive rate increases linearly with the false positive rate; its AUC is 0.5. The solid line is the ROC plot of our model, and its AUC is somewhat less than 1. In a perfect case, the true positive samples have a probability of 1, so that the ROC starts at the point with 100% true positive and 0% false positive. The AUC of such a perfect curve is 1. To compute the exact AUC of our model, we can resort to the roc_auc_score function of scikit-learn:

>>> from sklearn.metrics import roc_auc_score
>>> roc_auc_score(Y_test, pos_prob)
0.6857375752586637

What AUC value leads to the conclusion that a classifier is good? Unfortunately, there is no such "magic" number. We use the following rule of thumb as general guidelines: classification models achieving an AUC of 0.7 to 0.8 are considered acceptable, 0.8 to 0.9 are great, and anything above 0.9 are superb. Again, in our case, we are only using the very sparse movie rating data. Hence, an AUC of 0.69 is actually acceptable.

You have learned several classification metrics, and we will explore how to measure them properly and how to fine-tune our models in the next section.

Tuning models with cross-validation

We can simply avoid adopting the classification results from one fixed testing set, which we did in experiments previously. Instead, we usually apply the k-fold cross-validation technique to assess how a model will generally perform in practice.

In the k-fold cross-validation setting, the original data is first randomly divided into k equal-sized subsets, in which class proportion is often preserved. Each of these k subsets is then successively retained as the testing set for evaluating the model. During each trial, the rest of the k -1 subsets (excluding the one-fold holdout) form the training set for driving the model. Finally, the average performance across all k trials is calculated to generate an overall result:

Figure 2.9: Diagram of 3-fold cross-validation

Statistically, the averaged performance of k-fold cross-validation is a better estimate of how a model performs in general. Given different sets of parameters pertaining to a machine learning model and/or data preprocessing algorithms, or even two or more different models, the goal of model tuning and/or model selection is to pick a set of parameters of a classifier so that the best averaged performance is achieved. With these concepts in mind, we can now start to tweak our Naïve Bayes classifier, incorporating cross-validation and the AUC of ROC measurements.

In k-fold cross-validation, k is usually set at 3, 5, or 10. If the training size is small, a large k (5 or 10) is recommended to ensure sufficient training samples in each fold. If the training size is large, a small value (such as 3 or 4) works fine since a higher k will lead to an even higher computational cost of training on a large dataset.

We will use the split() method from the StratifiedKFold class of scikit-learn to divide the data into chunks with preserved class distribution:

>>> from sklearn.model_selection import StratifiedKFold
>>> k = 5
>>> k_fold = StratifiedKFold(n_splits=k, random_state=42)

After initializing a 5-fold generator, we choose to explore the following values for the following parameters:

  • alpha: This represents the smoothing factor, the initial value for each feature.
  • fit_prior: This represents whether to use prior tailored to the training data.

We start with the following options:

>>> smoothing_factor_option = [1, 2, 3, 4, 5, 6]
>>> fit_prior_option = [True, False]
>>> auc_record = {}

Then, for each fold generated by the split() method of the k_fold object, we repeat the process of classifier initialization, training, and prediction with one of the aforementioned combinations of parameters, and record the resulting AUCs:

>>> for train_indices, test_indices in k_fold.split(X, Y):
...     X_train, X_test = X[train_indices], X[test_indices]
...     Y_train, Y_test = Y[train_indices], Y[test_indices]
...     for alpha in smoothing_factor_option:
...         if alpha not in auc_record:
...             auc_record[alpha] = {}
...         for fit_prior in fit_prior_option:
...             clf = MultinomialNB(alpha=alpha, 
...                                 fit_prior=fit_prior)
...             clf.fit(X_train, Y_train)
...             prediction_prob = clf.predict_proba(X_test)
...             pos_prob = prediction_prob[:, 1]
...             auc = roc_auc_score(Y_test, pos_prob)
...             auc_record[alpha][fit_prior] = auc + 
...                        auc_record[alpha].get(fit_prior, 0.0)

Finally, we present the results, as follows:

>>> for smoothing, smoothing_record in auc_record.items():
...     for fit_prior, auc in smoothing_record.items():
...         print(f'    {smoothing}        {fit_prior}    
...               {auc/k:.5f}')
smoothing  fit prior  auc
    1        True    0.65647
    1        False    0.65708
    2        True    0.65795
    2        False    0.65823
    3        True    0.65740
    3        False    0.65801
    4        True    0.65808
    4        False    0.65795
    5        True    0.65814
    5        False    0.65694
    6        True    0.65663
    6        False    0.65719

The (2False) set enables the best averaged AUC, at 0.65823.

Finally, we retrain the model with the best set of hyperparameters (2False) and compute the AUC:

>>> clf = MultinomialNB(alpha=2.0, fit_prior=False)
>>> clf.fit(X_train, Y_train)
>>> pos_prob = clf.predict_proba(X_test)[:, 1]
>>> print('AUC with the best model:', roc_auc_score(Y_test,
...       pos_prob))
AUC with the best model:  0.6862056720417091

An AUC of 0.686 is achieved with the fine-tuned model. In general, tweaking model hyperparameters using cross-validation is one of the most effective ways to boost learning performance and reduce overfitting.

Summary

In this chapter, you learned the fundamental and important concepts of machine learning classification, including types of classification, classification performance evaluation, cross-validation, and model tuning. You also learned about the simple, yet powerful, classifier Naïve Bayes. We went in depth through the mechanics and implementations of Naïve Bayes with a couple of examples, the most important one being the movie recommendation project.

Binary classification was the main talking point of this chapter, and multiclass classification will be the subject of the next chapter. Specifically, we will talk about SVMs for image classification.

Exercise

  1. As mentioned earlier, we extracted user-movie relationships only from the movie rating data where most ratings are unknown. Can you also utilize data from the files movies.dat and users.dat?
  2. Practice makes perfect—another great project to deepen your understanding could be heart disease classification. The dataset can be downloaded directly at https://www.kaggle.com/ronitf/heart-disease-uci, or from the original page at https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
  3. Don't forget to fine-tune the model you obtained from Exercise 2 using the techniques you learned in this chapter. What is the best AUC it achieves?

References

To acknowledge the use of the MovieLens dataset in this chapter, I would like to cite the following paper:

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Dive into machine learning algorithms to solve the complex challenges faced by data scientists today
  • Explore cutting edge content reflecting deep learning and reinforcement learning developments
  • Use updated Python libraries such as TensorFlow, PyTorch, and scikit-learn to track machine learning projects end-to-end

Description

Python Machine Learning By Example, Third Edition serves as a comprehensive gateway into the world of machine learning (ML). With six new chapters, on topics including movie recommendation engine development with Naïve Bayes, recognizing faces with support vector machine, predicting stock prices with artificial neural networks, categorizing images of clothing with convolutional neural networks, predicting with sequences using recurring neural networks, and leveraging reinforcement learning for making decisions, the book has been considerably updated for the latest enterprise requirements. At the same time, this book provides actionable insights on the key fundamentals of ML with Python programming. Hayden applies his expertise to demonstrate implementations of algorithms in Python, both from scratch and with libraries. Each chapter walks through an industry-adopted application. With the help of realistic examples, you will gain an understanding of the mechanics of ML techniques in areas such as exploratory data analysis, feature engineering, classification, regression, clustering, and NLP. By the end of this ML Python book, you will have gained a broad picture of the ML ecosystem and will be well-versed in the best practices of applying ML techniques to solve problems.

Who is this book for?

If you’re a machine learning enthusiast, data analyst, or data engineer highly passionate about machine learning and want to begin working on machine learning assignments, this book is for you. Prior knowledge of Python coding is assumed and basic familiarity with statistical concepts will be beneficial, although this is not necessary.

What you will learn

  • Understand the important concepts in ML and data science
  • Use Python to explore the world of data mining and analytics
  • Scale up model training using varied data complexities with Apache Spark
  • Delve deep into text analysis and NLP using Python libraries such NLTK and Gensim
  • Select and build an ML model and evaluate and optimize its performance
  • Implement ML algorithms from scratch in Python, TensorFlow 2, PyTorch, and scikit-learn
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Oct 30, 2020
Length: 526 pages
Edition : 3rd
Language : English
ISBN-13 : 9781800209718
Category :
Languages :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Publication date : Oct 30, 2020
Length: 526 pages
Edition : 3rd
Language : English
ISBN-13 : 9781800209718
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 153.97
Machine Learning for Algorithmic Trading
$57.99
Python Machine Learning
$54.99
Python Machine Learning by Example
$40.99
Total $ 153.97 Stars icon

Table of Contents

16 Chapters
Getting Started with Machine Learning and Python Chevron down icon Chevron up icon
Building a Movie Recommendation Engine with Naïve Bayes Chevron down icon Chevron up icon
Recognizing Faces with Support Vector Machine Chevron down icon Chevron up icon
Predicting Online Ad Click-Through with Tree-Based Algorithms Chevron down icon Chevron up icon
Predicting Online Ad Click-Through with Logistic Regression Chevron down icon Chevron up icon
Scaling Up Prediction to Terabyte Click Logs Chevron down icon Chevron up icon
Predicting Stock Prices with Regression Algorithms Chevron down icon Chevron up icon
Predicting Stock Prices with Artificial Neural Networks Chevron down icon Chevron up icon
Mining the 20 Newsgroups Dataset with Text Analysis Techniques Chevron down icon Chevron up icon
Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling Chevron down icon Chevron up icon
Machine Learning Best Practices Chevron down icon Chevron up icon
Categorizing Images of Clothing with Convolutional Neural Networks Chevron down icon Chevron up icon
Making Predictions with Sequences Using Recurrent Neural Networks Chevron down icon Chevron up icon
Making Decisions in Complex Environments with Reinforcement Learning Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
(20 Ratings)
5 star 55%
4 star 25%
3 star 0%
2 star 5%
1 star 15%
Filter icon Filter
Top Reviews

Filter reviews by




siebo Dec 15, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Excellent book, I really enjoy Yuxi's writing style. He is equally adept at explaining algos, classifiers, and code architecture as he is navigating the business cases, design workflows and the stories behind the data. Essentially, everything you would expect in a top-flight Google engineer. If you work in AdTech or Analytics-driven Marketing, the book is an obvious buy for chapters 4-6 alone on predicting and optimizing ad click-through. Yuxi manages to fit a huge amount of content into this book, while delivering each topic in concise, approachable writing. At 500+ pages, he is not cutting any corners. My favorite chapters were the ones on Facial Recognition and predicting financial market pattern, as those are the most relevant to my own work. The Best Practices section in Ch. 11 will save you from a lot of issues. I think this book would be approachable to beginners, as the author starts from a foundation level and goes up from there. For advanced who want to dive deeper, Python Machine Learning (from the same publisher) is a good follow-on, and covers some of the topics from this book in greater detail.
Amazon Verified review Amazon
Annie Jun 19, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The concepts are explained clearly and step by step. Machine learning is not an easy topic, I find that this book is really helpful
Amazon Verified review Amazon
Brandon Dec 01, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book is a great practical resource for those interested in applying ML techniques quickly. As other reviewers have mentioned, it is a bit light on theory and the more technical aspects of ML until you get deeper into the book. Having said that, the examples and code provided are very practical and to the point. I would recommend it if you are either already familiar with basic ML theory and the math behind it or if you have a software engineering background and are simply looking to quickly implement an ML solution. The author walks you through code segments and explains each step and by the end, you will have covered most of the more popular ML models and know which ones to use given your data and your goals.The chapters on SVMs and building and predicting ad-clicks are very practical. I like how the author walks through each model type and compares them so you have a baseline and can see the differences in implementation and result. That was a helpful exercise spread across a few chapters. Also, the Best Practices chapter near the end was a good, "I need an answer quickly" kind of reference.Overall, I liked the book and think it would be helpful from a more practical perspective. I think this book paired with another more theoretical resource would really round out those seeking to learn ML.
Amazon Verified review Amazon
crystalattice Nov 23, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Python ML By Example (BE) is a good complement to Python ML Third Edition (3E). The 3E book focuses on the theory and general application of ML programming, while the BE book focuses an specific application examples.While they both tackle ML programming, their approach is different. The BE book assumes you have a reasonable, foundational background in ML and uses that basis to create specific ML-based applications.For example, whereas 3E has a simple note about Naïve Bayes classification, the BE book has a whole chapter dedicated to the algorithm, discussing the different types of classification methods, how Naïve Bayes works, and then actually implementing a Naïve Bayes application. On the flip side, the 3E book has a whole chapter dedicated just to the different classifiers and different implementations of them using scikit-learn.It's almost like the 3E book is a textbook and the BE book is its complementary workbook for practice. While you may be able to be successful with either one, combining them really maximizes your ML learning.To speak about the BE book in more detail, the topics covered include:*Introduction to Python ML, including software installation*Using Naïve Bayes algorithm to create movie recommendation application*Using SVM for facial recognition*Using tree-based algorithms to predict ad click-through*Using Apache Spark to work with large data sets*Using regression algorithms and neural networks to predict the stock market*Using text analysis and NLP to data mine newsgroups*Using unsupervised learning models to identify newsgroups topics*Using different types of neural networks for different types of analysis approaches*Using reinforcement learning for decision making*ML best practicesIt is a long book (nearly 500 pages), but the material is invaluable for anyone in the ML field, especially if you don't have a lot of experience with the different algorithms. And in conjunction with 3E, you almost have a complete ML curriculum.
Amazon Verified review Amazon
Matt M Dec 08, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
It is a fantastic, well-written book in machine learning with python. It provides enough technical background about each technique's theory, followed by beautiful graphs/tables and its Python code.It covers various topics and practical examples, such as the powerful and popular classification models in supervised learning, regression algorithms, reinforcement learning, and much more.This book prepares you for real-world machine learning problems. I highly recommended it!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the digital copy I get with my Print order? Chevron down icon Chevron up icon

When you buy any Print edition of our Books, you can redeem (for free) the eBook edition of the Print Book you’ve purchased. This gives you instant access to your book when you make an order via PDF, EPUB or our online Reader experience.

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
Modal Close icon
Modal Close icon