Reader small image

You're reading from  Regression Analysis with Python

Product typeBook
Published inFeb 2016
Reading LevelIntermediate
Publisher
ISBN-139781785286315
Edition1st Edition
Languages
Concepts
Right arrow
Authors (2):
Luca Massaron
Luca Massaron
author image
Luca Massaron

Having joined Kaggle over 10 years ago, Luca Massaron is a Kaggle Grandmaster in discussions and a Kaggle Master in competitions and notebooks. In Kaggle competitions he reached no. 7 in the worldwide rankings. On the professional side, Luca is a data scientist with more than a decade of experience in transforming data into smarter artifacts, solving real-world problems, and generating value for businesses and stakeholders. He is a Google Developer Expert(GDE) in machine learning and the author of best-selling books on AI, machine learning, and algorithms.
Read more about Luca Massaron

Alberto Boschetti
Alberto Boschetti
author image
Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
Read more about Alberto Boschetti

View More author details
Right arrow

Chapter 4. Logistic Regression

In this chapter, another supervised method is introduced: classification. We will introduce the simplest classifier, the Logistic Regressor, which shares the same foundations as the Linear Regressor, but it targets classification problems.

In the following chapter, you'll find:

  • A formal and mathematical definition of the classification problem, for both binary and multiclass problems

  • How to evaluate classifier performances—that is, their metrics

  • The math behind Logistic Regression

  • A revisited formula for SGD, specifically built for Logistic Regression

  • The multiclass case, with Multiclass Logistic Regression

Defining a classification problem


Although the name Logistic Regression suggests a regression operation, the goal of Logistic Regression is classification. In a very rigorous world such as statistics, why is this technique ambiguously named? Simple, the name is not wrong at all, and it makes perfect sense: it just requires a bit of an introduction and investigation. After that you'll fully understand why it's named Logistic Regression, and you'll no longer think that it's a wrong name.

First, let's introduce what a classification problem is, what a classifier is, how it operates, and what its output is.

In the previous chapter, we presented regression as the operation of estimating a continuous value in a target variable; mathematically speaking, the predicted variable is a real number in the range (−∞, +∞). Classification, instead, predicts a class, that is, an index in a finite set of classes. The simplest case is named binary classification, and the output is typically a Boolean value ...

Defining a probability-based approach


Let's gradually introduce how logistic regression works. We said that it's a classifier, but its name recalls a regressor. The element we need to join the pieces is the probabilistic interpretation.

In a binary classification problem, the output can be either "0" or "1". What if we check the probability of the label belonging to class "1"? More specifically, a classification problem can be seen as: given the feature vector, find the class (either 0 or 1) that maximizes the conditional probability:

Here's the connection: if we compute a probability, the classification problem looks like a regression problem. Moreover, in a binary classification problem, we just need to compute the probability of membership of class "1", and therefore it looks like a well-defined regression problem. In the regression problem, classes are no longer "1" or "0" (as strings), but 1.0 and 0.0 (as the probability of belonging to class "1").

Let's now try fitting a multiple linear...

Revisiting gradient descent


In the previous chapter, we introduced the gradient descent technique to speed up processing. As we've seen with Linear Regression, the fitting of the model can be made in two ways: closed form or iterative form. Closed form gives the best possible solution in one step (but it's a very complex and time-demanding step); iterative algorithms, instead, reach the minima step by step with few calculations for each update and can be stopped at any time.

Gradient descent is a very popular choice for fitting the Logistic Regression model; however, it shares its popularity with Newton's methods. Since Logistic Regression is the base of the iterative optimization, and we've already introduced it, we will focus on it in this section. Don't worry, there is no winner or any best algorithm: all of them can reach the very same model eventually, following different paths in the coefficients' space.

First, we should compute the derivate of the loss function. Let's make it a bit...

Multiclass Logistic Regression


The extension to Logistic Regression, for classifying more than two classes, is Multiclass Logistic Regression. Its foundation is actually a generic approach: it doesn't just work for Logistic Regressors, it also works with other binary classifiers. The base algorithm is named One-vs-rest, or One-vs-all, and it's simple to grasp and apply.

Let's describe it with an example: we have to classify three kinds of flowers and, given some features, the possible outputs are three classes: f1, f2, and f3. That's not what we've seen so far; in fact, this is not a binary classification problem. Instead, it seems very easy to break down this problem into three simpler problems:

  • Problem #1: Positive examples (that is, the ones that get the label "1") are f1; negative examples are all the others

  • Problem #2: Positive examples are f2; negative examples are f1 and f3

  • Problem #3: Positive examples are f3; negative examples are f1 and f2

For all three problems, we can use a binary...

An example


We now look at a practical example, containing what we've seen so far in this chapter.

Our dataset is an artificially created one, composed of 10,000 observations and 10 features, all of them informative (that is, no redundant ones) and labels "0" and "1" (binary classification). Having all the informative features is not an unrealistic hypothesis in machine learning, since usually the feature selection or feature reduction operation selects non-related features.

In:
X, y = make_classification(n_samples=10000, n_features=10,
                           n_informative=10, n_redundant=0,
                           random_state=101)

Now, we'll show you how to use different libraries, and different modules, to perform the classification task, using logistic regression. We won't focus here on how to measure the performance, but on how the coefficients can compose the model (what we've named in the previous chapters).

As a first step, we will use Statsmodel. After having loaded the right...

Summary


We've seen in this chapter how to build a binary classifier based on Linear Regression and the logistic function. It's fast, small, and very effective, and can be trained using an incremental technique based on SGD. Moreover, with very little effort (the One-vs-Rest approach), the Binary Logistic Regressor can become multiclass.

In the next chapter, we will focus on how to prepare data: to obtain the maximum from the supervised algorithm, the input dataset must be carefully cleaned and normalized. In fact, real world datasets can have missing data, errors, and outliers, and variables can be categorical and with different ranges of values. Fortunately, some popular algorithms deal with these problems, transforming the dataset in the best way possible for the machine learning algorithm.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Regression Analysis with Python
Published in: Feb 2016Publisher: ISBN-13: 9781785286315
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Luca Massaron

Having joined Kaggle over 10 years ago, Luca Massaron is a Kaggle Grandmaster in discussions and a Kaggle Master in competitions and notebooks. In Kaggle competitions he reached no. 7 in the worldwide rankings. On the professional side, Luca is a data scientist with more than a decade of experience in transforming data into smarter artifacts, solving real-world problems, and generating value for businesses and stakeholders. He is a Google Developer Expert(GDE) in machine learning and the author of best-selling books on AI, machine learning, and algorithms.
Read more about Luca Massaron

author image
Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
Read more about Alberto Boschetti