Packt+ | Advance your knowledge in tech

You're reading from Learning Predictive Analytics with Python

Product typeBook

Published inFeb 2016

Reading LevelIntermediate

Publisher

ISBN-139781783983261

Edition1st Edition

Languages

Python

Concepts

Predictive Analytics

Authors (2):

Ashish Kumar

Gary Dougan

View More author details

Chapter 6. Logistic Regression with Python

In the previous chapter, we learned about linear regression. We saw that linear regression is one of the most basic models that assumes that there is a linear relationship between a predictor variable and an output variable.

In this chapter, we will be discussing the details of logistic regression. We will be covering the following topics in this chapter:

Math behind logistic regression: Logistic regression relies on concepts such as conditional probability and odds ratio. In this chapter, we will understand what they mean and how they are applied. We will also see how the odds ratio is transformed to establish a linear relationship with the predictor variable. We will analyze the final logistic regression equation and understand the meaning of each term and coefficient.
Implementing logistic regression with Python: Similar to what we did in the last chapter, we will take a dataset and implement a logistic regression model on it to understand the...

Linear regression versus logistic regression

One thing to note about the linear regression model is that the output variable is always a continuous variable. In other words, linear regression is a good choice when one needs to predict continuous numbers. However, what if the output variable is a discrete number. What if we want to classify our records in two or more categories? Can we still extend the assumptions of a linear relationship and try to classify the records?

As it happens, there is a separate regression model that takes care of a situation where the output variable is a binary or categorical variable rather than a continuous variable. This model is called logistic regression. In other words, logistic regression is a variation of linear regression where the output variable is a binary or categorical variable. The two regressions are similar in the sense that they both assume a linear relationship between the predictor and output variables. However, as we will see soon, the output...

Understanding the math behind logistic regression

Imagine a situation where we have a dataset from a supermarket store about the gender of the customer and whether that person bought a particular product or not. We are interested in finding the chances of a customer buying that particular product, given their gender. What comes to mind when someone poses this question to you? Probability anyone? Odds of success?

What is the probability of a customer buying a product, given he is a male? What is the probability of a customer buying that product, given she is a female? If we know the answers to these questions, we can make a leap towards predicting the chances of a customer buying a product, given their gender.

Let us look at such a dataset. To do so, we write the following code snippet:

import pandas as pd
df=pd.read_csv('E:/Personal/Learning/Predictive Modeling Book/Book Datasets/Logistic Regression/Gender Purchase.csv')
df.head()

Fig. 6.1: Gender and Purchase dataset

The first column mentions...

Implementing logistic regression with Python

We have understood the mathematics that goes behind the logistic regression algorithm. Now, let's take one dataset and implement a logistic regression model from scratch. The dataset we will be working with is from the marketing department of a bank and has data about whether the customers subscribed to a term deposit, given some information about the customer and how the bank has engaged and reached out to the customers to sell the term deposit.

Let us import the dataset and start exploring it:

import pandas as pd
bank=pd.read_csv('E:/Personal/Learning/Predictive Modeling Book/Book Datasets/Logistic Regression/bank.csv',sep=';')
bank.head()

The dataset looks as follows:

Fig. 6.6: A glimpse of the bank dataset

There are 4119 records and 21 columns. The column names are as follows:

bank.columns.values

Fig. 6.7: The columns of the bank dataset

The details of each column are mentioned in the Data Dictionary file present in the Logistic Regression folder...

Model validation and evaluation

The preceding logistic regression model is built on the entire data. Let us now split the data into training and testing sets, build the model using the training set, and then check the accuracy using the testing set. The ultimate goal is to see whether it improves the accuracy of the prediction or not:

from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=0)

The preceding code snippet creates testing and training datasets for a predictor and also outcome variables. Let us now build a logistic regression model over the training set:

from sklearn import linear_model
from sklearn import metrics
clf1 = linear_model.LogisticRegression()
clf1.fit(X_train, Y_train)

The preceding code snippet creates the model. If you remember the equation behind the model, you will know that the model predicts probabilities and not the classes (binary output, that is, 0 or 1). One needs to select a...

Model validation

Once the model has been built and evaluated, the next step is to validate the model. In the case of logistic regression models or classification models in general, we basically validate the model by comparing the actual class with the predicted class. There are various ways to do this, but the most famous and widely used is the Receiver Operating Characteristic (ROC) curve.

The ROC curve

An ROC curve is a graphical tool to understand the performance of a classification model. For a logistic regression model, a prediction can either be positive or negative. Also, this prediction can either be correct or incorrect.

There are four categories in which the predictions of a logistic regression model can fall:

Summary

A logistic regression is a versatile technique used widely in the cases where the variable to be predicted is a binary (or categorical) variable. This chapter dives deep into the math behind the logistics regression and the process to implement it using the scikit-learn and statsmodel api modules. It is important to understand the math behind the algorithm so that the model is not used as a black box without knowing what is going on behind the hood. To recap, the following are the main takeaways from the chapter:

Linear regression wouldn't be an appropriate model to predict binary variables as the predictor variables can range from -infinity to +infinity, while the binary variable would be 0 or 1.
The odds of a certain event happening is the probability of that event happening divided by the probability of that event not happening. The higher the odds, the higher are the chances of the event happening. The odds can range from 0 to infinity.
The final equation for the logistic regression...

The rest of the chapter is locked

You have been reading a chapter from

Learning Predictive Analytics with Python

Published in: Feb 2016Publisher: ISBN-13: 9781783983261

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Ashish Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about Ashish Kumar

Gary Dougan

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

Actual/predicted	Positive	Negative
Positive	True Positive (TP): Correct positive prediction Actually positive and prediction is also positive	True Negative (TN): Correct negative prediction Actually negative and prediction is also...