You're reading from Machine Learning with scikit-learn Quick Start Guide

Product typeBook

Published inOct 2018

Reading LevelIntermediate

PublisherPackt

ISBN-139781789343700

Edition1st Edition

Languages

Python

Tools

Scikit-learn

Concepts

Machine Learning

Author (1)

Kevin Jolly

Predicting Categories with Logistic Regression

The logistic regression algorithm is one of the most interpretable algorithms in the world of machine learning, and although the word "regression" implies predicting a numerical outcome, the logistic regression algorithm is, used to predict categories and solve classification machine learning problems.

In this chapter, you will learn about the following:

How the logistic regression algorithm works mathematically
Implementing and evaluating your first logistic regression algorithm with scikit-learn
Fine-tuning the hyperparameters using GridSearchCV
Scaling your data for a potential improvement in accuracy
Interpreting the results of the model

Logistic regression has a wide range of applications, especially in the field of finance, where building interpretable machine learning models is key in convincing both investors...

Technical requirements

You will be required to have Python 3.6 or greater, Pandas ≥ 0.23.4, Scikit-learn ≥ 0.20.0, and Matplotlib ≥ 3.0.0 installed on your system.

The code files of this chapter can be found on GitHub:
https://github.com/PacktPublishing/Machine-Learning-with-scikit-learn-Quick-Start-Guide/blob/master/Chapter_03.ipynb

Check out the following video to see the code in action:

http://bit.ly/2DaTNgQ

Understanding logistic regression mathematically

As the name implies, logistic regression is fundamentally derived from the linear regression algorithm. The linear regression algorithm will be discussed in depth in the upcoming chapters. For now, let's consider a hypothetical case in which we want to predict the probability that a particular loan will default based on the loan's interest rate. Using linear regression, the following equation can be constructed:

Default = (Interest Rate × x) + c

In the preceding equation, c is the intercept and x is a coefficient that will be the output from the logistic regression model. The intercept and the coefficient will have numeric values. For the purpose of this example, let's assume c is 5 and x is -0.2. The equation now becomes this:

Default = (Interest Rate × -0.2) + 5

The equation can be represented in a...

Implementing logistic regression using scikit-learn

In this section, you will learn how you can implement and quickly evaluate a logistic regression model for your dataset. We will be using the same dataset that we have already cleaned and prepared for the purpose of predicting whether a particular transaction was fraudulent. In the previous chapter, we saved this dataset as fraud_detection.csv. The first step is to load this dataset into your Jupyter Notebook. This can be done by using the following code:

import pandas as pd

# Reading in the dataset 

df = pd.read_csv('fraud_prediction.csv')

Splitting the data into training and test sets

The first step to building any machine learning model with scikit-learn is to...

Fine-tuning the hyperparameters

From the output of the logistic regression model implemented in the preceding section, it is clear that the model performs slightly better than random guessing. Such a model fails to provide value to us. In order to optimize the model, we are going to optimize the hyperparameters of the logistic regression model by using the GridSearchCV algorithm that we used in the previous chapter.

The hyperparameter that is used by the logistic regression model is known as the inverse regularization strength. This is because we are implementing a type of linear regression known as l1 regression. This type of linear regression will explained in detail in Chapter 5, Predicting Numeric Outcomes with Linear Regression.

In order to optimize the inverse regularization strength, or C as it is called in short, we use the following code:

#Building the model with L1...

Scaling the data

Although the model has performed extremely well, scaling the data is still a useful step in building machine learning models with logistic regression, as it standardizes your data across the same range of values. In order to scale your data, we will use the same StandardScaler() function that we used in the previous chapter. This is done by using the following code:

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

#Setting up the scaling pipeline 

pipeline_order = [('scaler', StandardScaler()), ('logistic_reg', linear_model.LogisticRegression(C = 10, penalty = 'l1'))]

pipeline = Pipeline(pipeline_order)

#Fitting the classfier to the scaled dataset 

logistic_regression_scaled = pipeline.fit(X_train, y_train)

#Extracting the score 

logistic_regression_scaled.score(X_test, y_test)

The preceding code resulted...

Interpreting the logistic regression model

One of the key benefits of the logistic regression algorithm is that it is highly interpretable. This means that the outcome of the model can be interpreted as a function of the input variables. This allows us to understand how each variable contributes to the eventual outcome of the model.

In the first section, we understood that the logistic regression model consists of coefficients for each variable and an intercept that can be used to explain how the model works. In order to extract the coefficients for each variable in the model, we use the following code:

#Printing out the coefficients of each variable 

print(logistic_regression.coef_)

This results in an output as illustrated by the following screenshot:

The coefficients are in the order in which the variables were in the dataset that was input into the model. In order to extract...

Summary

In this chapter, you have learned how the logistic regression model works on a mathematical level. Although simplistic, the model proves to be formidable in terms of interpretability, which is highly beneficial in the financial industry.

You have also learned how to build and evaluate logistic regression algorithms using scikit-learn, and looked at hyperparameter optimization using the GridSearchCV algorithm. Additionally, you have learned to verify whether the results provided to you by the GridSearchCV algorithm are accurate by plotting the accuracy scores for different values of the hyperparameter.

Finally, you have scaled your data in order make it standardized and learned how to interpret your model on a mathematical level.

In the next chapter, you will learn how to implement tree-based algorithms, such as decision trees, random forests, and gradient-boosted trees...

The rest of the chapter is locked

You have been reading a chapter from

Machine Learning with scikit-learn Quick Start Guide

Published in: Oct 2018Publisher: PacktISBN-13: 9781789343700

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Kevin Jolly

Kevin Jolly is a formally educated data scientist with a master's degree in data science from the prestigious King's College London. Kevin works as a statistical analyst with a digital healthcare start-up, Connido Limited, in London, where he is primarily involved in leading the data science projects that the company undertakes. He has built machine learning pipelines for small and big data, with a focus on scaling such pipelines into production for the products that the company has built. Kevin is also the author of a book titled Hands-On Data Visualization with Bokeh, published by Packt. He is the editor-in-chief of Linear, a weekly online publication on data science software and products.
Read more about Kevin Jolly

Other recommended products

Related to this chapter

Advanced Analytics with R and Tableau

R is the go-to tool for statistics and data mining while Tableau offers an interface to filter data, plug and play with rich visualizations to describe insights from your data. When combined these two tools makes it easier to harness interesting patterns and communicate stories. This book covers various analytical techniques like prediction, classification, clustering and best practices to visualize it using interactive dashboard with drop-downs, sliders, and other visual cues of Tableau. Get to know how R can be used in conjunction with Tableau and implement powerful machine learning techniques making big data analytics accessible and presentable through Tableau workbooks.

BookAug 2017178 pages

Mastering Machine Learning for Penetration Testing

We live in an era where cyber security plays an important role. As systems are getting smarter, we now see machine learning interrupting computer security. With the adoption of machine learning in upcoming security products, it’s important for pentesters and security researchers to understand how these systems work, and to breach them for testing purposes.

BookJun 2018276 pages

scikit-learn Cookbook

scikit-learn has evolved as a robust library for machine learning applications in python with support for a wide range of supervised and unsupervised learning algorithms. This edition brings to you the various enhancements to its model implementations, API and bug fixes in the latest major release of scikit-learn to support Python. This book covers easy to follow recipes right from mathematical operations to implementing various supervised, unsupervised and deep learning algorithms with scikit-learn. Get practical hands-on knowledge to implement various models and algorithms like Multi-Layer Perceptrons, time-series split, MAE criterion for regression, criteria for gradient boosting, Classifier, Regressor, and much more.

BookNov 2017374 pages

Python Machine Learning for Beginners

Python Machine Learning for Beginners presents you with a hands-on approach to learn machine learning fast. Covering everything from data analysis and visualization to machine learning and statistical models for data science, this book will take you from beginner to expert in no time at all.

BookMar 2021301 pages

Artificial Intelligence and Machine Learning Fundamentals

Artificial Intelligence and Machine Learning Fundamentals teaches you machine learning and neural networks from the ground up using real-world examples. After you complete this book, you will be excited to revamp your current projects or build new intelligent networks.

BookDec 2018330 pages

Data Science Algorithms in a Week

Choosing the right algorithm is often a key differentiator in the success or failure of a data model and its optimal performance. This book introduces you to 7 key machine learning algorithms which you can easily grasp within a week and includes exercises that will help you learn different aspects of machine learning without any hassle.

BookOct 2018214 pages

Hands-On Exploratory Data Analysis with Python

This book provides practical knowledge about the main pillars of EDA including data cleaning, data preparation, data exploration, and data visualization. You can leverage the power of Python to understand, summarize and investigate your data in the best way possible. The book presents a unique approach to exploring hidden features in your data.

BookMar 2020352 pages

Machine Learning for OpenCV

Machine learning for OpenCV begins by introducing you to the essential concepts of statistical learning, such as classification and regression. Once all the basics are covered, you will start exploring various algorithms such as decision trees, support vector machines, and Bayesian networks, and learn how to combine them with other OpenCV functionality. As the book progresses, so will your machine learning skills, until you are ready to take on today's hottest topic in the field: Deep Learning. Combined with your having learned to select the right tool for the task, this book will make sure you get comfortable with all relevant machine learning fundamentals.

BookJul 2017382 pages

Hands-On Ensemble Learning with Python

Ensemble learning can provide the necessary methods to improve the accuracy and performance of existing models. In this book, you'll understand how to combine different machine learning algorithms to produce more accurate results from your models.

BookJul 2019298 pages

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

This book covers the theory and practice of building data-driven solutions. Includes the end-to-end process, using supervised and unsupervised algorithms. With each algorithm, you will learn the data acquisition and data engineering methods, the apt metrics, and the available hyper-parameters. You will learn how to deploy the models in production.

BookJul 2020384 pages

Machine Learning Fundamentals

As machine learning algorithms become popular, new tools that optimize these algorithms are also developed. Machine Learning Fundamentals explains the scikit-learn API, which is a package created to facilitate the process of building machine learning applications. By explaining the differences between supervised and unsupervised models and by applying some popular algorithms to real-life datasets, this course gives you the skills and confidence to start programming machine learning algorithms.

BookNov 2018240 pages

Big Data Analytics with Java

This book will start with the basic statistical analysis on big data using java and would then build on other topics on analytics like classification, regression, clustering and ensembling. It would also cover advanced topics of recommendation engines, massive graph analytics, real time analytics and deep learning.

BookJul 2017418 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages