You're reading from Machine Learning with scikit-learn Quick Start Guide

Product typeBook

Published inOct 2018

Reading LevelIntermediate

PublisherPackt

ISBN-139781789343700

Edition1st Edition

Languages

Python

Tools

Scikit-learn

Concepts

Machine Learning

Author (1)

Kevin Jolly

Performance Evaluation Methods

Your method of performance evaluation will vary by the type of machine learning algorithm that you choose to implement. In general, there are different metrics that can potentially determine how well your model is performing at its given task for classification, regression, and unsupervised machine learning algorithms.

In this chapter, we will explore how the different performance evaluation methods can help you to better understand your model. The chapter will be split into three sections, as follows:

Performance evaluation for classification algorithms
Performance evaluation for regression algorithms
Performance evaluation for unsupervised algorithms

Technical requirements

You will be required to have Python 3.6 or greater, Pandas ≥ 0.23.4, Scikit-learn ≥ 0.20.0, NumPy ≥ 1.15.1, Matplotlib ≥ 3.0.0, and Scikit-plot ≥ 0.3.7 installed on your system.

The code files of this chapter can be found on GitHub:
https://github.com/PacktPublishing/Machine-Learning-with-scikit-learn-Quick-Start-Guide/blob/master/Chapter_08.ipynb

Check out the following video to see the code in action:

http://bit.ly/2EY4nJU

Why is performance evaluation critical?

It is key for you to understand why we need to evaluate the performance of a model in the first place. Some of the potential reasons why performance evaluation is critical are as follows:

It prevents overfitting: Overfitting occurs when your algorithm hugs the data too tightly and makes predictions that are specific to only one dataset. In other words, your model cannot generalize its predictions outside of the data that it was trained on.
It prevents underfitting: This is the exact opposite of overfitting. In this case, the model is very generic in nature.
Understanding predictions: Performance evaluation methods will help you to understand, in greater detail, how your model makes predictions, along with the nature of those predictions and other useful information, such as the accuracy of your model.

...

Performance evaluation for classification algorithms

In order to evaluate the performance of classification, let's consider the two classification algorithms that we have built in this book: k-nearest neighbors and logistic regression.

The first step will be to implement both of these algorithms in the fraud detection dataset. We can do this by using the following code:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import linear_model

#Reading in the fraud detection dataset 

df = pd.read_csv('fraud_prediction.csv')

#Creating the features 

features = df.drop('isFraud', axis = 1).values
target = df['isFraud'].values

#Splitting the data into training and test sets 

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size = 0.3, random_state ...

Performance evaluation for regression algorithms

There are three main metrics that you can use to evaluate the performance of the regression algorithm that you built, as follows:

Mean absolute error (MAE)
Mean squared error (MSE)
Root mean squared error (RMSE)

In this section, you will learn what the three metrics are, how they work, and how you can implement them using scikit-learn. The first step is to build the linear regression algorithm. We can do this by using the following code:

## Building a simple linear regression model

#Reading in the dataset

df = pd.read_csv('fraud_prediction.csv')

#Define the feature and target arrays

feature = df['oldbalanceOrg'].values
target = df['amount'].values

#Initializing a linear regression model 

linear_reg = linear_model.LinearRegression()

#Reshaping the array since we only have a single feature

feature = feature...

Performance evaluation for unsupervised algorithms

In this section, you will learn how to evaluate the performance of an unsupervised machine learning algorithm, such as the k-means algorithm. The first step is to build a simple k-means model. We can do so by using the following code:

#Reading in the dataset

df = pd.read_csv('fraud_prediction.csv')

#Dropping the target feature & the index

df = df.drop(['Unnamed: 0', 'isFraud'], axis = 1)

#Initializing K-means with 2 clusters

k_means = KMeans(n_clusters = 2)

Now that we have a simple k-means model with two clusters, we can proceed to evaluate the model's performance. The different visual performance charts that can be deployed are as follows:

Elbow plot
Silhouette analysis plot

In this section, you will learn how to create and interpret each of the preceding plots.

...

Summary

In this chapter, you learned how to evaluate the performances of the three different types of machine learning algorithms: classification, regression, and unsupervised.

For the classification algorithms, you learned how to evaluate the performance of a model by using a series of visual techniques, such as the confusion matrix, normalized confusion matrix, area under the curve, K-S statistic plot, cumulative gains plot, lift curve, calibration plot, learning curve, and cross-validated box plot.

For the regression algorithms, you learned how to evaluate the performance of a model by using three metrics: the mean squared error, mean absolute error, and root mean squared error.

Finally, for the unsupervised machine learning algorithms, you learned how to evaluate the performance of a model by using the elbow plot.

Congratulations! You have now made it to the end of your...

The rest of the chapter is locked

You have been reading a chapter from

Machine Learning with scikit-learn Quick Start Guide

Published in: Oct 2018Publisher: PacktISBN-13: 9781789343700

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Kevin Jolly

Kevin Jolly is a formally educated data scientist with a master's degree in data science from the prestigious King's College London. Kevin works as a statistical analyst with a digital healthcare start-up, Connido Limited, in London, where he is primarily involved in leading the data science projects that the company undertakes. He has built machine learning pipelines for small and big data, with a focus on scaling such pipelines into production for the products that the company has built. Kevin is also the author of a book titled Hands-On Data Visualization with Bokeh, published by Packt. He is the editor-in-chief of Linear, a weekly online publication on data science software and products.
Read more about Kevin Jolly

Other recommended products

Related to this chapter

Advanced Analytics with R and Tableau

R is the go-to tool for statistics and data mining while Tableau offers an interface to filter data, plug and play with rich visualizations to describe insights from your data. When combined these two tools makes it easier to harness interesting patterns and communicate stories. This book covers various analytical techniques like prediction, classification, clustering and best practices to visualize it using interactive dashboard with drop-downs, sliders, and other visual cues of Tableau. Get to know how R can be used in conjunction with Tableau and implement powerful machine learning techniques making big data analytics accessible and presentable through Tableau workbooks.

BookAug 2017178 pages

Mastering Machine Learning for Penetration Testing

We live in an era where cyber security plays an important role. As systems are getting smarter, we now see machine learning interrupting computer security. With the adoption of machine learning in upcoming security products, it’s important for pentesters and security researchers to understand how these systems work, and to breach them for testing purposes.

BookJun 2018276 pages

scikit-learn Cookbook

scikit-learn has evolved as a robust library for machine learning applications in python with support for a wide range of supervised and unsupervised learning algorithms. This edition brings to you the various enhancements to its model implementations, API and bug fixes in the latest major release of scikit-learn to support Python. This book covers easy to follow recipes right from mathematical operations to implementing various supervised, unsupervised and deep learning algorithms with scikit-learn. Get practical hands-on knowledge to implement various models and algorithms like Multi-Layer Perceptrons, time-series split, MAE criterion for regression, criteria for gradient boosting, Classifier, Regressor, and much more.

BookNov 2017374 pages

Python Machine Learning for Beginners

Python Machine Learning for Beginners presents you with a hands-on approach to learn machine learning fast. Covering everything from data analysis and visualization to machine learning and statistical models for data science, this book will take you from beginner to expert in no time at all.

BookMar 2021301 pages

Artificial Intelligence and Machine Learning Fundamentals

Artificial Intelligence and Machine Learning Fundamentals teaches you machine learning and neural networks from the ground up using real-world examples. After you complete this book, you will be excited to revamp your current projects or build new intelligent networks.

BookDec 2018330 pages

Data Science Algorithms in a Week

Choosing the right algorithm is often a key differentiator in the success or failure of a data model and its optimal performance. This book introduces you to 7 key machine learning algorithms which you can easily grasp within a week and includes exercises that will help you learn different aspects of machine learning without any hassle.

BookOct 2018214 pages

Hands-On Exploratory Data Analysis with Python

This book provides practical knowledge about the main pillars of EDA including data cleaning, data preparation, data exploration, and data visualization. You can leverage the power of Python to understand, summarize and investigate your data in the best way possible. The book presents a unique approach to exploring hidden features in your data.

BookMar 2020352 pages

Machine Learning for OpenCV

Machine learning for OpenCV begins by introducing you to the essential concepts of statistical learning, such as classification and regression. Once all the basics are covered, you will start exploring various algorithms such as decision trees, support vector machines, and Bayesian networks, and learn how to combine them with other OpenCV functionality. As the book progresses, so will your machine learning skills, until you are ready to take on today's hottest topic in the field: Deep Learning. Combined with your having learned to select the right tool for the task, this book will make sure you get comfortable with all relevant machine learning fundamentals.

BookJul 2017382 pages

Hands-On Ensemble Learning with Python

Ensemble learning can provide the necessary methods to improve the accuracy and performance of existing models. In this book, you'll understand how to combine different machine learning algorithms to produce more accurate results from your models.

BookJul 2019298 pages

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

This book covers the theory and practice of building data-driven solutions. Includes the end-to-end process, using supervised and unsupervised algorithms. With each algorithm, you will learn the data acquisition and data engineering methods, the apt metrics, and the available hyper-parameters. You will learn how to deploy the models in production.

BookJul 2020384 pages

Machine Learning Fundamentals

As machine learning algorithms become popular, new tools that optimize these algorithms are also developed. Machine Learning Fundamentals explains the scikit-learn API, which is a package created to facilitate the process of building machine learning applications. By explaining the differences between supervised and unsupervised models and by applying some popular algorithms to real-life datasets, this course gives you the skills and confidence to start programming machine learning algorithms.

BookNov 2018240 pages

Big Data Analytics with Java

This book will start with the basic statistical analysis on big data using java and would then build on other topics on analytics like classification, regression, clustering and ensembling. It would also cover advanced topics of recommendation engines, massive graph analytics, real time analytics and deep learning.

BookJul 2017418 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages