You're reading from The Applied Artificial Intelligence Workshop

Product typeBook

Published inJul 2020

Reading LevelIntermediate

PublisherPackt

ISBN-139781800205819

Edition1st Edition

Languages

Python

Tools

Jupyter

Concepts

Artificial Intelligence

Authors (3):

Anthony So

William So

Zsolt Nagy

View More author details

3. An Introduction to Classification

Overview

This chapter introduces you to classification. You will implement various techniques, such as k-nearest neighbors and SVMs. You will use the Euclidean and Manhattan distances to work with k-nearest neighbors. You will apply these concepts to solve intriguing problems such as predicting whether a credit card applicant has a risk of defaulting and determining whether an employee would stay with a company for more than two years. By the end of this chapter, you will be confident enough to work with any data using classification and come to a certain conclusion.

Introduction

In the previous chapter, you were introduced to regression models and learned how to fit a linear regression model with single or multiple variables, as well as with a higher-degree polynomial.

Unlike regression models, which focus on learning how to predict continuous numerical values (which can have an infinite number of values), classification, which will be introduced in this chapter, is all about splitting data into separate groups, also called classes.

For instance, a model can be trained to analyze emails and predict whether they are spam or not. In this case, the data is categorized into two possible groups (or classes). This type of classification is also called binary classification, which we will see a few examples of in this chapter. However, if there are more than two groups (or classes), you will be working on a multi-class classification (you will come across some examples of this in Chapter 4, An Introduction to Decision Trees).

But what is a...

The Fundamentals of Classification

As stated earlier, the goal of any classification problem is to separate the data into relevant groups accurately using a training set. There are a lot of applications of such projects in different industries, such as education, where a model can predict whether a student will pass or fail an exam, or healthcare, where a model can assess the level of severity of a given disease for each patient.

A classifier is a model that determines the label (output) or value (class) of any data point that it belongs to. For instance, suppose you have a set of observations that contains credit-worthy individuals, and another one that contains individuals that are risky in terms of their credit repayment tendencies.

Let's call the first group P and the second one Q. Here is an example of such data:

Figure 3.1: Sample dataset

With this data, you will train a classification model that will be able to correctly classify a new observation...

Data Preprocessing

Before building a classifier, we need to format our data so that we can keep relevant data in the most suitable format for classification and remove all the data that we are not interested in.

The following points are the best ways to achieve this:

Replacing or dropping values:
For instance, if there are N/A (or NA) values in the dataset, we may be better off substituting these values with a numeric value we can handle. Recall from the previous chapter that NA stands for Not Available and that it represents a missing value. We may choose to ignore rows with NA values or replace them with an outlier value.
Note
An outlier value is a value such as -1,000,000 that clearly stands out from regular values in the dataset.
The fillna() method of a DataFrame does this type of replacement. The replacement of NA values with an outlier looks as follows:
```
df.fillna(-1000000, inplace=True)
```
The fillna() method changes all NA values into numeric values.
This numeric value...

The K-Nearest Neighbors Classifier

Now that we have our training and testing data, it is time to prepare our classifier to perform k-nearest neighbor classification. After being introduced to the k-nearest neighbor algorithm, we will use scikit-learn to perform classification.

Introducing the K-Nearest Neighbors Algorithm (KNN)

The goal of classification algorithms is to divide data so that we can determine which data points belong to which group.

Suppose that a set of classified points is given to us. Our task is to determine which class a new data point belongs to.

In order to train a k-nearest neighbor classifier (also referred to as KNN), we need to provide the corresponding class for each observation on the training set, that is, which group it belongs to. The goal of the algorithm is to find the relevant relationship or patterns between the features that will lead to this class. The k-nearest neighbors algorithm is based on a proximity measure that calculates the...

Classification with Support Vector Machines

We first used SVMs for regression in Chapter 2, An Introduction to Regression. In this topic, you will find out how to use SVMs for classification. As always, we will use scikit-learn to run our examples in practice.

What Are Support Vector Machine Classifiers?

The goal of an SVM is to find a surface in an n-dimensional space that separates the data points in that space into multiple classes.

In two dimensions, this surface is often a straight line. However, in three dimensions, the SVM often finds a plane. These surfaces are optimal in the sense that they are based on the information available to the machine so that it can optimize the separation of the n-dimensional spaces.

The optimal separator found by the SVM is called the best separating hyperplane.

An SVM is used to find one surface that separates two sets of data points. In other words, SVMs are binary classifiers. This does not mean that SVMs can only be used for binary...

Summary

In this chapter, we learned about the basics of classification and the difference between regression problems. Classification is about predicting a response variable with limited possible values. As for any data science project, data scientists need to prepare the data before training a model. In this chapter, we learned how to standardize numerical values and replace missing values. Then, you were introduced to the famous k-nearest neighbors algorithm and discovered how it uses distance metrics to find the closest neighbors to a data point and then assigns the most frequent class among them. We also learned how to apply an SVM to a classification problem and tune some of its hyperparameters to improve the performance of the model and reduce overfitting.

In the next chapter, we will walk you through a different type of algorithm, called decision trees.

The rest of the chapter is locked

You have been reading a chapter from

The Applied Artificial Intelligence Workshop

Published in: Jul 2020Publisher: PacktISBN-13: 9781800205819

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Anthony So

Anthony So is a renowned leader in data science. He has extensive experience in solving complex business problems using advanced analytics and AI in different industries including financial services, media, and telecommunications. He is currently the chief data officer of one of the most innovative fintech start-ups. He is also the author of several best-selling books on data science, machine learning, and deep learning. He has won multiple prizes at several hackathon competitions, such as Unearthed, GovHack, and Pepper Money. Anthony holds two master's degrees, one in computer science and the other in data science and innovation.
Read more about Anthony So

William So

William So is a Data Scientist with both a strong academic background and extensive professional experience. He is currently the Head of Data Science at Douugh and also a Lecturer for Master of Data Science and Innovation at the University of Technology Sydney. During his career, he successfully covered the end-end spectrum of data analytics from ML to Business Intelligence helping stakeholders derive valuable insights and achieve amazing results that benefits the business. William is a co-author of the "The Applied Artificial Intelligence Workshop" published by Packt.
Read more about William So

Zsolt Nagy

Zsolt Nagy is an engineering manager in an ad tech company heavy on data science. After acquiring his MSc in inference on ontologies, he used AI mainly for analyzing online poker strategies to aid professional poker players in decision making. After the poker boom ended, he put extra effort into building a T-shaped profile in leadership and software engineering.
Read more about Zsolt Nagy

Other recommended products

Related to this chapter

Artificial Intelligence and Machine Learning Fundamentals

Artificial Intelligence and Machine Learning Fundamentals teaches you machine learning and neural networks from the ground up using real-world examples. After you complete this book, you will be excited to revamp your current projects or build new intelligent networks.

BookDec 2018330 pages

The Deep Learning Workshop

With The Deep Learning Workshop, you’ll learn about essential deep learning concepts, such as image recognition, text embedding, and neural networks, all so that you can build your own smart machine learning models. You'll be able to learn at your own pace with the help of interesting activities and hands-on exercises that will keep you hooked throughout the book.

BookJul 2020474 pages

Machine Learning with scikit-learn Quick Start Guide

Scikit-learn is a robust machine learning library for the Python programming language. It provides a set of supervised and unsupervised learning algorithms. This book is the easiest way to learn how to deploy, optimize and evaluate all the important machine learning algorithms that scikit-learn provides.

BookOct 2018172 pages

Artificial Intelligence with Python

Build real-world artificial intelligence apps to intelligently interact with the world around you, explore real-world scenarios, and discover the various algorithms that can be used to build AI applications. Packed with insightful examples and topics such as predictive analytics and deep learning, this book is a must-have for Python developers.

BookJan 2017446 pages

The Data Science Workshop

The Data Science Workshop equips you with the basic skills you need to start working on a variety of data science projects. You’ll work through the essential building blocks of a data science project gradually through the book, and then put all the pieces together to consolidate your knowledge and apply your learnings in the real world.

BookAug 2020824 pages5

The Data Science Workshop

Cut through the noise and get real results with a step-by-step approach to data science

BookJan 2020818 pages

The Machine Learning Workshop

With expert guidance and real-world examples, The Machine Learning Workshop gets you up and running with programming machine learning algorithms. By showing you how to leverage scikit-learn's flexibility, it teaches you all the skills you need to use machine learning to solve real-world problems.

BookJul 2020286 pages

Python Data Mining Quick Start Guide

This book is an introduction to data mining and its practical demonstration of working with real-world data sets. With this book, you will be able to extract useful insights using common Python libraries. You will also learn key stages like data loading, cleaning, analysis, visualization to build an efficient data mining pipeline.

BookApr 2019188 pages

The Applied TensorFlow and Keras Workshop

The Applied TensorFlow and Keras Workshop provides you with a blueprint to build an application that generates predictions using a deep learning model. You’ll learn to apply techniques to improve the model: add more data and features, change its architecture, or create a new model by changing the core components to meet your own requirements.

BookJul 2020174 pages

Data Science for Marketing Analytics

Data Science for Marketing Analytics opens doors to looking at data with a different approach and new tools. Drawing on machine learning and data science concepts, this book broadens the range of tools that you can use to transform the market analysis process.

BookMar 2019420 pages

Mastering Machine Learning with scikit-learn

This book examines machine learning models including k-nearest neighbors, logistic regression, naive Bayes, random forests, and support vector machines. You will work through document classification, image recognition, and other example problems.

BookJul 2017254 pages

The Applied Data Science Workshop

The Applied Data Science Workshop explores the key elements and interesting applications of data science techniques with the help of practical examples and interactive exercises. Following a hands-on approach, it allows you the freedom of analyzing data in the Jupyter Notebook effectively using many diverse open-source Python libraries.??

BookJul 2020352 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages