You're reading from Building Data Science Solutions with Anaconda

Product typeBook

Published inMay 2022

PublisherPackt

ISBN-139781800568785

Edition1st Edition

Tools

Anaconda

Concepts

Data Science

Author (1)

Dan Meador

Chapter 7: Choosing the Best AI Algorithm

If the field of artificial intelligence and machine learning (commonly referred to as AI/ML) is a car, then the model is the engine. While there are other parts that are critical for its operation, no other aspect gets as much focus and attention. This is for good reason. In the end, the model is the core object that determines whether your outcome is accurate or not, and is the most important artifact from that entire data science workflow.

Which modeling approach is best? That's easy, it depends. For the same reason all cars don't have the same engine, there are many different aspects that go into the best approach to use.

Ask yourself, What problem am I trying to solve? In this chapter, we are going to start with that question, and from there lead you to the modeling approach that would best suit your situation. We'll take a look at the problem type with an example for each algorithm, and look at some of the most widely...

Technical requirements

All the required libraries can be installed easily with conda, which comes with the Anaconda distribution. The content in this chapter requires the following tools:

The Anaconda distribution (this includes conda and Navigator)
Python 3.8+ (this is included with the Anaconda distribution)
pandas 1.3+
Matplotlib 3.4+
Jupyter notebooks 6.4+

Now that the setup is ready, let's dive into the chapter!

Defining your problem

Many times, you'll see AI books and blogs talking about the distinct types of AI problems falling into the following categories:

Supervised
Unsupervised
Semi-supervised
Reinforcement

We did the same thing back in Chapter 1, Understanding the AI/ML Landscape, and you can find a flowchart of how to decide what category your situation falls into in Figure 7.1:

Figure 7.1 – Dataset heuristics for choosing your AI family

This is a good idea, but when you are starting with a problem, you aren't always thinking about it in terms of the problem type but more in terms of what solution you are trying to figure out.

We'll look at a few different and very common problem types in the following sections. They do not encompass every problem family that you will come across, but they will serve many of them.

Model problem types

The following are the four core problem types that we'll focus...

Understanding regression problems with examples

Figuring out the price of a stock, what your house should be worth, and the future temperature of the Earth all have one thing in common: they all can be thought of as regression problems. It's simply the goal of figuring out what a number would be, given a set of independent variables.

A few more examples that fall into this problem type are as follows:

Price of a car
Sales forecast for next year
Number of people who will sign up for a promotion

When you see a problem like this, you can try a few different models. There are many specific algorithms that you can use, each with its own pros and cons. Let's look at a few of these algorithms in the next section.

The following are a few of the most common regression algorithms you'll want to try. For each of these algorithms, we're going to take an example and create a regression model:

Linear regression
Random forest
Support...

Classification

Being able to put things into certain classes might be the most common type of ML application that you see in the world, and has been a staple of the industry for a long time.

There are two main types of classification: binary classification and multi-class classification. As the names indicate, binary classification is when the outcome only has two possible options. It's very common to have a true or false outcome in this setup.

Multi-class classification is when there are more than two possible classes. This could be for a variety of scenarios, such as movie genre. The approaches taken for them are very similar to a binary classification problem.

Let's check out some examples that might help you get a better grasp on problems that fall into the classification bucket:

Whether emails are spam or not (binary)

Whether you would survive the Titanic sinking (binary)
Identifying the type of flower (multi-class)
Labeling handwritten...

Anomaly detection

If you've ever gotten a text saying that your bank has noticed some suspicious activity, chances are they have put anomaly detection to use. Anomaly detection is the attempt to determine whether an event, item, or object doesn't fit in with the others. One of these things is not like the other is a good way to think about it. Another name you might see for this is outlier detection.

You will find unsupervised, supervised, and semi-supervised approaches can all work in these scenarios. A depiction of what this looks like can be found in Figure 1.4 of Chapter 1, Understanding the AI/ML Landscape.

Many of the examples in this space handle more serious issues around security and safety. You'll find some examples in the following list:

Credit card fraud
If someone is trying to hack your account via random logins
Unsafe operations at a power plant
Customer buying patterns
Illegal trading activity on a stock

There are...

Clustering problems

In addition to anomaly detection, there is another class of problem that takes an unsupervised approach to trying to group entities together in order to understand more about the dataset. Clustering is the process of finding elements of a dataset that contain enough similar attributes that you can determine clear distinctions from among the individual points.

There are many applications of this technique, and we'll go over the following few examples now:

Grouping segments of a customer base
Knowing which emails are promotions and which are more important

To achieve this, we can use a few different algorithms such as the following:

DBScan
K-Means clustering

While there are many more, you can be sure that these have shown promising results across various datasets and are a great place to start.

Let's look at DBscan first.

DBScan

Density-Based Spatial Clustering of Applications with Noise (or DBScan for...

Summary

In this chapter, we have discussed how starting from the problem itself is much more valuable than beginning from a technique to use. Depending on what we need to achieve, we can look at different model approaches that will help us solve the problem we need to.

We learned that classification problems are useful when we want to put elements into categories, and some approaches such as linear regression and random forest allow you create models that achieve this. We also saw how scikit-learn lets you get to a solution with very few lines of code.

We also looked at regression for predicting values, clustering to group entities into similar buckets, and anomaly detection to find elements that don't belong with others. Similar to classification, we saw how with scikit-learn, you can get going quickly. Matplotlib also comes in handy to plot out the problem in order to give you a visual representation of what the predictions look like.

All of the models built in this...

The rest of the chapter is locked

You have been reading a chapter from

Building Data Science Solutions with Anaconda

Published in: May 2022Publisher: PacktISBN-13: 9781800568785

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan Meador

Dan Meador is an Engineering Manager at Anaconda and is the creator of Conda as well as a champion of open source at Anaconda. With a history of engineering and client facing roles, he has the ability to jump into any position. He has a track record of delivering as a leader and a follower in companies from the Fortune 10 to startups.
Read more about Dan Meador

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages