You're reading from Serverless Machine Learning with Amazon Redshift ML

Product typeBook

Published inAug 2023

Reading LevelBeginner

PublisherPackt

ISBN-139781804619285

Edition1st Edition

Languages

Python

Tools

Amazon Redshift

Concepts

Machine Learning

Authors (4):

Debu Panda

Phil Bates

Bhanu Pittampally

Sumeet Joshi

View More author details

Creating a Custom ML Model with XGBoost

So far, all of the supervised learning models we have explored have utilized the Amazon Redshift Auto ML feature, which uses Amazon SageMaker Autopilot behind the scenes. In this chapter, we will explore how to create custom machine learning (ML) models. Training a custom model gives you the flexibility to choose the model type and the hyperparameters to use. This chapter will provide examples of this modeling technique. By the end of this chapter, you will know how to create a custom XGBoost model and how to prepare the data to train your model using Redshift SQL.

In this chapter, we will go through the following main topics:

Introducing XGBoost
Introducing an XGBoost use case
XGBoost model with Auto off feature

Technical requirements

This chapter requires a web browser and access to the following:

An AWS account
An Amazon Redshift Serverless endpoint
Amazon Redshift Query Editor v2

You can find the code used in this chapter here:

https://github.com/PacktPublishing/Serverless-Machine-Learning-with-Amazon-Redshift/blob/main/CodeFiles/chapter10/chapter10.sql

Introducing XGBoost

XGBoost gets its name because it is built on the Gradient Boosting framework. Using a tree-boosting technique provides a fast method for solving ML problems. As you have seen in previous chapters, you can specify the model type, which can help speed up model training since SageMaker Autopilot does not have to determine which model type to use.

You can learn more about XGBoost here: https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html.

When you create a model with Redshift ML and specify XGBoost as the model type, and optionally specify AUTO OFF, this turns off SageMaker Autopilot and you have more control of model tuning. For example, you can specify the hyperparameters you wish to use. You will see an example of this in the Creating a binary classification model using XGBoost section.

You will have to perform preprocessing when you set AUTO to OFF. Carrying out the preprocessing ensures we will get the best possible model and is also necessary...

Introducing an XGBoost use case

In this section, we will be discussing a use case where we want to predict whether credit card transactions are fraudulent. We will be going through the following steps:

Defining the business problem
Uploading, analyzing, and preparing data for training
Splitting data into training and testing datasets
Preprocessing the input variables

Defining the business problem

In this section, we will use a credit card payment transaction dataset to build a binary classification model using XGBoost in Redshift ML. This dataset contains customer and terminal information along with the date and amount related to the transaction. This dataset also has some derived fields based on recency, frequency, and monetary numeric features, along with a few categorical variables, such as whether a transaction occurred during the weekend or at night. Our goal is to identify whether a transaction is fraudulent or non-fraudulent. This use case is taken...

Creating a model using XGBoost with Auto Off

In this exercise, we are going to create a custom binary classification model using the XGBoost algorithm. You can achieve this by setting AUTO off. Here are the parameters that are available:

AUTO OFF
MODEL_TYPE
OBJECTIVE
HYPERPARAMETERS

For the complete list of hyperparameter values that are available and their defaults, please read the documentation found here:

https://docs.aws.amazon.com/redshift/latest/dg/r_create_model_use_cases.html#r_auto_off_create_model

Now that you have a basic understanding of the parameters available with XGBoost, you can create the model.

Creating a binary classification model using XGBoost

Let’s create a model to predict whether a transaction is fraudulent or non-fraudulent. As you learned in the previous chapters, creating models with Amazon Redshift ML is simply done by running a SQL command that creates a function. As inputs (or features), you will be using...

Summary

In this chapter, you learned what XGBoost is and how to apply it to a business problem. You learned how to specify your own hyperparameters when using the Auto Off option and how to specify the objective for a binary classification problem. Additionally, you learned how to do your own data preprocessing and calculate the F1 score to validate the model performance.

In the next chapter, you will learn how to bring your own models from Amazon SageMaker for in-database or remote inference.

The rest of the chapter is locked

You have been reading a chapter from

Serverless Machine Learning with Amazon Redshift ML

Published in: Aug 2023Publisher: PacktISBN-13: 9781804619285

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (4)

Debu Panda

Debu Panda, a Senior Manager, Product Management at AWS, is an industry leader in analytics, application platform, and database technologies, and has more than 25 years of experience in the IT world. Debu has published numerous articles on analytics, enterprise Java, and databases and has presented at multiple conferences such as re:Invent, Oracle Open World, and Java One. He is lead author of the EJB 3 in Action (Manning Publications 2007, 2014) and Middleware Management (Packt, 2009).
Read more about Debu Panda

Phil Bates

Phil Bates is a Senior Analytics Specialist Solutions Architect at AWS. He has more than 25 years of experience implementing large-scale data warehouse solutions. He is passionate about helping customers through their cloud journey and leveraging the power of ML within their data warehouse.
Read more about Phil Bates

Bhanu Pittampally

Bhanu Pittampally is Analytics Specialist Solutions Architect at Amazon Web Services. His background is in data and analytics and is in the field for over 16 years. He currently lives in Frisco, TX with his wife Kavitha and daughters Vibha and Medha.
Read more about Bhanu Pittampally

Sumeet Joshi

Sumeet Joshi is an Analytics Specialist Solutions Architect based out of New York. He specializes in building large-scale data warehousing solutions. He has over 17 years of experience in the data warehousing and analytical space.
Read more about Sumeet Joshi

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages