Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Serverless Machine Learning with Amazon Redshift ML
Serverless Machine Learning with Amazon Redshift ML

Serverless Machine Learning with Amazon Redshift ML: Create, train, and deploy machine learning models using familiar SQL commands

By Debu Panda , Phil Bates , Bhanu Pittampally , Sumeet Joshi
€29.99 €20.98
Book Aug 2023 290 pages 1st Edition
eBook
€29.99 €20.98
Print
€37.99
Subscription
€14.99 Monthly
eBook
€29.99 €20.98
Print
€37.99
Subscription
€14.99 Monthly

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Aug 30, 2023
Length 290 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781804619285
Category :
Table of content icon View table of contents Preview book icon Preview Book

Serverless Machine Learning with Amazon Redshift ML

Introduction to Amazon Redshift Serverless

“Hey, what’s a data warehouse?” John Doe, CEO and co-founder of Red.wines, a fictional specialty wine e-commerce company, asked Tathya Vishleshak*, the company’s CTO. John, who owned a boutique winery, had teamed up with Tathya for the project. The company’s success surged during the pandemic, driven by social media and the stay-at-home trend. John wanted detailed data analysis to align inventory and customer outreach. However, there was a problem – producing this analysis was slowing down their online transaction processing (OLTP) database.

“A data warehouse is like a big database where we store different data for a long time to find insights and make decisions,” Tathya explained.

John had a concern, “Sounds expensive; we’re already paying for unused warehouse space. Can we afford it?”

Tathya reassured him, “You’re right, but there are cloud data warehouses such as Amazon Redshift Serverless that let you pay as you use.”

Expanding on this, this chapter introduces data warehousing and Amazon Redshift. We’ll cover Amazon Redshift Serverless basics, such as namespaces and workgroups, and guide you in creating a data warehouse. Amazon Redshift can gather data from various sources, mainly Amazon Simple Storage Service (S3).

As we go through this chapter, you’ll learn about a crucial aspect of this, the AWS Identity and Access Management (IAM) role, needed for loading data from S3. This role connects to your Serverless namespace for smooth data transfer. You’ll also learn how to load sample data and run queries using Amazon Redshift query editor. Our goal is to make it simple and actionable, so you’re confident in navigating this journey.

Tathya Vishleshak

The phrase 'Tathya Vishleshak' can be loosely interpreted to reflect the concept of a data analyst in Sanskrit/Hindi. However, it's important to note that this is not a precise or established translation, but rather an attempt to convey a similar meaning based on the individual meanings of the words 'Tathya' and 'Vishleshak' in Sanskrit.

Additionally, Amazon Redshift is used to analyze structured and unstructured data in data warehouses, operational databases, and data lakes. It’s employed for traditional data warehousing, business intelligence, real-time analytics, and machine learning/predictive analytics. Data analysts and developers use Redshift data with machine learning (ML) models for tasks such as predicting customer behavior. Amazon Redshift ML streamlines this process using familiar SQL commands.

The book delves into ML, explaining supervised and unsupervised training. You’ll learn about problem-solving with binary classification, multi-class classification, and regression using real-world examples. You’ll also discover how to create deep learning models and custom models with XGBoost, as well as use time series forecasting. The book also covers in-database and remote inferences using existing models, applying ML for predictive analytics, and operationalizing machine learning models.

The following topics will be covered in this chapter:

  • What is Amazon Redshift?
  • Getting started with Amazon Redshift Serverless
  • Connecting to your data warehouse

This chapter requires a web browser and access to an AWS account.

What is Amazon Redshift?

Organizations churn out vast troves of customer data along with insights into these customers’ interactions with the business. This data gets funneled into various applications and stashed away in disconnected systems. A conundrum arises when attempting to decipher these data silos – a formidable challenge that hampers the derivation of meaningful insights essential for organizational clarity. Adding to this complexity, security and performance considerations typically muzzle business analysts from accessing data within OLTP systems.

The hiccup is that intricate analytical queries weigh down OLTP databases, casting a shadow over their core operations. Here, the solution is the data warehouse, which is a central hub of curated data, used by business analysts and data scientists to make informed decisions by employing the business intelligence and machine learning tools at their disposal. These users make use of Structured Query Language (SQL) to derive insights from this data trove. From operational systems, application logs, and social media streams to the influx of IoT device-generated data, customers channel structured and semi-structured data into organizations’ data warehouses, as depicted in Figure 1.1, showcasing the classic architecture of a conventional data warehouse.

Figure 1.1 – Data warehouse

Figure 1.1 – Data warehouse

Here’s where Amazon Redshift Serverless comes in. It’s a key option within Amazon Redshift, a well-managed cloud data warehouse offered by Amazon Web Services (AWS). With cloud-based ease, Amazon Redshift Serverless lets you set up your data storage without infrastructure hassles or cost worries. You pay based on what you use for compute and storage.

Amazon Redshift Serverless goes beyond convenience, propelling modern data applications that seamlessly connect to the data lake. Enter the data lake – a structure that gathers all data strands under one roof, providing limitless space to store data at any scale, cost-effectively. Alongside other data repositories such as data warehouses, data lakes redefine how organizations handle data. And this is where it all comes together – the following diagram shows how Amazon Redshift Serverless injects SQL-powered queries into the data lake, driving a dynamic data flow:

Figure 1.2 – Data lake and data warehouse

Figure 1.2 – Data lake and data warehouse

So, let’s get started on creating our first data warehouse in the cloud!

Getting started with Amazon Redshift Serverless

You can create your data warehouse with Amazon Redshift Serverless using the AWS Command-Line Interface (CLI), the API, AWS CloudFormation templates, or the AWS console. We are going to use the AWS console to create a Redshift Serverless data warehouse. Log in to your AWS console and search for Redshift in the top bar, as shown in Figure 1.3:

Figure 1.3 – AWS console page showing services filtered by our search for Redshift

Figure 1.3 – AWS console page showing services filtered by our search for Redshift

Click on Amazon Redshift, which will take you to the home page for the Amazon Redshift console, as shown in Figure 1.4. To help get you started, Amazon provides free credit for first-time Redshift Serverless customers. So, let’s start creating your trial data warehouse by clicking on Try Amazon Redshift Serverless. If you or your organization has tried Amazon Redshift Serverless before, you will have to pay for the service based on your usage:

Figure 1.4 – Amazon Redshift service page in the AWS console

Figure 1.4 – Amazon Redshift service page in the AWS console

If you have free credit available, it will be indicated at the top of your screen, as in Figure 1.5:

Figure 1.5 – AWS console showing the Redshift Serverless Get started page

Figure 1.5 – AWS console showing the Redshift Serverless Get started page

You can either choose the defaults or use the customized settings to create your data warehouse. The customized settings give you more control, allowing you to specify many additional parameters for your compute configuration including the workgroup, data-related settings such as the namespace, and advanced security settings. We will use the customized settings, which will help us customize the namespace settings for our Serverless data warehouse. A namespace combined with a workgroup is what makes a data warehouse with Redshift Serverless, as we will now see in more detail.

What is a namespace?

Amazon Redshift Serverless provides a separation of storage and compute for a data warehouse. A namespace is a collection of all your data stored in the database such as your tables, views, database users, and their privileges. You are separately charged for storage based on the size of the data stored in your data warehouse. For compute, you are charged for the capacity used over a given duration in Redshift processing hours (RPU) on a per second-basis. The storage capacity is billed as Redshift managed storage (RMS) and is billed by GB/month. You can view https://aws.amazon.com/redshift/pricing/ for detailed pricing for your AWS Region.

As a data warehouse admin, you can change the name of your data warehouse namespace while creating the namespace. You can also change your encryption settings, audit logging, and AWS IAM permissions, as shown in Figure 1.6. The primary reason we are going to use customized settings is to associate an IAM role with the namespace:

Figure 1.6 – Namespace configuration

Figure 1.6 – Namespace configuration

AWS IAM allows you to specify which users or services can access other services and resources in AWS. We will use that role for loading data from S3 and training a machine learning model with Redshift ML that accesses Amazon SageMaker.

If you have already created an IAM role earlier, you can associate with that IAM role. If you have not created an IAM role, do so now by selecting the Manage IAM roles option, as shown in Figure 1.7:

Figure 1.7 – Creating an IAM role and associating it via the AWS console

Figure 1.7 – Creating an IAM role and associating it via the AWS console

Then, select the Create IAM role option, as shown in Figure 1.8:

Figure 1.8 – Selecting the “Create IAM role” option

Figure 1.8 – Selecting the “Create IAM role” option

You can then create a default IAM role and provide appropriate permissions to the IAM role to allow it to access S3 buckets, as shown in Figure 1.9:

Figure 1.9 – Granting S3 permissions to the IAM role

Figure 1.9 – Granting S3 permissions to the IAM role

As shown in the preceding figure, select Any S3 bucket to enable Redshift to read data from and write data to all S3 buckets you have created. Then, select Create IAM role as default to create the role and set it as the default IAM role.

Figure 1.10 – An IAM role was created but is not yet applied

Figure 1.10 – An IAM role was created but is not yet applied

As shown in Figure 1.10, we created the IAM role and associated it with the namespace as a default role. Let’s next proceed to create a workgroup, wherein we will set up the compute configuration for the data warehouse.

What is a workgroup?

As we discussed earlier, a namespace combined with a workgroup is what makes a Redshift Serverless data warehouse. A workgroup provides the compute resources required to process your data. It also provides the endpoint for you to connect to the warehouse. As an admin, you need to configure the compute settings such as the network and security configuration for the workgroup.

We will not do any customization at this time and simply select the default settings instead, including the VPC and associated subnets for the workgroup, as shown in the following screenshot:

Figure 1.11 – Default settings and associated subnets for the workgroup

Figure 1.11 – Default settings and associated subnets for the workgroup

Click on the Save configuration button to create your Redshift Serverless instance, and your first data warehouse will be ready in a few minutes:

Figure 1.12 – Redshift Serverless creation progress

Figure 1.12 – Redshift Serverless creation progress

Once your data warehouse is ready, you will be redirected to your Serverless dashboard, as shown in Figure 1.13:

Figure 1.13 – Serverless dashboard showing your namespace and workgroup

Figure 1.13 – Serverless dashboard showing your namespace and workgroup

Now that we have created our data warehouse, we will connect to the data warehouse, load some sample data, and run some queries.

Connecting to your data warehouse

Your data warehouse with Redshift Serverless is now ready. You can connect to your data warehouse using third-party tools via JDBC/ODBC/Python drivers. Other options include the Data API or the embedded Redshift query editor v2.

Using Amazon Redshift query editor v2

Now that your data warehouse is ready; let’s navigate to the query editor to load some sample data and run some queries. Select the Query data option from your dashboard, as shown in Figure 1.13, and you will be navigated to the query editor, as shown in Figure 1.14.

Figure 1.14 – Query editor

Figure 1.14 – Query editor

In the Redshift query editor v2 console, on the left pane, you will see the data warehouses, such as the Serverless:default workgroup, that you have access to. Click on the workgroup (Serverless:default) to connect to the data warehouse.

Figure 1.15 – Creating a connection to your workgroup

Figure 1.15 – Creating a connection to your workgroup

As shown in the preceding screenshot, select Federated user if you did not specify any database credentials while creating the namespace, and then click Create connection. You can leave the database name as dev. You will be prompted to create a connection only when connecting to the data warehouse for the first time. If you have created the connection, you will be connected automatically when you click on the workgroup. Once you are connected, you will see the databases in the navigator, as shown in Figure 1.16:

Figure 1.16 – List of databases

Figure 1.16 – List of databases

Since we just created our data warehouse for the first time, there is no data present in it, so let’s load some sample data into the data warehouse now.

Loading sample data

On the left pane, click on the sample_data_dev database to expand the available database:

Figure 1.17 – The Redshift query editor v2 navigator that shows the sample data available

Figure 1.17 – The Redshift query editor v2 navigator that shows the sample data available

As you can see from the preceding screenshot, three sample datasets are available for you to load into your data warehouse. Click on the icon showing the folder with an arrow located to the right of your chosen sample data notebook to load and open it, as shown in Figure 1.18:

Figure 1.18 – List of sample databases

Figure 1.18 – List of sample databases

You will be prompted to create your sample database. Click on Create to get started, as shown in Figure 1.19:

Figure 1.19 – Creating a sample database

Figure 1.19 – Creating a sample database

The sample data will be loaded in a few seconds and presented in a notebook with SQL queries for the dataset that you can explore, as shown in Figure 1.20:

Figure 1.20 – Notebook with sample queries for the tickit database

Figure 1.20 – Notebook with sample queries for the tickit database

You can expand the navigation tree on the left side of the query editor to view schemas and database objects, such as tables and views in your schema, as shown in Figure 1.21.

Figure 1.21 – Expanding the navigation tree to view schemas and database objects

Figure 1.21 – Expanding the navigation tree to view schemas and database objects

You can click on a table to view the table definitions, as shown in Figure 1.22:

Figure 1.22 – Table definitions

Figure 1.22 – Table definitions

Right-clicking on a table provides additional Select table, Show table definition, and Delete options, as shown in Figure 1.23:

Figure 1.23 – Right-clicking on a table to view more options

Figure 1.23 – Right-clicking on a table to view more options

You can click Run all, as shown in Figure 1.24, to run all the queries in the sample notebook. The query editor provides a notebook interface to add annotation, and SQL cells organize your queries in a single document. You can use annotations for documentation purposes.

Figure 1.24 – The “Run all” option

Figure 1.24 – The “Run all” option

You will see the results of your queries for each cell. You can download the results as JSON or CSV files to your desktop, as shown in Figure 1.25:

Figure 1.25 – Options to download query results

Figure 1.25 – Options to download query results

Let’s author our first query.

Running your first query

We want to find out the top 10 events by sales in the tickit database. We will run the following SQL statement in the data warehouse:

SELECT eventname, total_price
FROM  (SELECT eventid, total_price, ntile(1000) over(order by total_price desc) as percentile
       FROM (SELECT eventid, sum(pricepaid) total_price
             FROM   tickit.sales
             GROUP BY eventid)) Q, tickit.event E
       WHERE Q.eventid = E.eventid
       AND percentile = 1
ORDER BY total_price desc
limit 10;

In the query editor, add a new query by clicking on the + sign and selecting Editor from the menu that appears. If you wanted to create a new notebook, you could click on Notebook instead, as shown in Figure 1.26:

Figure 1.26 – Creating a new query

Figure 1.26 – Creating a new query

Now, type the preceding SQL query in the editor and then click on Run. You will get the results as shown in the following screenshot:

Figure 1.27 – Query with results

Figure 1.27 – Query with results

As the saying goes, “A picture is worth a thousand words,” and query editor allows you to visualize the results to gain faster insight. You can create a chart easily by clicking on the Chart option and then selecting the chart you want. Let’s select a scatter plot, as shown in Figure 1.28:

Figure 1.28 – Using charts in Redshift query editor v2

Figure 1.28 – Using charts in Redshift query editor v2

You can add a chart name and notations for the X and Y axes and export the chart as PNG or JPG to put in your presentation or to share with your business partners:

Figure 1.29 – Charting options in query editor v2

Figure 1.29 – Charting options in query editor v2

As you have now seen, you can use Redshift query editor v2 to create your own database, create tables, load data, and run and author queries and notebooks. You can share your queries and notebooks with your team members.

Summary

In this chapter, you learned about cloud data warehouses and Amazon Redshift Serverless. You created your first data warehouse powered by Redshift Serverless and loaded some sample data using the query editor. You also learned how to use the query editor to run queries and visualize data to produce insights.

In Chapter 2, you will learn the best techniques for loading data and performing analytics in your Amazon Redshift Serverless data warehouse.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Leverage supervised learning to build binary classification, multi-class classification, and regression models
  • Learn to use unsupervised learning using the K-means clustering method
  • Master the art of time series forecasting using Redshift ML
  • Purchase of the print or Kindle book includes a free PDF eBook

Description

Amazon Redshift Serverless enables organizations to run petabyte-scale cloud data warehouses quickly and in a cost-effective way, enabling data science professionals to efficiently deploy cloud data warehouses and leverage easy-to-use tools to train models and run predictions. This practical guide will help developers and data professionals working with Amazon Redshift data warehouses to put their SQL knowledge to work for training and deploying machine learning models. The book begins by helping you to explore the inner workings of Redshift Serverless as well as the foundations of data analytics and types of data machine learning. With the help of step-by-step explanations of essential concepts and practical examples, you’ll then learn to build your own classification and regression models. As you advance, you’ll find out how to deploy various types of machine learning projects using familiar SQL code, before delving into Redshift ML. In the concluding chapters, you’ll discover best practices for implementing serverless architecture with Redshift. By the end of this book, you’ll be able to configure and deploy Amazon Redshift Serverless, train and deploy machine learning models using Amazon Redshift ML, and run inference queries at scale.

What you will learn

Utilize Redshift Serverless for data ingestion, data analysis, and machine learning Create supervised and unsupervised models and learn how to supply your own custom parameters Discover how to use time series forecasting in your data warehouse Create a SageMaker endpoint and use that to build a Redshift ML model for remote inference Find out how to operationalize machine learning in your data warehouse Use model explainability and calculate probabilities with Amazon Redshift ML

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Aug 30, 2023
Length 290 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781804619285
Category :

Table of Contents

19 Chapters
Preface Chevron down icon Chevron up icon
Part 1:Redshift Overview: Getting Started with Redshift Serverless and an Introduction to Machine Learning Chevron down icon Chevron up icon
Chapter 1: Introduction to Amazon Redshift Serverless Chevron down icon Chevron up icon
Chapter 2: Data Loading and Analytics on Redshift Serverless Chevron down icon Chevron up icon
Chapter 3: Applying Machine Learning in Your Data Warehouse Chevron down icon Chevron up icon
Part 2:Getting Started with Redshift ML Chevron down icon Chevron up icon
Chapter 4: Leveraging Amazon Redshift ML Chevron down icon Chevron up icon
Chapter 5: Building Your First Machine Learning Model Chevron down icon Chevron up icon
Chapter 6: Building Classification Models Chevron down icon Chevron up icon
Chapter 7: Building Regression Models Chevron down icon Chevron up icon
Chapter 8: Building Unsupervised Models with K-Means Clustering Chevron down icon Chevron up icon
Part 3:Deploying Models with Redshift ML Chevron down icon Chevron up icon
Chapter 9: Deep Learning with Redshift ML Chevron down icon Chevron up icon
Chapter 10: Creating a Custom ML Model with XGBoost Chevron down icon Chevron up icon
Chapter 11: Bringing Your Own Models for Database Inference Chevron down icon Chevron up icon
Chapter 12: Time-Series Forecasting in Your Data Warehouse Chevron down icon Chevron up icon
Chapter 13: Operationalizing and Optimizing Amazon Redshift ML Models Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.