You're reading from Amazon Redshift Cookbook

Product typeBook

Published inJul 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800569683

Edition1st Edition

Languages

Python

Tools

Amazon Redshift

Concepts

Data Analysis

Authors (3):

Shruti Worlikar

Thiyagarajan Arumugam

Harshida Patel

View More author details

Chapter 1: Getting Started with Amazon Redshift

Amazon Redshift is a fully managed data warehouse service in Amazon Web Services (AWS). You can query all your data, which can scale from gigabytes to petabytes, using SQL. Amazon Redshift integrates into the data lake solution though the lake house architecture, allowing you access all the structured and semi-structured data in one place. Each Amazon Redshift data warehouse is hosted as a cluster (a group of servers or nodes) that consists of one leader node and a collection of one or more compute nodes. Each cluster is a single tenant environment (which can be scaled to a multi-tenant architecture using data sharing), and every node has its own dedicated CPU, memory, and attached disk storage that varies based on the node's type.

This chapter will walk you through the process of creating a sample Amazon Redshift cluster and connecting to it from different clients.

The following recipes will be discussed in this chapter:

...

Technical requirements

The following are the technical requirements for this chapter:

An AWS account.
An AWS administrator should create an IAM user by following Recipe 1 – Creating an IAM user in the Appendix. This IAM user will be used to execute all the recipes in this chapter.
An AWS administrator should deploy the AWS CloudFormation template to attach the IAM policy to the IAM user, which will give them access to Amazon Redshift, Amazon SageMaker, Amazon EC2, AWS CloudFormation, and AWS Secrets Manager. The template is available here: https://github.com/PacktPublishing/Amazon-Redshift-Cookbook/blob/master/Chapter01/chapter_1_CFN.yaml.
Client tools such as SQL Workbench/J, an IDE, and a command-line tool.
You will need to authorize network access from servers or clients to access the Amazon Redshift cluster: https://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-authorize-cluster-access.html.
The code files for this chapter can be found here...

Creating an Amazon Redshift cluster using the AWS Console

The AWS Management Console allows you to interactively create an Amazon Redshift cluster via a browser-based user interface. It also recommends the right cluster configuration based on the size of your workload. Once the cluster has been created, you can use the Console to monitor the health of the cluster and diagnose query performance issues from a unified dashboard.

Getting ready

To complete this recipe, you will need the following:

A new or existing AWS Account. If new AWS accounts need to be created, go to https://portal.aws.amazon.com/billing/signup, enter the necessary information, and follow the steps on the site.
An IAM user with access to Amazon Redshift.

How to do it…

Follow these steps to create a cluster with minimal parameters:

Navigate to the AWS Management Console and select Amazon Redshift: https://console.aws.amazon.com/redshiftv2/.
Choose the AWS region (eu-west...

Creating an Amazon Redshift cluster using the AWS CLI

The AWS command-line interface (CLI) is a unified tool for managing your AWS services. You can use this tool on the command-line Terminal to invoke the creation of an Amazon Redshift cluster.

The command-line tool automates cluster creation and modification. For example, you can create a shell script that can create manual point in time snapshots for the cluster.

Getting ready

To complete this recipe, you will need to do the following:

Install and configure the AWS CLI based on your specific operating system at https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html and use the aws configure command to set up your AWS CLI installation, as explained here: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html.
Verify that the AWS CLI has been configured using the following command, which will list the configured values:

$ aws configure list
Name   &...

Creating an Amazon Redshift cluster using an AWS CloudFormation template

With an AWS CloudFormation template, you treat your infrastructure as code, which enables you to create an Amazon Redshift cluster using a json/yaml file. The declarative code in the file contains the steps to create the AWS resources, and it also enables easy automation and distribution. This template allows you to standardize the Amazon Redshift Cluster's creation to meet your organizational infrastructure and security standards. Furthermore, you can distribute them to different teams within your organization using the AWS service catalog for easy setup.

Getting ready

To complete this recipe, you will need to do the following:

Create an IAM user with access to AWS CloudFormation, Amazon EC2, and Amazon Redshift.

How to do it…

We will create a CloudFormation template to author the Amazon Redshift cluster infrastructure as code using the JSON-based template. Follow these steps...

Connecting to an Amazon Redshift cluster using the Query Editor

The Query Editor is a thin client browser-based interface available on the AWS Management Console for running SQL queries on Amazon Redshift clusters directly. Once you have created the cluster, you can use the Query Editor to jumpstart querying the cluster without needing to set up the JDBC/ODBC driver. This recipe will show you how get started with the Query Editor so that you can access your Redshift clusters.

The Query Editor allows you to do the following:

Explore the schema
Run multiple DDL and DML SQL commands
Run single/multiple select statements
View query execution details
Save a query
Download a query result set that's up to 100 MB in size in a .CSV, text, or HTML file

Getting ready

To complete this recipe, you will need do the following:

Create an IAM user with access to Amazon Redshift and AWS Secrets Manager.
Store the database credentials in Amazon...

Connecting to an Amazon Redshift cluster using the SQL Workbench/J client

There are multiple ways to connect to an Amazon Redshift cluster, but one of the most popular options is to connect using a UI-based tool. SQL Workbench/J is a free cross-platform SQL query tool that you can use to connect to your own local client.

Getting ready

To complete this recipe, you will need to do the following:

Create an Amazon Redshift cluster and the necessary login credentials (username and password).
Install SQL Workbench/J (https://www.sql-workbench.eu/manual/install.html).
Download Amazon Redshift Driver. Please check out Configuring a JDBC connection to download the latest driver version.
Modify the security group attached to the Amazon Redshift cluster to allow a connection from a local client.
Navigate to Amazon Redshift | Clusters | myredshiftcluster | General information to find the JDBC/ODBC URL for connecting to the Amazon Redshift cluster.

How to do...

Connecting to an Amazon Redshift Cluster using a Jupyter Notebook

Jupyter Notebooks is an interactive web application that enables you to analyze clusters interactively. Jupyter Notebooks applications are widely used by users such as business analysts, data scientists, and so on to perform data wrangling and exploration. Using a Jupyter Notebook, you can access all the historical data available in Amazon Redshift and combine it with the data that's available in the other sources, such as Amazon S3-based data lake. For example, you might want to build a forecasting model based on the historical sales data in Amazon Redshift, which will be combined with the clickstream data available in the data lake. Jupyter Notebooks are the tool of choice here due to the versatility they provide in terms of exploration tasks and the strong support from the open source community.

Getting ready

To complete this recipe, you will need to do the following:

Create an IAM user with access...

Connecting to an Amazon Redshift cluster using Python

Python is widely used for data analytics due to its simplicity and ease of use. In this recipe, we will use Python programming to connect using the Amazon Redshift Data API.

The Data API allows you to access Amazon Redshift without the need to use the JDBC or ODBC drivers. You can execute SQL commands on an Amazon Redshift cluster by invoking a secure API endpoint provided by the Data API. The Data API ensures that your SQL queries will be submitted asynchronously. You can now monitor the status of the query and retrieve your results later. The Data API is supported on all major programming languages, including Python, Go, Java, Node.js, PHP, Ruby, and C++, along with the AWS SDK.

Getting ready

To complete this recipe, you will need to do the following:

Create an IAM user with access to Amazon Redshift, Amazon Secrets Manager, and Amazon EC2.
Store the database credentials in Amazon Secrets Manager using Recipe...

Connecting to an Amazon Redshift cluster programmatically using Java

Java has been used for decades to build and orchestrate data pipeline tasks, ranging from cleaning and processing to data analysis. Java can programmatically access Amazon Redshift to build automated applications. In this recipe, we will use an AWS-provided Redshift JDBC driver in Java to connect to an Amazon Redshift cluster.

Getting ready

To complete this recipe, you will need to do the following:

Create an Amazon Redshift cluster and login credentials.
Install Java 8 and have an IDE to develop and run the code in. Alternatively, you can use AWS Cloud9. The AWS Cloud9 IDE offers a rich code editing experience and a runtime debugger with support for several programming languages. It also provides a built-in terminal. You can set up AWS Cloud9 for Java using the instructions provided at https://docs.aws.amazon.com/cloud9/latest/user-guide/sample-java.html.
Modify the security group that&apos...

Connecting to an Amazon Redshift cluster programmatically using .NET

.NET can connect to Amazon Redshift programmatically to build data-enabled applications such as business intelligence portals, share the data through an application interface, and more. In this recipe, we will install an AWS provided Amazon Redshift ODBC driver and connect to the database using .NET.

Getting ready

To complete this recipe, you will need to do the following:

Download and configure an Amazon Redshift ODBC driver for Windows using the details provided here: https://docs.aws.amazon.com/redshift/latest/mgmt/configure-odbc-connection.html#install-odbc-driver-windows.
Utilize Visual Studio IDE for .NET. You can do this from the AWS Cloud9 IDE, which offers a rich code editing experience and a runtime debugger that supports several programming languages. It also provides a built-in terminal. You can set up AWS Cloud9 for .NET core at https://docs.aws.amazon.com/cloud9/latest/user-guide/sample...

Connecting to an Amazon Redshift cluster using the command line

PSQL is a command-line frontend to PostgreSQL. It allows you to query the data interactively. In this recipe, we will learn how to install psql and run interactive queries.

Getting ready

To complete this recipe, you will need to do the following:

Install psql (this comes with PostgreSQL). To learn more about using psql, you can refer to https://www.postgresql.org/docs/8.4/static/app-psql.html. Based on your operating system, you can download the corresponding PostgreSQL binary from https://www.postgresql.org/download/.
If you are using a Windows OS, before running psql, you must set the PGCLIENTENCODING environment variable to UTF-8:

         set PGCLIENTENCODING=UTF8

Capture your Amazon Redshift cluster and login credentials.
Modify the security group attached to the Amazon Redshift cluster to allow connections from...

The rest of the chapter is locked

You have been reading a chapter from

Amazon Redshift Cookbook

Published in: Jul 2021Publisher: PacktISBN-13: 9781800569683

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Shruti Worlikar

Shruti Worlikar is a cloud professional with technical expertise in data lakes and analytics across cloud platforms. Her background has led her to become an expert in on-premises-to-cloud migrations and building cloud-based scalable analytics applications. Shruti earned her bachelor's degree in electronics and telecommunications from Mumbai University in 2009 and later earned her masters' degree in telecommunications and network management from Syracuse University in 2011. Her work history includes work at J.P. Morgan Chase, MicroStrategy, and Amazon Web Services (AWS). She is currently working in the role of Manager, Analytics Specialist SA at AWS, helping customers to solve real-world analytics business challenges with cloud solutions and working with service teams to deliver real value. Shruti is the DC Chapter Director for the non-profit Women in Big Data (WiBD) and engages with chapter members to build technical and business skills to support their career advancements. Originally from Mumbai, India, Shruti currently resides in Aldie, VA, with her husband and two kids.
Read more about Shruti Worlikar

Thiyagarajan Arumugam

Thiyagarajan Arumugam (Thiyagu) is a principal big data solution architect at AWS, architecting and building solutions at scale using big data to enable data-driven decisions. Prior to AWS, Thiyagu as a data engineer built big data solutions at Amazon, operating some of the largest data warehouses and migrating to and managing them. He has worked on automated data pipelines and built data lake-based platforms to manage data at scale for the customers of his data science and business analyst teams. Thiyagu is a certified AWS Solution Architect (Professional), earned his master's degree in mechanical engineering at the Indian Institute of Technology, Delhi, and is the author of several blog posts at AWS on big data. Thiyagu enjoys everything outdoors – running, cycling, ultimate frisbee – and is currently learning to play the Indian classical drum the mrudangam. Thiyagu currently resides in Austin, TX, with his wife and two kids.
Read more about Thiyagarajan Arumugam

Harshida Patel

Harshida Patel is a senior analytics specialist solution architect at AWS, enabling customers to build scalable data lake and data warehousing applications using AWS analytical services. She has presented Amazon Redshift deep-dive sessions at re:Invent. Harshida has a bachelor's degree in electronics engineering and a master's in electrical and telecommunication engineering. She has over 15 years of experience architecting and building end-to-end data pipelines in the data management space. In the past, Harshida has worked in the insurance and telecommunication industries. She enjoys traveling and spending quality time with friends and family, and she lives in Virginia with her husband and son.
Read more about Harshida Patel

Other recommended products

Related to this chapter

Serverless Architectures with AWS

Serverless Architectures with AWS teaches you how to build serverless applications on AWS—applications that do not require the developer to provision, scale, or manage any servers. Using an event-driven approach and AWS Lambda as the primary service, the book explains the many benefits of serverless architectures. By the end of the book, you will be ready to create and run your first serverless application that takes advantage of the high availability, security, performance, and scalability of AWS. With this new architecture, you will be able to focus on your product instead of worrying about managing and operating servers to run it.

BookDec 2018226 pages

Snowflake Cookbook

This book helps you to understand Snowflake's unique architecture and ecosystem that places it at the forefront of cloud data warehouses. The recipes present in this book will enable you to develop proficiency in managing data on Snowflake and learn Snowflake's novel features such as data sharing, cloning, and time travel.

BookFeb 2021330 pages

Effective Business Intelligence with QuickSight

BookMar 2017262 pages

Hands-On Serverless Computing

Serverless applications and architectures are gaining momentum and are increasingly being used by companies of all sizes to solve the problems of developers. This book teaches you how to quickly and securely develop applications without the hassle of configuring and maintaining infrastructure on three public cloud platforms.

BookJul 2018350 pages

AWS Administration - The Definitive Guide

AWS is at the forefront of Cloud Computing today, providing a plethora of ready to use services that help organizations quickly build, scale and deploy massive workloads on the Cloud. This book is specially designed for users who wish to explore and get started with some of the most commonly used AWS services in a quick and efficient way.

BookMar 2018358 pages

AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide

The AWS Certified Machine Learning Specialty 2020 Certification Guide covers everything you need to pass the MLS-C01 certification exam and serves as a handy, on-the-job reference guide. You'll also find the book useful if you're looking to get up to speed with AWS services for machine learning.

BookMar 2021338 pages

Scalable Data Streaming with Amazon Kinesis

This practical guide takes a hands-on approach to implementation and associated methodologies to have you up and running with all that Amazon Kinesis has to offer. You’ll work with use cases and practical examples to be able to ingest, process, analyze, and stream real-time data in no time.

BookMar 2021314 pages

Tableau 2019.x Cookbook

Explore recipes for analysis, visualization, and more with the Tableau 2019.x Cookbook. You’ll cover more than 115 recipes, discover best practices, and expert techniques as you progress through the book.

BookJan 2019670 pages

Infrastructure Monitoring with Amazon CloudWatch

Infrastructure Monitoring with Amazon CloudWatch makes it easy for anyone who is new to the cloud to understand the basic principles of monitoring. The book shows you how CloudWatch can and should be your go-to tool for monitoring and observability in AWS services such as EC2, ECS, EKS, ECR, and Kinesis.

BookApr 2021314 pages

AWS Certified Security – Specialty Exam Guide

Amazon has come up with Specialty certifications which validates a particular user's expertise that he/she would want to build a career in. This Guide will be a companion to getting skilled with complex and creative security solutions.

BookSep 2020558 pages

Effective Amazon Machine Learning

Amazon has strongly evolved as a strong leader in the arenas of enterprise AI and machine learning. Through this book you'll learn how Amazon Machine learning services makes it easy for developers to implement machine learning models without the need of creating lengthy lines of code for ML algorithms. You will learn to leverage Amazon's Web Service ecosystem for extended access to data sources; implement real- time predictions; and run Amazon Machine Learning projects via the command line and the python SDK.

BookApr 2017306 pages

Mastering AWS Security

Security is a key ingredient when it comes to workloads deployed in cloud. Security is highest priority for any organization and it is considered job zero at AWS. Our book will dig deep into the achieving end to end automated security for all workloads deployed, running and stored in AWS cloud.

BookOct 2017252 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages