Reader small image

You're reading from  Amazon Redshift Cookbook

Product typeBook
Published inJul 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800569683
Edition1st Edition
Languages
Right arrow
Authors (3):
Shruti Worlikar
Shruti Worlikar
author image
Shruti Worlikar

Shruti Worlikar is a cloud professional with technical expertise in data lakes and analytics across cloud platforms. Her background has led her to become an expert in on-premises-to-cloud migrations and building cloud-based scalable analytics applications. Shruti earned her bachelor's degree in electronics and telecommunications from Mumbai University in 2009 and later earned her masters' degree in telecommunications and network management from Syracuse University in 2011. Her work history includes work at J.P. Morgan Chase, MicroStrategy, and Amazon Web Services (AWS). She is currently working in the role of Manager, Analytics Specialist SA at AWS, helping customers to solve real-world analytics business challenges with cloud solutions and working with service teams to deliver real value. Shruti is the DC Chapter Director for the non-profit Women in Big Data (WiBD) and engages with chapter members to build technical and business skills to support their career advancements. Originally from Mumbai, India, Shruti currently resides in Aldie, VA, with her husband and two kids.
Read more about Shruti Worlikar

Thiyagarajan Arumugam
Thiyagarajan Arumugam
author image
Thiyagarajan Arumugam

Thiyagarajan Arumugam (Thiyagu) is a principal big data solution architect at AWS, architecting and building solutions at scale using big data to enable data-driven decisions. Prior to AWS, Thiyagu as a data engineer built big data solutions at Amazon, operating some of the largest data warehouses and migrating to and managing them. He has worked on automated data pipelines and built data lake-based platforms to manage data at scale for the customers of his data science and business analyst teams. Thiyagu is a certified AWS Solution Architect (Professional), earned his master's degree in mechanical engineering at the Indian Institute of Technology, Delhi, and is the author of several blog posts at AWS on big data. Thiyagu enjoys everything outdoors – running, cycling, ultimate frisbee – and is currently learning to play the Indian classical drum the mrudangam. Thiyagu currently resides in Austin, TX, with his wife and two kids.
Read more about Thiyagarajan Arumugam

Harshida Patel
Harshida Patel
author image
Harshida Patel

Harshida Patel is a senior analytics specialist solution architect at AWS, enabling customers to build scalable data lake and data warehousing applications using AWS analytical services. She has presented Amazon Redshift deep-dive sessions at re:Invent. Harshida has a bachelor's degree in electronics engineering and a master's in electrical and telecommunication engineering. She has over 15 years of experience architecting and building end-to-end data pipelines in the data management space. In the past, Harshida has worked in the insurance and telecommunication industries. She enjoys traveling and spending quality time with friends and family, and she lives in Virginia with her husband and son.
Read more about Harshida Patel

View More author details
Right arrow

Chapter 1: Getting Started with Amazon Redshift

Amazon Redshift is a fully managed data warehouse service in Amazon Web Services (AWS). You can query all your data, which can scale from gigabytes to petabytes, using SQL. Amazon Redshift integrates into the data lake solution though the lake house architecture, allowing you access all the structured and semi-structured data in one place. Each Amazon Redshift data warehouse is hosted as a cluster (a group of servers or nodes) that consists of one leader node and a collection of one or more compute nodes. Each cluster is a single tenant environment (which can be scaled to a multi-tenant architecture using data sharing), and every node has its own dedicated CPU, memory, and attached disk storage that varies based on the node's type.

This chapter will walk you through the process of creating a sample Amazon Redshift cluster and connecting to it from different clients.

The following recipes will be discussed in this chapter:

...

Technical requirements

The following are the technical requirements for this chapter:

Creating an Amazon Redshift cluster using the AWS Console

The AWS Management Console allows you to interactively create an Amazon Redshift cluster via a browser-based user interface. It also recommends the right cluster configuration based on the size of your workload. Once the cluster has been created, you can use the Console to monitor the health of the cluster and diagnose query performance issues from a unified dashboard.

Getting ready

To complete this recipe, you will need the following:

  • A new or existing AWS Account. If new AWS accounts need to be created, go to https://portal.aws.amazon.com/billing/signup, enter the necessary information, and follow the steps on the site.
  • An IAM user with access to Amazon Redshift.

How to do it…

Follow these steps to create a cluster with minimal parameters:

  1. Navigate to the AWS Management Console and select Amazon Redshift: https://console.aws.amazon.com/redshiftv2/.
  2. Choose the AWS region (eu-west...

Creating an Amazon Redshift cluster using the AWS CLI

The AWS command-line interface (CLI) is a unified tool for managing your AWS services. You can use this tool on the command-line Terminal to invoke the creation of an Amazon Redshift cluster.

The command-line tool automates cluster creation and modification. For example, you can create a shell script that can create manual point in time snapshots for the cluster.

Getting ready

To complete this recipe, you will need to do the following:

$ aws configure list
Name   &...

Creating an Amazon Redshift cluster using an AWS CloudFormation template

With an AWS CloudFormation template, you treat your infrastructure as code, which enables you to create an Amazon Redshift cluster using a json/yaml file. The declarative code in the file contains the steps to create the AWS resources, and it also enables easy automation and distribution. This template allows you to standardize the Amazon Redshift Cluster's creation to meet your organizational infrastructure and security standards. Furthermore, you can distribute them to different teams within your organization using the AWS service catalog for easy setup.

Getting ready

To complete this recipe, you will need to do the following:

  • Create an IAM user with access to AWS CloudFormation, Amazon EC2, and Amazon Redshift.

How to do it…

We will create a CloudFormation template to author the Amazon Redshift cluster infrastructure as code using the JSON-based template. Follow these steps...

Connecting to an Amazon Redshift cluster using the Query Editor

The Query Editor is a thin client browser-based interface available on the AWS Management Console for running SQL queries on Amazon Redshift clusters directly. Once you have created the cluster, you can use the Query Editor to jumpstart querying the cluster without needing to set up the JDBC/ODBC driver. This recipe will show you how get started with the Query Editor so that you can access your Redshift clusters.

The Query Editor allows you to do the following:

  • Explore the schema
  • Run multiple DDL and DML SQL commands
  • Run single/multiple select statements
  • View query execution details
  • Save a query
  • Download a query result set that's up to 100 MB in size in a .CSV, text, or HTML file

Getting ready

To complete this recipe, you will need do the following:

  • Create an IAM user with access to Amazon Redshift and AWS Secrets Manager.
  • Store the database credentials in Amazon...

Connecting to an Amazon Redshift cluster using the SQL Workbench/J client

There are multiple ways to connect to an Amazon Redshift cluster, but one of the most popular options is to connect using a UI-based tool. SQL Workbench/J is a free cross-platform SQL query tool that you can use to connect to your own local client.

Getting ready

To complete this recipe, you will need to do the following:

  • Create an Amazon Redshift cluster and the necessary login credentials (username and password).
  • Install SQL Workbench/J (https://www.sql-workbench.eu/manual/install.html).
  • Download Amazon Redshift Driver. Please check out Configuring a JDBC connection to download the latest driver version.
  • Modify the security group attached to the Amazon Redshift cluster to allow a connection from a local client.
  • Navigate to Amazon Redshift | Clusters | myredshiftcluster | General information to find the JDBC/ODBC URL for connecting to the Amazon Redshift cluster.

How to do...

Connecting to an Amazon Redshift Cluster using a Jupyter Notebook

Jupyter Notebooks is an interactive web application that enables you to analyze clusters interactively. Jupyter Notebooks applications are widely used by users such as business analysts, data scientists, and so on to perform data wrangling and exploration. Using a Jupyter Notebook, you can access all the historical data available in Amazon Redshift and combine it with the data that's available in the other sources, such as Amazon S3-based data lake. For example, you might want to build a forecasting model based on the historical sales data in Amazon Redshift, which will be combined with the clickstream data available in the data lake. Jupyter Notebooks are the tool of choice here due to the versatility they provide in terms of exploration tasks and the strong support from the open source community.

Getting ready

To complete this recipe, you will need to do the following:

  • Create an IAM user with access...

Connecting to an Amazon Redshift cluster using Python

Python is widely used for data analytics due to its simplicity and ease of use. In this recipe, we will use Python programming to connect using the Amazon Redshift Data API.

The Data API allows you to access Amazon Redshift without the need to use the JDBC or ODBC drivers. You can execute SQL commands on an Amazon Redshift cluster by invoking a secure API endpoint provided by the Data API. The Data API ensures that your SQL queries will be submitted asynchronously. You can now monitor the status of the query and retrieve your results later. The Data API is supported on all major programming languages, including Python, Go, Java, Node.js, PHP, Ruby, and C++, along with the AWS SDK.

Getting ready

To complete this recipe, you will need to do the following:

  • Create an IAM user with access to Amazon Redshift, Amazon Secrets Manager, and Amazon EC2.
  • Store the database credentials in Amazon Secrets Manager using Recipe...

Connecting to an Amazon Redshift cluster programmatically using Java

Java has been used for decades to build and orchestrate data pipeline tasks, ranging from cleaning and processing to data analysis. Java can programmatically access Amazon Redshift to build automated applications. In this recipe, we will use an AWS-provided Redshift JDBC driver in Java to connect to an Amazon Redshift cluster.

Getting ready

To complete this recipe, you will need to do the following:

  • Create an Amazon Redshift cluster and login credentials.
  • Install Java 8 and have an IDE to develop and run the code in. Alternatively, you can use AWS Cloud9. The AWS Cloud9 IDE offers a rich code editing experience and a runtime debugger with support for several programming languages. It also provides a built-in terminal. You can set up AWS Cloud9 for Java using the instructions provided at https://docs.aws.amazon.com/cloud9/latest/user-guide/sample-java.html.
  • Modify the security group that&apos...

Connecting to an Amazon Redshift cluster programmatically using .NET

.NET can connect to Amazon Redshift programmatically to build data-enabled applications such as business intelligence portals, share the data through an application interface, and more. In this recipe, we will install an AWS provided Amazon Redshift ODBC driver and connect to the database using .NET.

Getting ready

To complete this recipe, you will need to do the following:

Connecting to an Amazon Redshift cluster using the command line

PSQL is a command-line frontend to PostgreSQL. It allows you to query the data interactively. In this recipe, we will learn how to install psql and run interactive queries.

Getting ready

To complete this recipe, you will need to do the following:

         set PGCLIENTENCODING=UTF8
  • Capture your Amazon Redshift cluster and login credentials.
  • Modify the security group attached to the Amazon Redshift cluster to allow connections from...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Amazon Redshift Cookbook
Published in: Jul 2021Publisher: PacktISBN-13: 9781800569683
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Shruti Worlikar

Shruti Worlikar is a cloud professional with technical expertise in data lakes and analytics across cloud platforms. Her background has led her to become an expert in on-premises-to-cloud migrations and building cloud-based scalable analytics applications. Shruti earned her bachelor's degree in electronics and telecommunications from Mumbai University in 2009 and later earned her masters' degree in telecommunications and network management from Syracuse University in 2011. Her work history includes work at J.P. Morgan Chase, MicroStrategy, and Amazon Web Services (AWS). She is currently working in the role of Manager, Analytics Specialist SA at AWS, helping customers to solve real-world analytics business challenges with cloud solutions and working with service teams to deliver real value. Shruti is the DC Chapter Director for the non-profit Women in Big Data (WiBD) and engages with chapter members to build technical and business skills to support their career advancements. Originally from Mumbai, India, Shruti currently resides in Aldie, VA, with her husband and two kids.
Read more about Shruti Worlikar

author image
Thiyagarajan Arumugam

Thiyagarajan Arumugam (Thiyagu) is a principal big data solution architect at AWS, architecting and building solutions at scale using big data to enable data-driven decisions. Prior to AWS, Thiyagu as a data engineer built big data solutions at Amazon, operating some of the largest data warehouses and migrating to and managing them. He has worked on automated data pipelines and built data lake-based platforms to manage data at scale for the customers of his data science and business analyst teams. Thiyagu is a certified AWS Solution Architect (Professional), earned his master's degree in mechanical engineering at the Indian Institute of Technology, Delhi, and is the author of several blog posts at AWS on big data. Thiyagu enjoys everything outdoors – running, cycling, ultimate frisbee – and is currently learning to play the Indian classical drum the mrudangam. Thiyagu currently resides in Austin, TX, with his wife and two kids.
Read more about Thiyagarajan Arumugam

author image
Harshida Patel

Harshida Patel is a senior analytics specialist solution architect at AWS, enabling customers to build scalable data lake and data warehousing applications using AWS analytical services. She has presented Amazon Redshift deep-dive sessions at re:Invent. Harshida has a bachelor's degree in electronics engineering and a master's in electrical and telecommunication engineering. She has over 15 years of experience architecting and building end-to-end data pipelines in the data management space. In the past, Harshida has worked in the insurance and telecommunication industries. She enjoys traveling and spending quality time with friends and family, and she lives in Virginia with her husband and son.
Read more about Harshida Patel