Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Amazon Redshift Cookbook

You're reading from   Amazon Redshift Cookbook Recipes for building modern data warehousing solutions

Arrow left icon
Product type Paperback
Published in Apr 2025
Publisher Packt
ISBN-13 9781836206910
Length 468 pages
Edition 2nd Edition
Arrow right icon
Authors (3):
Arrow left icon
Shruti Worlikar Shruti Worlikar
Author Profile Icon Shruti Worlikar
Shruti Worlikar
 Patel Patel
Author Profile Icon Patel
Patel
Anusha Challa Anusha Challa
Author Profile Icon Anusha Challa
Anusha Challa
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Getting Started with Amazon Redshift FREE CHAPTER 2. Data Management 3. Loading and Unloading Data 4. Zero-ETL Ingestions 5. Scalable Data Orchestration for Automation 6. Platform Authorization and Security 7. Data Authorization and Security 8. Performance Optimization 9. Cost Optimization 10. Lakehouse Architecture 11. Data Sharing with Amazon Redshift 12. Generative AI and ML with Amazon Redshift 13. Other Books You May Enjoy
14. Index
Appendix

Connecting to Amazon Redshift using Jupyter Notebook

The Jupyter Notebook is an interactive web application that enables you to analyze your data interactively. Jupyter Notebook is widely used by users such as business analysts and data scientists to perform data wrangling and exploration. Using Jupyter Notebook, you can access all the historical data available in an Amazon Redshift data warehouse (serverless or provisioned cluster) and combine that with data in many other sources, such as an Amazon S3 data lake. For example, you might want to build a forecasting model based on historical sales data in Amazon Redshift combined with clickstream data available in the data lake. Jupyter Notebook is the tool of choice due to the versatility it provides with exploration tasks and the strong support from the open source community. This recipe covers the steps to connect to an Amazon Redshift data warehouse using Jupyter Notebook.

Getting ready

To complete this recipe, you will need:

How to do it…

The following steps will help you connect to an Amazon Redshift cluster using an Amazon SageMaker notebook:

  1. Open the AWS Console and navigate to the Amazon SageMaker service.
  2. Navigate to your notebook instance and open JupyterLab. When using the Amazon SageMaker notebook, find the notebook instance that was launched and click on the Open JupyterLab link, as shown in the following screenshot:
Figure 1.12 – Navigating to JupyterLab using the AWS Console

Figure 1.12 – Navigating to JupyterLab using the AWS Console

  1. Now, let’s install the Python driver libraries to connect to Amazon Redshift using the following code in the Jupyter Notebook. Set the kernel as conda_python3:
    !pip install psycopg2-binary
    ### boto3 is optional, but recommended to leverage the AWS Secrets Manager storing the credentials  Establishing a Redshift Connection
    !pip install boto3
    

    Important Note

    You can connect to an Amazon Redshift cluster using Python libraries such as Psycopg (https://pypi.org/project/psycopg2-binary/) or pg (https://www.postgresql.org/docs/7.3/pygresql.html) to connect to the Notebook. Alternatively, you can also use a JDBC, but for ease of scripting with Python, the following recipes will use either of the preceding libraries.

  1. Grant the Amazon SageMaker instance permission to use the stored secret. On the AWS Secrets Manager console, click on your secret and find the Secret ARN. Replace the ARN information in the resource section with the following JSON code:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "secretsmanager:GetResourcePolicy",
            "secretsmanager:GetSecretValue",
            "secretsmanager:DescribeSecret",
            "secretsmanager:ListSecretVersionIds"
          ],
          "Resource": [
            "arn:aws:secretsmanager:eu-west-1:123456789012:secret:aes128-1a2b3c"
          ]
        }
      ]
    }
    
  2. Now, attach this policy as an inline policy to the execution role for your SageMaker notebook instance. To do this, follow these steps:
    1. Navigate to the Amazon SageMaker (https://us-west-2.console.aws.amazon.com/sagemaker/) console.
    2. Select Notebook Instances.
    3. Click on your notebook instance (the one running this notebook, most likely).
    4. Under Permissions and Encryption, click on the IAM role link.
    5. You should now be on an IAM console that allows you to Add inline policy. Click on the link.
    6. On the Create Policy page that opens, click JSON and replace the JSON lines that appear with the preceding code block.
    7. Click Review Policy.
    8. On the next page select a human-friendly name for the policy and click Create policy.
  3. Finally, paste the ARN for your secret in the following code block of the Jupyter Notebook to connect to the Amazon Redshift cluster:
    # Put the ARN of your AWS Secrets Manager secret for your redshift cluster here:
    secret_arn="arn:aws:secretsmanager:eu-west-1:123456789012:secret:aes128-1a2b3c"
    # This will get the secret from AWS Secrets Manager.
    import boto3
    import json
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager'
    )
    get_secret_value_response = client.get_secret_value(
        SecretId=secret_arn
    )
    if 'SecretString' in get_secret_value_response:
        connection_info = json.loads(get_secret_value_response['SecretString'])
    else:
        print("ERROR: no secret data found")
    # Sanity check for credentials
    expected_keys = set(['user', 'password', 'host', 'database', 'port'])
    if not expected_keys.issubset(connection_info.keys()):
        print("Expected values for ",expected_keys)
        print("Received values for ",set(connection_info.keys()))
        print("Please adjust query or assignment as required!")
    # jdbc:redshift://HOST:PORT/DBNAME
    import time
    import psycopg2
    database = "dev"
    con=psycopg2.connect(
        dbname   = database,
        host     = connection_info["host"],
        port     = connection_info["port"],
        user     = connection_info["username"],
        password = connection_info["password"]
    )
    
  4. Run basic queries against the database. These queries make use of the cursor class to execute a basic query in Amazon Redshift:
    cur = con.cursor()
    cur.execute("SELECT sysdate")
    res = cur.fetchall()
    print(res)
    cur.close()
    
  5. Optionally, you can use the code here to connect to Amazon Redshift using Amazon SageMaker notebook: https://github.com/PacktPublishing/Amazon-Redshift-Cookbook-2E/blob/main/Chapter01/Connecting_to_AmazonRedshift_using_JupyterNotebook.ipynb.
Visually different images
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Amazon Redshift Cookbook
You have been reading a chapter from
Amazon Redshift Cookbook - Second Edition
Published in: Apr 2025
Publisher: Packt
ISBN-13: 9781836206910
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Modal Close icon
Modal Close icon