Cataloging and ingesting data using AWS Glue
Data that is staged in Amazon S3 can be cataloged using the AWS Glue service. Cataloging the data allows attaching the metadata and populating the AWS Glue Data Catalog. This process enriches the raw data that can queried as tables using many of the AWS analytical services, such as Amazon Redshift or Amazon EMR, for the analytical processing. It is easy to perform this data discovery using the AWS Glue crawlers that can create and update the metadata automatically.
In this recipe, we will enrich the data to catalog and enable ingestion into an Amazon Redshift data warehouse (serverless or provisioned cluster).
Getting ready
To complete this recipe, you will need:
- An Amazon Redshift data warehouse deployed in the eu-west-1 AWS region.
- Amazon Redshift data warehouse admin user credentials.
- An IAM user with access to Amazon Redshift, Amazon S3, and AWS Glue.
- An IAM role attached to Amazon Redshift data...