Reader small image

You're reading from  Simplify Big Data Analytics with Amazon EMR

Product typeBook
Published inMar 2022
PublisherPackt
ISBN-139781801071079
Edition1st Edition
Tools
Concepts
Right arrow
Author (1)
Sakti Mishra
Sakti Mishra
author image
Sakti Mishra

Sakti Mishra is an engineer, architect, author, and technology leader with over 16 years of experience in the IT industry. He is currently working as a senior data lab architect at Amazon Web Services (AWS). He is passionate about technologies and has expertise in big data, analytics, machine learning, artificial intelligence, graph networks, web/mobile applications, and cloud technologies such as AWS and Google Cloud Platform. Sakti has a bachelor’s degree in engineering and a master’s degree in business administration. He holds several certifications in Hadoop, Spark, AWS, and Google Cloud. He is also an author of multiple technology blogs, workshops, white papers and is a public speaker who represents AWS in various domains and events.
Read more about Sakti Mishra

Right arrow

Validating the output using Amazon Athena

The Parquet format data is already available in Amazon S3 with year and month partition, but to make it more consumable for data analysts or data scientists, it would be great if we could enable querying the data through SQL by making it available as a database table.

To make that integration, we can follow a two-step approach:

  1. We can run the Glue crawler to create a Glue Data Catalog table on top of the S3 data.
  2. We can run a query in Athena to validate the output.

Let's see how you can integrate that.

Defining a virtual Glue Data Catalog table on top of Amazon S3 data

You can follow these steps to create and run the Glue crawler, which will create a Glue Data Catalog table:

  1. Navigate to the AWS Glue crawler at https://console.aws.amazon.com/glue/home?region=us-east-1#catalog:tab=crawlers.
  2. Then click Add crawler, which will open up the form to configure the crawler.
  3. Configure the crawler...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Simplify Big Data Analytics with Amazon EMR
Published in: Mar 2022Publisher: PacktISBN-13: 9781801071079

Author (1)

author image
Sakti Mishra

Sakti Mishra is an engineer, architect, author, and technology leader with over 16 years of experience in the IT industry. He is currently working as a senior data lab architect at Amazon Web Services (AWS). He is passionate about technologies and has expertise in big data, analytics, machine learning, artificial intelligence, graph networks, web/mobile applications, and cloud technologies such as AWS and Google Cloud Platform. Sakti has a bachelor’s degree in engineering and a master’s degree in business administration. He holds several certifications in Hadoop, Spark, AWS, and Google Cloud. He is also an author of multiple technology blogs, workshops, white papers and is a public speaker who represents AWS in various domains and events.
Read more about Sakti Mishra