Reader small image

You're reading from  Serverless Analytics with Amazon Athena

Product typeBook
Published inNov 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800562349
Edition1st Edition
Languages
Right arrow
Authors (3):
Anthony Virtuoso
Anthony Virtuoso
author image
Anthony Virtuoso

Anthony Virtuoso works as a Principal Engineer at Amazon and holds multiple patents in distributed systems, software defined networks, and security. In his eight years at Amazon, he has helped launch several Amazon Web Services, the most recent of which was Amazon Managed Blockchain. As one of the original authors of Athena Query Federation, you'll often find him lurking on the Athena Federation GitHub repository answering questions and shipping bug fixes. When not at work, Anthony obsesses over a different set of customers, namely his wife and two little boys, aged 2 and 5. His kids enjoy doing science experiments with dad, like 3D printing toys, building with Lego, or searching the local pond for tardigrades.
Read more about Anthony Virtuoso

Mert Turkay Hocanin
Mert Turkay Hocanin
author image
Mert Turkay Hocanin

Mert Turkay Hocanin is a Principal Big Data Architect at Amazon Web Services within the AWS Glue and AWS Lake Formation services and has previously worked for several other services including Amazon Athena, Amazon EMR, Amazon Managed Blockchain. During his time at AWS, he worked with several Fortune 500 companies on some of the largest data lakes in the world and was involved with the launching of three Amazon Web Services. Prior to being a Big Data Architect, he was a Senior Software Developer within Amazon's retail systems organization building one of the earliest data lakes in the company in 2013. When he is not helping customers build data lakes, he enjoys spending time with his wife-Subrina, son-Tristan, and exploring New York City.
Read more about Mert Turkay Hocanin

Aaron Wishnick
Aaron Wishnick
author image
Aaron Wishnick

Aaron Wishnick works as a Senior Software Engineer at Amazon, where he has been for 7 years. During that time he has worked on Amazon's payment systems, financial intelligence systems, as well as working for AWS on Athena and AWS Proton. When not at work, Aaron and his fiance, Alyssa, are on a quest to determine just how much dog fur is too much, with their husky and malamute, Mina and Wally.
Read more about Aaron Wishnick

View More author details
Right arrow

Chapter 6: AWS Glue and AWS Lake Formation

Although this book focuses on Athena and its rich functionality, you should be aware of AWS Glue and AWS Lake Formation. These services can be used with Athena to implement use cases that Athena cannot alone. AWS Lake Formation was created to help customers simplify creating data lakes by providing tools to help ingest data, secure data, and reduce the time it takes to get a functional data lake. Lake Formation is a layer that exists on top of AWS Glue and uses Glue's components as building blocks.

One of the main features that Lake Formation brings is fine-grained access controls and auditing to several AWS services, including Athena. Lake Formation augments AWS IAM to help secure the data lake. IAM provides authentication of the user, while Lake Formation provides authorization based on the principle that is requesting data. Every authorization request that goes through Lake Formation generates audit events in CloudTrail that are...

Technical requirements

For this chapter, if you wish to follow some of the walkthroughs, you will need the following:

  • Internet access to GitHub, S3, and the AWS Console.
  • A computer with either Chrome, Safari, or Microsoft Edge installed on it.
  • An AWS account and accompanying IAM user (or role) with sufficient privileges to complete this chapter's activities. For simplicity, you can always run through these exercises with a user that has full access. However, we recommend using scoped-down IAM policies to avoid making costly mistakes and learn how to best use IAM to secure your applications and data. You can find a minimally scoped IAM policy for this chapter in this book's accompanying GitHub repository, which is listed as chapter_6/iam_policy_chapter_6.json. This policy includes the following:
    • Permissions to create and list IAM roles and policies:
      • We will be creating a service role for an AWS Glue Crawler to assume.
    • Permissions to read, list, and write access...

Summary

In this chapter, you learned what AWS Glue and AWS Lake Formation provide when building and maintaining data lakes on AWS. We then focused on Lake Formation's ability to provide fine-grained access controls and the benefits and limitations of this. We also went through a sample process of enabling Lake Formation access controls for a new database and how it works within Athena. Lastly, we touched on Lake Formation governed tables, what they are, and how they can solve many issues with storing datasets on a distributed filesystem. There are more advanced features of Lake Formation, and we will dive deeper into governed tables in Chapter 14, Lake Formation – Advanced Topics.

In the next part of this book, we will get our hands dirty by using Amazon Athena in various settings ranging from ad hoc data analysis, using Athena to build ETL pipelines, and building applications that use Athena. We'll also take some time to cover how you can troubleshoot and tune...

Further reading

To learn more about the topics that were covered in this chapter, take a look at the following resources:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Serverless Analytics with Amazon Athena
Published in: Nov 2021Publisher: PacktISBN-13: 9781800562349
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Anthony Virtuoso

Anthony Virtuoso works as a Principal Engineer at Amazon and holds multiple patents in distributed systems, software defined networks, and security. In his eight years at Amazon, he has helped launch several Amazon Web Services, the most recent of which was Amazon Managed Blockchain. As one of the original authors of Athena Query Federation, you'll often find him lurking on the Athena Federation GitHub repository answering questions and shipping bug fixes. When not at work, Anthony obsesses over a different set of customers, namely his wife and two little boys, aged 2 and 5. His kids enjoy doing science experiments with dad, like 3D printing toys, building with Lego, or searching the local pond for tardigrades.
Read more about Anthony Virtuoso

author image
Mert Turkay Hocanin

Mert Turkay Hocanin is a Principal Big Data Architect at Amazon Web Services within the AWS Glue and AWS Lake Formation services and has previously worked for several other services including Amazon Athena, Amazon EMR, Amazon Managed Blockchain. During his time at AWS, he worked with several Fortune 500 companies on some of the largest data lakes in the world and was involved with the launching of three Amazon Web Services. Prior to being a Big Data Architect, he was a Senior Software Developer within Amazon's retail systems organization building one of the earliest data lakes in the company in 2013. When he is not helping customers build data lakes, he enjoys spending time with his wife-Subrina, son-Tristan, and exploring New York City.
Read more about Mert Turkay Hocanin

author image
Aaron Wishnick

Aaron Wishnick works as a Senior Software Engineer at Amazon, where he has been for 7 years. During that time he has worked on Amazon's payment systems, financial intelligence systems, as well as working for AWS on Athena and AWS Proton. When not at work, Aaron and his fiance, Alyssa, are on a quest to determine just how much dog fur is too much, with their husky and malamute, Mina and Wally.
Read more about Aaron Wishnick