Reader small image

You're reading from  Serverless Analytics with Amazon Athena

Product typeBook
Published inNov 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800562349
Edition1st Edition
Languages
Right arrow
Authors (3):
Anthony Virtuoso
Anthony Virtuoso
author image
Anthony Virtuoso

Anthony Virtuoso works as a Principal Engineer at Amazon and holds multiple patents in distributed systems, software defined networks, and security. In his eight years at Amazon, he has helped launch several Amazon Web Services, the most recent of which was Amazon Managed Blockchain. As one of the original authors of Athena Query Federation, you'll often find him lurking on the Athena Federation GitHub repository answering questions and shipping bug fixes. When not at work, Anthony obsesses over a different set of customers, namely his wife and two little boys, aged 2 and 5. His kids enjoy doing science experiments with dad, like 3D printing toys, building with Lego, or searching the local pond for tardigrades.
Read more about Anthony Virtuoso

Mert Turkay Hocanin
Mert Turkay Hocanin
author image
Mert Turkay Hocanin

Mert Turkay Hocanin is a Principal Big Data Architect at Amazon Web Services within the AWS Glue and AWS Lake Formation services and has previously worked for several other services including Amazon Athena, Amazon EMR, Amazon Managed Blockchain. During his time at AWS, he worked with several Fortune 500 companies on some of the largest data lakes in the world and was involved with the launching of three Amazon Web Services. Prior to being a Big Data Architect, he was a Senior Software Developer within Amazon's retail systems organization building one of the earliest data lakes in the company in 2013. When he is not helping customers build data lakes, he enjoys spending time with his wife-Subrina, son-Tristan, and exploring New York City.
Read more about Mert Turkay Hocanin

Aaron Wishnick
Aaron Wishnick
author image
Aaron Wishnick

Aaron Wishnick works as a Senior Software Engineer at Amazon, where he has been for 7 years. During that time he has worked on Amazon's payment systems, financial intelligence systems, as well as working for AWS on Athena and AWS Proton. When not at work, Aaron and his fiance, Alyssa, are on a quest to determine just how much dog fur is too much, with their husky and malamute, Mina and Wally.
Read more about Aaron Wishnick

View More author details
Right arrow

Chapter 7: Ad Hoc Analytics

Welcome to Part 3 of Serverless Analytics with Amazon Athena! In the preceding chapters, you learned how to run basic Athena queries and established an understanding of key Athena concepts. You then connected to a data lake that you built and secured. Along the way, you've been learning how to organize and model your data for use by Athena. Now that you have much of the prerequisite knowledge for using Athena, we once again shift our focus. The next few chapters will revisit many of the concepts you've already learned as you work through four of the most common use cases that lead customers to choose Athena for their business.

We begin right here, in this chapter, by unraveling both what it means to run ad hoc analytics queries as well as why the industry seems to have an insatiable appetite for running such queries. We'll also go through building a template for how you can adopt Athena and its related tooling within your organization...

Technical requirements

Wherever possible, we will provide samples or instructions to guide you through the setup. However, to complete the activities in this chapter, you will need to ensure you have the following prerequisites available. Our command-line examples will be executed using Ubuntu, but most types of Linux should work without modification, including Ubuntu on Windows Subsystem for Linux.

You will need an internet connection to access GitHub, S3, and the AWS console.

You will also require a computer with the following:

  • A Chrome, Safari, or Microsoft Edge browser installed
  • The AWS CLI installed

This chapter also requires you to have an AWS account and an accompanying IAM user (or role) with sufficient privileges to complete this chapter's activities. Throughout this book, we will provide detailed IAM policies that attempt to honor the age-old best practice of "least privilege." For simplicity, you can always run through these exercises...

Understanding the ad hoc analytics hype

If you are lucky, you may not be aware of the buzzword levels of hype surrounding ad hoc analytics. Fortunately, there are strong fundamentals behind the increasing level of interest and importance placed on having good tooling for ad hoc analytics. In a moment, we'll attempt to form a proper definition of ad hoc analytics, but not before we run a time travel query of our own to set the stage for what we now know as ad hoc analytics.

As a society, we've been collecting data since the advent of commerce. In the era before modern big data technologies, the business intelligence landscape was a very different place. Most data capture and entry was a manual affair, frequently driven by government accounting and auditing requirements. Particularly savvy companies were tracking their own, non-accounting-related Key Performance Indicators (KPIs), but these exercises were often short-lived and targeted at achieving specific outcomes. It...

Building an ad hoc analytics strategy

As we've seen in our examples, by putting the information in the hands of subject-matter experts, you can make better, faster decisions. Thus, it should be a focal point of any ad hoc analytics strategy to improve the accessibility of data, putting it in the hands of the individuals best suited to interpret the insights it contains. Our first step in forming such a strategy is to remember that while this book will present solutions based on the Athena ecosystem, it is rarely a good idea to lock yourself into any single product or analytics engine. The underlying technologies, pricing models, and supporting tooling will make trade-offs that necessarily favor one use case over others. If something sounds too good to be true, such as a product claiming to be the only analytics system you need, it's probably mediocre at a wide range of things and unlikely to be the best in class for anything. This is part of the philosophy behind AWS&apos...

Using QuickSight with Athena

AWS QuickSight is a data analysis and visualization tool that offers out-of-the-box integrations with popular AWS analytics tools and databases such as Athena, Redshift, MySQL, and others. QuickSight has its own analytics engine called Spice. Spice is capable of low-latency aggregations, searches, and other common analytics operations. When combined with a large-scale analytics engine such as Athena, QuickSight can be used for a combination of data exploration, reporting, and dashboarding tasks. This section will briefly introduce you to QuickSight and use it to visualize both our earthquake and Yellow Taxi ride datasets. Since QuickSight itself is a WYSIWYG (What Ya See Is What Ya Get) authoring experience with lots of built-in guidance, we won't spend much time walking you through each step in this section. Instead, we will focus on the broad strokes and let you explore QuickSight yourself. Regardless of this simplification, our QuickSight exercise...

Using Jupyter Notebooks with Athena

Depending on the proficiency level in querying data, some individuals may consider QuickSight to be more of a dashboarding tool that populates results based on pre-set parameters. Individuals looking for a more fluid and interactive experience may feel their needs are better satisfied by a tool designed for authoring and sharing investigations. You're already familiar with the Athena console's basic ability to write queries and display tabular results. Jupyter Notebooks is a powerful companion to analytics engines such as Athena.

In this section, we'll walk through setting up a Jupyter notebook, connecting it to Amazon Athena, and running advanced ad hoc analytics over the NYC Yellow Taxi ride dataset. If you are unfamiliar with SageMaker or Jupyter Notebooks, don't worry. We will walk you through every step of the process so you can add this new tool to your shelf. For the uninitiated, AWS describes SageMaker as the most...

Summary

In this chapter, you got hands-on with the first of Athena's four most common usages – ad hoc analytics. We did this by looking at the history of business intelligence and learning about the OODA loop. Ad hoc analytics shortens the OODA loop by making it easier to use data to observe and orient yourself to the situation. The increased accessibility of data ultimately leads to the heightened situational awareness required for making sound decisions. With clarity of data behind your decisions, your organization will be less likely to waste time before acting on those choices. A short OODA loop also helps you react to poor decisions or calculated risks such as A/B tests.

The OODA loop isn't a new concept, and it's not the catalyst of the rising importance of ad hoc analytics. Instead, the proliferation of data has made it necessary for every decision maker in your organization to have access to critical business metrics at a moment's notice. We saw...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Serverless Analytics with Amazon Athena
Published in: Nov 2021Publisher: PacktISBN-13: 9781800562349
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Anthony Virtuoso

Anthony Virtuoso works as a Principal Engineer at Amazon and holds multiple patents in distributed systems, software defined networks, and security. In his eight years at Amazon, he has helped launch several Amazon Web Services, the most recent of which was Amazon Managed Blockchain. As one of the original authors of Athena Query Federation, you'll often find him lurking on the Athena Federation GitHub repository answering questions and shipping bug fixes. When not at work, Anthony obsesses over a different set of customers, namely his wife and two little boys, aged 2 and 5. His kids enjoy doing science experiments with dad, like 3D printing toys, building with Lego, or searching the local pond for tardigrades.
Read more about Anthony Virtuoso

author image
Mert Turkay Hocanin

Mert Turkay Hocanin is a Principal Big Data Architect at Amazon Web Services within the AWS Glue and AWS Lake Formation services and has previously worked for several other services including Amazon Athena, Amazon EMR, Amazon Managed Blockchain. During his time at AWS, he worked with several Fortune 500 companies on some of the largest data lakes in the world and was involved with the launching of three Amazon Web Services. Prior to being a Big Data Architect, he was a Senior Software Developer within Amazon's retail systems organization building one of the earliest data lakes in the company in 2013. When he is not helping customers build data lakes, he enjoys spending time with his wife-Subrina, son-Tristan, and exploring New York City.
Read more about Mert Turkay Hocanin

author image
Aaron Wishnick

Aaron Wishnick works as a Senior Software Engineer at Amazon, where he has been for 7 years. During that time he has worked on Amazon's payment systems, financial intelligence systems, as well as working for AWS on Athena and AWS Proton. When not at work, Aaron and his fiance, Alyssa, are on a quest to determine just how much dog fur is too much, with their husky and malamute, Mina and Wally.
Read more about Aaron Wishnick