Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Data Wrangling on AWS

You're reading from  Data Wrangling on AWS

Product type Book
Published in Jul 2023
Publisher Packt
ISBN-13 9781801810906
Pages 420 pages
Edition 1st Edition
Languages
Authors (3):
Navnit Shukla Navnit Shukla
Profile icon Navnit Shukla
Sankar M Sankar M
Profile icon Sankar M
Sampat Palani Sampat Palani
Profile icon Sampat Palani
View More author details

Table of Contents (19) Chapters

Preface 1. Part 1:Unleashing Data Wrangling with AWS
2. Chapter 1: Getting Started with Data Wrangling 3. Part 2:Data Wrangling with AWS Tools
4. Chapter 2: Introduction to AWS Glue DataBrew 5. Chapter 3: Introducing AWS SDK for pandas 6. Chapter 4: Introduction to SageMaker Data Wrangler 7. Part 3:AWS Data Management and Analysis
8. Chapter 5: Working with Amazon S3 9. Chapter 6: Working with AWS Glue 10. Chapter 7: Working with Athena 11. Chapter 8: Working with QuickSight 12. Part 4:Advanced Data Manipulation and ML Data Optimization
13. Chapter 9: Building an End-to-End Data-Wrangling Pipeline with AWS SDK for Pandas 14. Chapter 10: Data Processing for Machine Learning with SageMaker Data Wrangler 15. Part 5:Ensuring Data Lake Security and Monitoring
16. Chapter 11: Data Lake Security and Monitoring 17. Index 18. Other Books You May Enjoy

Enriching data from multiple sources using Athena

In this section, we will explore how to enrich data using Athena SQL and also using an Athena federation setup for enriching data from other supported data sources.

Enriching data using Athena SQL joins

In the previous section, we saw various ways through which we can explore data in Amazon Athena. Now, we will focus more on ways to enrich data with additional information through Athena queries.

In this phase, we can enrich raw data further by joining with other data sources. We will continue to use the same data source that we used in earlier sections. Let us assume a scenario where we want to identify the maximum recorded temperature of this century (after the year 2000) from a specific US state (Connecticut) for a specific year (2022). We will get the data for readings from the Parquet table (noaa_data_parquet) that was created using a CTAS statement in the previous section.

We can filter and get records from Country...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}