Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Modern Data Architecture on AWS

You're reading from  Modern Data Architecture on AWS

Product type Book
Published in Aug 2023
Publisher Packt
ISBN-13 9781801813396
Pages 420 pages
Edition 1st Edition
Languages
Author (1):
Behram Irani Behram Irani
Profile icon Behram Irani

Table of Contents (24) Chapters

Preface 1. Part 1: Foundational Data Lake
2. Prologue: The Data and Analytics Journey So Far 3. Chapter 1: Modern Data Architecture on AWS 4. Chapter 2: Scalable Data Lakes 5. Part 2: Purpose-Built Services And Unified Data Access
6. Chapter 3: Batch Data Ingestion 7. Chapter 4: Streaming Data Ingestion 8. Chapter 5: Data Processing 9. Chapter 6: Interactive Analytics 10. Chapter 7: Data Warehousing 11. Chapter 8: Data Sharing 12. Chapter 9: Data Federation 13. Chapter 10: Predictive Analytics 14. Chapter 11: Generative AI 15. Chapter 12: Operational Analytics 16. Chapter 13: Business Intelligence 17. Part 3: Govern, Scale, Optimize And Operationalize
18. Chapter 14: Data Governance 19. Chapter 15: Data Mesh 20. Chapter 16: Performant and Cost-Effective Data Platform 21. Chapter 17: Automate, Operationalize, and Monetize 22. Index 23. Other Books You May Enjoy

Data Federation

In the previous chapter, we explored different use cases for sharing data, both internally and externally with the organization. Data sharing is a very critical aspect of any data platform, where data stored in an Amazon S3-based data lake and in an Amazon Redshift data warehouse is seamlessly shared, without the need to create duplicate copies. Every data platform has distinct components for data storage, as well as for data computations. In the data sharing model, we focused on sharing data between similar systems – for example, using Amazon Athena to share data stored in an S3 data lake and using Amazon Redshift to share data with other Redshift clusters.

Data doesn’t always get stored, processed, and shared within homogeneous systems. A lot of times, data is captured in heterogeneous systems and those systems may not even reside inside the AWS ecosystem. This brings us to the question, how do we seamlessly and transparently query datasets from a...

Data federation using Amazon Athena

Amazon Athena is primarily used to query data from S3 data lakes. However, to query data across heterogeneous sources, Athena provides a feature called Federated Query. This feature enables different personas, such as data analysts, data engineers, and data scientists, to execute queries across disparate data sources from Athena itself. The single biggest differentiator for Federated Query is that the execution of such queries happens inside the systems that store the data.

Athena executes these federated queries using connectors. Athena provides many connectors to a variety of source systems. Using these connectors, Athena can pass portions of the query that need to be executed in the source system. This execution is assisted by AWS Lambda functions, which optimize the query’s execution and gather the data received from the underlying systems. Since Lambda functions are serverless and scalable, this allows Athena to query larger datasets...

Data federation using Amazon Redshift

Federated queries can be executed even from inside Redshift, allowing Redshift data to be joined with data from relational data sources such as PostgreSQL and MySQL, either on Amazon RDS or on Amazon Aurora. For certain use cases, it does not make sense to spend time creating an ETL pipeline to load data inside Redshift. Redshift can connect to these sources and distribute the execution of such queries down to the data source itself to improve performance.

The following figure highlights the current data sources that Redshift federated queries can work with. With the federated architecture in place inside Redshift, more source connectors may get added in the future, to expand the ecosystem and broaden the use cases that can be solved with this architecture pattern:

Figure 9.7 – Redshift federated queries

Figure 9.7 – Redshift federated queries

Amazon Redshift federated queries use case

To understand this better, let’s consider a use case...

Summary

In this chapter, we looked at how data federation helps organizations quickly fetch data using a single pane of glass from multiple heterogeneous source systems.

We looked at how different connectors in Amazon Athena allow for a quick and easy way to join datasets from other sources. Athena’s connectors make it a seamless and transparent user experience where reports can be created just by writing SQL statements inside Athena, to join datasets from the underlying data stores.

We also looked at how Amazon Redshift can assist in federated queries, by fetching data stored in ODS systems such as MySQL and PostgreSQL. A use case that typically gets solved by this mechanism is querying live operational data that’s constantly getting updated in the ODS.

The next chapter is critical in our modern data platform journey as we will discuss everything about predictive analytics and how it helps organizations think big with their data.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Modern Data Architecture on AWS
Published in: Aug 2023 Publisher: Packt ISBN-13: 9781801813396
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}