Reader small image

You're reading from  Modern Data Architecture on AWS

Product typeBook
Published inAug 2023
PublisherPackt
ISBN-139781801813396
Edition1st Edition
Concepts
Right arrow
Author (1)
Behram Irani
Behram Irani
author image
Behram Irani

Behram Irani is currently a technology leader with Amazon Web Services (AWS) specializing in data, analytics and AI/ML. He has spent over 18 years in the tech industry helping organizations, from start-ups to large-scale enterprises, modernize their data platforms. In the last 6 years working at AWS, Behram has been a thought leader in the data, analytics and AI/ML space; publishing multiple papers and leading the digital transformation efforts for many organizations across the globe. Behram has completed his Bachelor of Engineering in Computer Science from the University of Pune and has an MBA degree from the University of Florida.
Read more about Behram Irani

Right arrow

Data Federation

In the previous chapter, we explored different use cases for sharing data, both internally and externally with the organization. Data sharing is a very critical aspect of any data platform, where data stored in an Amazon S3-based data lake and in an Amazon Redshift data warehouse is seamlessly shared, without the need to create duplicate copies. Every data platform has distinct components for data storage, as well as for data computations. In the data sharing model, we focused on sharing data between similar systems – for example, using Amazon Athena to share data stored in an S3 data lake and using Amazon Redshift to share data with other Redshift clusters.

Data doesn’t always get stored, processed, and shared within homogeneous systems. A lot of times, data is captured in heterogeneous systems and those systems may not even reside inside the AWS ecosystem. This brings us to the question, how do we seamlessly and transparently query datasets from a...

Data federation using Amazon Athena

Amazon Athena is primarily used to query data from S3 data lakes. However, to query data across heterogeneous sources, Athena provides a feature called Federated Query. This feature enables different personas, such as data analysts, data engineers, and data scientists, to execute queries across disparate data sources from Athena itself. The single biggest differentiator for Federated Query is that the execution of such queries happens inside the systems that store the data.

Athena executes these federated queries using connectors. Athena provides many connectors to a variety of source systems. Using these connectors, Athena can pass portions of the query that need to be executed in the source system. This execution is assisted by AWS Lambda functions, which optimize the query’s execution and gather the data received from the underlying systems. Since Lambda functions are serverless and scalable, this allows Athena to query larger datasets...

Data federation using Amazon Redshift

Federated queries can be executed even from inside Redshift, allowing Redshift data to be joined with data from relational data sources such as PostgreSQL and MySQL, either on Amazon RDS or on Amazon Aurora. For certain use cases, it does not make sense to spend time creating an ETL pipeline to load data inside Redshift. Redshift can connect to these sources and distribute the execution of such queries down to the data source itself to improve performance.

The following figure highlights the current data sources that Redshift federated queries can work with. With the federated architecture in place inside Redshift, more source connectors may get added in the future, to expand the ecosystem and broaden the use cases that can be solved with this architecture pattern:

Figure 9.7 – Redshift federated queries

Figure 9.7 – Redshift federated queries

Amazon Redshift federated queries use case

To understand this better, let’s consider a use case...

Summary

In this chapter, we looked at how data federation helps organizations quickly fetch data using a single pane of glass from multiple heterogeneous source systems.

We looked at how different connectors in Amazon Athena allow for a quick and easy way to join datasets from other sources. Athena’s connectors make it a seamless and transparent user experience where reports can be created just by writing SQL statements inside Athena, to join datasets from the underlying data stores.

We also looked at how Amazon Redshift can assist in federated queries, by fetching data stored in ODS systems such as MySQL and PostgreSQL. A use case that typically gets solved by this mechanism is querying live operational data that’s constantly getting updated in the ODS.

The next chapter is critical in our modern data platform journey as we will discuss everything about predictive analytics and how it helps organizations think big with their data.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Modern Data Architecture on AWS
Published in: Aug 2023Publisher: PacktISBN-13: 9781801813396
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Behram Irani

Behram Irani is currently a technology leader with Amazon Web Services (AWS) specializing in data, analytics and AI/ML. He has spent over 18 years in the tech industry helping organizations, from start-ups to large-scale enterprises, modernize their data platforms. In the last 6 years working at AWS, Behram has been a thought leader in the data, analytics and AI/ML space; publishing multiple papers and leading the digital transformation efforts for many organizations across the globe. Behram has completed his Bachelor of Engineering in Computer Science from the University of Pune and has an MBA degree from the University of Florida.
Read more about Behram Irani