Reader small image

You're reading from  Modern Data Architecture on AWS

Product typeBook
Published inAug 2023
PublisherPackt
ISBN-139781801813396
Edition1st Edition
Concepts
Right arrow
Author (1)
Behram Irani
Behram Irani
author image
Behram Irani

Behram Irani is currently a technology leader with Amazon Web Services (AWS) specializing in data, analytics and AI/ML. He has spent over 18 years in the tech industry helping organizations, from start-ups to large-scale enterprises, modernize their data platforms. In the last 6 years working at AWS, Behram has been a thought leader in the data, analytics and AI/ML space; publishing multiple papers and leading the digital transformation efforts for many organizations across the globe. Behram has completed his Bachelor of Engineering in Computer Science from the University of Pune and has an MBA degree from the University of Florida.
Read more about Behram Irani

Right arrow

Data Sharing

In the previous chapter, we looked at how the data stored in Amazon Redshift can be consumed. But imagine that, in a large company such as our GreatFin example, every line of business (LOB) produces and consumes its own data gathered from multiple channels. For a company to be truly data-driven, the data silos need to be broken and there needs to be an easy way to share data across all LOBs, without the need to physically move the data around as duplicate copies.

First, we will look at how you can share data inside your organization, from a data lake on S3 as well as from the data warehouse we built on Redshift.

In this chapter, we will look at the following key topics:

  • Internal data sharing
  • External data sharing

Internal data sharing

Organizations have many internal LOBs and each LOB has many personas that interact with the data produced by their department. Different LOBs often want access to portions of data from other departments for many reasons, including cross-sell, up-sell, fraud detection, and other critical insights about their customers. First, let’s look at a use case on how each LOB can share data that they have curated inside their S3 data lake.

Data sharing using Amazon Athena

Previously, we covered how you can create a data lake on Amazon S3 and then interactively query it using Amazon Athena. In a simple scenario, the data produced by one LOB is only consumed by the personas inside the same LOB. But to unlock the true value of data, organizations prefer that each LOB shares relevant sets of data with other LOBs. When organizations prefer to create a centralized enterprise data lake, the question becomes, how can each LOB access the datasets that belong to them...

External data sharing

Every organization produces and collects a lot of data. Often, data that’s produced is consumed for internal operations, but there are many cases where some data that’s collected can be monetized by offering it to other companies that can use this data to enrich their analytical insights. As you may recall from our data lake chapter, we created an enriched layer for data that could use a combination of internal data and external data to produce datasets that help derive precision insights.

Creating a vision for sharing data externally to make money is easy; however, the real challenge is around setting up all the mechanisms to do this in a scalable, secure, and cost-effective manner. Creating a secure and optimal technical handshake between the data providers and data consumers is not easy. Producers and subscribers both want a secure and easy-to-use cloud-native platform that can seamlessly enable data sharing by providing self-service options...

Summary

In this chapter, we looked at how organizations can share data that’s internal to the organization as well as externally for monetization. Internal data sharing can be as easy as sharing the data in the S3 data lake by providing cross-account access to Amazon Athena. Athena can read data from a shared Glue Data Catalog, making it easy to share different objects from the catalog. We also looked at how Redshift’s data sharing feature helps in sharing data that’s stored in one Redshift cluster with many other clusters in the organization. By creating a producer cluster and providing grants, the consumer cluster can easily access the objects shared with it.

Finally, we looked at patterns for sharing data external to the organization by leveraging AWS Data Exchange. Data Exchange helps us share datasets via various modes, such as files, S3, Redshift, Lake Formation, and APIs. Without data sharing features, complex ETL pipelines would have to be built to move...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Modern Data Architecture on AWS
Published in: Aug 2023Publisher: PacktISBN-13: 9781801813396
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Behram Irani

Behram Irani is currently a technology leader with Amazon Web Services (AWS) specializing in data, analytics and AI/ML. He has spent over 18 years in the tech industry helping organizations, from start-ups to large-scale enterprises, modernize their data platforms. In the last 6 years working at AWS, Behram has been a thought leader in the data, analytics and AI/ML space; publishing multiple papers and leading the digital transformation efforts for many organizations across the globe. Behram has completed his Bachelor of Engineering in Computer Science from the University of Pune and has an MBA degree from the University of Florida.
Read more about Behram Irani