Reader small image

You're reading from  Modern Data Architecture on AWS

Product typeBook
Published inAug 2023
PublisherPackt
ISBN-139781801813396
Edition1st Edition
Concepts
Right arrow
Author (1)
Behram Irani
Behram Irani
author image
Behram Irani

Behram Irani is currently a technology leader with Amazon Web Services (AWS) specializing in data, analytics and AI/ML. He has spent over 18 years in the tech industry helping organizations, from start-ups to large-scale enterprises, modernize their data platforms. In the last 6 years working at AWS, Behram has been a thought leader in the data, analytics and AI/ML space; publishing multiple papers and leading the digital transformation efforts for many organizations across the globe. Behram has completed his Bachelor of Engineering in Computer Science from the University of Pune and has an MBA degree from the University of Florida.
Read more about Behram Irani

Right arrow

Modern Data Architecture on AWS

Before we dive deep into the actual data and analytics use cases and how to design and build them, let’s address the elephant in the room—what is a modern data architecture, and why build it on Amazon Web Services (AWS)?

One of the fundamental tenets of a modern data architecture on AWS is to seamlessly integrate your data lake, data warehouse, and purpose-built data stores. In the previous prologue, we looked at what a data warehouse is and what it does. We also looked at the data tier in a three-tier architecture, typically referred to as a relational database management system (RDBMS) and considered a type of purpose-built store. The type of system we haven’t really explored in much detail yet is the data lake. The next chapter is completely dedicated to data lakes, but before we go any further in this chapter, it is important to get some context around the need for data lakes in the first place.

In this chapter, we will...

Data lakes

Simply put, a data lake is a centralized repository to store all kinds of data. Data can be structured (such as relational database data in tabular format), semi-structured (such as JSON), or unstructured (such as images, PDFs, and so on). Data from all the heterogenous source systems is collected and processed in this single repository and consumed from it. In its early days, Apache Hadoop became the go-to place for setting up data lakes. The Hadoop framework provided a storage layer called Hadoop Distributed File System (HDFS) and a data processing layer called MapReduce. Organizations started using this data lake as a central place for storing and processing all kinds of data. The data lake provided a great alternative to storing and processing data outside relational databases and data warehouses. But soon, the data lake setup on-premises infrastructure became a nightmare. We will look at those challenges as we build upon this chapter.

The following diagram shows...

The role of a modern data architecture

A modern data architecture removes the rigid boundaries between data systems and seamlessly integrates the data lake, data warehouse, and purpose-built data stores. A modern data architecture recognizes the fact that taking a one-size-fits-all approach leads to compromises in the data and analytics platform. And we are not just referring to seamless integration between data systems; it also has to encompass unified governance, along with ease of data movement.

A modern data architecture is a direct response to all the challenges we have seen so far, including exponential data growth, performance and scalability issues, security and data governance nightmares, data silo issues, and—of course—pinching high expenses.

The following diagram shows a modern data architecture at a high level:

Figure 1.2 – Modern data architecture

Figure 1.2 – Modern data architecture

All the data an organization collects plays a huge role in reinventing...

Modern data architecture on AWS

AWS has been a pioneer in cloud computing; it provides a broad and deep platform to help organizations build sophisticated, scalable, and secure applications and data platforms.

Here’s a quick recap of why millions of customers choose AWS:

  • Agility: Allows organizations to experiment and innovate quickly and frequently
  • Elasticity: Takes away the guesswork around hardware capacity provisioning, allowing it to scale up and down with demand
  • Faster innovation: This is possible because organizations can now focus on implementing things that matter to their businesses and not worry about IT infrastructure
  • Cost saving: This is significant due to the economies of scale of cloud computing, coupled with pay-as-you-go models
  • Global reach: This is now possible in minutes due to AWS’ most extensive, reliable, and secure global cloud infrastructure
  • Service breadth and depth: With over 200 fully featured services to support...

Pillars of a modern data architecture

A modern data architecture is required to break down data silos so that data analytics, descriptive as well as predictive using artificial intelligence/machine learning (AI/ML), can be done with all the data aggregated into a central location. In order to meet all the business needs around deriving value out of the data in a fast and cost-effective manner, the architecture requires certain pillars to be in place, as follows:

  • Scalable data lakes
  • Purpose-built analytics services
  • Unified data access, including seamless data movement
  • Unified governance
  • Performance and cost-effectiveness

The following diagram illustrates these pillars for you:

 Figure 1.6 – Pillars of a modern data architecture on AWS

Figure 1.6 – Pillars of a modern data architecture on AWS

Let’s explore each of the pillars in more detail.

Scalable data lakes

A data lake is the foundation of a strong modern data platform. Data lakes get pretty big in a short...

Summary

In this chapter, we looked at what data lakes are, why they are important, and what some of the challenges of on-premises data lakes are. We had enough context to pivot toward what a modern data architecture looks like and why it’s important to build data platforms using this architecture pattern. And, as the climax was building up, AWS made a grand entry. We looked at the pillars of a modern data architecture on AWS. The stage is now set to get into details of each of these pillars. The flow of this book going forward is in line with these pillars.

With this chapter, our rollercoaster has just reached the top at cruising speed. Now, in the subsequent chapters, hang tight for all the thrills of the actual use cases and how the whole modern data platform slowly starts to take shape.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Modern Data Architecture on AWS
Published in: Aug 2023Publisher: PacktISBN-13: 9781801813396
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Behram Irani

Behram Irani is currently a technology leader with Amazon Web Services (AWS) specializing in data, analytics and AI/ML. He has spent over 18 years in the tech industry helping organizations, from start-ups to large-scale enterprises, modernize their data platforms. In the last 6 years working at AWS, Behram has been a thought leader in the data, analytics and AI/ML space; publishing multiple papers and leading the digital transformation efforts for many organizations across the globe. Behram has completed his Bachelor of Engineering in Computer Science from the University of Pune and has an MBA degree from the University of Florida.
Read more about Behram Irani