Reader small image

You're reading from  Data Lakehouse in Action

Product typeBook
Published inMar 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781801815932
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Pradeep Menon
Pradeep Menon
author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon

Right arrow

Chapter 2: The Data Lakehouse Architecture Overview

A well-thought-out architecture is the cornerstone of any robust information technology (IT) system, and a data lakehouse is no exception. The last chapter elucidated the need for a modern data analytics platform. The chapter also discussed the evolution of the data lakehouse. This chapter will focus on the critical elements of a data lakehouse.

The chapter will begin by describing the system context of a data lakehouse. Then, it will investigate the actors and systems that interact with a data lakehouse.

We will then discuss the logical architecture of a data lakehouse that consists of seven layers. The chapter will then deep-dive into various components of a data lakehouse architecture and elaborate on each element. The last section of this chapter will focus on five sacrosanct architecture principles that provide a framework for implementing a data lakehouse.

To summarize, the chapter covers the following topics:

...

Developing a system context for a data lakehouse

A system context diagram shows different entities that interact with a system. In the following case, the system is a data lakehouse:

Figure 2.1 – Data lakehouse system context diagram

The preceding diagram shows key entities (systems or actors) that interact with the data lakehouse. The interaction with the data lakehouse has two parts, as outlined here:

  • Data providers: Systems or actors that provide data to the data lakehouse
  • Data consumers: Systems or actors that consume data from the data lakehouse

Let's examine these entities in detail.

Data providers

Data providers are any system or actor that ingests data into the data lakehouse. Any system that generates data is a potential data provider. A few typical data providers are listed here:

  • Operational systems: Any system that generates data is a potential data provider. Typically, online transaction processing ...

Developing a logical data lakehouse architecture

We have discussed a data lakehouse system context. Let's now get into developing a logical data lakehouse architecture. A logical architecture focuses on components that integrate to satisfy specific functional requirements (FRs) and non-functional requirements (NFRs). It is abstracted to a level that is technology-agnostic and focuses on component functionality. A logical architecture focuses on two kinds of requirements, as follows:

  • An FR is a requirement that fulfills a specific business or domain-driven behavior. These kinds of requirements are driven by the tasks and the needs of a particular business function.
  • An NFR is a requirement that specifies criteria that need to be fulfilled for the system to be helpful in that specific context. For example, a typical NFR includes the time a particular query is expected to complete, a requirement for data encryption, and so on.

A well-architected system ensures that...

Developing architecture principles

As seen in the preceding section, many components make up a data lakehouse architecture. A data lakehouse architecture needs to be governed by a set of architecture principles that ensure that the data lakehouse can meet its goal of being a flexible platform for AI and BI and being agile to cater to ever-changing requirements.

Architecture principles govern any architectural construct and define the underlying general rules and guidelines for use. We can tailor these principles as per the organization's requirements. However, five principles are sacrosanct. These are represented in the following diagram:

Figure 2.12 – Data lakehouse architecture principles

Disciplined at the core, flexible at the edges

The purpose of creating a new architecture paradigm is to be agile and innovative, yet it needs to be governed pragmatically. This balance is a fine line to tread. The first sacrosanct principle embodies...

Summary

This chapter was a 30,000-feet overview introduction to a data lakehouse architecture. This chapter started with the system context that established the critical systems and people that generate data for a data lakehouse and consume data from it. Next, we discussed the motivations and use cases for different types of data. Once the section clarified the system context, the chapter introduced the logical architecture of a data lakehouse. Next, the chapter provided a brief overview of the seven layers of a data lakehouse and its components. Finally, the chapter concluded with an elaboration of five sacrosanct architecture principles core to a robust data lakehouse architecture. The chapter lays the architectural foundation for a modern data analytics platform. Now that the stage is set, the subsequent chapters will go deeper into each of the layers and discuss design patterns for each layer.

In the next chapter, we will cover storing data in a data lakehouse.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Lakehouse in Action
Published in: Mar 2022Publisher: PacktISBN-13: 9781801815932
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon