Reader small image

You're reading from  Data Lakehouse in Action

Product typeBook
Published inMar 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781801815932
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Pradeep Menon
Pradeep Menon
author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon

Right arrow

Chapter 9: Scaling the Data Lakehouse Architecture

In the journey so far, we have covered all the seven layers of the data lakehouse architecture. However, for large organizations that are complex and spread globally, a single data lakehouse won't suffice. They will need multiple platforms to fulfill their analytical requirements. In addition, they will need a structured process to share data elements between them. Therefore, the need arises to develop a macro-architecture pattern that ensures that the organization's overall analytical requirements are met without compromising the architectural debts.

This chapter will discuss concepts that you can use to develop those macro-architecture patterns. First, we will define the need for a macro-architecture pattern for a data lakehouse and the organizational drivers that dictate the requirements for such a pattern. Next, we will cover two types of macro-architecture patterns, namely hub-spoke and data mesh. This section will...

The need for a macro-architectural pattern for analytics

Organizations are complex and need to evolve in their analytics journey, especially when they become too big to be managed centrally. Typically, a complex organization has two groups, the central unit and the sub-units.

  • Central unit: This is the unit that is at the organizational level. It may prescribe guidances that are expected to be followed by the sub-units. It may hold budgets that are distributed for various initiatives across the sub-units. It may also have platforms that fulfill group-level requirements.
  • Sub-units: It is not uncommon for organizations to have many sub-units. The sub-units may have differing levels of independence from the central unit. This degree of autonomy is based on the organizational structure and its culture. Typically, these sub-units can belong to mainly three categories:
    • The first category comprises a different organization (entity) within a group organization in the same or another...

Implementing a data lakehouse in a macro-architectural pattern

The building block for both these patterns is a node. The following figure depicts the formation of a node:

Figure 9.1 – A node – the building block for the macro-architecture pattern

A node is the data lakehouse architectural component implemented in the organization's sub-units and the central unit. The hub-spoke and the data mesh patterns differ in how the data is discovered and shared between the nodes. Let's discuss these patterns in detail.

The hub-spoke pattern

The first pattern that we want to discuss is the hub-spoke pattern. The following figure depicts the conceptual architecture of the hub-spoke pattern:

Figure 9.2 – The conceptual architecture of the hub-spoke pattern

A central node acts as a hub in the hub-spoke pattern, and many edge nodes act as the spoke. The hub is the central node that orchestrates and governs the...

Summary

Large organizations evolve and their analytics journeys differ. A single data lakehouse will not be able to cater to all the analytical requirements of the organization. Therefore, the data lakehouse architecture needs to be scaled in a governed manner to address the ever-changing analytical requirements. In addition, the data in the data lakehouse needs to be democratized and enable structured data sharing between the different units of the organization. This chapter covered the methods for scaling the data lakehouse architecture pattern.

The chapter started by emphasizing the need for macro patterns. We defined the two categories of units that embody a large organization and the five key considerations that influence the analytical requirements of these units. The next section of the chapter focused on implementing the two general macro-architecture patterns. The hub-spoke pattern was the first pattern that we discussed. The section covered the key components that develop...

Further reading

For more information regarding the topics that were covered in this chapter, take a look at the following resources:

Why subscribe?

  • Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
  • Improve your learning with Skill Plans built especially for you
  • Get a free eBook or video every month
  • Fully searchable for easy access to vital information
  • Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up f or a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Lakehouse in Action
Published in: Mar 2022Publisher: PacktISBN-13: 9781801815932
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon