Reader small image

You're reading from  Data Lakehouse in Action

Product typeBook
Published inMar 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781801815932
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Pradeep Menon
Pradeep Menon
author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon

Right arrow

Introducing the data lakehouse paradigm

In 2006, Clive Humbly, a British mathematician, coined the now-famous phrase, "Data is the new oil." It was akin to peering through a crystal ball and peeking into the future. Data is the lifeblood of organizations. The competitive advantage is defined by how an organization uses data. Data management is paramount in this age of digital transformation. More and more organizations are embracing digital transformation programs, and data is at the core of these transformations.

As discussed earlier, the paradigms of the EDW and data lakes were opportune for their times. They had their benefits and their challenges. A new paradigm needed to emerge that was disciplined at its core and flexible at its edges.

Figure 1.9 – Data lakehouse paradigm

Figure 1.9 – Data lakehouse paradigm

The new data architectural paradigm is called the data lakehouse. It strives to combine the advantages of both the data lake and the EDW paradigms while minimizing their challenges.

An adequately architected data lakehouse delivers four key benefits.

Figure 1.10 – Benefits of the data lakehouse

Figure 1.10 – Benefits of the data lakehouse

  1. It derives insights from both structured and unstructured data: The data lakehouse architecture should be able to store, transform, and integrate structured and unstructured data. It should be able to fuse them together and enable the extraction of valuable insights from the data.
  2. It caters to different personas of the organizations: Data is a dish with different tastes for different personas. The data lakehouse should be able to cater to the needs of these personas. The data lakehouse caters to a range of organizational personas and fulfills their requirements for insights. A data scientist should get their playground for testing their hypothesis. An analyst should be able to analyze data using their tools of choice, and business users should be able to get their reports accurately and on time. It democratizes data for analytics.
  3. It facilitates the adoption of a robust governance framework: The primary challenge with the data lake architecture pattern was the lack of a strong governance framework. It was easy for a data lake to become a data swamp. In contrast, an EDW architecture was stymied by too much governance for too little content. The data lakehouse architecture strives to hit the governance balance. It seeks to achieve the proper governance for the correct data type with access to the right stakeholder.
  4. It leverages cloud computing: Data lakehouse architecture needs to be agile and innovative. The pattern needs to adapt to the changing organizational requirements and reduce the data to insight turnover time. To achieve this agility, it is imperative to adopt cloud computing technology. The cloud computing platforms offer the innovativeness required. It provides the appropriate technology stack with scalability and flexibility, and fulfills the demands of a modern data analytics platform.

The data lakehouse paradigm addresses the challenges faced by the EDW and the data lake paradigm. Yet, it does have its own set of challenges that needs to be managed. A few of those challenges are as follows:

  • Architectural complexity: Given that the data lakehouse pattern amalgamates the EDW and the data lake pattern, it is inevitable that it will have its fair share of architectural complexity. The complexity manifests in the form of multiple components required to fruition the pattern. Architectural patterns are quid pro quo; it is vital to carefully trade off architectural complexity with the potential business benefit. The data lakehouse architecture needs to tread that path carefully.
  • Required holistic data governance: The challenges pertinent to the data lake paradigm do not magically go away with the data lakehouse paradigm. The biggest challenge of a data lake was that it was prone to becoming a data swamp. As the data lakehouse grows in its scope and complexity, the lack of a holistic governance framework is a sure-shot way of creating a swamp out of a data lakehouse.
  • Balancing flexibility with discipline: The data lakehouse paradigm strives to be flexible and to adapt to changing business requirements with agility. The ethos under which it operates is to have discipline at the core and flexibility at the edges. Achieving this objective is a careful balancing act that clearly defines the limits of flexibility and the strictness of discipline. The data lakehouse stewards play an essential role in ensuring this balance.

Let's recap what we've discussed in this chapter.

Previous PageNext Page
You have been reading a chapter from
Data Lakehouse in Action
Published in: Mar 2022Publisher: PacktISBN-13: 9781801815932
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon