You're reading from Data Lakehouse in Action

Product typeBook

Published inMar 2022

Reading LevelBeginner

PublisherPackt

ISBN-139781801815932

Edition1st Edition

Languages

Python

Tools

AWS IoT

Concepts

Big Data

Author (1)

Pradeep Menon

Introducing the data lakehouse paradigm

In 2006, Clive Humbly, a British mathematician, coined the now-famous phrase, "Data is the new oil." It was akin to peering through a crystal ball and peeking into the future. Data is the lifeblood of organizations. The competitive advantage is defined by how an organization uses data. Data management is paramount in this age of digital transformation. More and more organizations are embracing digital transformation programs, and data is at the core of these transformations.

As discussed earlier, the paradigms of the EDW and data lakes were opportune for their times. They had their benefits and their challenges. A new paradigm needed to emerge that was disciplined at its core and flexible at its edges.

Figure 1.9 – Data lakehouse paradigm

The new data architectural paradigm is called the data lakehouse. It strives to combine the advantages of both the data lake and the EDW paradigms while minimizing their challenges.

An adequately architected data lakehouse delivers four key benefits.

Figure 1.10 – Benefits of the data lakehouse

It derives insights from both structured and unstructured data: The data lakehouse architecture should be able to store, transform, and integrate structured and unstructured data. It should be able to fuse them together and enable the extraction of valuable insights from the data.
It caters to different personas of the organizations: Data is a dish with different tastes for different personas. The data lakehouse should be able to cater to the needs of these personas. The data lakehouse caters to a range of organizational personas and fulfills their requirements for insights. A data scientist should get their playground for testing their hypothesis. An analyst should be able to analyze data using their tools of choice, and business users should be able to get their reports accurately and on time. It democratizes data for analytics.
It facilitates the adoption of a robust governance framework: The primary challenge with the data lake architecture pattern was the lack of a strong governance framework. It was easy for a data lake to become a data swamp. In contrast, an EDW architecture was stymied by too much governance for too little content. The data lakehouse architecture strives to hit the governance balance. It seeks to achieve the proper governance for the correct data type with access to the right stakeholder.
It leverages cloud computing: Data lakehouse architecture needs to be agile and innovative. The pattern needs to adapt to the changing organizational requirements and reduce the data to insight turnover time. To achieve this agility, it is imperative to adopt cloud computing technology. The cloud computing platforms offer the innovativeness required. It provides the appropriate technology stack with scalability and flexibility, and fulfills the demands of a modern data analytics platform.

The data lakehouse paradigm addresses the challenges faced by the EDW and the data lake paradigm. Yet, it does have its own set of challenges that needs to be managed. A few of those challenges are as follows:

Architectural complexity: Given that the data lakehouse pattern amalgamates the EDW and the data lake pattern, it is inevitable that it will have its fair share of architectural complexity. The complexity manifests in the form of multiple components required to fruition the pattern. Architectural patterns are quid pro quo; it is vital to carefully trade off architectural complexity with the potential business benefit. The data lakehouse architecture needs to tread that path carefully.
Required holistic data governance: The challenges pertinent to the data lake paradigm do not magically go away with the data lakehouse paradigm. The biggest challenge of a data lake was that it was prone to becoming a data swamp. As the data lakehouse grows in its scope and complexity, the lack of a holistic governance framework is a sure-shot way of creating a swamp out of a data lakehouse.
Balancing flexibility with discipline: The data lakehouse paradigm strives to be flexible and to adapt to changing business requirements with agility. The ethos under which it operates is to have discipline at the core and flexibility at the edges. Achieving this objective is a careful balancing act that clearly defines the limits of flexibility and the strictness of discipline. The data lakehouse stewards play an essential role in ensuring this balance.

Let's recap what we've discussed in this chapter.

You have been reading a chapter from

Data Lakehouse in Action

Published in: Mar 2022Publisher: PacktISBN-13: 9781801815932

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages