Reader small image

You're reading from  Data Lakehouse in Action

Product typeBook
Published inMar 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781801815932
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Pradeep Menon
Pradeep Menon
author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon

Right arrow

Chapter 6: Applying Data Governance in the Data Lakehouse

The journey so far has covered five layers of Data Lakehouse. This chapter will focus on the final two layers, that is, data governance and data security layers. Data governance and security is an essential aspect of the modern data analytics platform. We will start this chapter by discussing the need for a data governance framework and how the 3-3-3 framework outlines the three components of data governance. The next section of the chapter will discuss implementing the data governance components in the Data Lakehouse. The sections will first cover implementing the three data governance components. Then, the sections will include key constructs for each of the data governance components and practical methods to implement them.

We will cover the following topics in the chapter:

  • The 3-3-3 framework for data governance
  • The three components of data governance

The 3-3-3 framework for data governance

The data landscape of any organization changes rapidly as more and more organizations embark on the digital transformation journey, and more and more data footprints are created.

Efficiently collecting, managing, and harnessing this data footprint is pivotal for an organization's success. Therefore, data needs to be treated as a strategic asset. However, according to a survey conducted by McKinsey in 2019, on average, an employee spends around 29 percent of the time on non-value-added tasks due to poor data quality and availability. The lack of quality of data and the lack of proper data availability is a function of data governance. As more and more data becomes available for analysis, the principle of garbage in, garbage out begins to manifest.

The 3-3-3 framework creates a structure that is a great starting point:

Figure 6.1 – 3-3-3 framework for data governance

Figure 6.1 – 3-3-3 framework for data governance

This framework focuses on the...

The three components of the data governance layer

Recall that in Chapter 2, Data Lakehouse Architecture Overview, we briefly discussed the three components of the data governance layer. The following figure provides a recap of the three components of data governance:

Figure 6.4 – The three components of the data governance layer

Figure 6.4 – The three components of the data governance layer

The three components of data governance are as follows:

  • Data Governance Policy Management: The first component is not a technology component; it is a set of data policies and standards. The data policy is a set of statements describing the rules of controlling the standards, security, integrity, quality, and data usage in the Data Lakehouse.
  • Data Curation and Cataloging Service: Data cataloging is the process of organizing an inventory of data so that it can be easily identified. This service ensures that all the source data, the data in the data lake, the data in the data warehouse, the data processing pipelines...

Summary

We have covered a lot of ground in this chapter. The data governance layer, one of the vital layers in the Data Lakehouse, was covered in depth. We started by emphasizing the importance of data governance. The 3-3-3 framework for data governance provides a holistic framework for looking into data governance in a structured manner. As part of the 3-3-3 framework, must-have objectives for data governance were covered. The section also covered the key roles that enable data governance and the critical aspects of data governance policies, and the characteristics of technologies that allow it.

The next section of the chapter focused on the three components of the data governance layer, that is, data governance policy management, the data cataloging service, and the data quality service. The section then drilled down into what each component means and how the parts come to fruition in the Data Lakehouse architecture.

The next chapter will focus on the final layer of the Data...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Lakehouse in Action
Published in: Mar 2022Publisher: PacktISBN-13: 9781801815932
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon