Reader small image

You're reading from  Data Lakehouse in Action

Product typeBook
Published inMar 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781801815932
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Pradeep Menon
Pradeep Menon
author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon

Right arrow

Chapter 7: Applying Data Security in a Data Lakehouse

Six layers of the data lakehouse have been covered so far. This chapter will cover the last layer of Data Security. It is the most crucial layer that ensures that data is secured in all the layers of a data lakehouse; this chapter will cover the ways to secure the data lakehouse. We will start by formulating a framework for data security, which will elucidate the key dimensions you need to consider for data security. The next section of the chapter will focus on three components of the data security layer that help secure the lake and provide the right access.

In summary, this chapter will cover the following:

  • Realizing the data security components in a data lakehouse
  • Using an identity and access management service in a data lakehouse
  • Methods of data encryption in a data lakehouse
  • Methods of data masking in a data lakehouse
  • Methods of implementing network security in a data lakehouse

Let's...

Realizing the data security components in a data lakehouse

We covered the elements of data security briefly in Chapter 2, The Data Lakehouse Architecture Overview. Recall that, in that chapter, we discussed the four key components of the data security layer. The following figure summarizes the four components of the data security layer:

Figure 7.1 – Data security components

These four components ensure that data is well secured and that access to data is controlled. They work together in securing the data lakehouse. The following figure depicts how these four components orchestrate protected data:

Figure 7.2 – Orchestration of various data security components in the data lakehouse

Whenever any interaction needs to be done with the data lakehouse layers, it must go through the network security service. The network security service filters the traffic to the data lakehouse layer. The network traffic to and from the data...

Using IAM in a data lakehouse

The first component for the data security layer is IAM, which ensures that the right principal gets access to the right component with the correct authorization level. For example, the principal could be a range of identities, including a person, a device, or an application, that can request an action or operation on a data lakehouse component. The IAM component determines who gets access to what and how.

IAM employs a Zero-Trust architecture. Zero trust means that any organization should have no trust in anything or anyone when accessing resources. With zero trust, a breach is assumed. Every user and device is treated as a threat. Therefore, its access level needs to be verified before being granted. The principles of least-privilege access and identity-based security policies are the cornerstone of a zero-trust architecture.

The following figure shows that an organization should have a holistic IAM implementation strategy with at least five elements...

Methods of data encryption in a data lakehouse

The second data security component is the data encryption service. Encryption is the most common form of protecting data. When data is encrypted, it cannot be deciphered even if someone gains unauthorized access to it. It becomes the second line of defense. However, hackers have their methods as well. They are getting more and more sophisticated in breaching data. The standard method of attacking encryption is through brute force. Brute force tries multiple keys to gain access to data until the right one is found. The encryption algorithms that encrypt the data prevent this by employing sophisticated algorithms and smartly managing its keys.

The following figure depicts the data encryption process:

Figure 7.5 – The high-level process of data encryption

Data encryption is the process of translating data from typical data in human-readable format (plaintext) into a format that is unreadable by humans (ciphertext...

Methods of data masking in a data lakehouse

A data lakehouse can contain a lot of sensitive data that needs protection from unauthorized access. This could include Personally Identifiable Information (PII) such as social security numbers, email, or phone numbers, or sensitive information such as credit card or bank account numbers. Not everyone needs to access this sensitive data. A data masking service adds a layer of protection to ensure that sensitive data is only accessed by the most privileged users with the need to access it. Data masking is a way to create an artificial but practical version of data. It protects sensitive data without sacrificing the functionality that it offers. There are several reasons why data masking is vital for an organization:

  • Data masking mitigates external and internal threats. For example, data exfiltration, insider threats or account compromise, and insecure interfaces with third-party systems are some threats mitigated by data masking.
  • ...

Methods of implementing network security in a data lakehouse

The network is the lifeline of the data lakehouse. The data flows in and out of the data lakehouse through the network. The network acts as a conduit to other systems as well. Therefore, it is paramount to ensure that the network is well protected. All other data security components work in tandem with network security services. Network security aims to protect the usability and integrity of your network. It effectively manages access to the network and mitigates various threats and stops them from entering or spreading on your network.

In the previous section, we covered the network security layer at a high level. So, let's further drill down to the network security service and discuss what it entails. The following figure dives deeper into the subcomponents of network security services:

Figure 7.9 – The subcomponents of network security services

Let's break down this figure...

Summary

This chapter covered the components of the data security layer, one of the essential layers of a data lakehouse. Data security is paramount in any system. Its importance is accentuated when it comes to protecting data access, usage, and storage. This chapter covered how to secure data and ensure that the right access is provided at the right layer to the right stakeholder. The chapter began by giving an overview of the four components of the data security layer. Then, we discussed how these four layers interact with each other. The following sections of the chapter delved deeper into each component. The first component discussed was IAM. IAM ensures that the right user gets access to the right component with the correct authorization level. We also discussed the principles of zero-trust architecture. The following section discussed the ways data can be encrypted in a data lakehouse using a data encryption service. When the data is encrypted, it cannot be deciphered, even if...

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Lakehouse in Action
Published in: Mar 2022Publisher: PacktISBN-13: 9781801815932
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon