Reader small image

You're reading from  Data Lakehouse in Action

Product typeBook
Published inMar 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781801815932
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Pradeep Menon
Pradeep Menon
author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon

Right arrow

Chapter 8: Implementing a Data Lakehouse on Microsoft Azure

We have come a long way on our journey in exploring modern data analytics architecture through the concept of the data lakehouse. All seven layers of the data lakehouse are covered in detail in the preceding chapters. Although you can employ the same architecture in any cloud service or even on-premises, I will implement a data lakehouse on the Microsoft Azure platform.

This chapter will start by refreshing the concepts covered in Chapter 1, Introducing the Evolution of Data Analytics Patterns, and Chapter 2, The Data Lakehouse Architecture Overview, and establishing the advantages of using cloud computing. The following three sections of the chapter will focus on the cloud services used to bring to fruition the data lakehouse architecture. Finally, the chapter will explain the services in Microsoft Azure that you can use to realize the data lakehouse architecture. We will discuss the key features of each of these services...

Why is cloud computing apt for implementing a data lakehouse?

Why cloud computing is apt for implementing a data lakehouse has been discussed in previous chapters. This section will recap and consolidate the top three reasons for adopting cloud computing for a data lakehouse.

The rapid advancements in cloud computing facilitate data analytics

Recall that in Chapter 1, Introducing the Evolution of Data Analytics Patterns, we discussed the five factors that have caused the perfect data storm. The following figure recaps these five factors:

Figure 8.1 – Ingredients of the perfect data storm

One of the five factors was The Advancement of Cloud Computing. As discussed extensively in Chapter 1, Introducing the Evolution of Data Analytics Patterns, the cloud computing landscape has constantly risen since 2010. Worldwide spending on the public cloud started at around $77 billion in 2010 and reached around $441 billion in 2020. Cloud computing eliminates...

Implementing a data lakehouse on Microsoft Azure

The following figure maps the key cloud services available in Microsoft Azure for each of the seven layers of a data lakehouse:

Figure 8.4 – Azure services that realize data lakehouse layers

Each of these services has a plethora of features. Deep-diving into each of these services is beyond the scope of this book. However, we will explore each of these services in brief. It will also be prudent to note the following for Azure data services:

One service is designed to fulfill multiple functionalities. Here, we will discuss specific features of the most convenient services frequently used to realize a particular component of the data lakehouse architecture.

The data ingestion layer on Microsoft Azure

The first layer is the data ingestion layer. Recall that in Chapter 3, Ingesting and Processing Data in a Data Lakehouse, we covered the architectural considerations of the data ingestion layer. This...

Summary

This chapter gave a flavor of how the concept of the data lakehouse is implemented on a cloud computing platform. We started this chapter by delving into the question of why cloud computing is apt for implementing a data lakehouse. Then, we revisited the factors that propel cloud computing as the most optimal platform for implementing the data lakehouse architecture. The next section of the chapter focused on implementing the data lakehouse architecture on Microsoft Azure. We peeled back layer after layer and discussed the Azure services that you can use to realize each specific component.

We started with the data ingestion layer and discussed services such as Azure Data Factory and Event Hubs that enable batch and stream data ingestion. Next, we moved on to the data processing layer. We explored services such as Azure Databricks, ADF's data flows, Azure Data Explorer, and HDInsight that can be used to process batch and streaming data. Next, we focused on the data lake...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Lakehouse in Action
Published in: Mar 2022Publisher: PacktISBN-13: 9781801815932
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Pradeep Menon

Pradeep Menon is a seasoned data analytics professional with more than 18 years of experience in data and AI. Pradeep can balance business and technical aspects of any engagement and cross-pollinate complex concepts across many industries and scenarios. Currently, Pradeep works as a data and AI strategist at Microsoft. In this role, he is responsible for driving big data and AI adoption for Microsoft’s strategic customers across Asia. Pradeep is also a distinguished speaker and blogger and has given numerous keynotes on cloud technologies, data, and AI.
Read more about Pradeep Menon