You're reading from Learn Microsoft Fabric

Product typeBook

Published inFeb 2024

Reading LevelN/a

PublisherPackt

ISBN-139781835082287

Edition1st Edition

Languages

Tools

Azure Data Factory Power BI

Concepts

Data Analysis

Authors (2):

Arshad Ali

Bradley Schacht

View More author details

Building an End-to-End Analytics System – Lakehouse

Traditionally, for their analytics needs, companies have struggled to manage two different analytics systems: a relational data warehouse to manage and process primarily structured data and a data lake for big data processing (primarily unstructured data). This has not only created data silos and redundancy across multiple systems but has also increased the overall effort to develop and manage the increased total cost of ownership. Microsoft Fabric bridges this gap by unifying different data stores (data warehouses and data lakes) by standardizing data storage using the Delta Lake format in OneLake for lakehouses.

In this chapter, we are going to take an example of a retail organization and build its end-to-end analytics system based on a lakehouse from start to finish—all the way from data ingestion and transformation to reporting and visualization. The key stages are as follows:

Creating a lakehouse using...

Technical requirements

This chapter assumes that you have followed the instructions mentioned in the Getting started with Microsoft Fabric section in the previous chapter to create/enable Fabric in your tenant and have created a Fabric workspace to work in.

The code files for this chapter are available on GitHub: https://github.com/PacktPublishing/Learn-Microsoft-Fabric/tree/main/ch3.

Once you arrive at this link, you can open an individual notebook and then click on the Download raw file icon at the top right of the preview pane to download this individual notebook file.

You can also click on https://github.com/PacktPublishing/Learn-Microsoft-Fabric/ and then click on Download ZIP under the Code button at the top of the middle of the screen to download all notebook files in one go.

Understanding end-to-end scenarios

A lakehouse in Microsoft Fabric is a data storage layer that allows organizations to store and manage virtually any type of data (structured, semi-structured, and unstructured data) in a single location, allowing various tools and frameworks to process and analyze such data as per organizational needs and/or an individual’s preference.

A lakehouse combines the best aspects of a data lake and a data warehouse, removing the data duplicity and friction of ingesting, transforming, and sharing organizational data, all in the open format of Delta Lake. Ingested data flow into the lakehouse by default in the Delta Lake format (https://delta.io/), and tables are automatically discovered and registered in the metastore on behalf of users so that they’re available to seamlessly work with all the engines within Fabric.

A data analytics system based on a lakehouse typically follows Medallion architecture (https://learn.microsoft.com/en-us...

Storage

In this section, you will create three lakehouses (each representing each zone of the Medallion architecture) by following these steps:

When logged into Fabric tenant, select the Workspaces flyout on the left-hand side.
Search for the workspace that you created in Chapter 2, Understanding Different Workloads and Getting Started with Microsoft Fabric, by typing its name in the search textbox at the top and clicking on your workspace to open it. You can also pin it so that it always appears at the top of the list.
From the workload switcher located at the bottom left of the screen, select Data Engineering.
In the Data Engineering experience, select the Lakehouse type of item to create a lakehouse under + New.
Enter wwi_bronze in the Name box and click on Create. The new lakehouse will be created and automatically opened.

Repeat steps 4–5 to create two more lakehouses named wwi_silver and wwi_gold. When you switch to the workspace again, you...

Ingestion

In this section, you will use a Pipeline (Data Factory) to ingest sample data from a source (Azure storage account) to the Files section of the Bronze zone (wwi_bronze) of the Medallion architecture:

Choose the workspace that you created in Chapter 2, Understanding Different Workloads and Getting Started with Microsoft Fabric, from the Workspaces fly out on the left-hand side and open it. Create a Data pipeline from the +New button on the workspace page. If you don’t see an option for Data pipeline, click on the Show All menu item at the bottom and then select Data pipeline under Data Factory.

Figure 3.4 – Creating a new data pipeline

For the New pipeline, specify the name as IngestDataFromSourceToBronze and click on Create. This will create a new data factory pipeline and open its canvas on the screen to work on.
On the newly created data factory pipeline, click on Add pipeline activity to add an activity to...

Transformation

Now that you have already ingested the raw data from the source to the Files section of the wwi_bronze lakehouse, you can take this data and transform and prepare it to create Delta Lake tables in the wwi_silver lakehouse as a next step.

Importing notebooks

The first step is to import notebooks using the following steps:

Download the notebooks found in the ch3 folder of this chapter’s GitHub repo (https://github.com/PacktPublishing/Learn-Microsoft-Fabric/tree/main/ch3) to your local machine. If required, unzip or uncompress them.
From the workload switcher located at the bottom left of the screen, select Data engineering. Select Import notebook from the New section at the top of the landing page of the Data Engineering experience.

Figure 3.12 – The option to import notebooks

Select Upload from the Import status pane that opens on the right-hand side of the screen. Select all three notebooks that were...

Analyze

Now that we have data integrated into the lakehouse and have prepared them for reporting, we’ll analyze these data to get insights. We will look at two methods: first, we will use Power BI to create visualizations (reports and dashboards), and then we will use SQL endpoint to connect the lakehouse for running analytical queries.

Power BI

Power BI is natively integrated within the whole Fabric experience; this native integration brings a unique mode of accessing the data (called Direct Lake, which we discussed in earlier chapters) from the lakehouse to provide the most performant query and reporting experience. Let’s create a report based on the data from the Gold zone:

Open the wwi_gold lakehouse and click on SQL endpoint under mode selection on the top right of the screen to switch to SQL endpoint mode for the selected lakehouse.

Figure 3.23 – Switching to SQL endpoint mode

Once you are in SQL endpoint...

Orchestrate data ingestion and transformation flow and schedule notebooks and pipelines

Fabric provides flexibility in how you schedule your jobs. For example, you can schedule a notebook by clicking on the settings (cogwheel) icon at the top under the Home menu tab when the notebook is open or by clicking on the ellipsis (…) next to the name of the notebook in the workspace item view and then clicking on the Setting menu.

On the Setting page, click on the Schedule tab and define the schedule for this notebook to be executed.

Figure 3.36 – Schedule a notebook

Furthermore, if you have multiple notebooks/jobs, some of which you would like to be executed in parallel while others in sequence, then you can create a data pipeline and define a schedule for when and how frequently this pipeline should be executed. Figure 3.37 shows an example pipeline that has three activities being executed in sequence (this is just one example; you might have...

Data meshes in Fabric – a primer

A data mesh is a federated data architecture that emphasizes decentralizing data across business functions or domains such as marketing, sales, human resources, and more. It facilitates organizing and managing data in a logical way to facilitate the more targeted and efficient use and governance of the data across organizations. This provides more ownership to the producers of a given dataset by encouraging a shift away from the giant, monolithic enterprise-wide data architecture.

Important note

The term data mesh was coined by Zhamak Dehghani (https://martinfowler.com/articles/data-mesh-principles.html) and is founded on four principles: “domain-driven ownership of data”, “data as a product”, “self-serve data infrastructure platform”, and “federated governance”. A detailed discussion about data meshes is out of the scope of this chapter; however, you can learn more about it at https...

Summary

Since a lakehouse based on Medallion architecture combines the best of data lakes and data warehouses by breaking silos and removing data duplicity, it’s becoming more popular as the de facto standard for building data platform architecture. Microsoft Fabric, with its native capabilities, makes it easy to build data analytics systems based on lakehouses.

In this chapter, you learned about creating an end-to-end lakehouse-based data analytics system. You learned about the different components in this architecture pattern and how to implement them quickly to derive business values. Further, you learned about ingesting data from a data source to your lakehouse using pipelines, transforming this data with notebooks/Spark, and then using Power BI—with its new Direct Lake mode—to create reports and dashboards. You also learned about the capabilities that Fabric provides to build a decentralized data architecture with data meshes.

In the next chapter, you...

The rest of the chapter is locked

You have been reading a chapter from

Learn Microsoft Fabric

Published in: Feb 2024Publisher: PacktISBN-13: 9781835082287

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Arshad Ali

Arshad Ali is a principal product manager at Microsoft, working on the Microsoft Fabric product team in Redmond, WA. He focuses on Spark Runtime, which empowers both data engineering and data science experiences. In his previous role, he helped strategic customers and partners adopt Azure Synapse and Microsoft Fabric. Arshad has more than 20 years of industry experience and has been with Microsoft for over 16 years. He is the co-author of the book Big Data Analytics with Azure HDInsight and the author of over 200 technical articles and blogs on data and analytics. Arshad holds an MBA from the Foster School of Business at the University of Washington and an MCA from India.
Read more about Arshad Ali

Bradley Schacht

Bradley Schacht is a principal program manager on the Microsoft Fabric product team based in Saint Augustine, Florida. Bradley is a former consultant and trainer and has co-authored five books on SQL Server and Power BI. As a member of the Microsoft Fabric product team, Bradley works directly with customers to solve some of their most complex data problems and helps shape the future of Microsoft Fabric. Bradley gives back to the community by speaking at events, such as the PASS Summit, SQL Saturday, Code Camp, and user groups across the country, including locally at the Jacksonville SQL Server User Group (JSSUG). He is a contributor on SQLServerCentral and blogs on his personal site, BradleySchacht.
Read more about Bradley Schacht

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages