You're reading from Learn Microsoft Fabric

Product type Book

Published in Feb 2024

Publisher Packt

ISBN-13 9781835082287

Pages 338 pages

Edition 1st Edition

Languages

Concepts

Data Analysis

Authors (2):

Arshad Ali

Bradley Schacht

View More author details

Table of Contents (19) Chapters

Preface

Part 1: An Introduction to Microsoft Fabric

Chapter 1: Overview of Microsoft Fabric and Understanding Its Different Concepts

Chapter 2: Understanding Different Workloads and Getting Started with Microsoft Fabric

Part 2: Building End-to-End Analytics Systems

Chapter 3: Building an End-to-End Analytics System – Lakehouse

Chapter 4: Building an End-to-End Analytics System – Data Warehouse

Chapter 5: Building an End-to-End Analytics System – Real-Time Analytics

Chapter 6: Building an End-to-End Analytics System – Data Science

Part 3: Administration and Monitoring

Chapter 7: Monitoring Overview and Monitoring Different Workloads

Chapter 8: Administering Fabric

Part 4: Security and Developer Experience

Chapter 9: Security and Governance Overview

Chapter 10: Continuous Integration and Continuous Deployment (CI/CD)

Part 5: AI Assistance with Copilot Integration

Chapter 11: Overview of AI Assistance and Copilot Integration

Index

Why subscribe?

Other Books You May Enjoy

Building an End-to-End Analytics System – Data Warehouse

While the world of data and analytics makes a shift to the lake-centric approach introduced in the previous chapter, there are still many scenarios that are well served by a traditional data warehouse. Teams of developers often have deep, longstanding SQL skills that can be easily applied to the development of a data warehouse. Other times, an enterprise data lake or lakehouse could feed data into downstream data warehouses that are purpose-built by various business units (BUs) or departments that serve as the gold layer in a medallion architecture. In some smaller organizations, there just isn’t the need for the added complexity of a lakehouse when data is being centralized from all or mostly relational data sources.

In this chapter, you will explore the core functionality of the Fabric data warehouse by building an end-to-end solution that includes ingestion, transformation, and reporting. Many of the concepts...

Understanding the end-to-end scenario

A warehouse in Microsoft Fabric is a relational data store that organizes data into schemas and tables. There are a few different ways for developers to interact with the data warehouse, which are outlined in Table 4.1. It is important to note that Spark-based operations against the data warehouse, even those leveraging Spark SQL, are read-only, while T-SQL commands issued from the SQL endpoint are read/write:

Creating a data warehouse

In this section, you will create a data warehouse that will store the data used throughout this chapter. To get started, follow these steps:

Open your web browser, navigate to Microsoft Fabric (https://app.fabric.microsoft.com), and sign in.
From the list of experiences, select Synapse Data Warehouse:

Figure 4.2 – Microsoft Fabric landing page and experience list

Select Workspaces from the left-hand navigation bar. Go to the workspace called Learn Microsoft Fabric, created in Chapter 2, Understanding Different Workloads and Getting Started with Microsoft Fabric, by typing its name in the search textbox at the top and clicking on your workspace to open it. You can also pin it so that it always appears on top of the list.
Select New | Warehouse in the top-left corner of the screen:

Figure 4.3 – Menu of new items found on the Fabric workspace view

On the...

Loading data

As previously discussed, there are several ways to load data into a warehouse. In this section, we will explore two methods: the Data Factory copy activity and T-SQL. The loading method you choose will largely depend on two factors: developer skillset and data source connectivity.

Data warehouse professionals are very likely to be proficient in writing T-SQL and may, therefore, gravitate toward a code-based approach that uses the COPY command. While analysts may not have as deep of a T-SQL skillset, they may opt for a GUI-based approach with Data Factory. While an individual may want to use a code-based approach, the fact that T-SQL requires the data to come from a narrow set of locations could mean the copy activity is a better fit. There are other factors to consider, such as enterprise standards and ETL frameworks, that may inform a decision on the tool used.

Note

In a real-world scenario, it is unlikely that you would load data from the same source using different...

Data transformation using T-SQL

With the data now loaded, it is time to transform the data into an aggregate reporting table. In this section, you will create and execute a stored procedure that will create an aggregate table that will be used later in this chapter as the base for a Power BI report.

Let’s look at transforming data using T-SQL:

While still in the query editor of the WideWorldImportersDW data warehouse, click on New SQL query from the ribbon.
In the query window, enter the following code:

Note

The following code should not be used as an example of best practices. In a real-world scenario, loading a data warehouse will involve generating a surrogate key (warehouse key), updating dimension data based on type 1 and type 2 attributes, handling NULL dimension attributes, looking up surrogate key values when loading a fact table, and more. The following code represents a simplified approach to loading dimension and fact tables to show the general...

Orchestrating ETL operations with Data Factory pipelines

Data Factory pipelines are an extremely useful tool in analytics projects. Earlier in this chapter, you created a data pipeline that runs a Copy data activity to load the stage.DimCity table. This data was later used in a stored procedure to load a dimensional model table called dbo.DimCity. It is now time to extend the pipeline to orchestrate the entire ETL process, which will include the following:

Dropping and recreating all stage schema tables so that data is not duplicated from prior runs
Loading the stage.DimCity table using the Copy data activity
Loading the stage.DimDate and stage.FactSale tables using the T-SQL COPY command by executing a stored procedure
Incrementally loading the dimensional model using a stored procedure

Let’s extend the pipeline created earlier in the chapter:

Return to the pipeline created in the Loading data section earlier in the chapter by navigating to...

Analyzing data with Power BI

The value derived from data is only truly realized when it can be used to make business decisions. To this point, we have explored creating a warehouse, loading tables, and transforming data within a Fabric data warehouse. Now, it is time to refine the data model for reporting. To begin, we will define relationships between the tables in the data model, followed by creating visuals in a Power BI report.

Let’s look at setting up a Power BI report:

Return to the WWI data warehouse by navigating to your workspace and then selecting WideWorldImportersDW with a type of Warehouse from the list of workspace items.
In the bottom-left corner of the screen, navigate to the Model view:

Figure 4.28 – Options for different views available in the data warehouse

Create a relationship between the dbo.FactSale and dbo.DimDate tables by dragging the InvoiceDateKey field from dbo.FactSale and dropping it...

Summary

As lakehouse implementations grow in popularity, data warehouses still play a key role in data-driven organizations. Many of the patterns that were explored in the prior lakehouse chapter can also be applied to a data warehouse. The major difference comes in the developer skillset. Lakehouses are Spark-centric, while data warehouses are T-SQL-centric. For the foreseeable future, these two items will live and work in tandem. Fabric provides a seamless experience to combine lakehouses and warehouses through OneCopy, cross-database querying, and standardizing on the Delta format.

In this chapter, you learned how to build out an end-to-end data warehouse analytics system. You learned how to create a data warehouse and a code-first and no-code approach for loading warehouse tables, how to transform data using T-SQL, and how to curate a data model that can then be used in a Power BI report. These are by no means the full extent of the data warehouse functionality but instead represent...

The rest of the chapter is locked

You have been reading a chapter from

Learn Microsoft Fabric

Published in: Feb 2024 Publisher: Packt ISBN-13: 9781835082287

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime}

Authors (2)

Arshad Ali

Arshad Ali is a principal product manager at Microsoft, working on the Microsoft Fabric product team in Redmond, WA. He focuses on Spark Runtime, which empowers both data engineering and data science experiences. In his previous role, he helped strategic customers and partners adopt Azure Synapse and Microsoft Fabric. Arshad has more than 20 years of industry experience and has been with Microsoft for over 16 years. He is the co-author of the book Big Data Analytics with Azure HDInsight and the author of over 200 technical articles and blogs on data and analytics. Arshad holds an MBA from the Foster School of Business at the University of Washington and an MCA from India.

See other products by Arshad Ali

Bradley Schacht

Bradley Schacht is a principal program manager on the Microsoft Fabric product team based in Saint Augustine, Florida. Bradley is a former consultant and trainer and has co-authored five books on SQL Server and Power BI. As a member of the Microsoft Fabric product team, Bradley works directly with customers to solve some of their most complex data problems and helps shape the future of Microsoft Fabric. Bradley gives back to the community by speaking at events, such as the PASS Summit, SQL Saturday, Code Camp, and user groups across the country, including locally at the Jacksonville SQL Server User Group (JSSUG). He is a contributor on SQLServerCentral and blogs on his personal site, BradleySchacht.

See other products by Bradley Schacht

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

Aug 2023 7 hours 40 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

Aug 2023 22 hours 48 minutes

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

Sep 2023 8 hours 36 minutes

Building AI Applications with ChatGPT APIs

Sep 2023 8 hours 36 minutes

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Oct 2023 21 hours 12 minutes

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

Aug 2023 14 hours 0 minutes

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

Dec 2023 8 hours 0 minutes

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

Nov 2023 22 hours 8 minutes

	Read-capable on warehouse	Write-capable on warehouse
T-SQL	Yes	Yes
Table access via lakehouse shortcut	Yes	No
Power BI	Yes	No
Data Factory pipelines	Yes	...