Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Learn Microsoft Fabric

You're reading from  Learn Microsoft Fabric

Product type Book
Published in Feb 2024
Publisher Packt
ISBN-13 9781835082287
Pages 338 pages
Edition 1st Edition
Languages
Authors (2):
Arshad Ali Arshad Ali
Profile icon Arshad Ali
Bradley Schacht Bradley Schacht
Profile icon Bradley Schacht
View More author details

Table of Contents (19) Chapters

Preface Part 1: An Introduction to Microsoft Fabric
Chapter 1: Overview of Microsoft Fabric and Understanding Its Different Concepts Chapter 2: Understanding Different Workloads and Getting Started with Microsoft Fabric Part 2: Building End-to-End Analytics Systems
Chapter 3: Building an End-to-End Analytics System – Lakehouse Chapter 4: Building an End-to-End Analytics System – Data Warehouse Chapter 5: Building an End-to-End Analytics System – Real-Time Analytics Chapter 6: Building an End-to-End Analytics System – Data Science Part 3: Administration and Monitoring
Chapter 7: Monitoring Overview and Monitoring Different Workloads Chapter 8: Administering Fabric Part 4: Security and Developer Experience
Chapter 9: Security and Governance Overview Chapter 10: Continuous Integration and Continuous Deployment (CI/CD) Part 5: AI Assistance with Copilot Integration
Chapter 11: Overview of AI Assistance and Copilot Integration Index Other Books You May Enjoy

Building an End-to-End Analytics System – Data Warehouse

While the world of data and analytics makes a shift to the lake-centric approach introduced in the previous chapter, there are still many scenarios that are well served by a traditional data warehouse. Teams of developers often have deep, longstanding SQL skills that can be easily applied to the development of a data warehouse. Other times, an enterprise data lake or lakehouse could feed data into downstream data warehouses that are purpose-built by various business units (BUs) or departments that serve as the gold layer in a medallion architecture. In some smaller organizations, there just isn’t the need for the added complexity of a lakehouse when data is being centralized from all or mostly relational data sources.

In this chapter, you will explore the core functionality of the Fabric data warehouse by building an end-to-end solution that includes ingestion, transformation, and reporting. Many of the concepts...

Understanding the end-to-end scenario

A warehouse in Microsoft Fabric is a relational data store that organizes data into schemas and tables. There are a few different ways for developers to interact with the data warehouse, which are outlined in Table 4.1. It is important to note that Spark-based operations against the data warehouse, even those leveraging Spark SQL, are read-only, while T-SQL commands issued from the SQL endpoint are read/write:

Creating a data warehouse

In this section, you will create a data warehouse that will store the data used throughout this chapter. To get started, follow these steps:

  1. Open your web browser, navigate to Microsoft Fabric (https://app.fabric.microsoft.com), and sign in.
  2. From the list of experiences, select Synapse Data Warehouse:
Figure 4.2 – Microsoft Fabric landing page and experience list

Figure 4.2 – Microsoft Fabric landing page and experience list

  1. Select Workspaces from the left-hand navigation bar. Go to the workspace called Learn Microsoft Fabric, created in Chapter 2, Understanding Different Workloads and Getting Started with Microsoft Fabric, by typing its name in the search textbox at the top and clicking on your workspace to open it. You can also pin it so that it always appears on top of the list.
  2. Select New | Warehouse in the top-left corner of the screen:
Figure 4.3 – Menu of new items found on the Fabric workspace view

Figure 4.3 – Menu of new items found on the Fabric workspace view

  1. On the...

Loading data

As previously discussed, there are several ways to load data into a warehouse. In this section, we will explore two methods: the Data Factory copy activity and T-SQL. The loading method you choose will largely depend on two factors: developer skillset and data source connectivity.

Data warehouse professionals are very likely to be proficient in writing T-SQL and may, therefore, gravitate toward a code-based approach that uses the COPY command. While analysts may not have as deep of a T-SQL skillset, they may opt for a GUI-based approach with Data Factory. While an individual may want to use a code-based approach, the fact that T-SQL requires the data to come from a narrow set of locations could mean the copy activity is a better fit. There are other factors to consider, such as enterprise standards and ETL frameworks, that may inform a decision on the tool used.

Note

In a real-world scenario, it is unlikely that you would load data from the same source using different...

Data transformation using T-SQL

With the data now loaded, it is time to transform the data into an aggregate reporting table. In this section, you will create and execute a stored procedure that will create an aggregate table that will be used later in this chapter as the base for a Power BI report.

Let’s look at transforming data using T-SQL:

  1. While still in the query editor of the WideWorldImportersDW data warehouse, click on New SQL query from the ribbon.
  2. In the query window, enter the following code:

Note

The following code should not be used as an example of best practices. In a real-world scenario, loading a data warehouse will involve generating a surrogate key (warehouse key), updating dimension data based on type 1 and type 2 attributes, handling NULL dimension attributes, looking up surrogate key values when loading a fact table, and more. The following code represents a simplified approach to loading dimension and fact tables to show the general...

Orchestrating ETL operations with Data Factory pipelines

Data Factory pipelines are an extremely useful tool in analytics projects. Earlier in this chapter, you created a data pipeline that runs a Copy data activity to load the stage.DimCity table. This data was later used in a stored procedure to load a dimensional model table called dbo.DimCity. It is now time to extend the pipeline to orchestrate the entire ETL process, which will include the following:

  1. Dropping and recreating all stage schema tables so that data is not duplicated from prior runs
  2. Loading the stage.DimCity table using the Copy data activity
  3. Loading the stage.DimDate and stage.FactSale tables using the T-SQL COPY command by executing a stored procedure
  4. Incrementally loading the dimensional model using a stored procedure

Let’s extend the pipeline created earlier in the chapter:

  1. Return to the pipeline created in the Loading data section earlier in the chapter by navigating to...

Analyzing data with Power BI

The value derived from data is only truly realized when it can be used to make business decisions. To this point, we have explored creating a warehouse, loading tables, and transforming data within a Fabric data warehouse. Now, it is time to refine the data model for reporting. To begin, we will define relationships between the tables in the data model, followed by creating visuals in a Power BI report.

Let’s look at setting up a Power BI report:

  1. Return to the WWI data warehouse by navigating to your workspace and then selecting WideWorldImportersDW with a type of Warehouse from the list of workspace items.
  2. In the bottom-left corner of the screen, navigate to the Model view:
Figure 4.28 – Options for different views available in the data warehouse

Figure 4.28 – Options for different views available in the data warehouse

  1. Create a relationship between the dbo.FactSale and dbo.DimDate tables by dragging the InvoiceDateKey field from dbo.FactSale and dropping it...

Summary

As lakehouse implementations grow in popularity, data warehouses still play a key role in data-driven organizations. Many of the patterns that were explored in the prior lakehouse chapter can also be applied to a data warehouse. The major difference comes in the developer skillset. Lakehouses are Spark-centric, while data warehouses are T-SQL-centric. For the foreseeable future, these two items will live and work in tandem. Fabric provides a seamless experience to combine lakehouses and warehouses through OneCopy, cross-database querying, and standardizing on the Delta format.

In this chapter, you learned how to build out an end-to-end data warehouse analytics system. You learned how to create a data warehouse and a code-first and no-code approach for loading warehouse tables, how to transform data using T-SQL, and how to curate a data model that can then be used in a Power BI report. These are by no means the full extent of the data warehouse functionality but instead represent...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Learn Microsoft Fabric
Published in: Feb 2024 Publisher: Packt ISBN-13: 9781835082287
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}

Read-capable on warehouse

Write-capable on warehouse

T-SQL

Yes

Yes

Table access via lakehouse shortcut

Yes

No

Power BI

Yes

No

Data Factory pipelines

Yes

...