Reader small image

You're reading from  Expert Data Modeling with Power BI - Second Edition

Product typeBook
Published inApr 2023
PublisherPackt
ISBN-139781803246246
Edition2nd Edition
Right arrow
Author (1)
Soheil Bakhshi
Soheil Bakhshi
author image
Soheil Bakhshi

Soheil Bakhshi is the founder of Data Vizioner and is a sought after BI consultant. Working in data and analytics for more than 20 years, Soheil's experience lies in Microsoft BI, Data Warehousing, and Power BI platform. He possesses MSCE, MCSA certifications and is a Microsoft MVP (Most Valuable Professional). He has a passion for sharing knowledge via his website and speaking at conferences and Power BI community events locally and globally. In following his desire for simplicity and efficiency, he is behind Power BI community tools and commercial products such as Power BI Exporter and Power BI Documenter.
Read more about Soheil Bakhshi

Right arrow

Star Schema Preparation in Power Query Editor

In the previous chapter, we learned about some common data preparation steps, including data type conversions, split columns, merge columns, adding a custom column, and filtering rows. We also learned how to create summary tables using the Group By feature, appending data, merging queries, and pivoting tables.

This chapter uses many topics discussed in the past few chapters and helps you learn how to prepare a Star Schema in Power Query Editor. Data modeling in Power BI starts with preparing a Star Schema. In this chapter, we use the Chapter 6, Sales Data.xlsx file, which contains flattened data. You can download the file from here: https://github.com/PacktPublishing/Expert-Data-Modeling-with-Power-BI-Second-Edition/blob/28e2af1762336ab5236a3b3961c41e9020de8200/Samples/Chapter%2006/Chapter%206,%20Sales%20Data.xlsx.

A common daily scenario is that we get a set of files containing data exported from a source system, and we need to...

Identifying dimensions and facts

When talking about a Star Schema, we automatically discuss dimensions and facts. In a Star Schema model, we usually keep all the numeric values in fact tables and put all the descriptive data in the dimension tables. But not all numeric values fall into fact tables. A typical example is Product Unit Price. If we need to do some calculations regarding the Product Unit Price, it is likely to be a part of our fact table; but if it filters or groups the data, it is a part of a dimension.

Designing a data model in a Star Schema is not possible unless we have a concrete understanding of the business requirements and a good level of understanding of the data.

Understanding business requirements

In this section, we will try to find the dimensions and facts based on the business requirements. We also discuss whether to create separate tables for them or not and why. In the following sections, we look at this in more detail and implement the required...

Creating Dimension tables

We should already be connected to the Chapter 6, Sales Data.xlsx file from Power Query Editor. In this section, we look at the necessity of creating the potential dimensions identified in the previous section. We first evaluate each dimension from a business requirement perspective. If we are convinced that the dimension is required, we create it.

Geography

The identified business requirements show that we must have a dimension keeping geographical data. When we look at the data, we can see geography-related columns in the Sales table. We can create a separate Geography dimension derived from the Sales table. However, this might not cover all business requirements.

As the following image shows, there are some geography-related columns in the Customer table. We must find commonalities in the data and think about the possibility of combining the data from both tables into a single Geography dimension. Using Column Distribution shows that the CustomerKey...

Creating fact tables

Now that we have created all the dimensions, it is time to create a fact table that contains numeric values and the primary keys of the dimensions as foreign keys. The Sales Base and Customer Base data show that the Sales Base table holds many numeric values. Therefore, a fact table can be derived from the Sales Base table, which we name Sales, by following these steps:

  1. Reference the Sales Base table and then rename the new table Sales.

We want to get ProductKey from the Product table. To do so, we can merge the Sales table with the Product table.

  1. Click Merge Queries.
  2. Select the ProductCategory, ProductSubcategory, and Product columns, respectively.
  3. Select the Product table from the drop-down list.
  4. Again, select the ProductCategory, ProductSubcategory, and Product columns.
  5. Select Left outer (all from first, matching from second) from the Join Kind drop-down list.
  6. Click OK:

Figure 6.37:...

Summary

In this chapter, we prepared the data in a Star Schema, which has been optimized for data analysis and reporting purposes on top of a flat data structure. We identified potential dimensions and discussed the reasons for creating or not creating separate dimension tables. We then went through the transformation steps to create the justified dimension tables. Finally, we added all the dimension key columns to the fact table and removed all the unnecessary columns, which gave us a tidy fact table that only contained all the necessary columns.

The next chapter covers an exciting and rather important topic: Data preparation common best practices. By following these best practices, we can avoid a lot of reworks and maintenance costs.

Join us on Discord!

Join The Big Data and Analytics Community on the Packt Discord Server!

Hang out with 558 other members and enjoy free voice and text chat.

https://packt.link/ips2H

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Expert Data Modeling with Power BI - Second Edition
Published in: Apr 2023Publisher: PacktISBN-13: 9781803246246
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Soheil Bakhshi

Soheil Bakhshi is the founder of Data Vizioner and is a sought after BI consultant. Working in data and analytics for more than 20 years, Soheil's experience lies in Microsoft BI, Data Warehousing, and Power BI platform. He possesses MSCE, MCSA certifications and is a Microsoft MVP (Most Valuable Professional). He has a passion for sharing knowledge via his website and speaking at conferences and Power BI community events locally and globally. In following his desire for simplicity and efficiency, he is behind Power BI community tools and commercial products such as Power BI Exporter and Power BI Documenter.
Read more about Soheil Bakhshi