Expert Data Modeling with Power BI

By Soheil Bakhshi
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Chapter 1: Introduction to Data Modeling in Power BI

About this book

Microsoft Power BI is one of the most popular business intelligence tools available on the market for desktop and the cloud. This book will be your guide to understanding the ins and outs of data modeling and how to create data models using Power BI confidently. You'll learn how to connect data from multiple sources, understand data, define and manage relationships between data, and shape data models.

In this book, you'll explore how to use data modeling and navigation techniques to define relationships and create a data model before defining new metrics and performing custom calculations using modeling features. As you advance through the chapters, the book will demonstrate how to create full-fledged data models, enabling you to create efficient data models and simpler DAX code with new data modeling features. With the help of examples, you'll discover how you can solve business challenges by building optimal data models and changing your existing data models to meet evolving business requirements. Finally, you'll learn how to use some new and advanced modeling features to enhance your data models to carry out a wide variety of complex tasks.

By the end of this Power BI book, you'll have gained the skills you need to structure data coming from multiple sources in different ways to create optimized data models that support reporting and data analytics.

Publication date:
June 2021
Publisher
Packt
Pages
612
ISBN
9781800205697

 

Chapter 1: Introduction to Data Modeling in Power BI

Power BI is not just a reporting tool that someone uses to build sophisticated reports; it is a platform supplying a wide range of features from data preparation to data modeling and data visualization. It is also a very well-designed ecosystem, giving a variety of users the ability to contribute to their organization's data analysis journey in many ways, from sharing datasets, reports, and dashboards to using their mobile phones to add some comments to a report, ask questions, and circulate it back to relevant people. All of this is only possible if we take the correct steps in building our Power BI ecosystem. A very eye-catching and beautiful report is worth nothing if it shows incorrect business figures or if the report is too slow to render so the user does not really have the appetite to use it.

One of the most important aspects of building a good Power BI ecosystem is getting the data right. In real-world scenarios, you normally get data from various data sources. Getting data from the data sources and mashing it up is just the beginning. Then you need to come up with a well-designed data model that guarantees you always represent the right figures supporting the business logic so the report performs well.

In this chapter, we'll start by learning about the different Power BI layers and how data flows between the different layers to be able to fix any potential issues more efficiently. Then, we'll study one of the most important aspects of Power BI implementation, that is, data modeling. You'll learn more about data modeling limitations and availabilities under different Power BI licensing plans. Finally, we'll discuss the iterative data modeling approach and its different phases.

In this chapter, we'll cover the following main sections:

  • Power BI Desktop layers
  • What data modeling means in Power BI
  • Power BI licensing considerations for data modeling
  • The iterative data modeling approach
 

Understanding the Power BI layers

As stated before, Power BI is not just a reporting tool. As the focus of this book is data modeling, we would rather not explain a lot about the tool itself, but there are some concepts that should be pointed out. When we talk about data modeling in Power BI, we are indeed referring to Power BI Desktop as our development tool. You can think of Power BI Desktop like Visual Studio when developing an SQL Server Analysis Services (SSAS) Tabular model. Power BI Desktop is a free tool offering from Microsoft that can be downloaded from https://powerbi.microsoft.com/en-us/downloads/. So, in this book, we're referring to Power BI Desktop when we say Power BI unless stated otherwise.

The following illustration shows a very simple process we normally go through while building a report in Power BI Desktop:

Figure 1.1 – Building a new report process in Power BI

Figure 1.1 – Building a new report process in Power BI

To go through the preceding processes, we use different conceptual layers of Power BI. You can see those layers in Power BI Desktop as follows:

Figure 1.2 – Power BI layers

Figure 1.2 – Power BI layers

Download the Microsoft Contoso Sales sample for Power BI Desktop from https://www.microsoft.com/en-us/download/confirmation.aspx?id=46801.

Let's discuss each point in detail:

  • The Power Query (data preparation) layer
  • The data model layer
  • The data visualization layer

The data preparation layer (Power Query)

In this layer, you get data from various data sources, transform and cleanse that data, and make it available for other layers. This is the very first layer that touches your data, so it is a very important part of your data journey in Power BI. In the Power Query layer, you decide which queries load data into your data model and which ones will take care of data transformation and data cleansing without loading the data into the data model:

Figure 1.3 – Power Query

Figure 1.3 – Power Query

The data model layer

This layer has two views, the Data view and the Model view. In the Data view, you can see the data, and in the Model view, you can see the data models.

The Data view

After we are done with our data preparation in the Power Query layer, we load the data into the data model layer. Using the Data view, we can see the underlying data in our data model layer after it has been transformed in the data preparation layer. Depending on the connection mode, this view may or may not be accessible. While we can see the output of the data preparation, in this view we also take some other actions, such as creating analytical objects such as calculated tables, calculated columns, and measures, or copying data from tables.

Note

All objects we create in DAX are a part of our data model.

The following screenshot shows the Data view in Power BI Desktop when the storage mode of the table is set to Import:

Figure 1.4 – Data view; storage mode: Import

Figure 1.4 – Data view; storage mode: Import

The Data view tab does not show the underlying data if the table only shows the data when the storage mode is set to Import. If the storage mode is set to DirectQuery, the data will not be shown in the Data view:

Figure 1.5 – Data view; storage mode: DirectQuery

Figure 1.5 – Data view; storage mode: DirectQuery

The Model view

As its names implies, the Model view is where we stitch all the pieces together. Not only can we visually see how the tables are related in the model section, but also, we can create new relationships, format fields and synonyms, show/hide fields, and so on:

Figure 1.6 – Model view

Figure 1.6 – Model view

The data visualization layer

In this layer, we bring the data to life by making meaningful and professional-looking data visualizations. This layer is accessible from the Report view, which is the default view in Power BI Desktop.

The Report view

In the Report view, we can build storytelling visualizations to help businesses make data-driven decisions on top of their data. For more convenience, we also create analytical calculations with DAX, such as calculated tables, calculated columns, and measures from the Fields pane in the Report view, but this doesn't mean those calculation objects are a part of the data visualization layer. Indeed, those calculations are a part of the data model layer:

Figure 1.7 – The Report view

Figure 1.7 – The Report view

Download the Sales & Returns sample.pbix file from https://docs.microsoft.com/en-us/power-bi/create-reports/sample-datasets#sales--returns-sample-pbix-file.

How data flows in Power BI

Understanding how data flows during its journey in Power BI is important from a maintenance perspective. For instance, when you see an issue with some calculations in a report, you'll know how to do a root cause analysis and trace the issue back to an actionable point. So, if you find an issue with a figure in a line chart and that line chart is using a measure that is dependent on a calculated column, you quickly know that you won't find that calculated column in Power Query as the objects created in the data model are not accessible in Power Query. So, in that sense, you will never look for a measure in the Power Query layer or vice versa, as you do not expect to be able to use user-defined functions in the data model layer. We will discuss custom functions in Chapter 3, Data Preparation in Power Query Editor, Custom Functions:

Figure 1.8 – The flow of data in Power BI

Figure 1.8 – The flow of data in Power BI

To understand this better, let's go through a scenario.

In a Power BI report, the developer has defined a query parameter. The parameter has a list of capital letters, E, O, and P. There is also a Product query in Power Query holding descriptive information about the product. The Product Name column is filtered by the parameters list. So, when the developer selects E from the parameter, the Product query filters the results showing only the products whose name starts with E.

You put a table visual on the report canvas with the Product Name column. Can you add a slicer to the report canvas showing the parameters' values so that the end user changes the values in the slicer and can see the changes in the table visual?

This is a real-world question you may get from time to time from Power BI developers. To answer the question, you need to think about Power BI layers. Let's do some analysis:

  • Query parameters are defined in the data preparation layer in Power Query.
  • Filtering a query is also a transformation step in Power Query, which changes the result sets of the query. Therefore, when we import the data into the data model, the result sets will not change unless we go back to Power Query and change the parameters' values, which consequently changes the result sets of the Product query and imports the new result sets to the data model.
  • By default, query parameters' values are not loaded into the data model unless the developer sets Enable load. Setting Enable load only loads the selected values from the parameters list and not the whole list.
  • A slicer is a visual. So, now we are talking about the data visualization layer. This means the slicer can only get values available in the data model.

So, the answer is no. After importing the result sets of a query to the data model, that data will be accessible to the data visualization layer.

 

What data modeling means in Power BI

Data modeling is undoubtedly one of the most important parts of Power BI development. The purpose of data modeling in Power BI is different from data models in transactional systems. In a transactional system, the goal is to have a model that is optimized for recording transactional data. Nevertheless, a well-designed data model in Power BI must be optimized for querying the data and reducing the dataset size by aggregating that data.

While not everyone has the luxury of having a prebuilt data warehouse, you end up creating a data model in Power BI in many cases. It is very tempting to get the whole data as is from various data sources and import it to Power BI. Answering business questions can quickly translate to complex queries that take a long time to process, which is not ideal. So, it is highly advised to resist the temptation to import everything from the data sources into Power BI and solve the problems later. Instead, it is wise to try to get your data model right so that it is capable of precisely answering business-driven questions in the most performant way. When modeling data in Power BI, you need to build a data model based on the business logic. Having said that, you may need to join different tables and aggregate your data to a certain level that answers all business-driven questions. It can get worse if you have data from various data sources of different grains representing the same logic.

Therefore, you need to transform and reshape that data before loading it to the data model in Power BI. Power Query is the right tool to take care of all that. After we cut all the noise from our data, we have a clean, easy-to-understand, and easy-to-work-with data model.

Semantic model

Power BI inherits its blood from Power Pivot and SSAS Tabular models. All of them use the xVelocity engine, an updated version of the VertiPaq engine designed for in-memory data analysis and consisting of semantic model objects such as tables, relationships, hierarchies, and measures, which are stored in memory, leveraging column store indexing. All of this means that you would expect to get tremendous performance gains over highly compressed data, right? Well, it depends. If you efficiently transformed the data to support business logic, then you can expect to have fast and responsive reports. After you import your data into the data model in Power BI, you have built a semantic model when it comes to the concepts. A semantic model is a unified data model that provides business contexts to data. The semantic model can be accessed from various tools to visualize data without needing it to be transformed again. In that sense, when you publish your reports to the Power BI service, you can analyze the dataset in Excel or use third-party tools such as Tableau to connect to a Power BI dataset backed by a Premium capacity and visualize your data.

Building an efficient data model in Power BI

An efficient data model can quickly answer all business questions that you are supposed to answer; it is easy to understand and easy to maintain. Let's analyze the preceding sentence. Your model must do the following:

  • Perform well (quickly)
  • Be business-driven
  • Decrease the level of complexity (be easy to understand)
  • Be maintainable with low costs

Let's look at the preceding points with a scenario.

You are tasked to create a report on top of three separate data sources, as follows:

  • An OData data source with 15 tables. The tables have between 50 and 250 columns.
  • An Excel file with 20 sheets that are interdependent with many formulas.
  • A data warehouse hosted in SQL Server. You need to get data from five dimensions and two fact tables:

    - Of those five dimensions, one is a Date dimension and the other is a Time dimension. The grain of the Time dimension is hour, minute.

    - Each of the fact tables has between 50 and 200 million rows. The grain of both fact tables from a date and time perspective is day, hour, minute.

    - Your organization has a Power BI Pro license.

There are already a lot of important points in the preceding scenario that you must consider before starting to get the data from the data sources. There are also a lot of points that are not clear at this stage. I have pointed out some of them:

  • OData: OData is an online data source, so it could be slow to load the data from the source system.

    The tables are very wide so it can potentially impact the performance.

    Our organization has a Power BI Pro license, so we are limited to 1 GB file size.

    The following questions should be answered by the business. Without knowing the answers, we may end up building a report that is unnecessarily large with poor performance. This can potentially lead to the customer's dissatisfaction:

    (a) Do we really need to import all the columns from those 15 tables?

    (b) Do we also need to import all data or is just a portion of the data enough? In other words, if there is 10-years worth of data kept in the source system, does the business need to analyze all the data, or does just 1 or 2 years of data fit the purpose?

  • Excel: Generally, Excel workbooks with many formulas can be quite hard to maintain. So, we need to ask the following questions of the business:

    (a) How many of those 20 sheets of data are going to be needed by the business? You may be able to exclude some of those sheets.

    (b) How often are the formulas in the Excel file edited? This is a critical point as modifying the formulas can easily break your data processing in Power Query and generate errors. So, you would need to be prepared to replicate a lot of formulas in Power BI if needed.

  • Data warehouse in SQL Server: It is beneficial to have a data warehouse as a source system as data warehouses normally have a much better structure from an analytical viewpoint. But in our scenario, the finest grain of both fact tables is down to a minute. This can potentially turn into an issue pretty quickly. Remember, we have a Power BI Pro license, so we are limited to a 1 GB file size only. Therefore, it is wise to clarify some questions with the business before we start building the model:

    (a) Does the business need to analyze all the metrics down to the minute or is day-level enough?

    (b) Do we need to load all the data into Power BI, or is a portion of the data enough?

    We now know the questions to ask, but what if the business needs to analyze the whole history of the data? In that case, we may consider using composite models with aggregations.

    Having said all that, there is another point to consider. We already have five dimensions in the data warehouse. Those dimensions can potentially be reused in our data model. So, it is wise to look at the other data sources and find commonalities in the data patterns.

You may come up with some more legitimate points and questions. However, you can quickly see in the previously mentioned points and questions that you need to talk to the business and ask questions before starting the job. It is a big mistake to start getting data from the source systems before framing your questions around business processes, requirements, and technology limitations. There are also some other points that you need to think about from a project management perspective that are beyond the scope of this book.

The initial points to take into account for building an efficient data model are as follows:

  • We need to ask questions of the business to avoid any confusions and potential reworks in the future.
  • We need to understand the technology limitations and come up with solutions.
  • We have to have a good understanding of data modeling, so we can look for common data patterns to prevent overlaps.

At this point, you may think, "OK, but how we can get there?" My answer is that you have already taken the first step, which is reading this book. All the preceding points and more are covered in this book. The rest is all about you and how you apply your learning to your day-to-day challenges.

Star schema (dimensional modeling) and snowflaking

First things first, star schema and dimensional modeling are the same things. In Power BI data modeling, the term star schema is more commonly used. So, in this book, we will use the term star schema. The intention of this book is not to teach you about dimensional modeling. Nevertheless, we'll focus on how to model data using star schema data modeling techniques. We will remind you of some star schema concepts.

Transactional modeling versus star schema modeling

In transactional systems, the main goal is to improve the system's performance in creating new records and updating/deleting existing records. So, it is essential to go through the normalization process to decrease data redundancy and increase data entry performance when designing transactional systems. In a straightforward form of normalization, we break tables down into master-detail tables.

But the goal of a business analysis system is very different. In business analysis, we need a data model optimized for querying in the most performant way.

Let's continue with a scenario.

Say we have a transactional retail system for international retail stores. We have hundreds of transactions every second from different parts of the world. The company owner wants to see the total sales amount in the past 6 months.

This calculation sounds easy. It is just a simple SUM of sales. But wait, we have hundreds of transactions every second, right? If we have only 100 transactions per second, then we have 8,640,000 transactions a day. So, for 6 months of data, we have more than 1.5 billion rows. Therefore, a simple SUM of sales will take a reasonable amount of time to process.

The scenario continues: Now, a new query comes from the business. The company owner now wants to see the total sales amount in the past 6 months by country and city. He simply wants to know what the best-selling cities are.

We need to add another condition to our simple SUM calculation, which translates to a join to the geography table. For those coming from a relational database design background, it will be obvious that joins are relatively expensive operations. This scenario can go on and on. So, you can imagine how quickly you will face a severe issue.

In the star schema, however, we already joined all those tables based on business entities. We aggregated and loaded the data into denormalized tables. In the preceding scenario, the business is not interested in seeing every transaction at the second level. We can summarize the data at the day level, which decreases the number of rows from 1.5 billion to a couple of thousands of rows for the whole 6 months. Now you can imagine how fast the summation would be running over thousands of rows instead of 1.5 billion rows.

The idea of the star schema is to keep all numeric values in separate tables called fact tables and put all descriptive information into other tables called dimension tables. Usually, the fact tables are surrounded by dimensions that explain those facts. When you look at a data model with a fact table in the middle surrounded by dimensions, you see that it looks like a star. Therefore, it is called a star schema. In this book, we generally use the Adventure Works DW data, a renowned Microsoft sample dataset, unless stated otherwise. Adventure Works is an international bike shop that sells products both online and in their retail shops. The following figure shows Internet Sales in a star schema shape:

Figure 1.9 – Adventure Works DW, Internet Sales star schema

Figure 1.9 – Adventure Works DW, Internet Sales star schema

Snowflaking

Snowflaking is when you do not have a perfect star schema when dimension tables surround your fact tables. In some cases, you have some levels of descriptions stored in different tables. Therefore, some dimensions of your model are linked to some other tables that describe the dimensions in a greater level of detail. Snowflaking is normalizing the dimension tables. In some cases, snowflaking is inevitable; nevertheless, the general rule of thumb in data modeling in Power BI (when following a star schema) is to avoid snowflaking as much as possible. The following figure shows snowflaking in Adventure Works Internet Sales:

Figure 1.10 – Adventure Works, Internet Sales snowflakes

Figure 1.10 – Adventure Works, Internet Sales snowflakes

Understanding denormalization

In real-world scenarios, not everyone has the luxury of having a pre-built data warehouse designed in a star schema. In reality, snowflaking in data warehouse design is inevitable. So, you may build your data model on top of various data sources, including transactional database systems and non-transactional data sources such as Excel files and CSV files. So, you almost always need to denormalize your model to a certain degree. Depending on the business requirements, you may end up having some level of normalization along with some denormalization. The reality is that there is no specific rule for the level of normalization and denormalization. The general rule of thumb is to denormalize your model so that a dimension can describe all the details as much as possible.

In the preceding example from Adventure Works DW, we have snowflakes of Product Category and Product Subcategory that can be simply denormalized into the Product dimension.

Let's go through a hands-on exercise.

Go through the following steps to denormalize Product Category and Product Subcategory into the Product dimension.

Note

You can download the Adventure Works, Internet Sales.pbix file from here:

https://github.com/PacktPublishing/Expert-Data-Modeling-with-Power-BI/blob/master/Adventure%20Works%20DW.pbix

Open the Adventure Works, Internet Sales.pbix file and follow these steps:

  1. Click Transform data on the Home tab in the Queries section.
  2. Click the Product Query.
  3. Click Merge Queries on the Home tab in the Combine section.
  4. Select Product Subcategory from the drop-down list.
  5. Click ProductSubcategoryKey on the Product table.
  6. Click ProductSubcategoryKey on the Product Subcategory table.
  7. Select Left Outer (all from first matching from the second) from the Join Kind dropdown.
  8. Click OK:

Figure 1.11 – Merging Product and Product Subcategory

Figure 1.11 – Merging Product and Product Subcategory

This adds a new step named Merged Queries. As you see, the values of this column are all Table. This type of column is called Structured Column. The merging Product and Product Subcategory step creates a new structured column named Product Subcategory:

You will learn more about structured columns in Chapter 3, Data Preparation in Power Query Editor.

Figure 1.12 – Merging the Product and Product Subcategory tables

Figure 1.12 – Merging the Product and Product Subcategory tables

Now let's look at how to expand a structured column in the query editor:

  1. Click the Expand button to expand the Product Subcategory column.
  2. Select ProductCategoryKey.
  3. Select the EnglishProductSubcategoryName columns and unselect the rest.
  4. Unselect Use original column names as prefix.
  5. Click OK:

Figure 1.13 – Expanding Structured Column in the Query Editor

Figure 1.13 – Expanding Structured Column in the Query Editor

So far, we have added the EnglishProductSubcategoryName and ProductCategoryKey columns from the Product Subcategory query to the Product query. The next step is to add EnglishProductCategoryName from the Product Category query. To do so, we need to merge the Product query with Product Category:

  1. Click Merge Queries again.
  2. Select Product Category from the drop-down list.
  3. Select ProductCategoryKey from the Product table.
  4. Select ProductCategoryKey from the Product Category table.
  5. Select Left Outer (all from first matching from second).
  6. Click OK:

Figure 1.14 – Merging Product and Product Category

Figure 1.14 – Merging Product and Product Category

This adds another step and a new structured column named Product Category. We now need to do the following:

  1. Expand the new column.
  2. Pick EnglishProductCategoryName from the list.
  3. Unselect Use original column name as prefix.
  4. Click OK:
Figure 1.15 – Merging Product and Product Category

Figure 1.15 – Merging Product and Product Category

The next step is to remove the ProductCategoryKey column as we do not need it anymore. To do so, do the following:

  1. Click on the ProductCategoryKey column.
  2. Click the Remove Columns button in the Managed Column section of the Home tab:
Figure 1.16 – Removing a column in the Query Editor

Figure 1.16 – Removing a column in the Query Editor

Now we have merged the Product Category and Product Subcategory snowflakes with the Product query. So, you have denormalized the snowflakes.

The very last step is to unload both the Product Category and Product Subcategory queries:

  1. Right-click on each query.
  2. Untick Enable load from the menu.
  3. Click Continue on the Possible Data Loss Warning pop-up message:
Figure 1.17 – Unloading queries in the Query Editor

Figure 1.17 – Unloading queries in the Query Editor

Now we need to import the data into the data model by clicking Close & Apply:

Figure 1.18 – Importing data into the data model

Figure 1.18 – Importing data into the data model

We have now achieved what we were after: we denormalized the Product Category and Product Subcategory tables, therefore rather than loading those two tables, we now have EnglishProductCategoryName and EnglishProductSubcategoryName represented as new columns in the Product table.

Job done!

 

Power BI licensing considerations

At this point, you may be wondering how Power BI licensing affects data modeling. It does, as each licensing tier comes with a set of features that can potentially affect the data modeling. Nevertheless, regardless of the licensing tier you are using, Power BI Desktop is free of charge. In this section, we'll quickly look at some licensing considerations related to data modeling.

The following table is a simplified version of the Power BI feature comparisons published on the Microsoft website separately based on different licenses:

Figure 1.19 – A simplified version of Power BI feature comparisons

Figure 1.19 – A simplified version of Power BI feature comparisons

Maximum size of individual dataset

As the table illustrates, we are limited to 1 GB for each dataset published to the Power BI service under Free or Professional licensing. Therefore, managing the file size is quite important. There are several ways to keep the file size just below the limit, as follows:

  • Import the necessary columns only.
  • Import just a portion of data when possible. Explain the technology limitation to the business and ask whether you can filter out some data. For instance, the business may not need to analyze 10 years of data, so filter older data in Power Query.
  • Use aggregations. In many cases, you may have the data stored in the source at a very low granularity. However, the business requires data analysis on a higher grain. Therefore, you can aggregate the data to a higher granularity, then import it into the data model. For instance, you may have data stored at a minute level. At the same time, the business only needs to analyze that data at the day level.
  • Consider disabling auto date/time settings in Power BI Desktop.
  • Consider optimizing data types.

We will cover all the preceding points in the upcoming chapters.

Incremental data load

One of the coolest features available in Power BI is the ability to set up an incremental data load. Incremental data loading in Power BI is inherited from SSAS to work with large models. When it is set up correctly, Power BI does not truncate the dataset and re-import all the data from scratch. Instead, it only imports the data that has been changed since the last data refresh. Therefore, incremental data load can significantly improve the data refresh performance and decrease the amount of processing load on your tenant. Incremental data load is available in both Professional and Premium licenses.

Calculation groups

Calculation groups are like calculated members in MultiDimensional eXpressions (MDX). Calculation groups were initially introduced in SSAS 2019 Tabular models. They are also available in Azure Analysis Services and all Power BI licensing tiers.

It is a common scenario that you create (or already have) some base measures in your Power BI model and then create many time intelligence measures on top of those base measures. In our sample file, we have three measures, as follows:

  • Product cost: SUM('Internet Sales'[TotalProductCost])
  • Order quantity: SUM('Internet Sales'[OrderQuantity])
  • Internet sales: SUM('Internet Sales'[SalesAmount])

The business requires the following time intelligence calculations on top of all the preceding measures:

  • Year to date
  • Quarter to date
  • Month to date
  • Last year to date
  • Last quarter to date
  • Last month to date
  • Year over year
  • Quarter over quarter
  • Month over month

We have nine calculations to be built on top of every single measure we have in our model. Hence, we end up having 9 x 3 = 27 measures to build in our model. You can imagine how quickly the number of measures can rise in the model, so you should not be surprised if someone tells you that they have hundreds of measures in their Power BI model.

Another common scenario is when we have multiple currencies. Without calculation groups, you need to convert the values into strings to show the figures and use a relevant currency symbol using the FORMAT() function in DAX. Now, if you think about the latter point, combined with time intelligence functions, you can see how the issue can get bigger and bigger.

Calculation groups solve those sorts of problems. We cover calculation groups in Chapter 10, Advanced Data Modeling Techniques.

Shared datasets

As the name implies, a shared dataset is a dataset used across various reports in a modern workspace (a new workspace experience) within the Power BI service. Therefore, it is only available in the Power BI Professional and Power BI Premium licensing plans. This feature is quite crucial to data modelers. It provides more flexibility in creating a more generic dataset, covering more business entities in a single dataset instead of having several datasets that may share many commonalities.

Power BI Dataflows

Dataflows, also referred to as Power Query Online, provide a centralized data preparation mechanism in the Power BI service that other people across the organization can take advantage of. Like using Power Query in Power BI Desktop for data preparation, we can prepare, clean, and transform the data in dataflows. Unlike Power Query queries, which are isolated within a dataset, when created in Power BI Desktop and then published to the Power BI service, you can share all data preparations, data cleansing, and data transformation processes across the organization with dataflows.

You can create Power BI dataflows inside a workspace, so it is only available to Professional and Premium users. We will also cover Power BI dataflows in future chapters.

 

The iterative data modeling approach

Like many other software development approaches, data modeling is an ongoing process. You start talking to the business, then apply the business logic to your model. You carry on with the rest of your Power BI development. In many cases, you build your data visualizations and then find out that you will get better results if you make some changes in your model. In many other cases, the business logic applied to the model is not what the business needs. This is a typical comment that many of us will get from the business after the first few iterations:

This looks really nice, but unfortunately, it is not what we want.

So, taking advantage of an agile approach would be genuinely beneficial for Power BI development. Here is the iterative approach you can follow in your Power BI development:

Figure 1.20 – The iterative data modeling approach

Figure 1.20 – The iterative data modeling approach

Information gathering from the business

Like all other software development processes, a Power BI development process starts with gathering information from the business to get a better understanding of the business requirements. A business analyst may take care of this step in the real world but wait, a lot of Power BI users are business analysts. Regardless of your role, whether you are a business analyst or a business analyst takes care of this step and you are a data modeler, you need to analyze the information you get from the business. You have to ask relevant questions and come up with a list of design possibilities. You have to identify potential risks and discuss them with the customer. You also need to be aware of technology limitations and discuss them with the customer as well. After you get answers to your questions and have a list of design possibilities, risks, and technology limitations, you can move on to the next step more confidently.

Data preparation based on the business logic

You now have a lot on your plate. You need to get the data from various data sources and go through the data preparation steps. Now that you know a lot about business logic, you can take the proper steps in your data preparation. For instance, if the business requires you to connect to an OData data source and get a list of the columns required by the business, you can prepare your data more efficiently with all the design risks and technology limitations in mind. After you have consciously prepared your data, you will go on to the next step, which is data modeling.

Data modeling

If you took the proper actions in the previous steps, your data model will be much tidier, so you can build your model more efficiently. Now you need to think about the analytical side of things. Simultaneously, you still have all the business requirements, design possibilities, risks, and technology limitations in mind. For instance, if the business cannot tolerate data latency longer than 5 minutes, you may need to think about using DirectQuery. Using DirectQuery comes with some limitations and performance risks. So, you need to think about the design approach that satisfies the business requirements the most. We cover DirectQuery in Chapter 4, Getting Data from Various Sources in the Dataset storage modes section.

Testing the logic

This is one of the most trivial and yet most important steps in data modeling: testing all the business logic you implement to meet the requirements. Not only do you need to test the figures to make sure the results are accurate, but you also need to test the solution from a performance and user experience perspective. Be prepared for tons of mixed feedback, and sometimes strong criticism from the end users, especially when you think everything is OK.

Demonstrating the business logic in a basic data visualization

As we are modeling the data, we do not need to be worried about the data visualization part. The fastest way to make sure all the business logic is right is to confirm with the business. The fastest way to do that is to demonstrate the logic in the simplest possible way, such as using table and matrix visuals and some slicers on the page. Remember, this is only to confirm the logic with the business, not the actual product delivery. There will be a lot of new information and surprises that come up during the demonstration in the real world, which means you'll then need to start the second iteration and gather more information from the business.

As you go through all the preceding steps several times, you'll gradually become a professional data modeler. In the next section, we'll quickly cover how professional data modelers think.

Note

This book also follows an iterative approach, so we'll go back and forth between different chapters to cover some scenarios.

Thinking like a professional data modeler

Back in the day, in the late 90s, I was working on transactional database systems. Back then, it was essential to know how to normalize your data model to at least the third normal form. In some cases, we were normalizing to the Boyce-Codd normal form. I carried out many projects facing a lot of different issues and I made many mistakes, but I learned from those mistakes. Gradually, I was experienced enough to visualize the data model to the second or sometimes even to the third normal form in my head while I was in a requirements gathering session with the customer. All data modeling approaches that I had a chance to work with, or read about, were based on relational models regardless of their usage, such as transactional models, star schema, Inmon, and data vault. They are all based on relational data modeling. Data modeling in Power BI is no different. Professional data modelers can visualize the data model in their minds from the first information-gathering sessions they have with the customer. But as mentioned, this capability comes with experience.

Once you have enough experience in data modeling, you'll be able to ask more relevant questions from the business. You already know of some common scenarios and pitfalls, so you can quickly recognize other similar situations. Therefore, you can avoid many future changes by asking more relevant questions. Moreover, you can also give your customer some new ideas to solve other problems down the road. In many cases, the customer's requirements will change during the project lifetime. So, you will not be surprised when those changes happen.

 

Summary

In this chapter, we discussed the different layers of Power BI and what is accessible in which layer. Therefore, when we face an issue, we know exactly where we should look to fix the problem. Then we discussed how when we build a data model, we are indeed making a semantic layer in Power BI. We also covered some star schema and snowflaking concepts, which are essential to model our data more efficiently. We then covered different Power BI licensing considerations and how they can potentially affect our data modeling. Lastly, we looked at the data modeling iterative approach to deliver a more precise and more reliable data model that solves many problems that the report writers may face down the road.

In the next chapter, we will look at DAX and data modeling. We will discuss a somewhat confusing topic, virtual tables, and we will walk you through some common time intelligence scenarios to help you with your future data modeling tasks.

About the Author

  • Soheil Bakhshi

    Soheil Bakhshi is the founder of Data Vizioner and is a sought after BI consultant. Working in data and analytics for more than 20 years, Soheil’s experience lies in Microsoft BI, Data Warehousing, and Power BI platform. He possesses MSCE, MCSA certifications and is a Microsoft MVP (Most Valuable Professional). He has a passion for sharing knowledge via his website and speaking at conferences and Power BI community events locally and globally. In following his desire for simplicity and efficiency, he is behind Power BI community tools and commercial products such as Power BI Exporter and Power BI Documenter.

    Browse publications by this author
Expert Data Modeling with Power BI
Unlock this book and the full library for $5 a month*
Start now