Reader small image

You're reading from  Expert Data Modeling with Power BI - Second Edition

Product typeBook
Published inApr 2023
PublisherPackt
ISBN-139781803246246
Edition2nd Edition
Right arrow
Author (1)
Soheil Bakhshi
Soheil Bakhshi
author image
Soheil Bakhshi

Soheil Bakhshi is the founder of Data Vizioner and is a sought after BI consultant. Working in data and analytics for more than 20 years, Soheil's experience lies in Microsoft BI, Data Warehousing, and Power BI platform. He possesses MSCE, MCSA certifications and is a Microsoft MVP (Most Valuable Professional). He has a passion for sharing knowledge via his website and speaking at conferences and Power BI community events locally and globally. In following his desire for simplicity and efficiency, he is behind Power BI community tools and commercial products such as Power BI Exporter and Power BI Documenter.
Read more about Soheil Bakhshi

Right arrow

Data Preparation Common Best Practices

In the previous chapter, we connected to a flat data source, prepared the data with the star schema approach, and identified the dimensions and facts. In this chapter, we look at common data preparation best practices that help to avoid common pitfalls, having better-performing queries that are well organized and are cheaper to maintain by going through some general techniques and considerations in Power Query. We look at query folding and discuss its related best practices. We emphasize the importance of data conversion to avoid potential issues caused by inappropriate data conversions in the data model. We also discuss some query optimization techniques. Last but not least, we look at potential naming conventions essential for code consistency.

In this chapter, we discuss the following best practices:

  • Consider loading a proportion of data
  • Appreciate case sensitivity in Power Query
  • Be mindful of query folding and its...

Consider loading a proportion of data

A general rule of thumb with all kinds of data sources is only to keep relevant columns during data preparation. We need to pay more attention to it when dealing with API-based data sources, as the available resources are usually pretty limited. For instance, when connecting to a data source stored in a SharePoint Online folder, we get throttled when we hit the 25 requests per second limit. Another common data source that we have to be more careful with is Open Data Protocol (OData). Microsoft invented OData, a commonly accepted method for creating and consuming REST APIs. Many Enterprise Resource Planning (ERP) systems are accessible via OData. When loading data via an OData connection into Power BI, it is essential to pay extra attention to the amount of data being loaded. In many cases, the underlying data model has wide tables with many columns containing metadata that is not necessarily needed.

Power BI reports brought production systems...

Appreciate case sensitivity in Power Query

As explained before, Power Query is case-sensitive. Case sensitivity is not just about the Power Query syntax; string data types in opposite cases are not equal in Power Query. So it is essential to pay attention to character cases when working with string data. When we mash up data from different sources, we often have GUIDs as key columns (either a primary or a foreign key). We get incorrect results if we compare the GUID values with different cases. For instance, in Power Query, the following values are not equal:

  • C54FF8C6-4E51-E711-80D4-00155D38270C
  • c54ff8c6-4e51-e711-80d4-00155d38270c

Therefore, if we merge two tables joining the key columns, we get weird results. It is also the case if we load the data into the data model and create a relationship between two tables with key columns in different character cases. The solution is simple; we only need to keep both key columns in the same character case using either...

Be mindful of query folding and its impact on data refresh

Data modelers need to pay extra attention to query folding. Not only can query folding affect the performance of a data refresh but it can also hit resource utilization during the data refresh. Query folding is essential for the very same reason an incremental data refresh is, so if the refresh takes too long due to the queries not being folded, the incremental data refresh never happens. It is also crucial for the models in either DirectQuery or Dual storage mode as each transformation step must be folded. As query folding is a vital topic, it is good to spend some time to see what it is all about.

Understanding query folding

Query folding is simply the Power Query engine’s capability to translate the transformation steps into the native query language. Therefore, based on the Power Query engine’s capability, a query in the Power Query Editor may be fully folded or partially folded. For instance, we...

Organize queries in the Power Query Editor

One of the aspects of a good development model in the software development world is to keep the code and objects organized, and Power BI development is not an exception. While this best practice is not directly relevant to data modeling, it is good to organize the queries from a maintenance perspective. Organizing queries is simple. Just follow these steps:

  1. Select multiple queries from the Queries pane.
  2. Right-click then hover over Move to Group, then click New Group...
  3. Enter a name for the group.
  4. Enter a relevant description for the group.
  5. Click OK.

The following screenshot shows the preceding steps:

Figure 7.15: Grouping queries in the Power Query Editor

After grouping all queries, we have organized the Queries pane. This is handy, especially with larger models with many queries, as the following screenshot shows:

Figure 7.16: Organized queries in the Power Query Editor

...

Follow data type conversion best practices

We previously discussed different Power Query types in Chapter 3, Data Preparation in Power Query Editor, in the Introduction to Power Query (M) section. In Chapter 5, Common Data Preparation Steps, we also discussed that data type conversion is one of the most common data preparation steps. In both chapters, we looked at different data types available in Power Query. So, as a data modeler, it is crucial to understand the importance of data type conversion. This section looks at some best practices for data conversion and how they can affect our data modeling.

Data type conversion can affect data modeling

As mentioned, we already discussed the data types in Power Query in Chapter 3, Data Preparation in Power Query Editor, and Chapter 5, Common Data Preparation Steps. However, briefly recalling some points in this section is worthwhile to emphasize the importance of understanding data types in Power Query. In Power Query, we have only...

Optimize query size

This section discusses other data preparation best practices to improve our model. Optimizing queries’ sizes can reduce the data refresh time. A model with an optimized size performs better after we import the data into the data model. In the following subsections, we look at some techniques that help us optimize queries.

Remove unnecessary columns and rows

In real-world scenarios, we might deal with large tables with hundreds of millions of rows and hundreds of columns. Some Power BI developers import all columns and rows from all data sources, resulting in poor-performing reports. As stated before, Power BI uses the xVelocity engine, which uses in-memory data processing for data analytics based on column cardinality. Therefore, fewer columns directly translate to less memory consumption and, as a result, a more performant data model. In many real-world scenarios, we need the business’s approval to remove unwanted columns from tables.

...

Use query parameters

As we discussed in Chapter 3, Data Preparation in Power Query Editor, in the Understanding query parameters section, query parameters are one of the most valuable features of Power Query that can help us in many scenarios. The following sections explain some of them.

Parameterizing connections

It is best to parameterize all data source connections, especially if we have different environments. The parameterized connections become handy when configuring the deployment pipelines.

Restricting the row counts in development for large tables

Sometimes, we deal with large tables with several million rows, which can be time-consuming in development. Imagine a scenario in which we transform the data in Power Query Editor and then load it into the data model. If the table in the source system is large, it takes a long time to load the data into the data model, decreasing our productivity. In most cases, we do not need to load all the data into the data model...

Define key columns in queries

One of the most important aspects of a relational database is defining key columns. The key columns are the columns used to define a relationship between two tables. The key column(s) in a table guarantees the uniqueness of each row of data. The key column guaranteeing the uniqueness of each row within a table is the primary key. The primary key of a table appearing in another table is called a foreign key. In many cases, a single column does not guarantee the uniqueness of rows, but the so-called Composite Key does by considering multiple columns as key columns. The xVelocity engine in the data model layer, as we discuss in more detail in Chapter 8, Data Modeling Components, does not support composite keys, but the Power Query engine does. By defining the key columns in the Power Query Editor, we get some performance gains in refreshing the data, especially when merging two tables. We discussed merging tables in Chapter 5, Common Data Preparation Steps...

Use naming conventions

Having naming conventions for Power BI developers and data modelers is essential. It helps with solutions’ consistency and makes the code more readable and more understandable for support specialists. It also sets common ground that everyone across the organization interacting with our Power BI solutions can benefit from.

Data sources do not necessarily have the most user-friendly object names. So, following a predefined naming convention during development is essential in helping the contributors understand the data model more efficiently and create new reports more confidently. The following naming conventions are suggested:

  • Use camel case for object names, including table names, column names, and parameter names.
  • Replace underscores, dashes, hyphens, or dots between the words with spaces.
  • Remove prefixes and suffixes from table names (such as DimDate becoming Date or FactSales becoming Sales).
  • Use the shortest and most...

Summary

This chapter discussed some of the most critical best practices for data preparation. We cannot cover all the best practices in a chapter or two, but we tried to cover some of the most important ones in this chapter. We learned how to optimize query sizes in Power Query and discussed how case sensitivity in Power Query could affect our data model. Going ahead, we learned the importance of query folding and how we can identify whether a query is folded. We then looked at some data type conversion best practices and how to reduce the number of steps by avoiding unnecessary steps and using query parameters. We then discussed the importance of having naming conventions.

In the next chapter, we will discuss data modeling components and building a star schema.

Join us on Discord!

Join The Big Data and Analytics Community on the Packt Discord Server!

Hang out with 558 other members and enjoy free voice and text chat.

https://packt.link/ips2H

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Expert Data Modeling with Power BI - Second Edition
Published in: Apr 2023Publisher: PacktISBN-13: 9781803246246
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Soheil Bakhshi

Soheil Bakhshi is the founder of Data Vizioner and is a sought after BI consultant. Working in data and analytics for more than 20 years, Soheil's experience lies in Microsoft BI, Data Warehousing, and Power BI platform. He possesses MSCE, MCSA certifications and is a Microsoft MVP (Most Valuable Professional). He has a passion for sharing knowledge via his website and speaking at conferences and Power BI community events locally and globally. In following his desire for simplicity and efficiency, he is behind Power BI community tools and commercial products such as Power BI Exporter and Power BI Documenter.
Read more about Soheil Bakhshi