You're reading from Expert Data Modeling with Power BI - Second Edition

Product typeBook

Published inApr 2023

PublisherPackt

ISBN-139781803246246

Edition2nd Edition

Concepts

Business Intelligence

Author (1)

Soheil Bakhshi

Data Preparation Common Best Practices

In the previous chapter, we connected to a flat data source, prepared the data with the star schema approach, and identified the dimensions and facts. In this chapter, we look at common data preparation best practices that help to avoid common pitfalls, having better-performing queries that are well organized and are cheaper to maintain by going through some general techniques and considerations in Power Query. We look at query folding and discuss its related best practices. We emphasize the importance of data conversion to avoid potential issues caused by inappropriate data conversions in the data model. We also discuss some query optimization techniques. Last but not least, we look at potential naming conventions essential for code consistency.

In this chapter, we discuss the following best practices:

Consider loading a proportion of data
Appreciate case sensitivity in Power Query
Be mindful of query folding and its...

Consider loading a proportion of data

A general rule of thumb with all kinds of data sources is only to keep relevant columns during data preparation. We need to pay more attention to it when dealing with API-based data sources, as the available resources are usually pretty limited. For instance, when connecting to a data source stored in a SharePoint Online folder, we get throttled when we hit the 25 requests per second limit. Another common data source that we have to be more careful with is Open Data Protocol (OData). Microsoft invented OData, a commonly accepted method for creating and consuming REST APIs. Many Enterprise Resource Planning (ERP) systems are accessible via OData. When loading data via an OData connection into Power BI, it is essential to pay extra attention to the amount of data being loaded. In many cases, the underlying data model has wide tables with many columns containing metadata that is not necessarily needed.

Power BI reports brought production systems...

Appreciate case sensitivity in Power Query

As explained before, Power Query is case-sensitive. Case sensitivity is not just about the Power Query syntax; string data types in opposite cases are not equal in Power Query. So it is essential to pay attention to character cases when working with string data. When we mash up data from different sources, we often have GUIDs as key columns (either a primary or a foreign key). We get incorrect results if we compare the GUID values with different cases. For instance, in Power Query, the following values are not equal:

C54FF8C6-4E51-E711-80D4-00155D38270C
c54ff8c6-4e51-e711-80d4-00155d38270c

Therefore, if we merge two tables joining the key columns, we get weird results. It is also the case if we load the data into the data model and create a relationship between two tables with key columns in different character cases. The solution is simple; we only need to keep both key columns in the same character case using either...

Be mindful of query folding and its impact on data refresh

Data modelers need to pay extra attention to query folding. Not only can query folding affect the performance of a data refresh but it can also hit resource utilization during the data refresh. Query folding is essential for the very same reason an incremental data refresh is, so if the refresh takes too long due to the queries not being folded, the incremental data refresh never happens. It is also crucial for the models in either DirectQuery or Dual storage mode as each transformation step must be folded. As query folding is a vital topic, it is good to spend some time to see what it is all about.

Understanding query folding

Query folding is simply the Power Query engine’s capability to translate the transformation steps into the native query language. Therefore, based on the Power Query engine’s capability, a query in the Power Query Editor may be fully folded or partially folded. For instance, we...

Organize queries in the Power Query Editor

One of the aspects of a good development model in the software development world is to keep the code and objects organized, and Power BI development is not an exception. While this best practice is not directly relevant to data modeling, it is good to organize the queries from a maintenance perspective. Organizing queries is simple. Just follow these steps:

Select multiple queries from the Queries pane.
Right-click then hover over Move to Group, then click New Group...
Enter a name for the group.
Enter a relevant description for the group.
Click OK.

The following screenshot shows the preceding steps:

Figure 7.15: Grouping queries in the Power Query Editor

After grouping all queries, we have organized the Queries pane. This is handy, especially with larger models with many queries, as the following screenshot shows:

Figure 7.16: Organized queries in the Power Query Editor

...

Follow data type conversion best practices

We previously discussed different Power Query types in Chapter 3, Data Preparation in Power Query Editor, in the Introduction to Power Query (M) section. In Chapter 5, Common Data Preparation Steps, we also discussed that data type conversion is one of the most common data preparation steps. In both chapters, we looked at different data types available in Power Query. So, as a data modeler, it is crucial to understand the importance of data type conversion. This section looks at some best practices for data conversion and how they can affect our data modeling.

Data type conversion can affect data modeling

As mentioned, we already discussed the data types in Power Query in Chapter 3, Data Preparation in Power Query Editor, and Chapter 5, Common Data Preparation Steps. However, briefly recalling some points in this section is worthwhile to emphasize the importance of understanding data types in Power Query. In Power Query, we have only...

Optimize query size

This section discusses other data preparation best practices to improve our model. Optimizing queries’ sizes can reduce the data refresh time. A model with an optimized size performs better after we import the data into the data model. In the following subsections, we look at some techniques that help us optimize queries.

Remove unnecessary columns and rows

In real-world scenarios, we might deal with large tables with hundreds of millions of rows and hundreds of columns. Some Power BI developers import all columns and rows from all data sources, resulting in poor-performing reports. As stated before, Power BI uses the xVelocity engine, which uses in-memory data processing for data analytics based on column cardinality. Therefore, fewer columns directly translate to less memory consumption and, as a result, a more performant data model. In many real-world scenarios, we need the business’s approval to remove unwanted columns from tables.

...

Use query parameters

As we discussed in Chapter 3, Data Preparation in Power Query Editor, in the Understanding query parameters section, query parameters are one of the most valuable features of Power Query that can help us in many scenarios. The following sections explain some of them.

Parameterizing connections

It is best to parameterize all data source connections, especially if we have different environments. The parameterized connections become handy when configuring the deployment pipelines.

Restricting the row counts in development for large tables

Sometimes, we deal with large tables with several million rows, which can be time-consuming in development. Imagine a scenario in which we transform the data in Power Query Editor and then load it into the data model. If the table in the source system is large, it takes a long time to load the data into the data model, decreasing our productivity. In most cases, we do not need to load all the data into the data model...

Define key columns in queries

One of the most important aspects of a relational database is defining key columns. The key columns are the columns used to define a relationship between two tables. The key column(s) in a table guarantees the uniqueness of each row of data. The key column guaranteeing the uniqueness of each row within a table is the primary key. The primary key of a table appearing in another table is called a foreign key. In many cases, a single column does not guarantee the uniqueness of rows, but the so-called Composite Key does by considering multiple columns as key columns. The xVelocity engine in the data model layer, as we discuss in more detail in Chapter 8, Data Modeling Components, does not support composite keys, but the Power Query engine does. By defining the key columns in the Power Query Editor, we get some performance gains in refreshing the data, especially when merging two tables. We discussed merging tables in Chapter 5, Common Data Preparation Steps...

Use naming conventions

Having naming conventions for Power BI developers and data modelers is essential. It helps with solutions’ consistency and makes the code more readable and more understandable for support specialists. It also sets common ground that everyone across the organization interacting with our Power BI solutions can benefit from.

Data sources do not necessarily have the most user-friendly object names. So, following a predefined naming convention during development is essential in helping the contributors understand the data model more efficiently and create new reports more confidently. The following naming conventions are suggested:

Use camel case for object names, including table names, column names, and parameter names.
Replace underscores, dashes, hyphens, or dots between the words with spaces.
Remove prefixes and suffixes from table names (such as DimDate becoming Date or FactSales becoming Sales).
Use the shortest and most...

Summary

This chapter discussed some of the most critical best practices for data preparation. We cannot cover all the best practices in a chapter or two, but we tried to cover some of the most important ones in this chapter. We learned how to optimize query sizes in Power Query and discussed how case sensitivity in Power Query could affect our data model. Going ahead, we learned the importance of query folding and how we can identify whether a query is folded. We then looked at some data type conversion best practices and how to reduce the number of steps by avoiding unnecessary steps and using query parameters. We then discussed the importance of having naming conventions.

In the next chapter, we will discuss data modeling components and building a star schema.

Join us on Discord!

Join The Big Data and Analytics Community on the Packt Discord Server!

Hang out with 558 other members and enjoy free voice and text chat.

https://packt.link/ips2H

The rest of the chapter is locked

You have been reading a chapter from

Expert Data Modeling with Power BI - Second Edition

Published in: Apr 2023Publisher: PacktISBN-13: 9781803246246

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Soheil Bakhshi

Soheil Bakhshi is the founder of Data Vizioner and is a sought after BI consultant. Working in data and analytics for more than 20 years, Soheil's experience lies in Microsoft BI, Data Warehousing, and Power BI platform. He possesses MSCE, MCSA certifications and is a Microsoft MVP (Most Valuable Professional). He has a passion for sharing knowledge via his website and speaking at conferences and Power BI community events locally and globally. In following his desire for simplicity and efficiency, he is behind Power BI community tools and commercial products such as Power BI Exporter and Power BI Documenter.
Read more about Soheil Bakhshi

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages