Reader small image

You're reading from  Data Cleaning with Power BI

Product typeBook
Published inFeb 2024
PublisherPackt
ISBN-139781805126409
Edition1st Edition
Right arrow
Author (1)
Gus Frazer
Gus Frazer
author image
Gus Frazer

Gus Frazer is a seasoned analytics consultant who focuses on business intelligence solutions. With over eight years of experience working for the two market-leading platforms, Power BI (Microsoft) and Tableau, he has amassed a wealth of knowledge and expertise. He also has experience in helping hundreds of customers to drive their digital and data transformations, scope data requirements, drive actionable insights, and most important of all, clean data ready for analysis.
Read more about Gus Frazer

Right arrow

M Query Optimization

In the previous chapter, you were introduced to custom functions, appropriate planning for functions, and also using the power of parameters within your functions. Across Power Query, there are a number of opportunities for you to express and code solutions with your knowledge of M code.

Like with any coding language, there are different ways in which you can write out your code, meaning there are often strategies and techniques that can help you achieve the optimum performance of your query. In this chapter, you will dive deep into four key tips for optimizing your M queries:

  • Filtering and reducing the data you’re working on to improve performance, such as removing unnecessary data early in the query
  • Recognizing when to use functions optimized for specific tasks, reducing the need for custom code
  • Employing lazy evaluation techniques and functions such as Table.Buffer and Table.Distinct to optimize memory usage and processing
  • Using...

Technical requirements

To follow the instructions in this chapter, you will need to connect to the data using the following URL: https://raw.githubusercontent.com/PacktPublishing/Data-Cleaning-with-Power-BI/main/Retail%20Store%20Sales%20Data.csv.

Creating custom functions

As we learned in the previous chapter, while native functions are powerful, there are scenarios where custom functions are necessary. In some scenarios, it might not even make sense to add the custom function through the Queries tab as it might only be used once. With that in mind, it sometimes makes more sense to build that custom function into the existing code within the advanced editor. We’ll delve deeper into the process of creating efficient custom functions in M during this section.

As an example, start by building a table from a blank query. Follow these steps to do this:

  1. Open Power BI Desktop.
  2. Select Transform data from the Report, Table, or Model View to open the Power Query editor.
  3. Select New source and then select Blank query from the dropdown. Select Advanced editor from the Home ribbon to open the Advanced editor.
  4. Enter the following code into the Advanced editor:
    let
    Scorecard = Table.FromList(
       &...

Filtering and reducing data

Efficiently handling large datasets is a common challenge in data transformation. Filtering and reducing data early in your query can significantly boost performance.

Let’s walk through an example where we filter out unnecessary data. Consider a dataset with sales information for the past five years. If your analysis only requires data from the last year, filtering out the older records early in the query can save processing time and memory usage:

  1. Open Power BI Desktop. We’re going to be connecting to a new dataset.
  2. Click on Get Data | Text CSV. You will then enter the following URL to access the file for our example: https://raw.githubusercontent.com/PacktPublishing/Data-Cleaning-with-Power-BI/main/Retail%20Store%20Sales%20Data.csv
  3. Select Transform Data instead of Load Now so that you can explore the data before loading it all into memory.
  4. Rename the query Retail Store Sales Data - Problem Statement 1 using the Properties...

Using native M functions

As highlighted earlier in this book, M, the language behind Power Query, offers a rich library of native functions designed for data transformation. Leveraging these functions can often be more efficient than custom code.

For instance, let’s say you need to standardize product names by converting them to title case. Instead of writing custom code, you can utilize the Text.ToTitleCase function, making your query more concise and performant.

Here is an example of doing just this:

let
    Source = ... // Your data source
    StandardizedData = Table.TransformColumns(Source, {"ProductName", Text.ToTitleCase})
in
    StandardizedData

In this code, we use the Table.TransformColumns function along with the Text.ToTitleCase function to standardize product names. Native functions are highly optimized for their specific tasks, resulting in more efficient and faster queries.

The...

Optimizing memory usage

Managing memory usage is vital for query optimization. Let’s consider an example where you’re dealing with a large dataset with repeated data. Instead of creating multiple copies of the same data, we can explore the Table.Buffer function.

This function loads a table into memory once, reducing memory duplication, which can lead to improved query speed, especially for large datasets with repeated data. This optimization can result in more efficient use of system resources and better overall performance during data transformation and analysis tasks.

On the other hand, though, there are some potential drawbacks to be aware of. One significant downside is that using Table.Buffer can actually slow down performance in certain scenarios.

One reason for this is that it loads the entire table into memory at once. For very large datasets, this can consume a significant amount of memory resources, potentially leading to memory pressure and slower...

Parallel query execution

Parallel query execution is a game-changer for performance. Suppose you have a massive dataset, and you want to process it more quickly. By splitting the table into smaller parts using Table.Split, you can enable parallel processing, drastically reducing query execution times.

This can be done using code such as the following in your M query:

let
    Source = ... // Your data source
    SplitTable = Table.Split(Source, 4) // Split the table into 4 partitions
in
    SplitTable

In this code, we use the Table.Split function to divide a large table into smaller partitions, enabling parallel processing. Each partition is processed simultaneously, which can often result in significantly reducing query execution times. It must be said, though, that using Table.Split in this code on its own won’t inherently reduce query execution times, but it’s a step towards enabling parallel processing...

Summary

In this chapter, we’ve covered several crucial techniques to enhance the performance of your Power Query workflows.

We began by emphasizing the importance of efficient data filtering and reduction, encouraging you to remove unnecessary data early in your query. We explored the use of native M functions, highlighting their efficiency compared to custom code for specific tasks. Optimizing custom functions was the next focus; we learned to optimize calculations not covered by native functions. The chapter also touched on the significance of optimizing memory usage, introducing Table.Buffer and other memory-efficient coding practices.

We then delved into the game-changing concept of parallel query execution, showcasing how functions such as Table.Split can drastically reduce query execution times by dividing large tables into smaller partitions and enabling parallel processing. These techniques will empower you to tackle complex data transformation tasks in Power BI...

Questions

  1. What were the four key tips for optimizing M queries discussed this chapter?
    1. Sorting, filtering, grouping, summarizing
    2. Filtering and reducing data, using native M functions, creating custom functions, optimizing memory usage
    3. Conditional formatting, joins, aggregations, pivoting
    4. Concatenation, union, splitting, transforming
  2. In the custom function for calculating the weighted average, what are the parameters of the function, and how is the weighted average calculated?
    1. Parameters: table, weights, values; the weighted average is calculated by summing the weighted values and dividing by the total weight
    2. Parameters: table, columns; the weighted average is calculated by multiplying values by weights
    3. Parameters: values, weights, total; the weighted average is calculated by summing the values and dividing by the weights
    4. Parameters: rows, weights, values; the weighted average is calculated by summing the values and multiplying by the weights
  3. How can Table.Buffer optimize the...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data Cleaning with Power BI
Published in: Feb 2024Publisher: PacktISBN-13: 9781805126409
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Gus Frazer

Gus Frazer is a seasoned analytics consultant who focuses on business intelligence solutions. With over eight years of experience working for the two market-leading platforms, Power BI (Microsoft) and Tableau, he has amassed a wealth of knowledge and expertise. He also has experience in helping hundreds of customers to drive their digital and data transformations, scope data requirements, drive actionable insights, and most important of all, clean data ready for analysis.
Read more about Gus Frazer