Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Microsoft Power BI Cookbook. - Second Edition

You're reading from  Microsoft Power BI Cookbook. - Second Edition

Product type Book
Published in Sep 2021
Publisher Packt
ISBN-13 9781801813044
Pages 656 pages
Edition 2nd Edition
Languages
Authors (2):
Gregory Deckler Gregory Deckler
Profile icon Gregory Deckler
Brett Powell Brett Powell
Profile icon Brett Powell
View More author details

Table of Contents (16) Chapters

Preface 1. Configuring Power BI Tools 2. Accessing and Retrieving Data 3. Building a Power BI Data Model 4. Authoring Power BI Reports 5. Working in the Service 6. Getting Serious with Date Intelligence 7. Parameterizing Power BI Solutions 8. Implementing Dynamic User-Based Visibility in Power BI 9. Applying Advanced Analytics and Custom Visuals 10. Administering and Monitoring Power BI 11. Enhancing and Optimizing Existing Power BI Solutions 12. Deploying and Distributing Power BI Content 13. Integrating Power BI with Other Applications 14. Other Book You May Enjoy
15. Index

Combining and Merging Queries

The full power of Power BI's querying capabilities is in the integration of distinct queries representing different data sources via its merge and append transformations. Retrieval processes that consolidate files from multiple network locations or integrate data from multiple data sources can be developed efficiently and securely. Additionally, the same join types and data transformation patterns SQL and ETL developers are familiar with can be achieved with the M language. This recipe provides examples of combining sources into a single query and leveraging the table join functions of M to support common transformation scenarios.

Getting ready

To follow along with this recipe, you can use the Merge Queries and Append Queries icons on the Home tab of the Power Query Editor to generate the join expressions used in this recipe. However, as joining queries is fundamental to the retrieval process, it is recommended to learn how to use the Table.Join, Table.NestedJoin, and Table.Combine functions.

To get ready for this recipe, import the DimCustomer and FactCallCenter tables from the AdventureWorksDW2019 database by doing the following:

  1. Open Power BI Desktop and choose Transform data from the ribbon of the Home tab to open the Power Query Editor.
  2. Create an Import mode data source query called AdWorksDW. This query should be similar to the following:
    let
        Source = Sql.Database("localhost\MSSQLSERVERDEV", "AdventureWorksDW2019")
    in
        Source
    
  3. Isolate this query in a query group called Data Sources.
  4. Right-click AdWorksDW and choose Reference, select the DimEmployee table in the data preview area, and rename this query DimEmployee.
  5. Repeat step 4 but choose the FactInternetSales table and name this query Sales2011. Filter the OrderDate column to be between January 1, 2011 and December 31, 2011.
    Let
        Source = AdWorksDW,
        dbo_FactInternetSales = Source{[Schema="dbo",Item="FactInternetSales"]}[Data],
        #"Filtered Rows" = 
            Table.SelectRows(
                dbo_FactInternetSales, each 
                    [OrderDate] >= #datetime(2011, 1, 1, 0, 0, 0) and 
                    [OrderDate] <= #datetime(2011, 12, 31, 0, 0, 0))
    in
        #"Filtered Rows"
    
  6. Right-click the Sales2011 query and choose Duplicate. Name this query Sales2012. Edit the Table.SelectRows expression to filter dates between January 1, 2012 and December 31, 2012.
  7. Repeat step 6, naming this new query Sales2013 and filtering for dates between January 1, 2013 and December 31, 2013.

For additional details on performing these steps, see the Managing Queries and Data Sources recipe in this chapter.

How to Combine Queries

The goal of this example is to produce an integrated table based on three "Sales" queries. While in this example the three queries come from the same data source, it is important to realize that the three queries could point to completely different data sources, such as three text files or even a text file, an Excel spreadsheet, and a SQL database. What is important is that the three queries have the same column names and number of columns. To implement this recipe, perform the following steps:

  1. Move the Sales2011, Sales2012, and Sales2013 queries to a new query group called Staging Queries.
  2. Disable the load for the Sales2011, Sales2012, and Sales2013 queries.
  3. Create a new Blank query in the Other Queries group and call this query FactInternetSales2011to2012.
  4. Open the FactInternetSales2011to2012 query in Advanced Editor.
  5. Use the Table.Combine (or Append Queries feature in the ribbon) function to return a single table based on the rows of the Sales2011, Sales2012, and Sales2013 queries.
    Let
        Source = Table.Combine( {Sales2011, Sales2012, Sales2013} )
    in
        Source
    
  6. Move the DimEmployees query to the Staging Queries group and disable loading.
  7. Duplicate the DimEmployees query and call this new query Managers.
  8. Disable loading for the Managers query.
  9. Open the Managers query in the Advanced Editor.
  10. Add a Manager Name column using the Table.AddColumn function.
        ManagerName = 
            Table.AddColumn(
                dbo_DimEmployee,
                "Manager Name", each [FirstName] & " " & [LastName]
            )
    
  11. Select only the EmployeeKey, Manager Name, and Title columns using the Table.SelectColumns function.
        SelectCols = 
            Table.SelectColumns(
                ManagerName, 
                { "EmployeeKey", "Manager Name", "Title" }
            )
    
  12. Create a new Blank query in the Other Queries group and call this query Employees.
  13. Open the Employees query in Advanced Editor.
  14. Join the Managers query and DimEmployee query using the Table.NestedJoin function or the Merge Queries feature in the ribbon.
    Let
        Source = 
            Table.NestedJoin(
                DimEmployee, "ParentEmployeeKey", Managers, "EmployeeKey",
                "ManagerColumn", JoinKind.LeftOuter
            )
    in
        Source
    
  15. Use the Table.ExpandTableColumn function to add the Manager Name and Manager Title columns.
        ManagerColumns = 
            Table.ExpandTableColumn(
                Source, "ManagerColumn",
                { "Manager Name", "Title" },
                { "Manager Name", "Manager Title" }
            )
    
  16. Add an Employee Name column.
        EmployeeName = 
            Table.AddColumn(
                ManagerColumns,
                "Employee Name", each [FirstName] & " " & [LastName]
            )
    
  17. Rename the Title column to Employee Title.
        RenameTitle = 
            Table.RenameColumns(
                EmployeeName,
                { "Title", "Employee Title" }
            )
    
  18. Select the EmployeeKey, Employee Name, Employee Title, Manager Name, and Manager Title columns.
        SelectCols =
            Table.SelectColumns(
                RenameTitle,
                { 
                    "EmployeeKey", "Employee Name", "Employee Title",
                    "Manager Name", "Manager Title"
                }
            )
    

How it works

For the FactInternetSales2011to2013 query, only a single expression is required using the Table.Combine function. No other expressions are necessary in this example given that the staging queries have identical column names and the same number of columns. The Table.Combine function performs an append operation and does not remove duplicate rows similar to a SQL UNION statement.

Any columns which are unique to one of the input tables in a Table.Combine function will be added to the result set with null values for the rows from the other tables. Depending on the scenario, the developer could apply the Table.Distinct function to avoid any duplicate rows from reaching the data model.

The Employees query references the DimEmployees query as the left table in a Table.NestedJoin function, and is joined to the Managers query via a left outer join. The left join is required to retain all employee rows in this scenario, as the DimEmployees table includes one employee that does not have a parent employee key, the Chief Executive Officer.

The join is performed on the ParentEmployeeKey column in the DimEmployees query to the EmployeeKey column in the Managers query. After this step, all of the rows from the matching Managers table are stored within the row as a Table object in the column ManagerColumn. When expanding the ManagerColumn column using the Table.ExpandTableColumn function, the Manager Name column can retain the same name, but the Title column is renamed to Manager Title in order to avoid conflicting with the Title column in the DimEmployees query.

In implementing the table joins, you can choose to use the Table.Join and Table.NestedJoin functions. All six join types—inner, left outer, right outer, full outer, left anti, and right anti—are supported by both functions. The Table.NestedJoin function enters the results of the join (the right or second table) into a new column of table values and will use local resources to execute the join operation, unless the Table.ExpandTableColumn function is used to replace this new column with columns from the right table. A left outer join type is the default if the JoinKind parameter is not specified. For performance reasons, Table.NestedJoin should not be used without a Table.ExpandTableColumn function removing the column of tables.

Conversely, the Table.Join function automatically expands the left table with the columns from the right table input (a flat join) and defaults to an inner join if the JoinKind parameter is not specified. The Table.Join function gets folded to the source without any additional functions but requires that there are no matching column names between the joined tables for a JoinKind other than inner join. For inner joins, the matching column names from both tables must be specified in the join key parameters. A Table.SelectColumns function is required to exclude any columns from the right table added with the join.

Whether implemented via Table.NestedJoin or Table.Join, developers should look to use inner joins if the source tables have referential integrity, such as with foreign key constraints and whether this meets requirements. For joins against larger tables, developers should confirm that query folding is occurring and can evaluate the different query plans generated by alternative retrieval designs in terms of performance.

Note that the two rows for Rob Walters are due to a Slowly Changing Dimension (SCD) Type 2 process applied in the source database. For more information on SCDs, refer to this Wikipedia article: https://bit.ly/3yIQeI5.

There's more...

Rather than creating separate lookup/join staging queries, it is possible to consolidate these expressions into a single let…in M expression. For example, the following single query returns the exact same results as the Sales2011, Sales2012, Sales2013, and FactInternetSales2011to2013 queries:

let
    Source = AdWorksDW,
    Sales = Source{[Sche"a=""bo",It"m="FactInternetSa"es"]}[Data],
    Sales2011Rows = 
        Table.SelectRows(
            Sales, each 
                [OrderDate] >= #datetime(2011, 1, 1, 0, 0, 0) and 
                [OrderDate] <= #datetime(2011, 12, 31, 0, 0, 0)
        ),
    Sales2012Rows = 
        Table.SelectRows(
            Sales, each 
                [OrderDate] >= #datetime(2012, 1, 1, 0, 0, 0) and 
                [OrderDate] <= #datetime(2012, 12, 31, 0, 0, 0)
        ),
    Sales2013Rows = 
        Table.SelectRows(
            Sales, each 
                [OrderDate] >= #datetime(2013, 1, 1, 0, 0, 0) and 
                [OrderDate] <= #datetime(2013, 12, 31, 0, 0, 0)
        ),
    Append = Table.Combine( {Sales2011Rows, Sales2012Rows, Sales2013Rows} )
in
    Append

Inline query approaches are helpful in limiting the volume of queries, but you lose the management benefits provided by group folders and the Query Dependencies view. The graphical support makes it easier to explain and quickly troubleshoot a data retrieval process over a single but complex M expression. Staging queries are recommended for all but the most trivial projects and retrieval processes. Staging queries should generally never be loaded to the data model, as staging tables could both confuse the user and would require the data model to use additional resources to process and store the additional data.

Similarly, merge queries can also be combined into a single query. The following table breaks down the six different join types that can be specified in both the Table.NestedJoin and Table.Join functions. Both the Parameter and Parameter Value can be used, though the recipes in this book use Parameter as this makes the expressions easier to follow.

Join type

Parameter

Parameter value

Inner

JoinKind.Inner

0

Left Outer

JoinKind.LeftOuter

1

Right Outer

JoinKind.RightOuter

2

Full Outer

JoinKind.FullOuter

3

Left Anti

JoinKind.LeftAnti

4

Right Anti

JoinKind.RightAnti

5

Table 2.3: Power Query (M) join types, parameters, and parameter values

One final note is that, for data source files with the same structure stored in the same network directory folder, Power BI offers the Combine Binaries transformation, which can be used with text, CSV, Excel, JSON, and other file formats. This feature can be used when creating a Folder query. The Combine Binaries feature automatically creates an example query and a function linked to this query, such that any required modification to the source files can be applied to all files, and the source location of the files can be easily revised.

See also

You have been reading a chapter from
Microsoft Power BI Cookbook. - Second Edition
Published in: Sep 2021 Publisher: Packt ISBN-13: 9781801813044
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}