Microsoft Tabular Modeling Cookbook — Save 50%
Over 50 tips and tricks for analytical modeling using Business Intelligence Semantic Models with SQL Server 2012 and PowerPivot with this book and ebook
This article written by Paul te Braak, the author of the book Microsoft Tabular Modeling Cookbook, focuses on the complex modeling concepts such as:
- Grouping by binning and sorting with ranks
- Defining many-to-many relationships
- Using the last non-empty function for stock data
(For more resources related to this topic, see here.)
This article examines the foundations of tabular modeling (at least from the modelers point of view). In order to provide a familiar model which can be used later, this article will progressively build the model. Each recipe is intended to demonstrate a particular technique; however, they need to be followed in order so that the final model is completed.
Grouping by binning and sorting with ranks
Often, we want to provide descriptive information in our data, based on values that are derived from downstream activities. For example, this could arise if we wish to include a field in the Customer table that shows the value of the previous sales for that customer or a bin grouping that defines the customer into banded sales. This value can then be used to rank each customer according to their relative importance in relation to other customers.
This type of value adding activity is usually troublesome and time intensive in a traditional data mart design as the customer data can be fully updated only once the sales data has been loaded. While this process may seem relatively straightforward, it is a recursive problem as the customer data must be loaded before the sales data (since customers must exist before the sales can be assigned to them), but the full view of the customer is reliant on loading the sales data. In a standard dimensional (star schema) modeling approach, including this type of information for dimensions requires a three-step process:
- The dimension (reseller customer data) is updated for known fields and attributes. This load excludes information that is derived (such as sales).
- Then, the sales data (referred to as fact data) is loaded in data warehouse. This ensures that the data mart is in a current state and all the sales transaction data is up-to-date. Information relating to any new and changed stores can be loaded correctly.
- The dimension data which relies on other fact data is updated based on the current state of the data mart.
Since the tabular model is less confined by the traditional star schema requirements of the fact and dimension tables (in fact, the tabular model does not explicitly identify facts or dimensions), the inclusion and processing of these descriptive attributes can be built directly into the model.
The calculation of a simple measure such as a historic sales value may be included in OLAP modeling through calculated columns in the data source view. However, this is restrictive and limited to simple calculations (such as total sales or n period sales). Other manipulations (such as ranking and binning) are a lot more flexible in tabular modeling (as we will see).
This recipe examines how to manipulate a dimensional table in order to provide a richer end user experience. Specifically, we will do the following:
- Introduce a calculated field to calculate the historical sales for the customer
- Determine the rank of the customer based on that field
- Create a discretization bin for the customer based on their sales
- Create an ordered hierarchy based on their discretization bins
Continuing with the scenario that was discussed in the Introduction section of the article, the purpose of this article is to identify each reseller's (customer's) historic sales and then rank them accordingly. We then discretize the Resellers table (customer) based on this. This problem is further complicated by the consideration that a sale occurs in the country of origin (the sales data in the Reseller Sales table will appear in any currency). In order to provide a concise recipe, we break the entire process into two distinct steps:
- Conversion of sales (which manages the ranking of Resellers based on a unified sales value)
- Classification of sales (which manages the manipulation of sales values based on discretized bins to format those bins)
How to do it…
Firstly, we need to provide a common measure to compare the sales value of Resellers. Convert sales to a uniform currency using the following steps:
- Open a new workbook and launch the PowerPivot Window.
- Import the text files Reseller Sales.txt, Currency Conversion.txt, and Resellers.txt.
The source data folder for this article includes the base schema.ini file that correctly transforms all data upon import. When importing the data, you should be prompted that the schema.ini file exists and will override the import settings. If this does not occur, ensure that the schema.ini file exists in the same directory as your source data. The prompt should look like the following screenshot:
Although it is not mandatory, it is recommended that connection managers are labeled according to a standard. In this model, I have used the convention type_table_name where type refers to the connection type (.txt) and table_name refers to the name of the table. Connections can be edited using the Existing Connections button in the Design tab.
- Create a relationship between the Customer ID field in the Resellers table and the Customer ID field in the Reseller Sales table.
- Add a new field (also called a calculated column) in the Resellers Sales table to show the gross value of the sale amount in USD. Add usd_gross_sales and hide it from client tools using the following code:
= [Quantity Ordered] *[Price Per Unit] *LOOKUPVALUE ( 'Currency Conversion'[AVG Rate] ,'Currency Conversion'[Date] ,[Order dt] ,'Currency Conversion'[Currency ID] ,[Currency ID] )
- Add a new measure in the Resellers Sales table to show sales (in USD). Add USD Gross Sales as:
USD Gross Sales := SUM ( [usd_gross_sales] )
- Add a new calculated column to the Resellers table to show the USD Sales Total value. The formula for the field should be:
= 'Reseller Sales' [USD Gross Sales]
- Add a Sales Rank field in the Resellers table to show the order for each resellers USD Sales Total. The formula for Sales Rank is:
=RANKX(Resellers, [USD Sales Total])
- Hide the USD Sales Total and Sales Rank fields from client tools.
Now that all the entries in the Resellers table show their sales value in a uniform currency, we can determine a grouping for the Reseller table. In this case, we are going to group them into 100,000 dollar bands.
- Add a new field to show each value in the USD Sales Total column of Resellers rounded down to the nearest 100,000 dollars. Hide it from client tools. Now, add Round Down Amount as:
=ROUNDDOWN([USD Sales Total],-5)
- Add a new field to show the rank of Round Down Amount in descending order and hide it from client tools. Add Round Down Order as:
=RANKX(Resellers,[Round Down Amount],,FALSE(), DENSE)
- Add a new field in the Resellers table to show the 100,000 dollars group that the reseller belongs to. Since we know what the lower bound of the sales bin is, we can also infer that the upper bin is the rounded up 100,000 dollars sales group. Add Sales Group as follows:
=IF([Round Down Amount]=0 || ISBLANK([Round Down Amount]) , "Sales Under 100K" , FORMAT([Round Down Amount], "$#,K") & " - " & FORMAT(ROUNDUP([USD Sales Total],-5),"$#,K") )
- Set the Sort By Column of the Sales Group field to the Round Down Order column. Note that the Round Down Order column should display in a descending order (that is, entries in the Resellers table with high sales values should appear first).
- Create a hierarchy on the Resellers table which shows the Sales Group field and the Customer Name column as levels. Title the hierarchy as Customer By Sales Group.
- Add a new measure titled Number of Resellers to the Resellers table:
Number of Resellers:=COUNTROWS(Resellers)
- Create a pivot table that shows the Customer By Sales Group hierarchy on the rows and Number of Resellers as values. If you created the usd_gross_sales field in the Reseller Sales table, it can also be added as an implicit measure to verify values. Expand the first bin of Sales Group. The pivot should look like the following screenshot:
How it works…
This recipe has included various steps, which add descriptive information to the Resellers table. This includes obtaining data from a separate table (the USD sales) and then manipulating that field within the Resellers table. In order to provide a clearer definition of how this process works, we will break the explanation into several subsections. This includes the sales data retrieval, the use of rank functions, and finally, the discretization of sales.
The next section deals with the process of converting sales data to a single currency.
The starting point for this recipe is to determine a common currency sales value for each reseller (or customer). While the inclusion of the calculated column USD Sales Total in the Resellers table should be relatively straightforward, it is complicated by the consideration that the sales data is stored in multiple currencies. Therefore, the first step needs to include a currency conversion to determine the USD sales value for each line. This is simply the local value multiplied by the daily exchange rate. The LOOKUPVALUE function is used to return the row-by-row exchange rate.
Now that we have the usd_gross_sales value for each sales line, we define a measure that calculates its sum in whatever filter context it is applied in. Including it in the Reseller Sales table makes sense (since, it relates to sales data), but what is interesting is how the filter context is applied when it is used as a field in the Resellers table. Here, the row filter context that exists in the Resellers table (after all, each row refers to a reseller) applies a restriction to the sales data. This shows the sales value for each reseller.
For this recipe to work correctly, it is not necessary to include the calculated field usd_gross_sales in Reseller Sales. We simply need to define a calculation, which shows the gross sales value in USD and then use the row filter context in the Resellers table to restrict sales to the reseller in question (that is, the reseller in the row).
It is obvious that the exchange rate should be applied on a daily basis because the value can change every day. We could use an X function in the USD Gross Sales measure to achieve exactly the same outcome. Our formula will be:
SUMX ( 'Reseller Sales' , 'Reseller Sales'[Quantity Ordered] * 'Reseller Sales'[Price Per Unit] * LOOKUPVALUE ( 'Currency Conversion'[AVG Rate] , 'Currency Conversion'[Date] ,'Reseller Sales'[Order dt] ,'Currency Conversion'[Currency ID] ,'Reseller Sales'[Currency ID] ) )
Furthermore, if we wanted to, we could completely eliminate the USD Gross Sales measure from the model. To do this, we could wrap the entire formula (the previous definition on USD Gross Sales) into the CALCULATE statement in the Resellers table's field definition of USD Gross Sales. This forces the calculation to occur at the current row context.
Why have we included the additional fields and measures in Reseller Sales? This is a modeling choice. It makes the model easier to understand because it is more modular. This would otherwise require two calculations (one into a default currency and the second into a selected currency) and the field usd_gross_sales is used in that calculation.
Now that sales are converted to a uniform currency, we can determine the importance by rank. RANKX is used to rank the rows in the Resellers table based on the USD Gross Sales field. The simplest implementation of RANKX is demonstrated within the Sales Rank field. Here, the function simply returns a rank based on the value according to the supplied measure (which is of course USD Gross Sales).
However, the RANKX function provides a lot of versatility and follows the syntax:
RANKX(<table> , <expression>[, <value>[, <order>[, <ties>]]] )
After the initial implementation of RANKX in its simplest form, the arguments of particular interest are the <order> and <ties> arguments. These can be used to specify the sort order (whether the rank is to be applied from highest to lowest or lowest to highest) and the function behavior when duplicate values are encountered. This may be best demonstrated with an example. To do this, we will examine the operation of rank in relation to Round Down Amount.
When a simple RANKX function is applied, the function sorts the columns in an ascending order and returns the position of a row based on the sorted order of the value and the number of prior rows within the table. This includes rows attributable to duplicate values. This is shown in the following screenshot where the Simple column is defined as RANKX(Resellers,[Round Down Amount]). Note, the data is sorted by Round Down Amount and the first four tied values have a RANKX value of 1. This is the behavior we expect since all rows have the same value. For the next value (700000), RANKX returns 5 because this is the fifth element in the sequence.
When the DENSE argument is specified, the value returned after a tie is the next sequential number in the list. In the preceding screenshot, this is shown through the DENSE column. The formula for the field DENSE is:
RANKX(Resellers, [Round Down Amount],,,DENSE))
Finally, we can specify the sort order that is used by the function (the default is ascending) with the help of <order> argument of the function. If we wish to sort (and rank) from lowest to highest, we could use the formula as shown in the INVERSE DENSE column. The INVERSE DENSE column uses the following calculation:
RANKX(Resellers, [Round Down Amount],,TRUE,DENSE)
After having specified the Sales Group field sort by column as Round Down Order, we may ask why we did not also sort the Customer Name column by their respective values in the Sales Rank column? Trying to define a sort by column in this way would cause an error as it is not a one-to-one relationship between these two fields. That is, each customer does not have a unique value for the sales rank.
Let's have a look at this in more detail. If we filter the Resellers table to show the blank USD Sales Total rows (click on the drop-down arrow in the USD Sales Total column and check the BLANKS checkbox), we see that the values of the Sales Rank column for all the rows is the same. In the following screenshot, we can see the value 636 repeated for all the rows:
Allowing the client tool visibility to the USD Sales Total and Sales Rank fields will not provide an intuitive browsing attribute for most client tools. For this reason, it is not recommended to expose these attributes to users. Hiding them will still allow the data to be queried directly
By discretizing the Resellers table, we firstly make a decision to group each reseller into bands of 100,000 intervals. Since, we have already calculated the USD Gross Sales value for each customer, our problem is reduced by determining which bin each customer belongs to.
This is very easily achieved as we can derive the lower and upper bound for the Resellers table. That is, the lower bound will be a rounded down amount of their sales and the upper bound will be the rounded up value (that is rounded nearest to the 100,000 interval). Finally, we must ensure that the ordering of the bins is correct so that the bins appear from the highest value resellers to the lowest.
For convenience, these steps are broken down through the creation of additional columns but they need not be—we could incorporate the steps into a single formula (mind you, it would be hard to read). Additionally, we have provided a unique name for the first bin by testing for 0 sales. This may not be required.
The rounding is done with the ROUNDDOWN and ROUNDUP functions. These functions simply return the number moved by the number of digits offset. The following is the syntax for ROUNDDOWN:
Since we are interested only in the INTEGER values (that is, values to the left of the decimal place), we must specify <num_digits> as -5.
The display value of the bin is controlled through the FORMAT function, which returns the text equivalent of a value according to the provided format string. The syntax for FORMAT is:
In presenting a USD Gross Sales value for the Resellers table, we may not be interested in all the historic data. A typical variation on this theme is to determine the current worth by showing the recent history (or summing recent sales). This requirement can be easily implemented into the preceding method by swapping USD Gross Sales with recent sales. To determine this amount, we need to filter the data used in the SUM function. For example, to determine the last 30 days' sales for a reseller, we will use the following code:
SUMX( FILTER('Reseller Sales' , 'Reseller Sales'[Order dt]> (MAX('Reseller Sales'[Order dt])-30) ) , USD SALES EXPRESSION )
|Over 50 tips and tricks for analytical modeling using Business Intelligence Semantic Models with SQL Server 2012 and PowerPivot with this book and ebook|
eBook Price: $32.99
Book Price: $54.99
Defining many-to-many relationships
Many-to-many relationships appear in a number of modeling situations. Perhaps, the most common scenario and explanation is that of a bank account and their owners. In this situation, a bank account can have many owners who are equally responsible for the balance. However, the owners can have different accounts. The challenge faced in tabular modeling is that the relationship includes a direction that does not actively filter the fact data. The use of the bridge table breaks the standard downstream filtering that is usually applied in a star schema.
This recipe demonstrates how to create the many-to-many relationship by extending the previous recipe to include store ownership. In this example, a reseller can be owned by one or more owners. Furthermore, each owner has an ownership interest (as a percentage interest in a reseller).
The following are the standard types of questions that the model is expected to answer:
- How many stores does an owner have?
- What is the owner's interest (ownership proportion) of the store(s)?
- What is the total value (both apportioned and unapportioned) of sales for owners' stores?
This recipe extends the prior recipe (Grouping by binning and sorting with ranks) by adding additional data to the model. The new data is a list of owners and the table shows the ownership interest.
The Owners.txt file identifies the owner (employee by ID and name), whereas the ownership interest table, Store Owners.txt, identifies the store, owner, and their interest with a percentage stake.
How to do it…
In building the many-to-many relationship, we first show what the implications of not correctly defining measures are. We then show how to correct measure outcomes using DAX. Let's start with the workbook that has been developed so far:
- Open the model developed in the previous recipe.
- Import the text files Owners.txt and Store Owners.txt into the model.
- Create a relationship between Customer ID in the Resellers table and Customer ID in the Store Owners table.
- Create a relationship between the Employee ID column in the Store Owners table and the Employee ID column in the Owners table. The relationship between the Reseller Sales table and the Owners model should look like the following screenshot:
Note that the direction of the relationship between Store Owners and Resellers breaks the unidirectional filtering path that will otherwise be created, as shown in the following diagram:
- Create a pivot table that shows Store Owners on rows and the measure USD Gross Sales. The pivot will look like the following screenshot:
- Create a measure to show the number of stores owned by an owner. Add the following code to the Store Owners table and format it as a (separated by a comma) number with no decimal spaces:
Stores Owned := DISTINCTCOUNT('Store Owners'[Customer ID])
- Add the measure Stores Owned to the pivot. Filter the pivot so that the first two owners are shown. Then, add the Customer ID column from the Resellers table as a secondary member on rows—all stores will be shown because the measure USD Gross Sales (Raw) appears for every store. Hide the stores that do not have an owner by filtering the pivot to show only the stores that have active owners. Select the Customer ID cell in the pivot and then apply a value filter so that Stores Owned does not equal to 0. The pivot will now look like the following screenshot:
- Add a measure to the Store Owners table to show average ownership (interest) using the following code. And then, add the Ownership % (AVG) measure to the pivot table.
Onwership % (AVG):=AVERAGE([Ownership Percent])
We can easily verify that the ownership interest (and stores owned) is correct by selecting an employee (in this case, employee ID of A. Scott Wright is 685233686) and filtering the Store Owners table within the model. If we do this, the table is reduced to only stores owned by Scott. Have a look at the following screenshot:
Note that the filter indicator also shows that the column is filtered.
- Create a measure to show the USD Gross Sales value for stores that are owned. Add Gross Sales (All Owners) to the Resellers Sales table:
CALCULATE([USD Gross Sales], 'Store Owners')
- Create a measure to show the USD Gross Sales value for the owners based on their interest percentage in the store. Add Gross Sales (Ownership) to the Resellers Sales table as:
CALCULATE( SUMX('Store Owners' , 'Store Owners'[Ownership Percent]/100 *[USD Gross Sales]) , 'Store Owners' )
- Format both measures as numeric values, separated by a comma, with no decimal places.
- Hide the unwanted columns from client tools. Hide all columns in the Store Owners and Resellers Sales tables.
- To verify the results, alter the existing pivot table. Ensure that the USD Gross Sales, Ownership % (AVG), Gross Sales (All Owners), and Gross Sales (Ownership) measures are added to the pivot. Remove Customer ID from the rows. The pivot will look like the following screenshot:
- Create a new pivot that shows sales information for A. Scott Wright. Add Customer ID from the Resellers table on the rows and the measures USD Gross Sales (Raw), Gross Sales (All Owners), and Gross Sales (Ownership) on the columns. Add a filter to the pivot (Employee ID from the Owners table) and then filter to Scott. Create a value filter for rows, so that the measure Gross Sales (All Owners) does not equal to 0. The pivot will look like the following screenshot:
- Note that Scott's sales Gross Sales (All Owners) match the USD Gross Sales and that Gross Sales (Ownership) are equal to the all owners' sales multiplied by the Ownership % (AVG). Furthermore, note that the totals for Scott match the prior pivot's line items.
How it works…
We have previously seen that the calculation of a measure is a calculation that is performed on data that is restricted by the query context (the pivot table creates this context by the intersections of the pivot table rows and columns). So far, all models have implemented relationships in a unidirectional manner, so that the relationships within the model filter the data that is being used in the calculation. However, the creation of a relationship between the Store Owners table and the Resellers table does not follow this outward pattern and hence the data in the Resellers Sales table is not filtered when fields from the Store Owners table or Owners table are used in the pivot table.
This is the reason why the first pivot table shows duplicate values for the USD Gross Sales (Raw) measure. Note that the direction of the relationship between Resellers and Store Owners is different from the others, as shown in the following screenshot:
This does not affect the calculations that are performed between tables where the relationship and filter context automatically restrict the rows that the calculation is performed on. Therefore, the measures for Stores Owned and Ownership % can be automatically derived through any filter that is applied to the Store Owners table, since the relationship dictates that the Store Owners table is automatically filtered by Owners. That is, the number of stores owned will always be derived by the (query) filters which occur between the Owners and Resellers tables.
If there is no filter applied (forcing all customer IDs to be shown), we only want to count individual stores (that are applicable based on the Owners filter) and so, we need to apply the DISTINCTCOUNT function to the Customer ID column in the Store Owners table. The DISTINCTCOUNT function performs, as the name suggests, and returns the number of unique values of the provided column. This gives us what we want—the number of unique stores under the current filter context.
While it is possible to use the COUNTROWS function for the Store Owners table, the value returned will only be correct if there is a filter context on the employee. There is no filter for a total level and, therefore, relying on COUNTROWS would give the total number of owners and store combinations, rather than the number of stores.
Now, let's consider the measure Gross Sales (All Owners). The purpose of the measure Gross Sales (All Owners) is to show the USD Gross Sales measure whenever a valid reseller or owner combination occurs. The value at a dimensions aggregate level (say for example, all owners) should be the full amount without adding the individual detail items.
This is achieved by specifying the bridge table in the calculation for Gross Sales (All Owners). Revisiting the formula, we can see that the filter is applied on the basis of the Store Owners table. Thus, the Boolean predicate (the part of the CALCULATE function that filters the table) is only applied if a row exists in the current queries filter context. In this way, we only show the measure USD Gross Sales (Raw) when a row in the queries filter context leaves a row in the Store Owners table.
Gross Sales (All Owners):= CALCULATE([USD Gross Sales], 'Store Owners')
While the measure Gross Sales (All Owners) shows the value of total sales for stores that are owned (by an employee), it does not show how much of those sales the owner is entitled to. In order to determine each owner's ownership interest in sales, we extend the concept that was applied to Gross Sales (All Owners) at their ownership interest as defined in the Store Owners table. Stated another way, we wish to sum each row's ownership interest of the sales amount multiplied by the Ownership Percent field.
We have already seen how SUMX can be used to iterate over rows, which is exactly the same principle that is applied in this formula. That is, iterate over each row in the Store Ownership table that has been filtered by the query context to and multiply the amount that is shown by the rows ownership proportion (that is, by the Ownership Percent field).
Using the last non-empty function for stock data
Most OLTP (Operational) systems contain summary tables to record the on hand quantity of stock. This is recorded at the level of detail appropriate to the system (for example, product item and location).
Of course, the current value of any item or location(s) should be determined by aggregating the net movement of all the prior transactions; however, the performance of such a calculation is unacceptably slow in traditional relational environments. Therefore, in order to retain balance history, the snapshots of the quantities are taken at key dates (say for example, the month end).
A similar type of recording process occurs in data marts and data warehouse environments. In these situations, the fact data (table) is commonly referred to as a periodic snapshot because the snapshot of the data is taken at periodic intervals.
For these types of tables, Stock on Hand quantities cannot be aggregated across time dimensions because the aggregated value would not give the correct result. Consider a situation where the daily balance of stock is held in a table (that is, there is a record of the balance for every day). While we can easily determine daily balances, we cannot derive a monthly balance by adding up all the daily balances. Instead, we must return the values on the last day of the month. This adjustment is only applicable to aggregations across a date dimension. All other dimensions (for example, stores and products) should aggregate according to the hierarchy that is selected.
SQL Server Analysis Services (SSAS) multidimensional modeling includes a special aggregation function (last non-empty) to retrieve the latest balance when dates were aggregated. However, there is no such function in tabular modeling and the result has to be determined with DAX.
This recipe builds on the tabular model that was developed in the Defining many-to-many relationships recipe. Continuing with the example developed in this recipe, we assume that our resellers have implemented a just-in-time inventory system. When a product is sold, it is ordered from the head office and delivered on the same day (this is a stock movement in for the Resellers table). Then, on the sales shipping date, when the product is shipped from the reseller, the stock is transferred out of the store's holdings.
Examine a sample of stock movements (filtered on customer AW00000438 and product BK-R79Y-48). Here, we can trace that the daily balance (QTY_BAL) is incremental and equal to the prior day's balance, and the net effect of QTY_IN and QTY_OUT.
How to do it…
As with the previous recipes, we start by adding some additional data to our model.
- Launch the PowerPivot Window.
- Import the text files Inventory Balances.txt, Products.txt, and Dates.txt.
- Create a relationship between the Customer ID column in the Inventory Balances table and the Customer ID column in the Resellers table.
- The Dates table will act as a date dimension and can also be used to filter the Reseller Sales data (this was not included in the previous recipe). Create a relationship between the Day field in the Dates table and the Movement Dt field in the Inventory Balances table and a relationship between the Day field in the Dates table and the Order dt in the Resellers table.
- Create a relationship between the Product ID field in the Products table and the Product ID field in the Reseller Sales table. Then, create another relationship between the Product ID field in the Products table and the Product ID field in the Inventory Balances table.
- Hide all fields in the Inventory Balances table.
- Set the sort column from Month Name to Month Number in the Dates table and hide Month Number from client tools. Then, create a hierarchy titled Date by Year that has the levels: Year, Month Name, and Day.
- Hide all fields in the Resellers Sales table from client tools.
- The model should appear, as shown in the following screenshot (note that we are excluding tables that are not necessary for inventory balances):
- Add a measure to the Inventory Balances table to calculate the net effect of stock movements. Add Stock Movement as follows:
Stock Movement := SUM([QTY_IN]) + SUM([QTY_OUT])
- Add a measure to the Inventory Balances table to aggregate the value of Stock on Hand. Add QTY BAL as follows:
QTY BAL := SUM([QTY_BAL])
- Add a measure to the Inventory Balances table, which determines the value of Stock on Hand (the stock balance) based on the addition of all historical transactions. Add Stock Balance (Trans) as follows:
Stock Balance (Trans):= CALCULATE ( [Stock Movement] , FILTER(ALL(Dates) , Dates[Day]<=MAX(Dates[Day]) ) )
- Add a measure in the Inventory Balances table to show what the last date was, which was associated with any stock movements that is within the current (Date) filter context. Add Stock Date and format the date accordingly using the following code:
Stock Date:= LASTNONBLANK('Inventory Balances'[Movement Dt] , 'Inventory Balances'[QTY BAL] )
- Add a measure to the Inventory Balances table to show the total stock balance at Stock Date. This is the last non-empty (or stock balance) aggregation. Add Stock on Hand using the following code:
Stock On Hand:= CALCULATE ( [QTY BAL] , LASTNONBLANK('Inventory Balances'[Movement Dt] , 'Inventory Balances'[QTY BAL] ) )
- Create a pivot table (which we will refer to as Pivot 1) that shows the Stock on Hand value for the customer AW00000438 and the product BK-R79Y-48 (this was the sample extract shown earlier in the article). The pivot should have filters for Customer ID and Product ID, the Date by Year hierarchy on rows, and the Stock Date and Stock on Hand measures on the columns. Expand the June 2008 value to look like the following screenshot:
- Create another pivot table (which will be referred to as Pivot 2), which shows all the entries in the Resellers and Products tables. Include the measures Stock Date, Stock on Hand, and Stock Balance (Trans). The pivot should look like the following screenshot:
How it works…
The Stock Balance (Trans) measure in Pivot 2 shows how the Stock Balance measure can be calculated by aggregating transactional movements from prior periods. However, using this technique, one may question whether the balance from the year 2008 should be continued into 2009 because the balance does not relate to the future years—or does it? This question is often argued in reporting teams!
The traditional approach to the last non-empty problem is to return all data on the last date within the current period that is specified by the filter context.
We reiterate that the concept of a current period is a logical condition imposed by the query (or the date reference imposed by the user). Unlike SSAS multidimensional, tabular modeling does not support the current member navigation within a time dimension.
The measures in Pivot 2 demonstrate how this works through the use of Stock Date. The purpose of the measure is to return the last date within the Inventory Balances table, within the current filter context, that is, the last non-empty date. Note that this is not the same value as the month end (which is the value that would be returned had a MAX(Dates[Date]) function been used).
In order to determine the last date of the activity, we rely on the LASTNONBLANK function. This simply returns the last value in the <column> parameter where the <expression> parameter is not blank. Since we are interested in stock balances, we would naturally use the QTY BAL measure (since it is the sum of the balance field).
Once this date is determined, we can use the value as the last non-empty date and aggregate the values from the Inventory Balances table. This is done by specifying the value as the filter within the CALCULATE command. Remember, that the syntax for CALCULATE is:
By specifying the LASTNONBLANK value as a <filter> argument, we automatically apply the filter without the need to completely redefine the queries filter context.
In summary, these formulas work by effectively filtering Stock on Hand records to the last date based on the current filter context.
One may question, why the recipe does not manipulate the Date table filter context, for example, trying to filter stock records based on the maximum date in the query filter context. Trying to determine a stock balance in this way creates and reapplies many filters to the Date table and can cause undesired results. Consider the logic being applied—I redefine the date filter to determine the last date which had stock, then I try to use this date by removing the queries filter context on the dates table. However, removing the filter changes the filter that is applied when we try to get the last date.
In this article we learned about some advanced modeling functionality and how the model can be used to manipulate its own data thus presenting new information. For example, we looked at the dynamic generation of bins (that is, the grouping of data), many-to-many relationships, and stock calculations over time.
Resources for Article:
- The Spotfire Architecture Overview [Article]
- Preparation Analysis of Data Source [Article]
- Easily Writing SQL Queries with Spring Python [Article]
|Over 50 tips and tricks for analytical modeling using Business Intelligence Semantic Models with SQL Server 2012 and PowerPivot with this book and ebook|
eBook Price: $32.99
Book Price: $54.99
About the Author :
Paul te Braak (ptebraak@abaXdata.com.au) is a leading Business Intelligence Consultant based in Australia. He has been involved in Information Management for over 15 years, with the past 9 years focusing on the Microsoft Business Intelligence stack. His areas of interest include data modeling, data mining, and visualization. He is an active participant in the SQL Server community, speaks at various local and international events, and organizes a regional SQL Server Saturday. His blog can be found at www.paultebraak.wordpress.com.