# How to Perform Iteration on Sets in MDX

August 2011

## MDX with Microsoft SQL Server 2008 R2 Analysis Services Cookbook

 More than 80 recipes for enriching your Business Intelligence solutions with high-performance MDX calculations and flexible MDX queries in this book and eBook

(For more resources on Microsoft SQL Server, see here.)

Iteration is a very natural way of thinking for us humans. We set a starting point, we step into a loop, and we end when a condition is met. While we're looping, we can do whatever we want: check, take, leave, and modify items in that set. Being able to break down the problems in steps makes us feel that we have things under control. However, by breaking down the problem, the query performance often breaks down as well. Therefore, we have to be extra careful with iterations when data is concerned.

If there's a way to manipulate the collection of members as one item, one set, without cutting that set into small pieces and iterating on individual members, we should use it. It's not always easy to find that way, but we should at least try.

# Iterating on a set in order to reduce it

Start a new query in SSMS and check that you're working on the right database. Then write the following query:

`SELECT   { [Measures].[Customer Count],      [Measures].[Growth in Customer Base] } ON 0,   NON EMPTY   { [Date].[Fiscal].[Month].MEMBERS } ON 1FROM   [Adventure Works]WHERE   ( [Product].[Product Categories].[Subcategory].&[1] )`

The query returns fiscal months on rows and two measures: a count of customers and their growth compared to the previous month. Mountain bikes are in slicer.

Now let's see how we can get the number of days the growth was positive for each period.

## How to do it...

Follow these steps to reduce the initial set:

1. Create a new calculated measure in the query and name it Positive growth days.
2. Specify that you need descendants of current member on leaves.
3. Wrap around the FILTER() function and specify the condition which says that the growth measure should be greater than zero.
4. Apply the COUNT() function on a complete expression to get count of days.
5. The new calculated member's definition should look as follows, verify that it does.
`WITHMEMBER [Measures].[Positive growth days] AS  FILTER(    DESCENDANTS([Date].[Fiscal].CurrentMember, , leaves),    [Measures].[Growth in Customer Base] > 0       ).COUNT`
6. Add the measure on columns.
7. Run the query and observe if the results match the following image:

## How it works...

The task says we need to count days for each time period and use only positive ones. Therefore, it might seem appropriate to perform iteration, which, in this case, can be performed using the FILTER() function.

But, there's a potential problem. We cannot expect to have days on rows, so we must use the DESCENDANTS() function to get all dates in the current context.

Finally, in order to get the number of items that came up upon filtering, we use the COUNT function.

## There's more...

Filter function is an iterative function which doesn't run in block mode, hence it will slow down the query. In the introduction, we said that it's always wise to search for an alternative if available. Let's see if something can be done here. A keen eye will notice a "count of filtered items" pattern in this expression. That pattern suggests the use of a set-based approach in the form of SUM-IF combination. The trick is to provide 1 for the True part of the condition taken from the FILTER() statement and null for the False part. The sum of one will be equivalent to the count of filtered items.

In other words, once rewritten, that same calculated member would look like this:

`MEMBER [Measures].[Positive growth days] AS  SUM(    Descendants([Date].[Fiscal].CurrentMember, , leaves),    IIF( [Measures].[Growth in Customer Base] > 0, 1, null)      )`

Execute the query using the new definition. Both the SUM() and the IIF() functions are optimized to run in the block mode, especially when one of the branches in IIF() is null. In this particular example, the impact on performance was not noticeable because the set of rows was relatively small. Applying this technique on large sets will result in drastic performance improvement as compared to the FILTER-COUNT approach. Be sure to remember that in future.

http://tinyurl.com/SumIIF

### Hints for query improvements

There are several ways you can avoid the FILTER() function in order to improve performance.

• When you need to filter by non-numeric values (i.e. properties or other metadata), you should consider creating an attribute hierarchy for often-searched items and then do one of the following:
• Use a tuple when you need to get a value sliced by that new member
• Use the EXCEPT() function when you need to negate that member on its own hierarchy (NOT or <>)
• Use the EXISTS() function when you need to limit other hierarchies of the same dimension by that member
• Use the NONEMPTY() function when you need to operate on other dimensions, that is, subcubes created with that new member
• Use the 3-argument EXISTS() function instead of the NONEMPTY() function if you also want to get combinations with nulls in the corresponding measure group (nulls are available only when the NullProcessing property for a measure is set to Preserve)
• When you need to filter by values and then count a member in that set, you should consider aggregate functions like SUM() with IIF() part in its expression, as described earlier.

(For more resources on Microsoft SQL Server, see here.)

# Iterating on a set in order to create a new one

There are situations when we don't want to eliminate certain members from a set, but instead execute a for-each type of loop. This is done using the GENERATE() function. In this recipe we'll show you how to create a new set of members from the existing one.

Start a new query in SSMS and check that you're working on the right database. Then write the following query:

`SELECT  NON EMPTY  { [Date].[Calendar].[Calendar Year].MEMBERS *     [Measures].[Sales Amount] } ON 0,  NON EMPTY  { [Sales Territory].[Sales Territory Country].MEMBERS }  ON 1FROM  [Adventure Works]`

The query returns 4 years on columns and 6 countries on rows. This recipe shows how to get a set of best months, one for each year.

## How to do it...

Follow these steps to create a new set from the initial one:

1. Cut the years from columns and define a named set using them.
2. Name that set Best month per year.
3. Wrap that set in the Generate() function so that the set of years becomes its first argument.
4. The second argument should be the TopCount() function which uses the descendants of each year on the Month level and finds the best month according to the value of the measure Sales Amount.
5. Put the name of the new set on columns.
6. The final query should look as follows:
`WITHSET [Best month per year] AS  Generate( [Date].[Calendar].[Calendar Year].MEMBERS,        TopCount(        Descendants( [Date].[Calendar].CurrentMember,              [Date].[Calendar].[Month],              SELF ),        1,        [Measures].[Sales Amount] )      )SELECT  NON EMPTY  { [Best month per year] *       [Measures].[Sales Amount] } ON 0,  NON EMPTY  { [Sales Territory].[Sales Territory Country].MEMBERS } ON 1FROM  [Adventure Works]`
7. Execute the query. Notice that each year is replaced with a single month, the month with the best sales result in that year:

## How it works...

The Generate() function can be thought of as a for-each loop. This means that we will iterate through each member of the initial set and assign another set instead of each member. That new set can have zero, one, or many members and this can vary during the iteration. In our example we're assigning a set with one member only, the best month in each year. That member is obtained using the TopCount() function where the first argument is months of the current year in iteration, the second argument is 1 (only one member to be returned), and the third argument is the Sales Amount measure — the criterion for deciding which month is the best. Months are obtained the standard way, using the Descendants() function.

## There's more...

The CURRENTORDINAL function is a special MDX function valid only in iterations. It returns the position of the current member (or tuple, to be precise) in the set in iteration (from 0 to N, where N is the total number of members in a set). In addition to that, there's also the CURRENT function. The CURRENT function returns the current tuple in a set being iterated. Again, it's only applicable during iterations.

Both of these functions can be used to detect the current tuple and to create various calculations with the current tuple and other tuples in that set. Reversing any initial set is one example of these manipulations. Comparing the value of the current tuple with the value of the previous tuple in the set (or any one before or after) in order to isolate certain tuples is another example.

Here's how you could reverse the set of months from the previous example.

`SET [Best month per year reversed] AS  Generate( [Date].[Calendar].[Calendar Year].MEMBERS       AS MySetAlias,       TopCount(           Descendants(              MySetAlias.Item( MySetAlias.Count -                     MySetAlias.CurrentOrdinal                     - 1 ).Item(0),              [Date].[Calendar].[Month],              SELF ),       1,       [Measures].[Sales Amount] )   )`

A set alias (MySetAlias in this example) is defined for the initial set. That set alias is later used for navigation. The combination of Count and CurrentOrdinal gives us members from the end of the set to its beginning, progressively, while the Item() function serves as a pointer on members in that set.

Yes, the same operation could be done simply by sorting the months by their member key, in descending order. Nevertheless, the idea of that example was to show you the principle which can be applied on any set, especially those that can't be reversed easily.

The other example mentioned above uses the Filter() function, not the Generate() function. There, tuples can be compared to each other progressively in order to see which one has the value higher than both of its neighboring members, which would signal that the current member is a relative peak. Or the opposite, whatever is more interesting in a particular case. However, the Filter() function doesn't add new members, it only limits its initial set and for that reason it is out of the scope of this recipe.

To summarize, Current() and CurrentOrdinal() are powerful functions that allow us to perform self-joining type of operations in MDX or make use of the existing relations between dimensions and measure groups. These functions are useful not only in the Generate() function, but in other iterating functions as well, namely, the Filter() function.

### Did you know

In MDX, there's no concept of the FOR loop. Iterations cannot be based on numbers (as in other languages or on other systems). They must always be based on a set. If we need to loop exactly N times, there are two basic ways we can achieve this. One is with the existing cube structure, the other is by expanding a cube with a utility dimension. The former means that we can use date dimension and take N members from its start. Or it could be some other dimension, as long as it has enough members to loop on. The latter, using the utility dimension.

(For more resources on Microsoft SQL Server, see here.)

# Iterating on a set using recursion

Recursion is sometimes the best way to iterate a collection. Why? Because iterations using set functions (including the GENERATE() function) require that we loop through the whole set. But what if that set is big and we only need to find something in it? Wouldn't it be great to be able to stop the process when we've found what we wanted? Recursion enables just that – to stop when we're done.

In this recipe we're going to see how to calculate the average of an average using recursion.

To get started, start a new query in SSMS and check that you're working in the right database. Then write the following query:

`SELECT   { [Measures].[Order Count] } ON 0,   NON EMPTY   { Descendants( [Date].[Fiscal Weeks].[All Periods],          1 , SELF_AND_BEFORE) } ON 1FROM   [Adventure Works]`

It returns 4 fiscal years and their total on top for the Order Count measure. Now let's see how to calculate the average daily value on the week level and the average weekly level on the year level, but based on the week level, not on the date level. In other words, each level will have the average value of members on the level immediately below.

## How to do it...

Follow these steps to perform recursion over a set:

1. Define a new calculated measure and name it Average of an average.
2. Use the IIF() function and specify its True parameter as the initial measure (Order Count).
3. The value should be returned for the leaf level, so the condition in IIF() should test exactly that using the ISLEAF() function.
4. In the False parameter we should provide the calculation we want to repeat recursively. In this case it is the AVG() function used on children of the current member.
5. The measure expression inside AVG() function should be the measure being defined.
6. Check if the measure is defined as follows:
`WITHMEMBER [Measures].[Average of an average] AS   iif( IsLeaf( [Date].[Fiscal Weeks].CurrentMember ),       [Measures].[Order Count],          Avg( [Date].[Fiscal Weeks].CurrentMember.Children,              [Measures].[Average of an average] )      )  , FORMAT_STRING = '#,#'`
7. Don't forget to include that measure as the second measure on columns.
8. Run the query. The results will look as follows. The first row, the one with the All Periods member, will have the average yearly value as result, that is (56+84+435+195)/4=193. In turn, every year will have the average weekly value. The weekly values are not visible in this screenshot, but we can divide the Order Count values by 53, that is, by the number of weeks per year. That should give us the values for the Average of an average measure shown in the second column.

## How it works...

Recursions are the most difficult iteration concept to apply. Their logic is very condensed. However, once you conquer them, you'll appreciate their power and efficiency. Let's see how that solution worked.

In order to start the recursive process, we have to specify an expression that uses the same calculated measure we're defining, thereby providing a different input parameter than the one which was being used in the current pass of recursive process. In order to stop the process, we must have a branch without the reference to that measure. On top of all that, we must perform some operation to collect values on the way. Complicated? Let's analyze our query.

Fiscal years on rows are not the leaf level of the Fiscal Weeks user hierarchy. Therefore, the expression inside the IIF() statement evaluates as False. This leads us to the part where we have to calculate the average value for each child of the current member. With a small detail, the calculation should be performed using the same measure we're evaluating!

The evaluation for the current year member cannot be completed and is therefore delayed until the calculation for all its child members (weeks in this case) is performed. One by one, each week of the year in context is passed inside the definition of this measure and evaluated.

In case of a leaf member, the Order Count measure would be evaluated and returned to the outer evaluation context. Otherwise, another turn of the child member's evaluation would occur. And so on until we would finally hit leaf-level members.

In this example, weeks are the leaf level of the hierarchy being used in the query. They would be evaluated using the True part of the condition. The True parameter is without reference to the measure we're calculating, which means the recursive path would be over. The value of the Order Count measure starting from the Week 1 of FY 2006 would be collected and saved in a temporary buffer. The same process would be repeated for all weeks of that year. Only then the average of them would be calculated and returned as a value for FY 2006. After which the process would repeat for subsequent years on rows.

Let's also mention that the value for the root member (All years) is calculated with the recursion depth of 2, meaning each year it is first evaluated as an average of its weeks and then the average of its years is calculated and returned as the final result.

## There's more...

You might be wondering how does one recognize when to use recursion and when to use other types of iteration? Look for some of these pointers: relative positions, relative granulation for calculation and stop logic. If there's a mention of going back or forth from the current member in a set, but there's no fixed span, then that might be a good lead to use recursion. If there's a relative stopping point, that's another sign. Finally, if there's no explicit requirement to loop through the whole set, but moreover a requirement to stop at some point in the process, that's a definite sign to try to apply recursion as a solution to the problem.

In case no such signs exist, it's perhaps better and easier to use simple types of iterations we covered in previous recipes. The other case when you should consider straightforward iteration is when the recursion would span over more than half of the members on a particular hierarchy, that pushes the SSAS engine into the slow cell-by-cell mode.

### Earlier versions of SSAS

SSAS 2008 and later have better support for recursion than previous versions of SSAS. Optimizations have been added to the code in form of unlimited recursion depth. Versions prior to that may suffer from memory limitations in some extreme cases.

# Summary

In this article we took a look at iterations and ways to perform them.

Further resources on this subject:

You've been reading an excerpt of: