In this chapter, we will cover:
Putting data on x and y axes
Skipping axes
Using a WHERE clause to filter the data returned
Optimizing MDX queries using the
NONEMPTY()
functionUsing the
PROPERTIES()
function to retrieve data from attribute relationshipsBasic sorting and ranking
Handling division by zero errors
Setting a default member of a hierarchy in the MDX script
MDX is an elegant and powerful language, and also has a steep learning curve.
The goal of this chapter is to use some simple examples to demonstrate the fundamental MDX concepts, features and techniques that are the foundations for further explorations of the MDX language.
The chapter begins with several basic techniques: putting multi-dimensional data onto query axes, cube space restriction, empty cell removal, and the important concept of unique names for members, tuples, and sets. From there, we shall turn our attention to a few more advanced features, such as using the MDX functions, creating calculations in the cube space, manipulating strings, writing parameterized queries, and conditionally formatting cell properties. This will form the basis for the rest of the chapters in this book.
SSAS 2012 provides a sample Analysis Services database, the Multidimensional Adventure Works DW. All the MDX queries and scripts in this book have been updated for Analysis Services 2012, and verified against the 2012 Enterprise Edition of the Adventure Works DW Analysis Services database. Majority of the MDX queries and scripts should also run and have been tested in SSAS 2008 R2.
The Query Editor in SQL Server Management Studio (SSMS) is our choice of writing and testing MDX queries. The SQL Server 2012 comes with a free tool, SQL Server Data Tools (SSDT) for cube developers. Just as the Business Intelligence Development Studio (BIDS) was the tool that we used for cube design and MDX scripting in SSAS 2008, SSDT is the tool we will use in this cookbook for cube design and MDX scripting for SSAS 2012.
Cube space in SSAS is multi-dimensional. MDX allows you to display results on axes from 0
, 1
, and 2
up to 128
. The first five axes have aliases: COLUMNS, ROWS, PAGES, SECTIONS, and CHAPTERS. However, the frontend tools such as SQL Server Management Studio (SSMS) or other application that you can use for writing and executing MDX queries only have two axes, x and y axis, or COLUMNS and ROWS.
As a result, we have two tasks to do when trying to fit the multi-dimensional data onto the limited axes in our frontend tool:
We must always explicitly specify a display axis for all elements in the SELECT list. We can use aliases for the first five axes: COLUMNS, ROWS, PAGES, SECTIONS, and CHAPTERS. We are also allowed to use integers,
0
,1
,2
,3
, and so on. But we are not allowed to skip axes. For example, the first axis must be COLUMNS (or0
). ROWS (or1
) cannot be specified, unless COLUMNS (or0
) has been specified first.Since we only have two display axes to show our data, we must be able to "combine" multiple hierarchies into one query axis. In MDX and other query language terms, we call it "cross join".
It's fair to say that your job of writing the MDX queries is mostly trying to figure out how to project multi-dimensional data onto only two axes, namely, x and y. We will start by putting only one hierarchy on COLUMNS, and one on ROWS. Then we will use the CROSSJOIN function to "combine" more than one hierarchy into COLUMNS and ROWS.
Making a two by eight table below in a spreadsheet is quite simple. Writing a MDX query to do that can also be very simple. Putting data on the x and y axes is a matter of finding the right expressions for each axis.
Internet Sales Amount | |
---|---|
Australia |
$9,061,000.58 |
Canada |
$1,977,844.86 |
France |
$2,644,017.71 |
Germany |
$2,894,312.34 |
NA |
(null) |
United Kingdom |
$3,391,712.21 |
United States |
$9,389,789.51 |
All we need are three things from our cube:
Once, we have the preceding three things, we are ready to plug them into the following MDX query, and the cube will give us back the two by eight table:
SELECT [The Sales Expression] ON COLUMNS, [The Territory Expression] ON ROWS FROM [The Cube Name]
The MDX engine will understand it perfectly, if we replace columns by 0
and rows by 1
. Throughout this book, we will use the number 0
for columns that is the x axis, and 1
for rows that is the y axis.
We are going to use the Adventure Works 2012 Multidimensional Analysis Service database enterprise edition in our cookbook. If you open the Adventure Works cube, and hover your cursor over the measure Internet Sales Amount, you will see the fully qualified expression, [Measures].[Internet Sales Amount]. This is a long expression. Drag-and-drop in SQL Server Management Studio works perfectly for us in this situation.
Tip
Long expression is a fact of life in MDX. Although the case does not matter, correct spelling is required and fully qualified and unique expressions are recommended for MDX queries to work properly.
Follow these two steps to open the Query Editor in SSMS:
Follow these steps to save the time spent for typing the long expressions:
Put your cursor on measure Internet Sales Amount, and drag-and-drop it onto
AXIS(0)
.To get the proper expression for the sales territory, put your cursor over the Sales Territory Country under the Sales Territory | Sales Territory Country. Again, this is a long expression. Drag-and-drop it onto
AXIS(1)
.For the name of the cube, the drag-and-drop should work too. Just point your cursor to the cube name, and drag-and-drop it in your
FROM
clause.
This should be your final query:
SELECT [Measures].[Internet Sales Amount] ON 0, [Sales Territory].[Sales Territory Country].[Sales Territory Country] ON 1 FROM [Adventure Works]
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
When you execute the query, you should get a two by eight table, same as the following screenshot:
We have chosen to put Internet Sales Amount
on the Axis(0)
, and all members of Sales Territory Country
on the Axis(1)
. We have fully qualified the measure with the special dimension [Measures]
, and the sales territory members with dimension [Sales Territory]
and hierarchy [Sales Territory Country]
.
You might have expected an aggregate function such as SUM
somewhere in the query. We do not need to have any aggregate function here because the cube understands that when we ask for the sales amount for Canada, we would expect the sales amount to come from all the provinces and territories in Canada.
SSAS cubes are perfectly capable of storing data in more than two dimensions. In MDX, we can use the technique called "cross join" to "combine" multiple hierarchies into one query axis.
In MDX query, we can specify how multi-dimensions from our SSAS cube layout onto only two x and y axes. Cross joining allows us in both SQL and MDX to get every possible combination of two lists.
We wish to write an MDX query to produce the following table. On the columns axis, we want to see both Internet Sales Amount and Internet Gross Profit. On the rows axis, we want to see all the sales territory countries, and all the products sold in each country.
Internet Sales Amount |
Internet Gross Profit | ||
---|---|---|---|
Australia |
Accessories |
$138,690.63 |
$86,820.10 |
Australia |
Bikes |
$8,852,050.00 |
$3,572,267.29 |
Australia |
Clothing |
$70,259.95 |
$26,767.68 |
Australia |
Components |
(null) |
(null) |
Canada |
Accessories |
$103,377.85 |
$64,714.37 |
Canada |
Bikes |
$1,821,302.39 |
$741,451.22 |
Canada |
Clothing |
$53,164.62 |
$23,755.91 |
Canada |
Components |
(null) |
(null) |
This query lays two measures on columns (from the same dimension and hierarchy [Measures]
), and two different hierarchies [Sales Territory Country]
and [Product Categories]
on rows.
SELECT { [Measures].[Internet Sales Amount], [Measures].[Internet Gross Profit] } ON 0, { [Sales Territory].[Sales Territory Country].[Sales Territory Country] * [Product].[Product Categories].[Category] } ON 1 FROM [Adventure Works]
To return the cross product of two sets, we can use either of the following two syntaxes:
Standard syntax: Crossjoin(Set_Expression1, Set_Expression2) Alternate syntax: Set_Expression1 * Set_Expression2
We have chosen to use the alternate syntax for its convenience. The result from the previous query is shown as follows:
There are situations when we want to display just a list of members with no data associated with them. Naturally, we expect to get that list in rows, so that we can scroll through them vertically instead of horizontally. However, the rules of MDX say that we can't skip the axes. If we want something on rows (which is AXIS(1)
by the way), we must use all previous axes as well (columns in this case, which is also known as AXIS(0)
).
The reason why we want the list to appear on axis 1
and not axis 0
is because a horizontal list is not as easy to read as a vertical one.
Is there a way to display those members on rows and have nothing on columns? Sure! This recipe shows how.
Notation for empty set is this: {}
. So for the axis 0
, we would simply do this:
{ } ON 0
Follow these steps to open the Query Editor in SQL Server Management Studio (SSMS):
Follow these steps to get a one-dimensional query result with members on rows:
Put an empty set on columns (
AXIS(0)
). Notation for the empty set is this:{}
.Put some hierarchy on rows (
AXIS(1)
). In this case we used the largest hierarchy available in this cube – customer hierarchy of the same dimension.Run the following query:
SELECT { } ON 0, { [Customer].[Customer].[Customer].MEMBERS } ON 1 FROM [Adventure Works]
Although we can't skip axes, we are allowed to provide an empty set on them. This trick allows us to get what we need – nothing on columns and a set of members on rows.
Skipping the Axis(0)
is a common technique to create a list for report parameters. If we want to create a list of customers whose name contains "John", we can modify the preceding base query to use two functions to get only those customers whose name contains the phrase John
. These two functions are Filter()
and InStr()
:
SELECT { } ON 0, { Filter( [Customer].[Customer].[Customer].MEMBERS, InStr( [Customer].[Customer].CurrentMember.Name, 'John' ) > 0 ) } ON 1 FROM [Adventure Works]
In the final result, you will notice the "John" phrase in various positions in member names:
Instead of skipping the Axis(0)
, if you put a cube measure or a calculated measure with a non-constant expression on axis 0
, you'll slow down the query. The slower query time can be noticeable, if there are a large number of members from the specified hierarchy. For example, if you put the Sales Amount
measure on axis 0
, the Sales Amount will have to be evaluated for each member in the rows. Do we need the Sales Amount? No, we don't. The only thing we need is a list of members; hence we've used an empty set {} on axis 0. That way, the SSAS engine doesn't have to go into cube space to evaluate the sales amount for every customer. The SSAS engine will only reside in dimension space, which is much smaller, and the query is therefore more efficient.
Some client applications might have issues with the MDX statement skipping axes because they expect something on columns, and will not work with an empty set on axis 0
. In this case, we can define a constant measure (a measure returning null, 0, 1 or any other constant) and place it on columns. In MDX's terms, this constant measure is a calculated measure. It will act as a dummy column. It might not be as efficient as an empty set, but it is a much better solution than the one with a regular (non-constant) cube measure like the Sales Amount
measure.
This query creates a dummy value NULL on columns:
WITH MEMBER [Measures].[Dummy] AS NULL SELECT { [Measures].[Dummy] } ON 0, { [Customer].[Customer].[Customer].MEMBERS } ON 1 FROM [Adventure Works]
A WHERE
clause in MDX works in a similar way as the other query languages. It acts as a filter and restricts the data returned in the result set.
Not surprisingly, however, the WHERE
clause in MDX does more than just restricting the result set. It also establishes the "query context".
The MDX WHERE
clause points to a specific intersection of cube space. We use tuple expressions to represent cells in cube space. Each tuple is made of one member, and only one member, from each hierarchy.
The following tuple points to one year, 2008 and one measure, the [Internet Sales Amount]
:
( [Measures].[Internet Sales Amount], [Date].[Calendar Year].&[2008] )
Using a tuple in an MDX WHERE
clause is called "slicing" the cube. This feature gives the WHERE
clause another name, slicer. If we put the previous tuple in the WHERE
clause, in MDX terms, we are saying, "show me some data from the cube sliced by sales and the year 2008".
That is what we are going to do next.
Open the Query Editor in SSMS, and then follow these steps to write a query with a slicer and test it:
Copy this initial query into the Query Editor and run the query. You will see the following result:
SELECT { [Customer].[Customer Geography].[Country] } ON 0, { [Product].[Product Categories].[Category] } ON 1 FROM [Adventure Works]
At this point, we should ask the question, "What are the cell values?" The cell values are actually the
[Measures]
.[Reseller Sales Amount]
, which is the default member on theMeasures
dimension.Add the previous tuple to the query as a slicer. Here is the final query:
SELECT { [Customer].[Customer Geography].[Country] } ON 0, { [Product].[Product Categories].[Category] } ON 1 FROM [Adventure Works] WHERE ( [Measures].[Internet Sales Amount], [Date].[Calendar Year].&[2008] )
Ask the question again, "What are the cell values?" The cell values are now the
[Measures]
.[Internet Sales Amount]
, and no longer the default measure.
We can slice the data by pointing to a specific intersection of cube space. We can achieve this by putting a tuple in the WHERE
clause.
In the preceding example, the cube space is sliced by sales and year 2008. The cell values are the Internet Sales Amount for each country and each product category, sliced by year 2008.
Notice that the data returned on the query axes can be completely different from the tuple in the WHERE
clause. The tuples in the slicer will only affect the cell values in the intersection of rows and columns, not what are on the columns or rows axes.
If you need to display sales and year 2008 on the query axes, you would need to move them to the query axes, and not in the WHERE
clause.
This query has moved the sales to the columns axis, and the year 2008 to the rows axis. They both are "crossjoined" to the original hierarchies on the two query axes:
SELECT { [Measures].[Internet Sales Amount] * [Customer].[Customer Geography].[Country] } ON 0, { [Date].[Calendar Year].&[2008] * [Product].[Product Categories].[Category] } ON 1 FROM [Adventure Works]
Run the query and you will get the following result. The call values are the same as before, but now we have the year 2008 on the rows axis, and the Internet Sales Amount on the columns axis.
The NonEmpty()
function is a very powerful MDX function. It is primarily used to improve query performance by reducing sets before the result is returned.
Both Customer and Date dimensions are relatively large in the Adventure Works DW 2012 database. Putting the cross product of these two dimensions on the query axis can take a long time. In this recipe, we'll show how the NonEmpty()
function can be used on the Customer and Date dimensions to improve the query performance.
Start a new query in SSMS and make sure that you're working on the Adventure Works DW 2012 database. Then write the following query and execute it:
SELECT { [Measures].[Internet Sales Amount] } ON 0, NON EMPTY Filter( { [Customer].[Customer].[Customer].MEMBERS } * { [Date].[Date].[Date].MEMBERS }, [Measures].[Internet Sales Amount] > 1000 ) ON 1 FROM [Adventure Works]
The query shows the sales per customer and dates of their purchases, and isolates only those combinations where the purchase was over 1000 USD.
On a typical server, it will take more than a minute before the query will return the results.
Now let's see how to improve the execution time by using the NonEmpty()
function.
Follow these steps to improve the query performance by adding the NonEmpty()
function:
Wrap
NonEmpty()
around the cross join of customers and dates so that it becomes the first argument of that function.Use the measure on columns as the second argument of that function.
This is what the MDX query should look like:
SELECT { [Measures].[Internet Sales Amount] } ON 0, NON EMPTY Filter( NonEmpty( { [Customer].[Customer].[Customer].MEMBERS } * { [Date].[Date].[Date].MEMBERS }, { [Measures].[Internet Sales Amount] } ), [Measures].[Internet Sales Amount] > 1000 ) ON 1 FROM [Adventure Works]
Execute that query and observe the results as well as the time required for execution. The query returned the same results, only much faster, right?
Both the Customer and Date dimensions are medium-sized dimensions. The cross product of these two dimensions contains several million combinations. We know that typically, the cube space is sparse; therefore, many of these combinations are indeed empty. The Filter()
operation is not optimized to work in block mode, which means a lot of calculations will have to be performed by the engine to evaluate the set on rows, whether the combinations are empty or not.
Fortunately, the NonEmpty()
function exists. This function can be used to reduce any set, especially multidimensional sets that are the result of a cross join operation. It removes the empty combinations of the two sets before the engine starts to evaluate the sets on rows. A reduced set has fewer cells to be calculated, and therefore the query runs much faster.
Regardless of the benefits that were shown in this recipe, NonEmpty()
should be used with caution. Here are some good practices regarding the NonEmpty()
function:
Use it with sets, such as named sets and axes.
Use it in the functions which are not optimized to work in block mode, such as with the
Filter()
function.Avoid using it in other MDX set functions that are optimized to work in block mode. The use of
NonEmpty()
inside optimized functions will prevent them from evaluating the set in block mode. This is because the set will not be compact once it passes theNonEmpty()
function. The function will break it into many small non-empty chunks, and each of these chunks will have to be evaluated separately. This will inevitably increase the duration of the query. In such cases, it is better to leave the original set intact, no matter its size. The engine will know how to run over it in optimized mode.
Both the NonEmpty()
function and the NON EMPTY
keyword can reduce sets, but they do it in a different way.
The NON EMPTY
keyword removes empty rows, columns, or both, depending on the axis on which that keyword is used in the query. Therefore, the NON EMPTY
operator tries to push the evaluation of cells to an early stage whenever possible. This way the set on axis becomes already reduced and the final result is faster.
Take a look at the initial query in this recipe, remove the Filter()
function, run the query, and notice how quickly the results come, although the multidimensional set again counts millions of tuples. The trick is that the NON EMPTY
operator uses the set on the opposite axis, the columns, to reduce the set on rows. Therefore, it can be said that NON EMPTY
is highly dependent on members on axes and their values in columns and rows.
Contrary to the NON EMPTY
operator found only on axes, the NonEmpty()
function can be used anywhere in the query.
The NonEmpty()
function removes all the members from its first set, where the value of one or more measures in the second set is empty. If no measure is specified, the function is evaluated in the context of the current member.
In other words, the NonEmpty()
function is highly dependent on members in the second set, the slicer, or the current coordinate, in general.
If a second set in the NonEmpty()
function is not provided, the expression is evaluated in the context of the current measure in the moment of evaluation, and current members of attribute hierarchies, also in the time of evaluation. In other words, if you're defining a calculated measure and you forget to include a measure in the second set, the expression is evaluated for that same measure which leads to null, a default initial value of every measure. If you're simply evaluating the set on the axis, it will be evaluated in the context of the current measure, the default measure in the cube or the one provided in the slicer. Again, this is perhaps not something you expected. In order to prevent these problems, always include a measure in the second set.
NonEmpty()
reduces sets, just like a few other functions, namely Filter()
and Existing()
do. But what's special about NonEmpty()
is that it reduces sets extremely efficiently and quickly. Because of that, there are some rules about where to position NonEmpty()
in calculations made by the composition of MDX functions (one function wrapping the other). If we're trying to detect multi-select, that is, multiple members in the slicer, NonEmpty()
should go inside with the EXISTING
function/keyword outside. The reason is that although they both shrink sets efficiently, NonEmpty()
works great if the set is intact. EXISTING
is not affected by the order of members or compactness of the set. Therefore, NonEmpty()
should be applied earlier.
You may get System.OutOfMemory errors if you use the CrossJoin()
operation on many large hierarchies because the cross join generates a Cartesian product of those hierarchies. In that case, consider using NonEmpty()
to reduce the space to a smaller subcube. Also, don't forget to group the hierarchies by their dimension inside the cross join.
Attribute relationships define hierarchical dependencies between attributes. A good example is the relationship between attribute City
and attribute State
. If we know the current city is Phoenix, we know the state must be Arizona. This knowledge of the relationship, City | State, can be used by the Analysis Services engine to optimize performance.
Analysis Services provides the Properties()
function to allow us to retrieve data based on attribute relationships.
We will start from a classic Top 10 query that shows the Top 10 Customers. Then we will use the Properties()
function to retrieve each top 10 customer's yearly income.
This table shows what our query result should be like:
Internet Sales Amount |
Yearly Income | |
---|---|---|
Nichole Nara |
$13,295.38 |
100000 - 120000 |
Kaitlyn J. Henderson |
$13,294.27 |
100000 - 120000 |
Margaret He |
$13,269.27 |
100000 - 120000 |
Randall M. Dominguez |
$13,265.99 |
80000 - 90000 |
Adriana L. Gonzalez |
$13,242.70 |
80000 - 90000 |
Rosa K. Hu |
$13,215.65 |
40000 - 70000 |
Brandi D. Gill |
$13,195.64 |
100000 - 120000 |
Brad She |
$13,173.19 |
80000 - 90000 |
Francisco A. Sara |
$13,164.64 |
40000 - 70000 |
Maurice M. Shan |
$12,909.67 |
80000 - 90000 |
Once we get only the top 10 customers, it's easy enough to place the customer on the rows, and the Internet sales amount on the columns. What about each customer's yearly income?
Customer geography is a user-defined hierarchy in the customer dimension. In the SSMS, if you start a new query against the Adventure Works DW 2012 database, and navigate to Customer | Customer Geography | Customer | Member Properties, you will see that the yearly income is one of the member properties for the attribute Customer
. This is a good news, because now we can surely get the Yearly Income for each top 10 customer using the PROPERTIES()
function:
In SSMS, let us write the following query in a new Query Editor against the Adventure Works DW 2012 database:
This query uses the
TopCount()
function which takes three parameters. The first parameter[Customer].[Customer Geography].[Customer]
.MEMBERS
provides the members that will be evaluated for the "top count", the second integer10
tells it to return only 10 members and the third parameter[Measures].[Internet Sales Amount]
provides a numeric measure as the evaluation criteria.-- Properties(): Initial SELECT [Measures].[Internet Sales Amount] on 0, TopCount( [Customer].[Customer Geography].[Customer].MEMBERS, 10, [Measures].[Internet Sales Amount] ) ON 1 FROM [Adventure Works]
Execute the preceding query, and we should get only 10 customers back with their Internet sales amount. Also notice that the result is sorted in the descending order of the numeric measure. Now let's add a calculated measure, like:
[Customer].[Customer Geography].currentmember.Properties("Yearly Income")
To make the calculated measure "dynamic", we must use a member function
.CurrentMember
, so we do not need to hardcode any specific member name on the customer dimension. TheProperties()
function is also a member function, and it takes another attribute name as a parameter. We've provided "Yearly Income" as the name for the attribute we are interested in.Now place the preceding expression in the
WITH
clause, and give it a name[Measures].[Yearly Income]
. This new calculated measure is now ready to be placed on the columns axis, along with the Internet sales amount. Here is the final query:WITH MEMBER [Measures].[Yearly Income] AS [Customer].[Customer Geography].currentmember .Properties("Yearly Income") SELECT { [Measures].[Internet Sales Amount], [Measures].[Yearly Income] } on 0, TopCount( [Customer].[Customer Geography].[Customer].MEMBERS, 10, [Measures].[Internet Sales Amount] ) ON 1 FROM [Adventure Works]
Executing the query, we should get the yearly income for each top 10 customer. The result should be exactly the same as the table shown at the beginning of our recipe.
Attributes correspond to columns in the dimension tables in our data warehouse. Although we don't normally define the relationship between them, in the relationship database, we do so in the multidimensional space. This knowledge of attribute relationships can be used by the Analysis Services engine to optimize the performance. MDX has provided us the Properties()
function to allow us to get from members of one attribute to members of another attribute.
In this recipe, we only focus on one type of member properties, that is, the user-defined member property. Member properties can also be the member properties that are defined by Analysis Services itself, such as NAME
, ID
, KEY
, or CAPTION
; they are the intrinsic member properties.
The Properties()
function can take another optional parameter, that is the TYPED
flag. When the TYPED
flag is used, the return value has the original type of the member.
The preceding example does not use the TYPED
flag. Without the TYPED
flag, the return value is always a string.
In many business analysis, we perform arithmetical operations numerically. In the next example, we will include the TYPED
flag in the Properties()
function to make sure that the [Total Children]
for the top 10 customers are numeric.
WITH MEMBER [Measures].[Yearly Income] AS [Customer].[Customer Geography].currentmember.Properties("Yearly Income") MEMBER [Measures].[Total Children] AS [Customer].[Customer Geography].currentmember.Properties("Total Children", TYPED) MEMBER [Measures].[Is Numeric] AS IIF( IsNumeric([Measures].[Total Children]), 1, NULL ) SELECT { [Measures].[Internet Sales Amount], [Measures].[Yearly Income], [Measures].[Total Children], [Measures].[Is Numeric] } ON 0, TopCount( [Customer].[Customer Geography].[Customer].MEMBERS, 10, [Measures].[Internet Sales Amount] ) ON 1 FROM [Adventure Works]
Attributes can be simply referenced as an attribute hierarchy, that is, when the attribute is enabled as an Attribute Hierarchy.
In SSAS, there is one situation where the attribute relationship can be explored only by using the PROPERTIES()
function, that is when its property AttributeHierarchyEnabled
is set to False
.
In the employee dimension in the Adventure Works cube, employees' SSN numbers are not enabled as an Attribute Hierarchy. Its property AttributeHierarchyEnabled
is set to False
. We can only reference the SSN number in the PROPERTIES()
function of another attribute that has been enabled as Attribute Hierarchy, such as the Employee
attribute.
Sorting and ranking are very common requirements in most business analysis, and MDX provides several functions for this purpose. They are:
TopCount
andBottomCount
TopPercent
andBottomPercent
TopSum
andBottomSum
Order
Hierarchize
Rank
All of these functions operate on sets of tuples, not just on one-dimensional sets of members. They all, in some way, involve a numeric expression, which is used to evaluate the sorting and the ranking.
We will start with the classic Top 5 (or Top-n) example using the TopCount()
function. We will then examine how the result is already pre-sorted, followed by using the Order()
function to sort the result explicitly. Finally, we will see how we can add a ranking number by using the Rank()
function.
Here is the classic Top 5 example using the TopCount()
function
TopCount ( [Product].[Subcategory].children, 5, [Measures].[Internet Sales Amount] )
It operates on a tuple ([Product].[Subcategory].children
, [Measures].[Internet Sales Amount])
.
The result is the five [Subcategory]
that has the highest [Internet Sales Amount]
.
The five subcategory members will be returned in order from the largest [Internet Sales Amount]
to the smallest.
In SSMS, let us write the following query in a new Query Editor, against the Adventure Works DW 2012 database. Follow these steps to first get the top-n members:
We simply place the earlier
TopCount()
expression on the rows axis.On the columns axis, we are showing the actually sales amount for each product subcategory.
In the slicer, we use a tuple to slice the result for the year 2008 and the Southwest only.
The final query should look like the following query:
SELECT [Measures].[Internet Sales Amount] on 0, TopCount ( [Product].[Subcategory].children, 5, [Measures].[Internet Sales Amount] ) ON 1 FROM [Adventure Works] WHERE ( [Date].[Calendar].[Calendar Quarter].&[2008]&[1], [Sales Territory].[Sales Territory Region].[Southwest] )
Run the query. The following screenshot shows the Top-n result:
Notice that the returned members are in order from the largest
numeric measure
to the smallest.
Next, in SSMS, follow these steps to explicitly sort the result:
This time, we will put the
TopCount()
expression in theWITH
clause, creating it as aNamed Set
. We will name it[Top 5 Subcategory]
.On the rows axis, we will use the
Order()
function, which takes two parameters: which members we want to return and what value we want to evaluate on for sorting. The named set[Top 5 Subcategory]
is what we want to return, so we will pass it to theOrder()
function as the first parameter. The.MemberValue
function gives us the product subcategory name, so we will pass it to theOrder()
function as the second parameter. Here is theOrder()
function expression we would use:ORDER ( [Top 5 Subcategory], [Product].[Subcategory].MEMBERVALUE )
Here is the final query for sorting the result:
-- Order members with MemberValue WITH SET [Top 5 Subcategory] as TopCount ( [Product].[Subcategory].CHILDREN, 5, [Measures].[Internet Sales Amount] ) SELECT [Measures].[Internet Sales Amount] on 0, ORDER ( [Top 5 Subcategory], [Product].[Subcategory].MEMBERVALUE ) ON 1 FROM [Adventure Works] WHERE ( [Date].[Calendar].[Calendar Quarter].&[2008]&[1], [Sales Territory].[Sales Territory Region].[Southwest] )
Executing the preceding query, we get the sorted result as the screenshot shows:
Finally, in SSMS follow these steps to add ranking numbers to the Top-n result:
We will create a new calculated measure
[Subcategory Rank]
using theRank()
function, which is simply putting a one-based ordinal position of each tuple in the set[Top 5 Subcategory]
. Since the set is already ordered, the ordinal position of the tuple will give us the correct ranking. Here is the expression for theRank()
function:RANK ( [Product].[Subcategory].CurrentMember, [Top 5 Subcategory] )
The following query is the final query. It is built on top of the first query in this recipe. We've added the earlier
Rank()
function and created a calculated measure[Measures]
.[Subcategory Rank]
, which is placed on the columns axis along with the Internet Sales Amount.WITH SET [Top 5 Subcategory] AS TopCount ( [Product].[Subcategory].children, 5, [Measures].[Internet Sales Amount] ) MEMBER [Measures].[Subcategory Rank] AS RANK ( [Product].[Subcategory].CurrentMember, [Top 5 Subcategory] ) SELECT { [Measures].[Internet Sales Amount], [Measures].[Subcategory Rank] } ON 0, [Top 5 Subcategory] ON 1 FROM [Adventure Works] WHERE ( [Date].[Calendar].[Calendar Quarter].&[2008]&[1], [Sales Territory].[Sales Territory Region].[Southwest] )
Run the preceding query. The ranking result is shown in the following screenshot:
Sorting functions, such as TopCount()
, TopPercent()
, and TopSum()
operate on sets of tuples. These tuples are evaluated on a numeric expression and returned pre-sorted in the order of a numeric expression.
Using the Order()
function, we can sort members from dimensions explicitly using the .MemberValue
function.
When a numeric expression is not specified, the Rank()
function can simply be used to display the one-based ordinal position of tuples in a set.
Like the other MDX sorting functions, the Rank()
function, however, can also operate on a numeric expression. If a numeric expression is specified, the Rank()
function assigns the same rank to tuples with duplicate values in the set.
It is also important to understand that the Rank()
function does not order the set. Because of this fact, we tend to do the ordering and ranking at the same time. However, in the last query of this recipe, we actually used the Order()
function to first order the set of members of the subcategory. This way, the sorting is done only once and then followed by a linear scan, before being presented in sorted order.
As a good practice, we recommend using the Order()
function to first order the set and then ranking the tuples that are already sorted.
Handling errors is a common task, especially the handling of division by zero type errors. This recipe offers a common practice to handle them.
Start a new query in SQL Server Management Studio and check that you're working on Adventure Works database. Then write and execute this query:
WITH MEMBER [Date].[Calendar Year].[CY 2006 vs 2005 Bad] AS [Date].[Calendar Year].[Calendar Year].&[2006] / [Date].[Calendar Year].[Calendar Year].&[2005], FORMAT_STRING = 'Percent' SELECT { [Date].[Calendar Year].[Calendar Year].&[2005], [Date].[Calendar Year].[Calendar Year].&[2006], [Date].[Calendar Year].[CY 2006 vs 2005 Bad] } * [Measures].[Reseller Sales Amount] ON 0, { [Sales Territory].[Sales Territory].[Country].MEMBERS } ON 1 FROM [Adventure Works]
This query returns six countries on the rows axis, and two years and a ratio on the column axis.
The problem is that we get 1.#INF on some ratio cells. 1.#INF is the formatted value of infinity, and it appears whenever the denominator CY 2005 is null and the nominator CY 2006 is not null.
We will need help from the IIF()
function, which takes three arguments: iif
(<condition>
, <then branch>
, <else branch>
). The IIF()
function is a Visual Basic for Applications (VBA) function and has a native implementation in MDX. The IIF ( ) function will allow us to evaluate the condition of CY 2005, then decide what the ratio calculation formula should be.
Follow these steps to handle division by zero errors:
Copy the calculated member and paste it as another calculated member. During that, replace the term
Bad
withGood
in its name, just to differentiate between those two members.Copy the denominator.
Wrap the expression in an outer
IIF()
statement.Paste the denominator in the condition part of the
IIF()
statement and compare it against 0.Provide null value for the true part.
Your initial expression should be in the false part.
Don't forget to include the new member on columns and execute the query:
WITH MEMBER [Date].[Calendar Year].[CY 2006 vs 2005 Bad] AS [Date].[Calendar Year].[Calendar Year].&[2006] / [Date].[Calendar Year].[Calendar Year].&[2005], FORMAT_STRING = 'Percent' MEMBER [Date].[Calendar Year].[CY 2006 vs 2005 Good] AS IIF([Date].[Calendar Year].[Calendar Year].&[2005] = 0, null, [Date].[Calendar Year].[Calendar Year].&[2006] / [Date].[Calendar Year].[Calendar Year].&[2005] ), FORMAT_STRING = 'Percent' SELECT { [Date].[Calendar Year].[Calendar Year].&[2005], [Date].[Calendar Year].[Calendar Year].&[2006], [Date].[Calendar Year].[CY 2006 vs 2005 Bad], [Date].[Calendar Year].[CY 2006 vs 2005 Good] } * [Measures].[Reseller Sales Amount] ON 0, { [Sales Territory].[Sales Territory].[Country].MEMBERS } ON 1 FROM [Adventure Works]
The result shows that the new calculated measure has corrected the problem. The last column [CY 2006 vs 2005 Good] is now showing (null) correctly when the denominator CY 2005 is null and the nominator CY 2006 is not null.
A division by zero error occurs when the denominator is null or zero and the numerator is not null. In order to prevent this error, we must test the denominator before the division and handle the two scenarios in the two branches using the IIF()
statement.
In the condition part of the IIF statement, we've used a simple scalar number zero to determine if [Measures].[Reseller Sales Amount]
in the following slicer is zero or not. If it is zero, then it will be true and the calculated member will be NULL:
[Date].[Calendar Year].[Calendar Year].&[2005] = 0
What about the NULL condition? It turned out for a numerical value; we do not need to test the NULL condition specifically. It is enough to test just for zero because null = 0
returns true. However, we could test for NULL condition if we want to, by using the IsEmpty()
function.
For the calculated member [CY 2006 vs 2005 Good]
we could wrap the member with the IsEmpty()
function. The result will be the same:
MEMBER [Date].[Calendar Year].[CY 2006 vs 2005 Good] AS IIF(IsEmpty([Date].[Calendar Year].[Calendar Year].&[2005]), null, [Date].[Calendar Year].[Calendar Year].&[2006] / [Date].[Calendar Year].[Calendar Year].&[2005] ), FORMAT_STRING = 'Percent'
SQLCAT's SQL Server 2008 Analysis Services Performance Guide has a lot of interesting details regarding the IIF()
function, found at http://tinyurl.com/PerfGuide2008R2.
Additionally, you may find the blog article MDX and DAX topics by Jeffrey Wang explaining the details of the IIF()
function, found at http://tinyurl.com/IIFJeffrey.
If you're using a version of SSAS prior to 2008 (that is, 2005), the performance of the IIF()
function will not be as good. See Mosha Pasumansky's article for more information: http://tinyurl.com/IIFMosha.
Setting a default member is a tempting option which looks like it can be used on any dimension we would like to use it on. The truth is far from that. Default members should be used as exceptions and not as a general rule when designing dimensions.
The reason for that is not so obvious. The feature looks self-explanatory, and it's hard to anticipate what could go wrong. If we're not careful enough, our calculations can become unpredictable, especially on complex dimensions with many relationships among attributes.
Default members can be defined in three places. The easy-to-find option is the dimension itself, using the DefaultMember
property found on every attribute. The second option is the role, on Dimension Data tab. Finally, default members can be defined in the MDX script. One of the main benefits of this place is easy maintenance of all default members in the cube because everything is in one place, and in the form of an easy-to-read text. That is also the only way to define the default member of a role-playing dimension.
In this recipe we'll show the most common option, that is, the last one, or how to set a default member of a hierarchy in the MDX script. More information on setting the DefaultMember
is available at http://tinyurl.com/DefaultMember2012.
Follow these steps to set up the environment for this recipe:
Start SSMS and connect to your SSAS 2012 instance.
Click on the New Query button and check that the target database is Adventure Works DW 2012. Then execute the following query:
WITH MEMBER [Measures].[Default account] AS [Account].[Accounts].DefaultMember.Name SELECT { [Measures].[Amount], [Measures].[Default account] } ON 0 FROM [Adventure Works]
The results will show that the default member is Net Income account and its value in this context is a bit more than 12.6 million USD.
Next, open Adventure Works DW 2012 solution in SSDT.
Double-click on the Adventure Works cube and go to the Calculations tab. Choose Script View.
Position the cursor at the beginning of the script, just beneath the CALCULATE command.
Follow these steps to set a new default member:
Enter the following expression to set a new default account:
ALTER CUBE CurrentCube UPDATE DIMENSION [Account].[Accounts], Default_Member = [Account].[Accounts].&[48]; //Operating Profit
Save and deploy (or just press the Deploy MDX Script icon if you're using BIDS Helper 2012).
Notice that the result has changed. The new default account is Operating Profit, the one we specified in the MDX script using
ALTER CUBE
command. The value changed as well – now it's above 16.7 million USD.
The ALTER CUBE
statement changes the default member of a hierarchy specified in the UPDATE DIMENSION
part of the statement. The third part is where we specify which member should be the default member of that hierarchy.
Don't mind that it says UPDATE DIMENSION. SSAS 2005 interprets that as a hierarchy.
Setting the default member on a dimension with multiple hierarchies can lead to unexpected results. Due to attribute relations, related attributes are implicitly set to corresponding members, while the non-related attributes remain on their default members, that is, the All member (also known as the root member). Certain combinations of members from all available hierarchies can result in a nonexisting coordinate. In that case, the query will return no data. Other times, the intersection will only be partial. In that case, the query will return the data, but the values will not be correct, which might be even worse than no data at all.
Enter the following expression in the MDX script, deploy it, and then analyze the result in the Cube Browser tab:
ALTER CUBE CurrentCube UPDATE DIMENSION [Date].[Calendar], Default_Member = [Date].[Calendar] .[Calendar Year].&[2007]; -- "current" year on the user hierarchy
The expression sets the year 2007 as the default member of the [Date].[Calendar]
user-defined hierarchy.
The analysis of the Sales Amount measure in the Cube Browser shows good results in almost all cases except in a few. Fiscal hierarchies that have the fiscal year level in them return empty or incomplete results when used in a slicer. They are empty because the intersection between the fiscal year 2006 and the calendar year 2007 (the latter being the default member in the calendar hierarchy) is a nonexisting combination. Remember, the calendar year 2007 doesn't get overwritten by the fiscal year 2006. It gets combined (open the Date dimension in SSDT and observe the relationships in the corresponding tab). Moreover, when you put the fiscal year 2007 into the slicer, you only get a portion of data, the portion which matches the intersection of the calendar and the fiscal year. That's only one half of the fiscal year, right? In short, you have a potential problem with this approach.
Can we fix the result? Yes, we can. The correct results will be there when we explicitly select the All member from the Date.Calendar hierarchy in the slicer. Only then will we get good results using fiscal hierarchies. The question is – will the end users remember that every time?
The situation is similar when the default member is defined on an attribute hierarchy, for example, on the Date.Calendar Year hierarchy. By now, you should be able to modify the previous expression so that it sets the year 2007 as the default member on the [Date].[Calendar Year]
. Test this to see it for yourself.
Another scenario could be that you want to put the current date as the default member on the Date.Date hierarchy. Try that too, and see that when you use the year 2006 from the Date.Calendar Year hierarchy in the slicer, you get an empty result. Again, the intersection formed a nonexisting coordinate.
To conclude, you should avoid defining default members on complex dimensions. Define them where it is appropriate: on dimensions with a single non-aggregatable attribute (that is, when you set the IsAggregatable
property of an attribute to False
) or on dimensions with one or more user hierarchies where that non-aggregatable attribute is the top level on each user hierarchy, and where all relationships are well defined.
The Account dimension used in this example is not such a dimension. In order to correct it, two visible attributes should be hidden because they can cause empty results when used in a slicer. Experimenting with a scope might help too, but that adds to the complexity of the solution and hence the initial advice of keeping things simple when using default members should prevail.
Take a look at other dimensions in the Adventure Works DW 2012 database. There you will find good examples of using default members.
When you're defining the default members in an MDX script, do it at the beginning of the script. This way the calculations that follow can reference them.
In addition, provide a comment explaining which member was chosen to be the default member, and perhaps why. Look back at the code in this recipe to see how it was done.