Wednesday, August 17, 2011

Interaction between MDX Subselect and Calculation

As Mosha Pasumansky has described in his blog AS2008MDX: subselects and CREATE SUBCUBE in non-visual mode, an MDX subselect performs two functions:

  1. Limits each axis set through dimension autoexists.
  2. Applies visualtotals to cell values if there are no coordinate overwrites.

The second point deserves more elaboration especially when there are MDX calculations involved. In this blog post, I will explain how subselect visualtotals works in MDX formula engine and how the presence of MDX calculations can disable visualtotals by overwriting subselect attributes. Let’s start by looking at a series of questions recently raised by an AS customer. Since the original questions were based on customer’s database, I have adapted them to using AdventureWorks.
1. A customer puzzle

First run the following MDX query to find out the original values of USA, CA, and WA.

                                              
Since the value of CA is $5,714,257.69, if we query USA while subselecting by CA, the visual totaled value of USA is $5,714,257.69 as well.

Case 1
Now if you insert a seemingly innocuous calculation into the cube script that simply says USA = USA,

the value of USA suddenly goes back to its original value.

Case 2
What the user really wanted was to define a calculation that calculates USA as the sum of CA and WA.



But he was not able to get visualtotals when he subselected CA. The value of USA is the sum of the values of CA and WA instead of just the value of CA.

Case 3
The user further experimented by assigning a constant value to WA. When he put the WA assignment before the USA assignment,


USA value was still the sum of the values of CA and WA.


Case 4
When the user moved the WA assignment after the USA assignment,



he suddenly got visualtotals on USA!

The above results confused quite a few MDX users.

2. How does subselect visualtotals work anyway?

2.1. Subselect filters without calculation

You can rest assured that there is a consistent logic underneath the above perplexing results. Before we study the interaction between subselect and calculation, let’s first go back to the basics to see how subselect visualtotals is attained when there is no calculation at all.

I’ll start with a quick review of some fundamentals of the formula engine. Every MDX query is split into one or more single-granularity queries. A single-granularity query is represented by a cube subspace, also known as subcube. The formula engine constructs evaluation nodes from those single-granularity subspaces. For every evaluation node, the formula engine determines which calculations apply to its subspace and generates a list of calculation items, one for each applicable calculation. In the future, you will see that calculation item is also called evaluation node item. But I will stick with calculation item in this post. Every calculation is associated with a subspace at the granularities where the calculation is defined. For example, the left side of the trivial assignment USA = USA defines a subspace with granularity at Country level. The formula engine then constructs a subspace for each calculation item by combining the evaluation node subspace with the calculation subspace. The process of constructing an item subspace is too complex to be covered here, but the resulting subspace is always at or below the granularities of the evaluation node subspace. That’s because a calculation simply doesn’t apply to an evaluation node if the subspace of the former is not strictly at or below the granularities of the subspace of the latter. For example, an assignment at the Country level does not affect a query at the City level. The diagram below illustrates an evaluation node and its calculation items.

In the simplest case when there is no calculation like calculated member, scope assignment, unary operator, etc., the formula engine creates a single, special calculation item, called DetailData, which fetches data from the storage engine. You can imagine that the subspace associated with any DetailData calculation item is always at the lowest granularities for the measures to be fetched. Any subselect filters are then used to filter out unwanted members from the DetailData subspace. Afterwards, when the formula engine aggregates measure values from the DetailData subspace into the evaluation node subspace, you get visualtotals since data has been filtered at the lowest level. In practice, the formula engine does not construct DetailData subspace at the lowest granularities since the storage engine can do more than just retrieving and filtering data at leaf levels, it can also aggregate physical measure values at higher granularities. So the DetailData subcubes sent to the storage engine, as shown in SQL Profiler’s Query Subcube trace events, are typically at higher granularities but include any subselect filters as subcube slices which will filter data at leaf levels.

2.2. Subselect filters with calculations.

As we just said, when there is no applicable calculation, visualtotals is achieved by applying filters to the DetailData subspace and then aggregating measure values into a higher granularity subspace. The same logic extends naturally to cases when calculations come into the picture. Unlike DetailData items, the subspace of an arbitrary calculation is not necessarily at the lowest granularities any more. When building the subspace of the calculation item for a calculation, the formula engine examines each subselect attribute one by one, applies a subselect filter only if the attribute is at or above the granularities of the subspace. So if a subspace is at quarter level, subselect filters at quarter or year levels will be applied to the subspace but subselect filters at month or day levels are skipped. The diagram below illustrates how a subselect partially filters a subspace.



In practice, every time the formula engine constructs a new subspace, be it for an evaluation node or a calculation item, it always applies the subset of subselect filters which are at or above the granularities of the new subspace.

2.3. Attributes overwritten by calculations

An MDX calculation is not only defined by the subspace to which it applies, but also by its MDX expression. When the formula engine evaluates an MDX expression, it often transforms one subspace into another by overwriting some of the attributes. For example, MDX tuple expression ([Retailer Sales Amount], [Date].[Fiscal Year].[FY 2011]) changes one subspace to another by overwriting the [Measures] attribute and the [Fiscal Year] attribute and its related attributes. More complex MDX calculation expressions are parsed into formula trees. Subexpressions within a formula tree like tuple expressions or set expressions overwrite attributes in the original subspace to produce new subspaces.

While the formula engine navigates from one subspace to another, through evaluation nodes and calculation items, it keeps track of all attributes that have been overwritten by calculation expressions so far. When it is time to apply subselect filters to a newly constructed subspace, the formula engine skips all overwritten attributes. This rule sometimes leads to unintuitive results, like the ones we saw earlier in section 1. In case 1, the dummy calculation USA = USA overwrites attributes in the [Customer Geography] hierarchy, later on the DetailData subspace is no longer filtered by the California slice on the [State-Province] attribute, the final value for USA ends up being the sum of values of all states.

The rationale behind this rule is that if a user defines a simple calculated measure like ([Internet Sales Amount], [Fiscal Year].[FY 2009]), they really want to see the sales in 2009 when they add this measure to a report even though the report may have a filter on fiscal years 2010 and 2011. If the formula engine had applied the subselect filter after the calculated measure has changed the subspace to fiscal year 2009, user would have got empty result back since the intersection between {2009} and {2010, 2011} yields empty slice on the [Fiscal Year] attribute.
2.4. Answers to puzzle questions

Since calculation USA = CA + WA in case 2 and case 3 overwrites attributes in the [Customer Geography] hierarchy as well, the same rule kicks in to prevent the subselect filter on [State-Province] from being applied, hence no visualtotals for USA. The diagram below illustrates that in cases 1 through 3, overwritten attributes prevents subselect visualtotals.

But what about case 4 when WA = 1000000 is moved after USA = CA + WA? Why did we get visualtotals in that case? This is actually nothing more than a trick question to test how well you know solving orders in MDX. Now that calculation WA = 1000000 has a higher calculation pass value than calculation USA = CA + WA, the latter calculation is not used at all. The diagram below shows that since calculation does not overwrite any attributes, subselect filters still apply.

2.5. Caveats

Note that the rule discussed in 2.3 only applies to subselect visualtotals. MDX VisualTotals function is not affected by overwritten attributes in the same way. The discrepancy between the two styles of visualtotals is due to historical reasons where VisualTotals function had to maintain backward compatibility with SQL Server 2000 behavior.

Also note that the current implementation has complications when an MDX query mixes subselect with set in the where clause. So it’s generally not a good idea to use both features together in the same query.

3. What if a subselect is arbitrarily shaped?

So far we have been applying subselect filters one attribute at a time. We have learned that for a given subspace, some subselect filters can be applied if they are at or above granularity and not overwritten, others cannot be applied if they are below granularity or have been overwritten. This is not an issue when each subselect attribute can be applied independently. What if there is a correlation between two subselect attributes? In AS jargon, what if a subselect is arbitrarily shaped but only a subset of the attributes can be applied to a subspace? In this case, the formula engine projects the subselect onto the applicable filter attributes and then restricts the subspace with the projected set. But if a partial projection is caused by some of the filter attributes overwritten by calculations, the formula engine raises an error instead. For example, insert a dummy calculation WA = WA into the cube script that overwrites the [State-Province] attribute. Issue a query Select USA From (Select {(CA, 2009), (WA, 2008)} From AW) which contains an arbitrary shape subselect and you will get the error Expression cannot be resolved in the context of an arbitrary shape. That’s because the [Fiscal Year] attribute of the arbitrary shape set applies to the subspace but the [State-Province] attribute of the arbitrary shape set does not apply as it has been overwritten by the dummy calculation.

4. Summary

Today we have learned how MDX formula engine implements subselect visualtotals. In general, subselect filters newly constructed subspaces for attributes at or above granularities. The DetailData calculation item can be thought of as working at the leaf levels therefore all subselect filters apply. Other calculation items may work at higher levels hence only a subset of subselect filters are used to restrict their subspaces. Attributes overwritten by calculations are not filtered by subselect.

7 comments:

  1. thanks for the posting. Can you explain why this query produces an error - and what one could do to correct it. The key here is that tuple removal must happen inside the sub-query in order to facilitate a visualtotal of the remaining elements in the outer query.

    select

    {measures.members} on 0,
    { [Product Categories].[Category].members } on 1

    from
    (
    select
    except({measures.members}, {[Measures].[Customer Count]}) on 0

    from [Adventure Works]
    )select

    { [Customer].[Customer Geography].[Country].members} *{measures.members} on 0,
    { [Product Categories].[Category].members } on 1

    from
    (
    select
    except(
    { [Customer].[Customer Geography].[Country].members} *{measures.members},
    {[Customer].[Customer Geography].[Country].&[United States]} * {[Measures].[Customer Count]}
    ) on 0

    from [Adventure Works]
    )

    ReplyDelete
    Replies
    1. The subselect in the 2nd query produces a slice which is an arbitrary shape set of two hierarchies, with one of the hierarchies being the [Measures] hierarchy. MDX Engine does not support such slices.

      Delete
  2. oops - I put in 2 queries. one above that works, and the one below that does not. the measures in a tuple seem to screw things up.

    ReplyDelete
  3. "The rationale behind this rule is that if a user defines a simple calculated measure like ([Internet Sales Amount], [Fiscal Year].[FY 2009]), they really want to see the sales in 2009 when they add this measure to a report even though the report may have a filter on fiscal years 2010 and 2011."

    I don't understand this logic. IMO it is counterintuitive to show any data for measure "Sales in 2009" when filtering by 2010. I would expect a null. I expect to see data when filtering by year 2009 or when no filtering by year.

    Is there any more reasons for this behaviour to exist?

    ReplyDelete
    Replies
    1. What you described is intersection semantics: finding the intersection between a slice in calculation and a slice from query. MDX formula engine implements overwrite semantics: a slice in the calculation overwrites a slice from query. The intersection semantics doesn't work for basic calculations. For example, a natural way to write year over year growth is ([Internet Sales Amount], [Date].[Calendar Year].CurrentMember) - ([Internet Sales Amount], [Date].[Calendar Year].PrevMember), this formula won't work if you implement intersection semantics since the calculation moves to the previous year and if you intersect the previous year with the current slice from query (the current year), you get empty result.

      Delete
    2. Thanks a lot! Right to the point.

      Delete
  4. Thanks for the article. Really helpful. I have another scenario:

    I need to create a resultset with 2 records from same dimension. However, on the second record I need to apply filter from another dimension.

    How can I union following resultsets in one single query?


    SELECT [Dim Account].[Account].&[acc1] ON 0
    FROM
    (
    SELECT ([Dim PC].&[PC1]) ON 0
    FROM [Fin1]
    )
    ;


    SELECT [Dim Account].[Main Account].&[acc2] ON 0
    FROM [Fin1]
    ;

    Thank you.

    ReplyDelete