Monday, March 21, 2011

The Logic behind the Magic of DAX Cross Table Filtering

Automatic cross filtering between columns of the same table or related tables is a very powerful feature of DAX. It allows a measure to evaluate to different values for different cells in a pivot table even though the DAX expression for the measure does not change. Filter context is the underlying mechanism that enables this magic behavior. But it is also a very tricky concept that even befuddles some DAX experts. Marco Russo and Alberto Ferrari have introduced DAX filter context in Chapter 6 of their book Microsoft PowerPivot for Excel 2010. Marco has also blogged about how Calculate function works. Recently I have run into many questions from advanced DAX users which tell me that people are still confused about how filter context works exactly. And this will be the subject of today’s post.
This post assumes that you already have basic knowledge about measures, row context, filter context, and DAX functions Calculate, Values, All, etc.
A level 200 pop quiz on DAX
If you think you already know how filter context works, let me ask you a couple of level 200 questions on DAX to see if you can explain the nuances of some DAX expressions. If you don’t feel like being challenged now, it is still beneficial to read the questions so you have some examples to better understand the following sections. The questions are based on the data model inside the publicly available sample PowerPivot workbook Contoso Samples DAX Formulas.xlsx. You can download the sample workbook to try out the formulas yourself if you want to, but it is not required to answer the questions.
Question #1.
People have heard that fact tables are automatically filtered by slices on dimension tables, but not the other way around, or in more general terms, if there is a relationship from table A to table B, A is automatically filtered by any slices on columns of B but B is not automatically filtered by any slices on columns of A. So if you select
DimProductSubcategory[ProductSubcategoryName] = “Air Conditioners”
on a pivot table slicer, measure
CountRows(DimProduct)
returns 62 as DimProduct is limited to air conditioners. On the other hand, if you select
DimProduct[ProductLabel] = “0101001”,
measure
CountRows(DimProductSubcategory)
returns 44 instead of just 1 although only a single product is selected. To filter DimProductSubcategory by the selected product label, you can define a measure as
Calculate(CountRows(DimProductSubcategory), DimProduct)
which returns 1. So it seems like when you explicitly add DimProduct as a setfilter argument of Calculate, DimProductSubcategory will be filtered by DimProduct. But if I define a measure as
Calculate(CountRows(DimProductSubcategory), Values(DimProduct[ProductLabel]))  
to explicitly add the column that I know having a slice from the pivot table to the Calculate function , the measure formula returns 44 again. So what makes setfilter expression DimProduct work but Values(DimProduct[ProductLabel]) not work even though the filter only comes from [ProductLabel] column? If you think you have to add foreign key DimProduct[ProductSubcategoryKey] to the filter context in order for DimProductSubcategory to be filtered by DimProduct, you can try
Calculate(CountRows(DimProductSubcategory), Values(DimProduct[ProductSubcategoryKey]))
but it still returns 44. If you have enough patience, you can use Values function to explicitly add all 33 columns in DimProduct one by one as setfilter arguments to Calculate function and you still will get 44 back. So what is the difference between table expression DimProduct and the enumeration of all 33 columns in that table?
Question #2.
There are 2556 records in DimDate table, therefore if you add a measure with expression
CountRows(DimDate)
to a pivot table without any filters, the measure value would be 2556. Now if you add a second measure with expression
Calculate(CountRows(DimDate), FactSales)
to the same pivot table, the measure value would be 1096 since DimDate table is filtered by FactSales table and only dates with sales records are included. But if you add a third measure with expression
Calculate(CountRows(DimDate), All(FactSales))
to the pivot table, the measure value becomes 2556 again. Since this pivot table has no filters anywhere, shouldn’t FactSales and All(FactSales) return the same table? Now add a fourth measure with expression
Calculate(CountRows(DimDate), Filter(All(FactSales), true))
to the pivot table, the measure value becomes 1096 again. All three setfilter arguments return exactly the same table, why would we get back different results?
With these questions in mind, let’s examine the logic foundation upon which the magic world of DAX is built. At the end of the post, you will be able to find a logical explanation to all these seemingly inconsistent results.
The expanded view of a DAX base table
The best way to understand DAX cross table filtering is to think of each base table as extended by its related tables. When a relationship is created from table A to table B, the new A, which is really A left outer join B, includes both columns of A and columns of B. So in DAX, a table reference FactSales really refers to
FastSales
LOJ DimProduct LOJ DimProductSubcategory LOJ DimProductCategory
LOJ DimStore LOJ DimGeography LOJ DimDate LOJ DimChannel LOJ DimPromotion,
where LOJ means left outer join. This interpretation makes it easy to understand some other DAX syntax. For example, in DAX expression
Filter(FactSales, Related(DimProduct[ProductLabel])  = “0101001”),
Related(DimProduct[ProductLabel])  refers to the value of column DimProduct[ProductLabel]  in the extended FactSales table. As a second example, DAX expression
AllExcept(FactSales, DimProduct[ProductLabel])
returns a table with all columns of extended FactSales table except for column DimProduct[ProductLabel].


Build initial filter context
DAX filter context is a stack of tables. At the beginning, the stack is empty. Given a pivot table, a filter context is initially populated by adding slicers and page filters. For each cell in a pivot table, current members of row labels and column labels also add filters to filter context. Other pivot table operations like visual totals add to initial filter context as well but I will keep things simple here. At this point, we have set up an initial filter context in which the measure expression of the current cell is to be evaluated.
Measure invocation
If SumOfSales is the name of a measure and Sum(Sales[Amount]) is its DAX formula, DAX expression
[SumOfSales]
is equivalent to
Calculate(Sum(Sales[Amount]))
 and DAX expression
[SumOfSales](Date[Year] = 2001, Store[Country] = “USA”)
is equivalent to
Calculate(Sum(Sales[Amount]), Date[Year] = 2001, Store[Country] = “USA”).
So the syntax sugar which makes a measure name look like a function name is just a clever way to add tables to filter context before evaluating the expression associated with the measure. Since invoking a measure implicitly calls Calculate, from now on I’ll just focus on Calculate function as the same rules apply equally to measures.
Add tables to filter context
Calculate function performs the following operations:
1.       Create a new filter context by cloning the existing one.
2.       Move current rows in the row context to the new filter context one by one and apply blocking semantics against all previous tables.
3.       Evaluate each setfilter argument in the old filter context and then add setfilter tables to the new filter context one by one and apply blocking semantics against all tables that exist in the new filter context before the first setfilter table is added.
4.       Evaluate the first argument in the newly constructed filter context.
If a new table is added to filter context and it has blocking semantics against some tables already in the filter context, the affected tables are checked one by one, all common columns with the new table are marked as blocked on the existing table.
Let’s look at an example. Assume the current filter context has two filters: one filter is Date[Year] = 2011, the other filter is Store[Country] = “Canada”. We want to evaluate the following expression in the context
AverageX(Distinct(Date[Month]), Calculate(Sum(Sales[Amount]), Store[Country] = “USA”)).
The first argument of AverageX sets a month in row context.  When it comes to Calculate, it first removes the month from row context and adds it to filter context, it does not block anything since there is no [Month] column in existing filters. Next Calculate adds Store[Country] = “USA” to filter context which blocks existing filter Store[Country] = “Canada”. When Sum(Sales[Amount]) is evaluated, Sales table is filtered by the current month in 2011 and stores in USA.

Targets of filter context
After so much effort populating and modifying a filter context, when will the filters be applied? In DAX, the filters in a filter context apply to following DAX table expressions:
1.       A table expression that is simply a table reference, such as FactSales.
2.       Values(Table[Column]).
3.       Distinct(Table[Column]).
In cases of 2 and 3, the Table is filtered by filter context and then distinct values of [Column] are extracted from the filtered table.
So if your expression is
Calculate(SumX(Filter(FactSales, [SalesQuantity] > 1000), [SalesAmount]), Date[Year] = 2011),
the filter context only restricts FactSales and has no effect whatsoever on other parts of the formula. If you image every DAX formula is represented as a tree of parent and child function calls, a filter context is built at the top or in the middle of the tree but takes effect at leaf level table nodes.


Note that DAX function Sum(T[C]) is just a shorthand for SumX(T, [C]), the same is true for other aggregation functions which take a single column reference as argument. Therefore the table in those aggregation functions is filtered by filter context.
Apply filters to a target table
Finally we have identified a target table and are ready to apply filters from filter context. For each filter table in the filter context, we check to see if there are any common columns between the target table and the unblocked columns of the filter table. If there is at least one common column, the target table is semi-joined with the filter table, or in SQL-like terms
SELECT *
FROM TargetTable AS t
WHERE EXISTS
(
SELECT *
FROM FilterTable AS f
WHERE t.CommonColumns = f.CommonColumns
)
Each filter table is applied to the target table independently, so the target table is filtered by all relevant filters.
All, AllExcept, AllNoBlankRow
So far I have said that each setfilter argument of Calculate function returns a table which is added to filter context. Well, that is true as long as the setfilter is not one of the All functions. The All functions should really be renamed as BlockColumns when they are used as setfilter arguments. If one of the All functions is used as the top-level function of setfilter, it only blocks common columns of earlier tables but does not add itself to filter context.
In all other places, including as a sub-expression of a setfilter but not at the top level, All functions behave like any other DAX table expressions and always return a table. One special feature of All functions is that the Table argument inside All(Table), All(Table[Column]), AllExcept(Table, …), AllNoBlankRow(Table), etc. is not filtered by the current filter context.
Pop quiz answers
Answer to question #1.
When the initial filter context contains column DimProduct[ProductLabel], table DimProductSubcategory is not filtered as it does not have that column.
Now look at the next formula
Calculate(CountRows(DimProductSubcategory), DimProduct).
The setfilter argument DimProduct is filtered by [ProductLabel], and then table DimProductSubcategory is filtered by table DimProduct since they both share the columns from table DimProductSubcategory and table DimProductCategory.
Move onto the next two formulas
Calculate(CountRows(DimProductSubcategory), Values(DimProduct[ProductLabel]))
Calculate(CountRows(DimProductSubcategory), Values(DimProduct[ProductSubcategoryKey]))
Both setfilter arguments are a single column table and the column comes from table DimProduct. Since table DimProductSubcategory does not have any column from DimProduct, it is not filtered by filter context. For the same reason, you can add any columns from DimProduct to the filter context and none of them would impact DimProductSubcategory.
Answer to question #2.
In the first formula
Calculate(CountRows(DimDate), FactSales)
Both table DimDate and table FactSales share columns from DimDate, so DimDate is filtered by FactSales.
In the second formula
Calculate(CountRows(DimDate), All(FactSales))
All(FactSales) blocks any columns from FactSales, but since the filter context is empty, it has no effect. When DimDate is evaluated, filter context is still empty.
In the third formula
Calculate(CountRows(DimDate), Filter(All(FactSales), true))
The All function is not at the top level of setfilter argument, table Filter(All(FactSales), true) is added to filter context, table DimDate is filtered by filter context for the same reason as in the first formula.