Before finishing this section, I also need to mention that no join, compression algorithm, or any other feature that SQL Server offers can help you if you write inefficient queries. A good example of a typical DW query is one that involves running totals. You can use non-equi self joins for such queries, which is a very good example of an inefficient query. The following code calculates the running total for the profit ordered over the sale key with a self join. The code also measures the IO and time needed to execute the query. Note that the query uses a CTE first to select 12,000 rows from the fact table. A non-equi self join is a quadratic algorithm; with double the number of the rows, the time needed increases by a factor of four. You can play with different number of rows to prove that:
SET STATISTICS IO ON; SET STATISTICS TIME ON; WITH SalesCTE AS ( SELECT [Sale Key] AS SaleKey, Profit FROM Fact.Sale WHERE [Sale Key] <= 12000 ) SELECT S1.SaleKey, MIN(S1.Profit) AS CurrentProfit, SUM(S2.Profit) AS RunningTotal FROM SalesCTE AS S1 INNER JOIN SalesCTE AS S2 ON S1.SaleKey >= S2.SaleKey GROUP BY S1.SaleKey ORDER BY S1.SaleKey; SET STATISTICS IO OFF; SET STATISTICS TIME OFF;
With 12,000 rows, the query needed 817,584 logical reads in a worktable, which is a temporary representation of the test fact table on the right side of the self join, and on the top of this, more than 3,000 logical reads for the left representation of the fact table. On my computer, it took more than 12 seconds (elapsed time) to execute this query, with more than 72 seconds of CPU time, as the query was executed with a parallel execution plan. With 6,000 rows, the query would need approximately four times less IO and time.
You can calculate running totals very efficiently with window aggregate functions. The following example shows the query rewritten. The new query uses the window aggregate functions:
SET STATISTICS IO ON; SET STATISTICS TIME ON; WITH SalesCTE AS ( SELECT [Sale Key] AS SaleKey, Profit FROM Fact.Sale WHERE [Sale Key] <= 12000 ) SELECT SaleKey, Profit AS CurrentProfit, SUM(Profit) OVER(ORDER BY SaleKey ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal FROM SalesCTE ORDER BY SaleKey; SET STATISTICS IO OFF; SET STATISTICS TIME OFF;
This time, the query used 331 reads in the fact table, 0 (zero) reads in the worktable, 0.15 second elapsed time, and 0.02 second CPU time. SQL Server didn't even bother to find a parallel plan.