Calculated Measures from Multiple Data Sources

Another reason to employ a DAX calculation is in the creation of a new measure from multiple data sources. From the example download, Chatper04DAX Measures.xlsx is a sample workbook containing a trivial amount of data from a fictional Internet retailer. The workbook contains data for both product sales (SalesFacts) and page views from the company web site (PageViewFacts).

The Business Problem

While the example workbook contains data for both sales and web activity, as is often the case in real work, both datasets are needed for the calculation of a value not stored in either source. For example, one of the management team for a fictitious web retailer believes understanding how many page views occur for every dollar of sales, by product, will aid the decision-making process. You have been asked to combine information from the sales system and the web activity logs to generate just such an analysis, highlighting page views per sales dollar, by product. Neither of the source system has this value, because neither contains all of the required data for the calculation. However, using PowerPivot to combine the measures for sales and web page views, we can easily satisfy management’s request.

The two tables containing sales and page view activity are known in business intelligence vernacular as fact tables. Fact tables as a general rule contain data directly related to a business activity under analysis. A Fact Table will usually contain metrics, known as measures, which can be operated upon by an aggregating function (average, sum, count, etc.) to create a meaningful assessment of activity.

In addition to the fact tables, the example workbook contains linked tables for products and dates. DimProduct contains the unique identifier and names of the five products under analysis. Furthermore, DimDate contains the calendar for the current year with columns for portions of a date that may be of interest for sorting, grouping or filtering activity. In the language of business intelligence, these are dimension tables. Dimension tables contain data relating to the description of a fact, usually with a historical perspective. For our purposes, no meaningful gauge of business activity is possible using the dimension tables (DimProduct and DimDate) alone. Instead, the dimension tables, through a relationship with the fact tables, serve to describe in greater detail the business activity recorded in the fact table.

Separation of Facts and Dimensions

If dimension tables only serve to describe activity in the fact tables, why not have one big table containing columns from both the fact and dimension tables? My initial response will be familiar to any parent, “Because I said so”. Actually, the better answer lies in research and methodologies developed long before I began working in this field. However, I may be able to illustrate the best reason to strive for this design using the PowerPivot solution at hand.

The example workbook contains a set of four tables, two fact tables and two dimension tables. The tables have been related using PowerPivot relationships in the following manner:

  • SalesFacts is related to both Product and DimDate
  • VisitorFacts is related to both Product and DimDate.

These relationships are depicted in the lines connecting each table’s representation in Figure 4-16.

images

Figure 4-16. Example fact and dimension relationships

If we create a very simple pivot table containing the measures for each of the fact tables, Sum of Sales and Sum of PageViews, a table like Figure 4-17 is produced.

.

images

Figure 4-17. Facts from separate tables

However, if we then attempt to include a Product column from either of the fact tables, PowerPivot will respond with both an odd result, and an error in the PowerPivot Field List suggesting a relationship may be needed. The “Relationship may be needed: warning is illustrated in Figure 4-18.

images

Figure 4-18. Relationship warning message

If the Product column from VisitorFacts is used as a row label, PowerPivot uses the grand total of all SalesFacts as the Sum of Sales for each Product. This behavior is illustrated in Figure 4-19. Conversely, if the Product column from SalesFacts is used as a row label, PowerPivot uses the grand total of all VisitorFacts as the Sum of PageViews for each Product. This behavior is illustrated in Figure 4-20. If you recall the earlier relationship diagram for our four-table solution, the reason for the behavior will be evident.

images

Figure 4-19. VistorFacts product as a row label

images

Figure 4-20. SalesFacts product as a row label

The reason for this seeming inconsistency is in Figure 4-16. There are no relationships between each of the fact tables, SalesFacts and VistorFacts. They are only related to each other via their separate relationships with the DimProduct and DimDate tables. PowerPivot has no means by which to resolve the request to assign Product as a row label to a PivotTable containing measures from two separate fact tables when the row label is a column in only one of the fact tables. However, each of our fact tables has a relationship with the Product column in the DimProduct table (DimProduct[Product]). Further complicating the issue is the values being computed into the PivotTable by PowerPivot. But the explanation is the same. Because there is no relationship between SalesFacts and PageViewFacts, PowerPivot has no choice but to use an implied value to constrain the query that supplies the Sum from the unrelated fact table. This is why the values for each Product row label are consistently the grand total for the measure in the unrelated table. SQL Server Analysis Server users recognize PowerPivot is supplying the default member, in the absence of a relationship.

If we substitute the column from the dimension table for the row label sourced from a fact table, will that solve the problem at hand? Absolutely, but wait there is more.

  1. First, remove the value from the Row Label area of the PowerPivot Field List.
  2. Drag the Product column from the DimProduct table to the Row Labels area of the PowerPivot Field List.
  3. Ensure the Values area of the PowerPivot Field List contains the measures from the two fact tables: Sales and PageViews. The resulting PivotTable should resemble Figure 4-21.
images

Figure 4-21. Using Dimension Table Columns

Because the PivotTable now uses a column related to both fact tables, the row label, product-level totals for each measure are accurately reported.

New Measures

Begin adding a new measure with DAX to our example PowerPivot solution by opening the Excel window. From the PowerPivot Field List, right-click the VisitorFacts table, and select Add New Measure from the context menu. Alternatively, the same dialog can be reached via the PowerPivot Ribbon New Measure selection.

Using Figure 4-22 as an example, enter data in the Measure Name, Custom Name, and Formula fields. Notice we are using the SUMX function to ensure we calculate the sum of PageViews for all rows in the table. SUMX is also used to evaluate sales for all rows in the table. This will ensure accurate calculations if new fields from the PowerPivot data tables are added to the PivotTable.

images

Figure 4-22. The Measure Settings dialog

PowerPivot will add the new measure as a selection in the Values area of the PowerPivot Field List. When completed, your worksheet should resemble Figure 4-23.

images

Figure 4-23. New Measure Added to PivotTable

images Note Unlike their cousin SQL Server Analysis Services’s concept of a calculated measure, PowerPivot calculated measures must be related to a table in the PowerPivot solution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset