Practices

The habits and actions that are executed routinely as part of every project are the practices of the method described in this chapter. Practices are the activities of execution that produce working, reusable PowerPivot solutions. The practices that follow include activities that are part of all modern software development, such as using revision control. The majority of these practices, however, pertain specifically to PowerPivot development, such as object naming and basics of dimensional modeling.

Take Advantage of Revision Control

Usually, when I am using a revision control system on my projects, it is Team Foundation Server. However, because of the compact unit of distribution, the Microsoft Excel .xlsx file, for distributing PowerPivot for Excel solutions your revision control system need not be so complex. In writing this book, I worked on multiple devices (two laptops, a desktop computer, an Atom-based netbook, an iPhone, and an Android phone), and I used DropBox (www.dropbox.com) to maintain a single version of my manuscripts, images, examples and notes. I can attest to DropBox's ability to maintain a history of revisions for a given .xlsx file, even across multiple devices.

You could also accomplish the same thing with Microsoft's LiveMesh. However, at the time of this writing, the LiveMesh Beta was ending, and Windows Live Essentials 2011 was not yet released.

Finally, and saving the best for last, if your company uses SharePoint, save your PowerPivot for Excel work there. Even if PowerPivot for SharePoint is not available on your company SharePoint install, your .xlsx files can be stored (and with document library revision history set on) and history maintained.

Contrast any of these free or low-cost solutions for maintaining your PowerPivot for Excel solutions with losing your work. The integration with Windows for all three is nearly seamless, and it just doesn't make sense to leave preservation of your efforts to chance.

Rename Early to Keep DAX Formulas Sane

PowerPivot for Excel performs a dizzying array of functions on behalf of the solution developer. Management of table and column names is one area where you can either save extra work by taking time to name (or rename) structures early. As shown in previous examples, when a dataset is added to PowerPivot by means of the Linked Table option, by default PowerPivot assigns a name in the form of TableN, where N is the number of previously created linked tables plus one. So the first Linked Table would be Table1, the second Table2, and so on. Hardly a meaningful way of naming the data, but PowerPivot will then work with DAX formulas in the form Table1[ColumnName]. As soon as you decide to give Table1 a more meaningful name, say DimDate, DimProduct, or SalesFacts, your existing formulas referring to Table1 now return #ERROR. However, PowerPivot Relationships are handled differently.

Creating a Relationship

As an example, begin with a clean Excel worksheet. In the first four rows of the first sheet, enter the data as depicted in Figure 5-1. Create a PowerPivot linked table from this data; PowerPivot should name the new table Table1. Leave the name as the default.

images

Figure 5-1. Our fact table

In the next tab of the current sheet, enter the data shown in Figure 5-2. This will be the basis of a trivial product dimension table for our example on the effects of renaming on PowerPivot relationships. Create a linked table from this data; PowerPivot should name this recent table Table2. Do not rename the new PowerPivot table.

images

Figure 5-2. Product dimension data

Create a relationship from Table 1 to Table 2, as depicted in Figure 5-3.

images

Figure 5-3. Creating an initial relationship

We now have two PowerPivot tables related to each other by a single column. If PowerPivot relationships behave like formulas, a rename of a table or column, on either side of the relationship should break the relationship.

Renaming an Object in a Relationship

From the PowerPivot window, select Table2 (the product data) and rename the PowerPivot table to DimProduct. From the table name tab, right-click and select the Rename option, as illustrated in Figure 5-4.

images

Figure 5-4. Renaming a table in PowerPivot

Examining the Result

From the PowerPivot window, select the Design ribbon and then the Manage Relationships item, as pictured in Figure 5-5.

images

Figure 5-5 Design and manage relationships

From the ensuing dialog, you will notice a list of relationships. Because of the trivial nature of our PowerPivot solution to demonstrate the resiliency of PowerPivot relationships, there is only one relationship, as shown in Figure 5-6. However, PowerPivot has propagated the table name change to the relationship on our behalf. The relationship that existed between Table1[Product] and Table2[Product] now exists between Table1[Product] and DimProduct[Product].

images

Figure 5-6. Relationship changed

You can take this example a step further and rename a column. PowerPivot will maintain the reference to the newly renamed object in the relationship. Take advantage of PowerPivot's work here on your behalf, and don't be afraid to rename tables and columns as a way of making your formulas more readable.

Choose Online Datasources When Possible

As you will soon find in the PowerPivot for SharePoint chapters, not all datasources are treated equally within PowerPivot. Your primary use of PowerPivot may be to combine data that exists only in a spreadsheet or text file with data stored in an online data store. Whenever possible, choose an online datasource over data that only exists on a single machine. You should prefer online data sources because data that exists as only a copy-and-paste from a source into a Microsoft Excel table has no connection information by which to regularly refresh the data. Even a text file, once it is made available as a PowerPivot for Excel data source, can be regularly refreshed.

For example, the data for the airline on-time performance example in Chapter 4 is readily available via the Internet, and you can download the data for the most recent month at http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time in comma separated variable (CSV) format. Import the data into PowerPivot for Excel, using the location of to which you have saved the data from the BTS web site. While PowerPivot for Excel imports the data, note the record count. After a successful import, open the original data file in a text editor, deleting either the first or last data row (or any nonheader rows) noting the record count before and after the row deletions. Save the data file (with fewer records) to the same location originally specified to PowerPivot. Open the PowerPivot for Excel window, and use the Refresh option from the PowerPivot ribbon's Get External Data section, as pictured in Figure 5-7.

images

Figure 5-7. Refreshing your connections

The record count for the refreshed table should contain the same record count as the data file from which rows were deleted. In other words, PowerPivot for Excel can maintain a connection to a text file. This can be a valuable tool to refresh PowerPivot for Excel from a regularly updated, yet nonrelational source of data.

The large difference here is in having to manipulate data from the clipboard and into an Excel worksheet, as we may have done to incorporate text data into a PowerPivot for Excel solution. First, the data file does not have to be opened by a user in order to refresh PowerPivot with the new data. This is not the case with copy and paste data. Second, PowerPivot for Excel can actually make use of more rows (or records) of data using a text file than with a copy and paste data source. This is because a copy and paste (or linked table) datasource is limited by Microsoft Excel's maximum rows in a worksheet. These limits don't apply with text file datasources, because the data is read from the inbound file and stored in the SSAS (Vertipaq mode) in-memory runtime on the PC.

Relate Fact Sets Through a Dimension Table

In Chapter 4, I discussed the problems with relating two fact sets directly, without a related dimension table. That example outlined the issues with relating a fact set (sales) with another (page views) without a dimension table. If nothing else, strive to involve a date dimension between fact tables. Take a second look at your design should you find the need to join two fact tables together.

Create a Separate Date Dimension

Rules were made to be broken, but if you work with PowerPivot for Excel to develop analytic solutions, you will soon see the value in creating a date dimension separate from your business fact (that is, measure) tables. The two principle uses of a date dimension center around relating facts sets to each other and ensuring a common set of to-date calendar periods are observed. But first, how do you quickly create a rich date dimension?

Shortcut to a Rich Date Dimension

The first choice for populating a date dimension table for your PowerPivot for Excel solutions will usually be the time (or date or calendar) tables used in an existing data source. Data mart and data warehouse databases usually contain a single version of the enterprise calendar accounting for year-to-year variances such as holidays. In case using such a table is not an option, I will share with you a few techniques that I have learned to employ to assist in creating a rich date dimension table.

First, in the workbook hosting your PowerPivot for Excel solution, create a sheet named DimDate. This is purely a convention carried over from my days creating dimensional models in relational databases. Essentially, tables beginning with “Dim” are intended to act as dimensional tables. Table beginning with “Fact” contain business metrics (measures) that are organized by dimension tables.

In the new DimDate sheet, label consecutive single columns DateKey and MonthNbr. Again, this is purely from the habit of naming the natural and surrogate keys required for dimensional update processing. Under the DateKey label, add a row beginning with January 1, 2009. Under the MonthNbr label, enter the formula =MONTH(A2), substituting the column to the left containing the date for the address A2. I recommend, for the sake of a more compelling demonstration, add at least two years of dates to both columns. From this table, create a linked PowerPivot table via PowerPivot's Create Linked Table ribbon selection.

You will notice that PowerPivot has not named the new linked table DimDate, but instead Table1 (assuming this is the first linked table). Rename the PowerPivot table now, as the balance of this technique uses DAX formulas.

Return to the Excel Worksheet window, and add a new sheet named Calendar Lookups. In this worksheet, you will enter the calendar specifics for your organization, similar to Figure 5-8.

images

Figure 5-8. Calendar Lookups linked table

In this table, enter both the long form and abbreviated names for each month. Microsoft Excel will handle this series for you after you enter and then select January and the Jan cells and drag the series selection to the final row. The final two columns relate to which quarter a month is associated. For our example, we have a both a calendar quarter and a fiscal quarter to represent a fiscal new year of October 1 of each year. After the data entry is complete, create a linked table from this data named CalendarLookup. Ensure you rename the PowerPivot table; otherwise, you will have “Table2” propogated in your DAX formulas.

From the PowerPivot window, create a relationship between DimDate[MonthNbr] and CalendarLookup[MonthNbr]. This is the basis for the rest of the solution. Next, add a column named Year, populated by the following DAX formula: =YEAR(DimDate[DateKey]). This will ensure we know the year for slicer purposes. Next, add a column to the DimDate table named Quarter. In the formula area, use the following DAX to populate Quarter with the correct quarter number entered from Figure 5-7, =RELATED(CalendarLookup[Quarter]). Repeat this operation for a column named FiscalQuarter that returns the FiscalQtr column from CalendarLookup. At this point, your DimDate table should resemble Figure 5-9.

images

Figure 5-9. The DimDate table

At this point, you may be asking wondering why we did not just copy everything in Microsoft Excel? Creating a single DimDate linked table in Microsoft Excel is certainly an option. However, one of the greatest advantages to this technique is being able to quickly create DAX formulas for date attributes, such as Quarter Number, plus being able to quickly and reliably change them. While it is not a frequent occurrence to have the Fiscal Year change, if the Fiscal Year end was moved to November 1, we could quickly implement by changing the Calendar Lookups table and refreshing the PowerPivot linked table. Additionally, this technique requires very little data to actually be stored in Excel. Most of the date attributes are implemented via DAX and stored in the PowerPivot SSAS instance.

Handling Moving Holidays

What about special dates? How could the previous technique accommodate ‘moving' holidays such as Thanksgiving in the United States? One way to accommodate these yearly moving holidays is via a lookup table. Creating a table similar to below would allow for quick identification within PowerPivot for Excel as a holiday. Simply create the linked table, relate Holiday[Date] to DimDate[Date], and use Related() to populate an IsHoliday_YN column in DimDate.

images Note When populating a date dimension, ensure you have rows in the date table for each day of the year. PowerPivot time intelligence functions rely on a fully populated date dimension when building sets of dates for year-to-date and similar calculations.

Leverage Built-In Measures

PowerPivot for Excel includes out-of-the-box features for a vast set of calculations available to report, based on measures within a PivotTable or PivotChart. The underlying SSAS database (the in-memory data store) that is part of PowerPivot for Excel establishes a platform from which percent-of-whole and other variations of composition analysis can be quickly calculated. Taking advantage of these calculations in your solutions can give a new insight to your existing data.

Revisiting the NCAA Football roster example from Chapter 4, we can illustrate several ways of reporting different measures using the same data. Using a PivotChart with a column label of SortYear and row labels of Conference and Team, as illustrated in Figure 5-10, choose Count of Player ID as the sole Values element.

images

Figure 5-10. Count of players by year (and team)

PowerPivot for Excel provides a group of ratios that can be calculated based on the filter context of the PivotTable. Access the alternative calculations by right-clicking a measure within the PivotTable and selecting Show Values As from the context menu. The context menu will look similar to Figure 5-11.

images

Figure 5-11. Show Values As menu

No Calculation

This option renders the measure as placed in the Values section of the PowerPivot Field List. No aggregate comparison calculation is applied.

% of Grand Total

This option renders the selected measure as a percentage of the grand total value for the measure. In the NCAA Class Composition example, there are 1,278 players reported in the grand total. The University of Alabama reported 34 freshmen in the dataset. When the % of Grand Total option is applied, the value for the Alabama and '01-FR intersection is 2.66 percent (34 divided by 1,278 players).

% of Column Total

This option renders the selected measure as a percentage of the column total for the selected measure. In the NCAA Class Composition example, there are 467 players reported across all selected (Southeastern Conference) institutions. To maintain example consistency where possible, we will use the 34 freshman year players from the % of Grand Total example. However, because the measure is now evaluating the total of all freshman (467) as the denominator, the percentage value used is 7.28 percent (34 / 467 = 7.28).

% of Row Total

Similar to % of Column Total, this option renders the selected measure as a percentage of the row total for the selected measure. In the NCAA Class Composition example, there are 105 players reported on the roster of the University of Alabama. The 34 freshman would then be compared to the row total (105) to render the percentage of 32.38 percent (34 / 105 = 32.28.

% Of

This option allows the PowerPivot for Excel user to baseline a measure organized by multiple dimensional values (Team) by the values for a single dimension (in our case, the University of Alabama). Selecting this option for a measure produces the dialog box depicted in Figure 5-12.

images

Figure 5-12. % O f baseline dialog

The dialog allows the selection of a base field and an item within the field by which to gauge the values of all other elements in the chosen field. In our case, we are comparing the other eleven teams in the Southeastern Conference with the University of Alabama. Pressing OK in the Show Values As dialog will render a PivotTable similar to Figure 5-13.

images

Figure 5-13. The percentage of a baseline

As Figure 5-13 illustrates, the values for the baseline instance are all 100.00 percent, because the numerator and denominator are the same value: the baseline is the value. However, the real use in this out-of-the-box calculation is in baselining data to a known dimensional value.

% of Parent Row Total

To create a meaningful illustration of the % of Parent Row Total, a new subtotaling level will have to be added to the example. Dragging Division into the Row Labels area of the PowerPivot Field List, so that it appears between Conference and Team, will create a new subtotal at the Division level. The results should be similar to Figure 5-14.

images

Figure 5-14. Subtotal by division

Choosing % of Parent Row Total will then render the selected measures evaluated over the parent level (in this case subtotaled by division). To continue with the University of Alabama freshmen, the 34 members of this class would be evaluated as a percentage of the 260 freshmen in the entirety of the Western division (34/260=13.08 percent). However, the % of Parent Row continues. The 260 Western Division freshmen are evaluated against to total of all freshmen in the conference (467). The Western Division freshmen total is reported as 55.67 percent (260 / 467 = 55.67).

This can be a very powerful tool for decomposing measures across and up a hierarchy.

% of Parent Column Total

This selection renders the selected measure in a manner identical to the % of Parent Row, except by levels used in the Column Labels area of the PowerPivot Field List. For example, in the example PivotTable, add Position to the Column Labels area of the PowerPivot field list. This should result in a PivotTable similar to below.

Adding the breakdown shows three of Alabama's 34 freshman (the parent column total) play the defensive back position. Therefore the reported % of Parent Column Total is 8.82 % (3 / 34 = 8.82).

% of Parent Total

This selection renders the selected measure similar to % of Parent Row and % of Parent Column but allows the solution author to choose the base field for comparison.

Difference From

This selection renders the selected measure, as compared to a baseline selected by the PowerPivot solution author. To continue with the current example, after selecting Difference From on the Show Measures As context menu, a dialog similar to Figure 5-15 will be produced.

images

Figure 5-15. Show Values as a difference

This dialog allows the solution author to select which base field and which item in the field as the basis for comparison of the other values. To continue with our example, selecting the options as depicted in Figure 5-12 will produce a PivotTable comparing all other teams in the SEC to the University of Alabama.

The end result of the comparison will resemble Figure 5-16. As you will note the values for Alabama are all blank, indicating this is the baseline row. Additionally, each of the other eleven teams in the conference has been reduced to the difference between their values for each class year and those reported by the University of Alabama (the baseline). For example, Alabama reported 34 freshmen and the University of Arkansas reported 44. The difference of 10 is rendered by PowerPivot given the baseline selection.

images

Figure 5-16. Difference from baseline

% Difference From

This selection works similarly to the Difference From selection, except the reported value is the percentage difference from the selected baseline. If the Difference From example in the previous section is extended to use instead % Difference From, the value reported for the 44 Arkansas freshmen, relative to the 34 Alabama freshmen in the baseline would be 29.41 percent ( (44-34) / 34 = 29.41).

Running Total In

Occasionally, it is necessary to create an analysis of measures that includes the running total of a measure. For example, which product classes constitute 80 percent of revenue? The Running Total In option renders the selected measure as a sum, over a set of column or row values. To further the current example, selecting SortYear as the base field for this option yields a PivotTable reporting the running total of players, by their SortYear value, as shown in Figure 5-17.

images

Figure 5-17. Running Total In (SortYear)

Conversely, we could apply the Team as the base field, which would have caused the running totals to run down the table (following the columns).

% Running Total In

Similar to the Running Total In option, the % Running Total In selection creates a percentage based on a running total of a measure value over a base field. Continuing with the existing example, first, reverse the order of SortYear from 04-SR to 01-FR (instead of 01-FR to 04-SR). Second, apply the % Running Total In option using SortYear as the base field yields a PivotTable similar to Figure 5-18.

images

Figure 5-18. % Running Total In by SortYear

This PivotTable shows the percentage of the Roster for Alabama, at the junior (03-JR) or senior (04-SR) level as 41.35 percent.

“Rank Smallest to Largest” and “Rank Largest to Smallest”

Both of these options rank the values for the selected measures including ties. In other words, the example data includes four institutions that report the same number of freshmen class members (Auburn and Mississippi State at 42, and Arkansas and Vanderbilt at 44). Therefore, the ranks for the twelve teams are from one to ten, with the previously mentioned ties being ranked seventh and eighth respectively.

Conversely, if the Rank Largest to Smallest option is selected, these ties are reported as Arkansas and Vanderbilt tied at number three, and Auburn and Mississippi State tied in the number four position.

Index

The Index option of the Show Values As context menu essentially evaluates as (Value In cell * Grand Total of Grand Totals) / (Grand Row Total * Grand Column Total). This would mean our Alabama Freshmen value would be calculated as (34 * 1262 )/ (104 * 467) = 0.883462362.

Customize Your PivotCharts via PivotTables

Beneath every PowerPivot PivotChart is a PivotTable. You may have noticed by now PowerPivot creating an underlying PivotTable for each PivotChart being expressly added by you, the PowerPivot for Excel author. An interesting side-effect to this PowerPivot for Excel feature is the ability to greatly customize a PivotChart, without touching the chart itself. By manipulating the PivotTable using filters, for example, the PivotChart will change to reflect the filtered data in the PivotTable.

For example, Figure 5-19 contains a PivotChart that was inserted into Sheet5 of the current Excel file. This is simply a summary of flights, by carrier, from a given airport. Notice that immediately preceding Sheet5 is an Excel sheet labeled Data for Sheet5 Chart1. This is a sheet inserted by PowerPivot at the time Chart1 was added to Sheet5 to contain the data behind the new PivotChart in Sheet5. If we had inserted additional PivotCharts into Sheet5, PowerPivot would create an additional sheet for each, naming them in the same form Data for Sheet5 ChartN.

images

Figure 5-19. A PivotChart and related data sheet

Also notice the Carrier button just below the origin of the chart. Within PowerPivot for Excel, this button is called an axis field button, and it allows rapid access to the sort and filter context menu, for a particular field, in this case Carrier. Access and expansion of the sort and filter context menu, from the carrier axis button, is illustrated in Figure 5-20.

images

Figure 5-20. The Carrier axis sort and filter menu

For the purposes of interactive and ad hoc analytics, the button allows a PowerPivot for Excel developer easy access to features that are also available by selecting a value on the Carrier axis of Chart1. As illustrated in Figure 5-21, right-clicking an axis value will also render the sort and filter menu for values of Carrier.

images

Figure 5-21. Carrier Axis Sort and Filter Menu

Perhaps this technique of customizing PivotCharts via the underlying PivotTable is more of a preference than a practice. But in my experience, clients have found the axis buttons confusing and feel that the buttons clutter the chart space. We can remove them altogether and still customize the appearance and data content of the PivotChart. Removal of the axis buttons from a PivotChart is a simple matter of right-clicking the axis field button and selecting Hide Axis Field Buttons on Chart, as illustrated in Figure 5-22. The result is a PivotChart without the Carrier axis field button and more room to display data. Similarly, the Hide All Field Buttons on Chart option removes both the Carrier axis button and the Sum of Flights field button, yielding even more area for the chart data within the chart zone.

images

Figure 5-22. Hiding axis field buttons

With the axis field buttons removed, Chart1 can still be manipulated using the underlying data contained in the PivotTable within the Data for Sheet5 Chart1 sheet. For example, if we wanted to report on the percentage of flights serviced by each carrier, we can quickly alter the chart without the creation of a new measure. Instead, navigate to the sheet containing the chart data (“Data for Sheet5 Chart1”). Select a measure value, in this case Sum of Flights, and right-click the cell. As illustrated in Figure 5-23, this will produce a context menu, from which we choose Show Values As and then select % of Grand Total. Notice the default selection is No Calculation, but with the highlighted selection, we will show each of the measures as their percentage of the grand total.

images

Figure 5-23. Show Values As menu

Immediately after applying the % of Grand Total selection, each of the values in the PivotTable are changed to percentages of the grand total, summing to be 100 percent of flights as illustrated in Figure 5-24. However, the valuable effect of this is seen in the original PivotChart. The y axis values are now formatted as percentages, associated with the original data bars illustrated in Figure 5-25.

images

Figure 5-24. Show Values As % of Grand Total

images

Figure 5-25. PivotChart using Show Values As % of Grand Total

Similarly, the Carriers in the PivotChart can be ranked by their percentage of volume by applying a variation of this technique. From the sheet containing the underlying data for the PivotChart (“Data for Sheet5 Chart1”), select a cell a measure value. Our illustration uses the Sum of Flights measure value. From the right-click context menu, select the Sort option and then Sort Largest to Smallest. This will immediately sort the PivotTable and predictably, Carriers in Chart1 will be sorted in order by flight volume, as illustrated in Figure 5-26.

images

Figure 5-26. Final PivotChart Sorted by Percentage

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset