C H A P T E R  4

images

Data Analysis Expressions

If you have built castles in the air, your work need not be lost; that is where they should be. Now put foundations under them.

—Henry David Thoreau

PowerPivot is uniquely suited as a business intelligence development tool in the ability to combine large volumes of disparate yet related data. To this point, our examples have steered clear of an issue that will occur in all but the most trivial business intelligence solutions: the need to create custom measures, or calculated columns, based on the source data. Recognizing this, Microsoft has included a set of functions, known as Data Analysis Expressions (DAX), capable of creating calculations that may not be present in the original source data.

The first case for utilizing DAX is to create a calculated column within a PowerPivot table. When PowerPivot data is retrieved from a relational database, a view or query can be used with PowerPivot for Excel to add a calculated column within the source database. However, PowerPivot for Excel can utilize many nonrelational data sources where creating a query or view workaround for a calculation would not be possible or practical. A trivial example of these row-level calculations would be deriving a month or day calculated column from a source column containing dates.

The other principal use of DAX is to create a measure from two (or more) separate data sources for the purpose of analyzing a business process. Because PowerPivot for Excel is serving as the datastore for separate but related datasets, neither of the inbound sets is capable of performing the required calculation.

As a real example that you can use in your own PowerPivot training efforts, consider data currently available from the Bureau of Transportation Statistics (BTS). BTS is a component of the United States Department of Transportation and maintains an exhaustive web site containing data for multiple modes of transport into and throughout the United States. For a frequent airline passenger for business, the air transport measurements available include two separate datasets of interest. The first is a set of data that details the number of passenger seats available between any two airports. If you want to know the number of seats that were historically available on Mondays in a given month, the bureau’s site has a table with the source data. The second dataset represents the on-time performance for every airline representing more than 1 percent of all airline passenger fares for a given reporting month. As the examples in this chapter will show, this can be an interesting dataset to understand on-time performance among cities and airlines. More to the point of using DAX to create new measures from related datasets, there is no way to represent from either of these datasets how many seats were delayed for a given time period. Additionally, neither dataset in isolation of the other has the capability to answer questions around the load factors along a given route. This concept is illustrated in Figure 4-1 below.

images

Figure 4-1. Using multiple datasets for a calculation

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset