Chapter 9. PowerPivot for Excel

One of the more exciting recent additions to the Microsoft business intelligence (BI) suite is a technology known as PowerPivot. I use the word technology because PowerPivot is not a new product so to speak, but rather it is a new capability for very powerful data analysis, and it impacts Excel, SharePoint, and SQL Server. PowerPivot is available as a standalone add-in to Excel, and is also available as a SharePoint service. This chapter explores PowerPivot for Excel.

Overview of PowerPivot

Business users, and in particular business analysts, need to be able to process data to do their job. Much of that data processing responsibility goes through the IT department, but for a number of reasons, business users also need to be able to access and analyze data on their own. As a result, Microsoft Excel has become an indispensable tool for business.

However, although Excel is a powerful data tool, standard Excel has some limitations when it comes to large datasets. Also, when you want to combine different datasets in Excel, it’s possible, but difficult and not very efficient. Wouldn’t it be great to have a tool that could process literally millions of rows of data, do so extremely quickly, enable you to create relationships between different sets of data, and do all this in a familiar user interface? Well, that is PowerPivot!

So, how does PowerPivot do all this? By leveraging some of the technology from Microsoft Analysis Services. In fact, the engine behind PowerPivot is an in-memory, column-based data store known as the VertiPaq data engine, and it actually creates memory-resident analytic cubes. This is what enables the ultra-high performance of PowerPivot. Millions of rows of data can be grouped, sorted, analyzed, and so on within seconds. What’s more, the VertiPaq engine allows for relationships to be created between different tables in much the same way that relationships are created between database tables. This makes it easy to combine disparate datasets to perform some powerful analysis, all from the desktop, all within the familiar Excel interface.

Additionally, to distribute and manage these PowerPivot applications, Microsoft provides the PowerPivot service in SharePoint 2010, which is discussed in the next chapter.

Installing PowerPivot

PowerPivot for Excel is available as a free download from Microsoft and is installed as an add-in to Excel 2010. Excel 2010 is required to experience the complete capabilities of PowerPivot and to create or modify a PowerPivot workbook. You can use Excel 2007 or Excel 2010 without the PowerPivot add-in to view PowerPivot workbooks, but the workbook will be static and read-only.

PowerPivot is available in both 32-bit (x86) and 64-bit (x64/amd64) versions. You need the version that matches the bit level of Microsoft Excel 2010 that is installed (x86 or x64). If you are unsure as to whether you have 32-bit or 64-bit Excel 2010, you can find out by doing the following:

  1. Launch Excel 2010.
  2. From the File menu, select Help.
  3. In the information listed in the right portion of the screen, under About Microsoft Excel, you will see the version number along with the bit level in parentheses.

To install PowerPivot for Excel, perform the following steps:

  1. Go to Microsoft.com and enter download powerpivot for Excel in the search box. Follow the results until you are able to download either the 32-bit or 64-bit MSI files.
  2. Download the version of PowerPivot you will use and click Run when prompted.
  3. Accept the license agreement, and then click Next.
  4. Enter your name, and then click Next.
  5. Click Install.
  6. After the installation is complete, click Finish.

The first time you launch Excel after installing the PowerPivot add-in, a dialog box will open asking if you want to install this customization. Click Install to complete the installation.

Using PowerPivot

The best way to think about how to use Excel with PowerPivot is to realize that there are really two steps that everyone takes when working with any kind of data. The first step is to get the data and prepare it (that is, import, clean, create calculations, and so on). The second step is to present that data (that is, create a pivot table, create a chart, and so forth). With that in mind, PowerPivot introduces the PowerPivot window as the design surface for getting and preparing the data, while the standard Excel worksheet is the design surface for presenting pivot tables and charts based on the data.

After installing the PowerPivot add-in to Excel, you will see a new tab menu named PowerPivot. Select that tab and you will notice that the first icon on the toolbar is for the PowerPivot window. This will open the PowerPivot window and still leave the Excel worksheet as is. You can move back and forth between the two separate design surfaces.

Figure 9.1. PowerPivot window.

image

To illustrate how PowerPivot can be used, let’s assume the following scenario:

You are a marketing analyst for Contoso and are wondering whether the company should modify its marketing strategy based on income level. To explore this, you decide to see if a correlation exists between the average size of a sale and average income level. To perform this analysis, you have Product Sales data available from the company database. You also have gathered median income data, which is publicly available. (Note: For simplicity in this demonstration, the median income data is at a state level.)

Getting the Data

  1. Open Excel 2010.
  2. Display the PowerPivot tab, and then open the PowerPivot window.
  3. In the Get External Data section of the toolbar, select From Other Sources, as shown in Figure 9.2, (Note: Depending on screen resolution, the text may not display.)

    Figure 9.2. Get external data from other sources.

    image

PowerPivot provides a large selection of connection managers right out of the box, as shown in Figure 9.3.

Figure 9.3. Data sources.

image

Importing from a Database

PowerPivot can easily connect to corporate databases and import data from them. The following example uses the Contoso sample BI database available from Microsoft’s website.

  1. The Contoso database is a SQL Server database, so select Microsoft SQL Server from the list.
  2. After establishing a connection to the database, the Import Wizard prompts whether you want to select tables from a list or write a SQL query that will specify the data to import. Choose Select from a List of Tables.
  3. The Import Wizard presents a list of all the tables in the database. For this analysis, you are concerned only with sales in the United States. So, check the DimGeography check box and then click the Preview & Filter button (see Figure 9.4).

    Figure 9.4. Table Import Wizard.

    image

    A preview of the data in the Geography dimension is displayed. In addition, PowerPivot makes it easy to refine the data that will be imported, both at a column level and a data value level. Scroll to the right, locate the RegionCountryName column, and click the drop-down to the right of the column name. Clear all countries except United States as shown in Figure 9.5 and click OK.

    Figure 9.5. Filtering prior to import.

    image

    The ETLLoadID, LoadDate, and UpdateDate columns are fields that don’t provide analytic value. Clear the check box in front of these fields so that they aren’t imported. Click OK to apply the filters to the DimGeography table.

  4. Back on the Select Tables and Views dialog, notice that DimGeography now lists Applied Filters in the Filter Details column. This makes it easy to see which dimensions have filtering in place. You can also click the Preview and Filter button again to further modify the filtering options if necessary.
  5. Select the check box next to the following tables to include them in the data model:

    • DimDate

    • DimProduct

    • DimProductCategory

    DimProductSubCategory

    • DimStore

    • FactSales

    Then click Finish.

    The data import begins and a status box displays the status of each of the tables, as shown in Figure 9.6. Notice that PowerPivot can work with large datasets; more than three million records were imported from the fact table, and done quickly!

    Figure 9.6. Import results.

    image

The final step in the import process is data preparation, and this is where PowerPivot builds the necessary relationships to tie the tables together (see Figure 9.6). How does it know how to do this? Well, if the data source is a relational database, PowerPivot analyzes the schema of the source database and looks for foreign key relationships in the schema. If the relationships area not in the source data, PowerPivot provides ways to create them, as discussed later.

Importing from a Flat File

  1. As stated in the scenario, you want to compare the Contoso sales data with average income data that you received from a third party. This data lists the average annual income by state and is in a CSV file. To import this text file into PowerPivot, select From Text from the Home tab of the PowerPivot window toolbar.
  2. Enter a friendly name for this connection and the file path and name for the file (or browse and locate the file). At this point, you will see data in the preview window.
  3. Select whether the first row contains the column headers, select the columns that are to be imported into PowerPivot, and clear the check box for columns that should not be imported (see Figure 9.7). When done, click Finish. The new data appears in PowerPivot as another table.

    Figure 9.7. Flat file import.

    image

Preparing the Data

Now that the data has been imported, we need to do some additional work to get it ready for use.

Creating Relationships Between the Tables

A relationship needs to be created between the imported Income data and the Sales data. Because this is Median Income by State, we relate it to the DimGeography table joined on the state name:

  1. Click the Design tab above the toolbar. This is where we can modify the data that has been imported into PowerPivot.

    Select Manage Relationships (as shown in Figure 9.8), and a dialog box appears displaying the existing relationships. Click the Create button to launch the Create Relationship dialog.

    Figure 9.8. Manage relationships.

    image

  2. Select State Median Income for the Table, and States for the Column. Then select DimGeography for the Related Lookup Table and the StateProvinceName for the Related Lookup Column as shown in Figure 9.9.

    Figure 9.9. Create a relationship.

    image

  3. You will notice that PowerPivot immediately puts an information bullet next to the Related Lookup column. This is because relationships need to be defined in a one-to-many direction, and currently this is defined in the reverse direction. However, PowerPivot automatically corrects this issue when the Create button is clicked. Click the Create button to create this relationship. The new relationship is now listed along with the others in the Manage Relationships dialog.

Adding Calculated Fields

We could use the Median Income values as they are, but it is more useful from an analysis standpoint to have these grouped into income categories. PowerPivot includes a rich expression language known as DAX, short for Data Analysis Expressions, which are similar to Excel functions and can be used to add calculated fields to the data model. To provide income groupings, we use a series of IF statements.

  1. Select the State Median Income tab to select that table.
  2. Click the Add button on the toolbar as shown in Figure 9.10.

    Figure 9.10. Adding a calculated column.

    image

  3. Enter the following DAX expression into the Function box (fx):

    =IF([2008 MedianIncome] < 40000, "< 40k",
    IF([2008 MedianIncome] < 45000, "40-45k"
    IF([2008 MedianIncome] < 50000, "45-50k",
        IF([2008 MedianIncome] < 55000, "50-55k", "55k +"))))

  4. After you enter the expression, a new column named CalculatedColumn1 is added to the table. Right-click the column name and rename it to IncomeGroup.

    You might notice that the expression doesn’t appear to be working correctly, because all rows are shown with a value of '55k+'. That is because our calculated column is a numeric calculation, but the [2008 MedianIncome] field was imported as a text field. By default, PowerPivot imports flat file fields as strings, but provides the ability to change the data type for any field.

  5. Select the column heading for the 2008 MedianIncome.
  6. Select the Home tab on the toolbar and select Data Type in the Formatting portion of the toolbar. From the drop-down list of options, select Whole Number as shown in Figure 9.11.

    Figure 9.11. Changing data types.

    image

    The Calculated IncomeGroup automatically recalculates and now looks as expected.

  7. To have this table match the naming convention of the others, right-click the tab for the table and change the name from State Median Income to DimIncomeLevel.

Hiding Unwanted Fields

During the import process, PowerPivot provides the opportunity to select which fields will be imported. This is useful for eliminating fields that have no use for analysis, such as the ETL tracking fields.

However, some fields, such as Primary Key fields, are necessary for the table relationships and need to be imported, but ideally these shouldn’t be exposed to the end users. PowerPivot provides an opportunity to hide those fields and any other unnecessary fields, as well:

  1. From the PowerPivot window, select the FactSales table.
  2. Select the Design toolbar, and then select Hide and Unhide from the toolbar.
  3. PowerPivot provides the option to hide or show each field in the PowerPivot window as well as in any PivotTables or PivotCharts that would be created (see Figure 9.12). In general, hide internal-only fields from both PowerPivot and PivotTables, and display Key fields in PowerPivot window for design purposes but hide them from PivotTables so that they are not exposed to end users.

    Figure 9.12. Hiding and unhiding columns.

    image

  4. Move through each of the tables and hide/unhide columns as necessary.
  5. At this point, the data has been imported and prepared. Save the file.

Presenting the Data

As mentioned earlier, the PowerPivot window can be thought of as a data design surface, with the original Excel worksheet then as the presentation design surface. PowerPivot provides several layout options for creating PivotTables, PivotCharts, or combinations of the two:

  1. From the PowerPivot window, select PivotTable from the toolbar and PivotTable from the drop-down list as shown in Figure 9.13.

    Figure 9.13. Creating a PivotTable from the PowerPivot window.

    image

  2. Select New Worksheet. If you are familiar with creating PivotTables, this process will be familiar; however some significant differences exist, including the following (see Figure 9.14):

    • The field list on the right side of the screen is not the standard PivotTable field list. This is actually a PowerPivot field list.

    • All the tables are listed the same way. There is no distinction in PowerPivot between fact tables and dimension tables as there is when using Excel against an Analysis Services OLAP cube.

    • PowerPivot introduces the idea of slicers, which provide enhanced options for filtering and interacting with the data.

    Figure 9.14. PowerPivot field list.

    image

  3. Expand DimIncomeLevel and drag IncomeGroup to the Row Labels pane.
  4. Expand FactSales and drag SalesAmount to the Values pane. You now have a basic PivotTable of Sales by IncomeGroup as shown in Figure 9.15.

    Figure 9.15. Initial PivotTable.

    image

    From the PivotTable, select the drop-down next to Row Labels and clear the check mark next to the blank Income Group and click OK. Notice that the drop-down icon has now changed from an arrow to a funnel to indicate that a filter is in effect.

  5. Expand FactSales, and again drag SalesAmount to the Values pane. SalesAmount will now be reflected twice in the pane. Right-click the second SumOfSalesAmount entry, select Summarize By, and change the aggregation from Sum to Average. Now there is both the Total Sales and Average Sales by income group in the PivotTable. Highlight the cells under Sum of Sales Amount and Average of Sales Amount and format as currency.
  6. Expand DimDate and drag CalendarYear to the Slicers Vertical pane. A slicer pane is now added to the worksheet. Slicers provide an intuitive way for users to filter the results; a slicer shows values that have data in black text and values that don’t have data in gray text. At first glance, it might seem as though the values in the slicer are not ordered, but they are. However, the values with data come first, followed by the values without data.
  7. Expand DimProductCategory and drag ProductCategoryName to the Slicers Vertical pane (see Figure 9.16). Click through the various product categories and notice that the average sale amount is definitely higher for the higher income levels.

    Figure 9.16. PivotTable with slicers.

    image

We now have a tool for analyzing sales by income level or any of the other attributes in our data model.

Summary

PowerPivot is one of the more exciting additions to the Microsoft BI suite. It is a tool that enables users to process literally millions of rows of data extremely quickly, allows creation of relationships between different sets of data, and leverages the familiar user interface of Microsoft Excel.

In this chapter, we used PowerPivot to combine more than three million records of sales data with third-party Median Income data, and produced a flexible and user-friendly analysis tool to analyze average sales by income level.

Best Practices

The following are best practices from this chapter:

• Filter off fields that will not have any analytic value during import (ETL fields, for example).

• Include fields that might have analytic value during import, and then hide them if they are not immediately necessary, because it is easy to just un-hide them if the need arises.

When trying to lighten the data volume, do not bother filtering small dimension tables; they likely have only a few hundred records. Instead, try to identify ways to filter the fact data if possible. For example, if the analysis will only be for 2008-2009, but the data source provides data from 2005 forward, filter off the fact data prior to 2008 to significantly minimize the data volume that PowerPivot needs to manage.

• PowerPivot is great for a variety of end-user situations, but if Parent/Child dimensions, Role-Playing dimensions, or drill-down hierarchies are required, Analysis Services is the better option. In these cases, PowerPivot might be useful for business analysts as a rapid prototyping tool that could aid in the design of a full-fledged Analysis Services cube.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset