Recognizing the importance of the business intelligence (BI) revolution and the place Excel holds within it, Microsoft has made substantial investments in improving Excel's BI capabilities, specifically focusing on Excel's self-service BI capabilities and its ability to manage and analyze information better from the increasing number of available data sources.
The key product of this endeavor is Power Pivot. With Power Pivot comes the ability to set up relationships between large, disparate data sources and to merge data sources with hundreds of thousands of rows into one analytical engine within Excel.
Starting with the release of Excel 2013, Microsoft has incorporated Power Pivot directly into Excel. This means the powerful capabilities of Power Pivot are available to you right out of the box!
In this chapter, you'll get an overview of those capabilities, exploring the key features, benefits, and capabilities of Power Pivot.
At its core, Power Pivot is essentially a SQL Server Analysis Services engine made available through an in-memory process that runs directly within Excel. The technical name for this engine is the xVelocity analytics engine. However, in Excel, it's referred to as the internal data model.
Every Excel workbook contains an internal data model, which is a single instance of the Power Pivot in-memory engine. This section outlines how to leverage the Power Pivot internal data model to import and integrate disparate data sources.
The Power Pivot Ribbon interface is available only when you activate it. It's important to note that the Power Pivot add-in does not install with every edition of Office. For example, if you have the Office Home edition, you will not be able to see or activate the Power Pivot add-in; thus, you will not have access to the Power Pivot Ribbon interface.
As of this writing, the Power Pivot Ribbon interface is available to you only if you have one of the following editions of Office or Excel:
Office365.com
If you have any of these editions, you can activate the Power Pivot Ribbon by following these steps:
The first step in using Power Pivot is to fill it with data. You can either import data from external data sources or link to Excel tables in your current workbook. For now, let's start this walk-through by linking three Excel tables to Power Pivot.
Most of the examples in this chapter are available on this book's website at |
In this scenario, we have three data sets in three different worksheets (see Figure 35.2): Customers, Invoice Header, and Invoice Details.
The Customers data set contains basic information like Customer ID, Customer Name, Address, and so forth. The Invoice Header data set contains data that points specific invoices to specific customers. The Invoice Details data set contains the specifics of each invoice.
We want to analyze revenue by customer and month. It's clear that we'll somehow need to join these three tables together before we can do our analysis. In the past, we would have to go through a series of gyrations involving VLOOKUP
s or other clever formulas. But with Power Pivot, we can build these relationships in just a few clicks.
When linking Excel data to Power Pivot, it's a best practice first to convert your Excel data to explicitly named tables. Although not technically necessary, giving your tables friendly names helps track and manage your data in the Power Pivot data model. If you don't convert your data to tables first, Excel will do it for you and give your tables useless names like Table1, Table2, and so on.
Follow these steps to convert each data set into an Excel table:
Once you've converted your data to Excel tables, you're ready to add them to the Power Pivot data model. Follow these steps to add your newly created Excel tables to the data model using the Power Pivot tab:
Although the Power Pivot window looks like Excel, it's actually a separate program altogether. You'll notice that the grid for the Customers table has row numbers but no column references. You'll also notice that you can't edit the data within the table. This data is simply a snapshot of the actual Excel table you imported.
Additionally, if you look at your Windows taskbar at the bottom of your screen, you'll see that Power Pivot has a separate window from Excel. You can switch between Excel and the Power Pivot window by clicking each respective program in the taskbar.
Repeat steps 1 and 2 for your other Excel tables: Invoice Header and Invoice Details. Once you have imported all your Excel tables into the data model, your Power Pivot window will show each data set on its own tab, as shown in Figure 35.6.
At this point in our walk-through, Power Pivot knows that we have three tables in the data model, but it has no idea how these three tables relate to one another. We'll need to connect these tables by defining relationships between the Customers, Invoice Details, and Invoice Header tables. We can do so directly within the Power Pivot window.
You can move the tables in the Diagram view around simply by clicking and dragging their title bars. The idea is to identify the primary index keys in each table and connect them. In this scenario, the Customers table and the Invoice Header table can be connected using the CustomerID field. The Invoice Header and Invoice Details tables can be connected using the InvoiceNumber field.
At this point, your diagram will look similar to Figure 35.9. Notice that Power Pivot shows a line between the tables that you just connected. In database speak, these are referred to as joins.
The joins in Power Pivot are one-to-many joins. This means that when a table is joined to another, one of the tables has unique records with unique index numbers, while the other can have many records where index numbers are duplicated.
Notice that the join lines have arrows pointing from a table to another table. The arrows in these join lines will always point to the table that has the nonduplicated unique index.
In this case, the Customers table contains a unique list of customers, each having its own unique identifier. No CustomerID in that table is duplicated. The Invoice header table has many rows for each CustomerID; each customer can have many invoices.
If you need to go back and edit or delete a relationship between two tables in your data model, you can do so by following these steps:
Once you define the relationships in your Power Pivot data model, it's essentially ready for action. In terms of Power Pivot, action basically means analysis with a PivotTable. In fact, all Power Pivot data is presented through the framework of PivotTables (PivotTables are covered in Chapter 29, “Introducing PivotTables,” and Chapter 30, “Analyzing Data with PivotTables”).
The PivotTable shown in Figure 35.12 contains all the tables in the Power Pivot data model. With this configuration, we essentially have a powerful cross-table analytical engine in the form of a familiar PivotTable. Here you can see that we are calculating the average unit price by customer.
In the days before Power Pivot, this analysis would have been tricky. You would have had to build VLOOKUP
formulas to get from Customers to Invoice Header and then another set of VLOOKUP
s to get from Invoice Header to Invoice Details. After all that formula building, you still would have had to find a way to aggregate the data to average the unit price per customer.
As you'll discover in this section, you're not limited to using only the data that already exists in your Excel workbook. Power Pivot has the ability to reach outside the workbook and import data found in external data sources. Indeed, what makes Power Pivot so powerful is its ability to consolidate data from disparate data sources and build relationships between them. This means you can theoretically create a Power Pivot data model that contains some data from a SQL Server table, some data from a Microsoft Access database, and even data from a one-off text file.
One of the more common data sources used by Excel analysts are relational databases. It's not difficult to find an analyst who frequently uses data from Microsoft Access, SQL Server, or Oracle databases. In this section, we'll walk through the steps for loading data from external database systems.
SQL Server databases are some of the most commonly used for storing enterprise-level data. Most SQL Server databases are managed and maintained by the IT department. To connect to a SQL Server database, you'll have to work with your IT department to obtain read access to the database from which you're trying to pull data.
Once you have access to the database, open Excel and activate the Power Pivot window by selecting Power Pivot ⇨ Manage.
Once activated, select the From Other Sources command button on the Home tab. This will activate the Table Import Wizard dialog box shown in Figure 35.13. Here, select the Microsoft SQL Server option and then click the Next button.
The Table Import Wizard will now ask for all the information it needs to connect to your chosen data source. This includes things such as server address, login credentials, and any other database name. The wizard will show you different fields based on the type of data source you select. The more common fields you'll see when connecting to an external data source are as follows:
Server Name This is the name of the server that contains the database to which you are trying to connect. You will get this from your IT department when they give you access.
Log on to the Server These are your login credentials. Depending on how your IT department gives you access, you will select either Use Windows Authentication or Use SQL Server Authentication. The Use Windows Authentication option essentially means that the server will recognize you by your Windows login. The Use SQL Server Authentication option means that the IT department created a distinct username and password for you. If you select Use SQL Server Authentication, you will need to provide a username and password. Note that you will need at least READ
privileges for the target database to pull the needed data.
Database Name Every SQL Server can contain multiple databases. Enter the name of the database to which you are connecting. You will get this from your IT department when they give you access.
Once you enter all the pertinent information, click the Next button to get to the subsequent screen, as shown in Figure 35.14. Here you have the choice of selecting from a list of tables and views or writing your own custom query using SQL syntax. The latter involves writing your own SQL scripts. In most cases, you will choose the option to select from a list of tables.
The Table Import Wizard reads the database and shows you a list of all available tables and views (see Figure 35.15). The tables will have an icon that looks like a grid, while views will have an icon that looks like a box on top of another box.
The idea is to place a check next to the tables and views that you want to import. The Friendly Name column allows you to enter a new name that will be used to reference the table in Power Pivot.
It's important to remember that importing a table imports all of the columns and records for that table. This can have an impact on the size and performance of your Power Pivot data model. You will often find that you need only a handful of the columns from the tables you import. In these cases, you can use the Preview & Filter button.
Click the table name to highlight it in blue (as shown in Figure 35.15) and then click the Preview & Filter button. The Table Import Wizard will activate the preview screen illustrated in Figure 35.16. In this screen, you will see all the columns available in the table, with a sampling of rows.
Each column header has a check box next to it, indicating that the column will be imported with the table. Removing the check mark tells Power Pivot not to include that column in the data model.
You also have the option of filtering out certain records. Clicking the drop-down arrow illustrated in Figure 35.16 activates a filter menu that allows you to specify criteria to filter out unwanted records. This works just like the standard filtering in Excel.
Once you're done selecting your data and applying any needed filters, you can click the Finish button on the Table Import Wizard to start the import process. The import log shown in Figure 35.17 displays the progress of the import and summarizes the import actions taken after completion.
The final step in loading data from SQL Server is to review and create any needed relationships. Activate the Power Pivot window and click the Diagram View button on the Home tab. Power Pivot will display the diagram screen where you can view and edit relationships as needed (see “Creating relationships between your Power Pivot tables” earlier in this chapter).
Whether your data lives in Microsoft Access, Oracle, dBase, or MySQL, you can load data from virtually any relational database system. As long as you have the appropriate database drivers installed, you have a way to connect Power Pivot to your data.
Activate the Power Pivot window, and click the From Other Sources button on the Home tab. This will activate the same Table Import Wizard as shown previously in Figure 35.13.
Select the appropriate relational database system from the plethora of options. If you need to import data from Oracle, select Oracle. If you need to import data from Sybase, select Sybase.
Connecting to any of these relational systems takes you through roughly the same steps you saw when importing SQL Server data earlier in this chapter. You may see some alternate dialog boxes, however, based on the needs of the database system you select.
Understandably, Microsoft cannot possibly create a named connection option for every database system out there. So, you may not find your database system listed. In this case, simply select the option called Others (OLEDB/ODBC). Selecting this option opens the Table Import Wizard starting with a screen asking you to enter the connection string for your database system.
The term flat file refers to a file that contains some form of tabular data without any sort of structural hierarchy or relationship between records. The most common types of flat files are Excel files and text files. A ton of important data is maintained in flat files. In this section, you'll discover how to import these flat file data sources into the Power Pivot data model.
Earlier in this chapter, you created linked tables by loading Power Pivot with the data contained within the same workbook. Linked tables have a distinct advantage over other types of imported data in that they immediately respond to changes in the source data within the workbook. If you change the data in one of the tables in the workbook, the linked table within the Power Pivot data model automatically changes. The real-time interactivity you get with linked tables is really nice to have.
The drawback to linked tables is that the source data must be kept in the same workbook as the Power Pivot data model. This isn't always possible. You'll encounter plenty of scenarios where you'll need to incorporate Excel data into your analysis but that data lives in another workbook. In these cases, you can use Power Pivot's Table Import Wizard to connect to external Excel files.
Activate the Power Pivot window and click the From Other Sources button on the Home tab. This will activate the Table Import Wizard dialog box. Scroll down and select the Excel File option in Figure 35.18 and then click the Next button.
The Table Import Wizard will now ask for all the information it needs to connect to your target workbook. In this screen, you'll need to provide the following:
Excel File Path Enter the full path of your target Excel workbook. You can use the Browse button to search for and select the workbook from which you want to pull this information.
Use First Row as Column Headers In most cases, your Excel data will have column headers. Be sure to select the check box next to Use First Row as Column Headers to make sure that your column headers are recognized as headers when imported.
Once you enter all the pertinent information, click the Next button to get to the next screen, as shown in Figure 35.19. Here you'll see a list of all the worksheets in the chosen Excel workbook. Place a check mark next to the worksheets you want to import. The Friendly Name column allows you to enter a new name that will be used to reference the table in Power Pivot.
As discussed earlier in this chapter, you can use the Preview & Filter button to filter out unwanted columns and records if needed. Otherwise, continue with the Table Import Wizard to complete the import process.
As always, be sure to review and create relationships to any other tables you've loaded into the Power Pivot data model.
Text files are another type of flat file used to distribute data. These files are commonly outputs from legacy systems and websites. Excel has always been able to import text files. With Power Pivot, you can go further and integrate them with other data sources.
Activate the Power Pivot window, and click the From Other Sources button on the Home tab. This will activate the same Table Import Wizard dialog box illustrated in Figure 35.18. Select the Text File option and then click the Next button.
The Table Import Wizard will ask for all the information it needs to connect to the target text file. In this screen, you'll need to provide the following:
File Path Enter the full path of your target text file. You can use the Browse button to search for and select the file from which you want to pull this information.
Column Separator Select the character used to separate the columns in the text file. Before you can do this, you'll need to know how the columns in your text file are delimited. For instance, a comma-delimited file will have commas separating the columns. A tab-delimited file will have tabs separating the columns. The drop-down list in the Table Import Wizard includes choices for the more common delimiters: Tab, Comma, Semicolon, Space, Colon, and Vertical Bar.
Use First Row as Column Headers If your text file contains header rows, be sure to select the check box next to Use First Row as Column Headers. This ensures that the column headers are recognized as headers when imported.
You'll get an immediate preview of the data in the text file. As with other data sources we've covered here, you can filter out any unwanted columns simply by removing the check mark next to the column names. You can also use the drop-down arrows next to each column to apply any record filters.
Clicking the Finish button will immediately start the import process. Upon completion, the data from your text file will be part of the Power Pivot data model. As always, be sure to review and create relationships to any other tables that you've loaded into Power Pivot.
Power Pivot includes an interesting option for loading data straight from the clipboard, that is, pasting data you've copied from some other place. This option is meant to be used as a one-off technique to get useful information into the Power Pivot data model quickly.
As you consider this option, keep in mind that there is no real data source. It's just you manually copying and pasting. There is no way to refresh the data, and there is no way to trace back where you actually copied the data from.
Imagine that you received a Word document showing a list of branches in a table. You'd like to include this static list of branches in your Power Pivot data model, as shown in Figure 35.20.
You can copy the table and then go to the Power Pivot window and click the Paste command on the Home tab. This activates the Paste Preview dialog box illustrated in Figure 35.21, where you can review what exactly will be pasted.
There aren't that many options here. You can specify the name that will be used to reference the table in Power Pivot, and you can specify whether the first row is a header.
Clicking the OK button will import the pasted data into Power Pivot without a lot of fanfare. At this point, you can adjust the data formatting and create the needed relationships.
When you load data from an external data source into Power Pivot, you essentially create a static snapshot of that data source at the time of creation. Power Pivot uses that static snapshot in its internal data model.
As time goes by, the external data source may change and grow with newly added records. However, Power Pivot is still using its snapshot, so it can't incorporate any of the changes in your data source until you take another snapshot.
The action of updating the Power Pivot data model by taking another snapshot of your data source is called refreshing your data. You can refresh manually, or you can set up an automatic refresh.
On the Home tab of the Power Pivot window, you will see the Refresh command. Clicking the drop-down arrow below it will display two options: Refresh and Refresh All.
Use the Refresh option to refresh the Power Pivot table that's currently active. That is, if you are on the Customers tab in Power Pivot, clicking Refresh will reach out to the external data source and request an update for just the Customers table. This works nicely when you need to refresh strategically only certain data sources.
Use the Refresh All option to refresh all the tables in the Power Pivot data model.
You can configure your data sources to pull the latest data and refresh Power Pivot automatically. Go to the Data tab in the Excel Ribbon, and select the Queries & Connections command. This will activate the Queries & Connections task pane illustrated in Figure 35.22.
Click Connections at the top of the task pane and then double-click the connection with which you want to work.
With the Connection Properties dialog box open (see Figure 35.23), select the Usage tab. Here you'll find the following options:
Refresh data when opening the file Placing a check mark next to this option tells Excel to refresh the chosen data connection automatically upon opening the workbook. This will refresh all tables associated with that connection as soon as the workbook is opened.
Refresh this connection on Refresh All Earlier in this section, you discovered that you can refresh all the connections that feed Power Pivot by using the Refresh All command found on the Power Pivot Home tab. If the data connection in question imports millions of lines of data from an external data source, you may not want to slow down your machine each time you trigger a Refresh All. You can remove the check mark next to the Refresh This Connection on Refresh All option. This essentially tells the connection to ignore the Refresh All command.
There may be instances where you need to edit your source data connection after you've already created it. Unlike refreshing, where you simply take another snapshot of the same data source, editing the source data connection allows you to go back and reconfigure the connection. Here are a few reasons that you'll need to edit a data connection:
In the Power Pivot window, go to the Home tab and click the Existing Connections button. The Existing Connections dialog box shown in Figure 35.24 will open. Your Power Pivot connections will be under the Power Pivot Data Connections subheading. Choose the data connection that needs editing.
Once your target data connection is selected, look to the Edit and Open buttons. The button you click depends on what you need to change.
The Open button Lets you import a new table from the existing connection. This is handy when you inadvertently missed a table during the initial loading of data.