Chapter 8

Introducing Power Query

In This Chapter

arrow Installing and activating Power Query

arrow Spelling out the Power Query basics

arrow Understanding Query steps

arrow Managing existing queries

arrow Overviewing query actions

In information management, the term ETL (Extract, Transform, Load) refers to the three separate functions typically required to integrate disparate data sources: extract, transform, and load. The extraction function refers to the reading of data from a specified source and extracting a desired subset of data. The transformation function refers to the cleaning, shaping, and aggregating of data to convert it to the desired structure. The loading function refers to the actual importing or writing of the resulting data to a target location.

Excel analysts have been manually performing ETL processes for years — although they rarely call it ETL. Every day, millions of Excel users manually pull data from a source location, manipulate that data, and integrate it into their reporting. This process requires lots of manual effort.

Power Query enhances the ETL experience by offering an intuitive mechanism to extract data from a wide variety of sources, perform complex transformations on that data, and then load the data into a workbook or the Internal Data Model.

In this chapter, you explore the basics of the Power Query Add-in. You also get a glimpse of how it can help you save time and automate the steps needed to ensure that clean data is imported into your reporting models.

Installing and Activating a Power Query Add-In

In Excel 2016, Power Query isn’t an add-in — it’s a native feature of Excel, just like charts and pivot tables are native features. If you’re working with Excel 2016, you don’t have to install any additional components. You’ll find Power Query in Excel 2016 hidden on the Data tab, in the Get & Transform group (see Figure 8-1).

image

Figure 8-1: In Excel 2016, the Power Query commands are found in the Get & Transform group on the Data tab.

If you’re working with Excel 2010 or Excel 2013, you need to explicitly download and install the Power Query add-in. As of this writing, the Power Query add-in is available to you only if you have one of these editions of Office or Excel:

  • Office 2010 Professional Plus: Available for purchase through any retailer
  • Office 2013 Professional Plus: Available through volume licensing only
  • Office 365 Pro Plus: Available with an ongoing subscription to Office365.com
  • Excel 2013 Stand-alone Edition: Available for purchase through any retailer

If you have any of these editions, you can install and activate the Power Query add-in. Simply enter the search term Excel Power Query add-in into your favorite search engine to find the free installation package.

Note that Microsoft offers Power Query for both Excel 2010 and Excel 2013 in both 32- and 64-bit platforms. Be sure to download the version that matches your version of Excel as well as the platform on which your PC is running.

After the add-in is installed, activate it by following these steps:

  1. Open Excel and look for the Power Query command on the Insert tab (see Figure 8-2).

    If you see it, the Power Query add-in is already activated. You can skip the remaining steps.

  2. From the Excel Ribbon, choose File ⇒ Options.
  3. Choose the Add-Ins option on the left, and then look for the Manage drop-down list at the bottom of the dialog box. Select COM Add-Ins and then click Go.
  4. Look for Power Query for Excel in the list of available COM add-ins. Select the check box next to each one of these options and click OK.
  5. Close and restart Excel.
image

Figure 8-2: In Excel 2010 and 2013, the Power Query add-in is exposed via its own tab on the Ribbon.

A successful install results in a new Power Query tab on the Excel Ribbon.

Power Query Basics

In this section, I walk you through a simple example of using Power Query. Imagine that you need to import Microsoft Corporation stock prices from the past 30 days by using Yahoo! Finance. For this scenario, you need to perform a web query to pull the data you need from Yahoo! Finance.

Starting the query

To start the query, follow these steps:

  1. If you have Excel 2016, select the New Query command on the Data tab, and then select From Other Sources ⇒ From Web (see Figure 8-3).

    If you’re working with Excel 2010 or Excel 2013, click the Power Query tab and select the From Web command.

  2. In the dialog box that appears, enter the URL for the data you need, as shown in Figure 8-4.

    In this example, you type http://finance.yahoo.com/q/hp?s=MSFT.

    After a bit of gyrating, the Navigator pane shown in Figure 8-5 appears. You can select the data source that you want to extract. Click on each table to see a preview of the data.

  3. In this case, Table 4 holds the historical stock data you need, so click Table 4 in the list box on the left and then click the Edit button.

    You may have noticed that the Navigator pane, shown in Figure 8-5, offers a Load button (next to the Edit button). You can use this button to skip any editing and import your targeted data as is. If you’re sure that you won’t need to transform or shape your data in any way, click the Load button to import the data directly into the data model or a spreadsheet in your workbook.

    warning Excel has another From Web command button, on the Data tab in the Get External Data group. This unfortunate duplicate command is the legacy web-scraping capability found in all Excel versions since Excel 2000.

    The Power Query version of the From Web command (choose New Query ⇒ From Other Sources ⇒ From Web) goes beyond simple web scraping. Power Query can pull data from advanced web pages and then manipulate it. Make sure you’re using the correct feature when pulling data from the web.

    When you click the Edit button, Power Query activates a new Query Editor window, which contains its own Ribbon and a preview pane that shows a preview of the data (see Figure 8-6). You can apply certain actions to shape, clean, and transform the data before importing.

    The idea is to work with each column shown in the Query Editor, applying the necessary actions that will give you the data and structure you need. You can dive deeper into column actions later in this chapter. For now, continue toward the goal of getting the last 30 days of stock prices for Microsoft Corporation.

  4. Right-click the Date column to see the available column actions, as shown in Figure 8-7. Select Change Type and then Date to ensure that the Date field is formatted as a proper date.
  5. Remove all unnecessary columns by right-clicking each one and selecting Remove.

    (Besides the Date field, the only other columns you need are the High, Low, and Close fields.)

    Alternatively, you can hold down the Ctrl key on the keyboard, select the columns you want to keep, right-click any selected column, and then choose Remove Other Columns (see Figure 8-8).

  6. To ensure that the High, Low, and Close fields are formatted as proper numbers, hold down the Ctrl key on the keyboard, select the three columns, and right-click and choose Change Type ⇒ Decimal Number.

    After you do this, you may notice that some of the rows show the word Error. These are rows that contain text values that could not be converted.

  7. Remove the error rows by selecting Remove Errors from the Table Actions list (next to the Date field), as shown in Figure 8-9.
  8. After all errors are removed, add a Week Of field that displays which week each date in the table belongs to.

    Here’s how to do this:

    1. Right-click the Date field and select the Duplicate Column option.

      A new column is added to the preview.

    2. Right-click the newly added column, select the Rename option, and then rename the column Week Of.
  9. Select the Transform tab on the Power Query Ribbon, look to the right to find the Date & Time Column group, and then choose Date ⇒ Week ⇒ Start of the Week, as shown in Figure 8-10.

    Excel transforms the date to display the start of the week for a given date.

  10. When you’ve finished configuring your Power Query feed, save and output the results.

    To do this, click the Close & Load drop-down menu on the Home tab of the Power Query Ribbon to reveal the two options shown in Figure 8-11:

    • Close & Load: Saves your query and outputs the results as an Excel table to a new worksheet in your workbook.
    • Close & Load To: Opens the Load To dialog box, where you can choose to output the results to a specific worksheet or to the Data Model. Alternatively, you can choose to save the query only as a query connection, and then you can use the query in various in-memory processes without needing to output the results.
image

Figure 8-3: Starting a Power Query web query.

image

Figure 8-4: Enter the target URL containing the data you need.

image

Figure 8-5: Select the correct data source and then click the Edit button.

image

Figure 8-6: The Query Editor window allows you to shape, clean, and transform data.

image

Figure 8-7: Right-click the Date column and choose to change the data type to a date format.

image

Figure 8-8: Select unneeded columns, and then select Remove Other Columns to get rid of them.

image

Figure 8-9: You can click the Table Actions icon to select actions (such as Remove Errors) that you want applied to the entire data table.

image

Figure 8-10: The Power Query Ribbon can be used to apply transformation actions such as displaying the start of the week for a given date.

image

Figure 8-11: The Load To dialog box gives you more control over how the results of queries are used.

At this point, you have a table similar to the one shown in Figure 8-12, which can be used to produce the pivot table you need.

image

Figure 8-12: Your final query pulled from the Internet: transformed, put into an Excel table, and ready to use in a pivot table.

Take a moment to appreciate what Power Query allowed you to do just now. With a few clicks, you searched the Internet, found some base data, shaped the data to keep only the columns you needed, and even manipulated that data to add an extra Week Of dimension to the base data. This is what Power Query is about: enabling you to easily extract, filter, and reshape data without the need for any programmatic coding skills.

Understanding query steps

Power Query uses its own formula language (known as the “M” language) to codify your queries. As with macro recording, each action you take when working with Power Query results in a line of code being written into a query step. Query steps are embedded M code that allow your actions to be repeated each time you refresh your Power Query data.

You can see the query steps for your queries by activating the Query Settings pane. While in the Query Editor window, you choose View ⇒ Query Settings. You can also place a check mark in the Formula Bar option to enhance your analysis of each step with a formula bar that displays the syntax for the given step.

The Query Settings pane appears to the right of the preview pane, as shown in Figure 8-13. The formula bar is located directly above the preview pane.

image

Figure 8-13: You can view and manage query steps in the Applied Steps section of the Query Settings pane.

Each query step represents an action you took to get to a data table. You can click on any step to see the underlying M code in the Power Query formula bar. For example, clicking the step called Removed Errors reveals the code for that step in the formula bar.

remember When you click on a query step, the data shown in the preview pane shows you what the data looked like up to and including the step you clicked. For example, in Figure 8-13, clicking the step before the Removed Other Columns step lets you see what the data looked like before you removed the non-essential columns.

You can right-click on any step to see a menu of options for managing your query steps. Figure 8-14 illustrates the following options:

  • Edit Settings: Edit the arguments or parameters that defines the selected step.
  • Rename: Give the selected step a meaningful name.
  • Delete: Remove the selected step. Be aware that removing a step can cause errors if subsequent steps depend on the deleted step.
  • Delete Until End: Remove the selected step and all following steps.
  • Move Up: Move the selected step up in the order of steps.
  • Move Down: Move the selected step down in the order of steps.
  • Extract Previous: Create a new query using the steps prior to the selected step. This feature is covered in Chapter 11.
image

Figure 8-14: Right-click on any query step to edit, rename, delete, or move the step.

Refreshing Power Query data

Power Query data is in no way connected to the source data used to extract it. A Power Query data table is merely a snapshot. In other words, as the source data changes, Power Query doesn’t automatically keep up with the changes; you need to intentionally refresh your query.

If you chose to load your Power Query results to an Excel table in the existing workbook, you can manually refresh by right-clicking on the table and selecting the Refresh option.

If you chose to load your Power Query data to the Internal Data Model, you need to open the Power Pivot window, select your Power Query data, and then click the Refresh command on the Home tab of the Power Query window.

To get a bit more automated with the refreshing of queries, you can configure your data sources to automatically refresh the Power Query data. To do so, follow these steps:

  1. From the Data tab on the Excel Ribbon, select the Connections command.

    The Workbook Connections dialog box appears.

  2. Select the Power Query data connection you want to refresh and then click the Properties button.

    The Properties dialog box opens.

  3. Select the Usage tab.
  4. Set the options to refresh the chosen data connection:
    • Refresh Every X Minutes: Tells Excel to automatically refresh the chosen data every specified number of minutes. Excel refreshes all tables associated with that connection.
    • Refresh Data When Opening the File: Tells Excel to automatically refresh the chosen data connection after opening the workbook. Excel refreshes all tables associated with that connection as soon as the workbook is opened.

These refresh options are useful when you want to ensure that your customers are working with the latest data. Of course, setting these options does not preclude the ability the manually refresh the data using the Refresh command on the Home tab.

Managing existing queries

As you add various queries to a workbook, you need a way to manage them. Excel accommodates this need by offering the Workbook Queries pane, which enables you to edit, duplicate, refresh, and generally manage all existing queries in the workbook. Open the Workbook Queries pane by selecting the Show Queries command on the Data tab of the Excel ribbon.

You need to find the query you want to work with and then right-click it to take any one of the actions described in the following list (see Figure 8-15):

  • Edit: Open the Query Editor, where you can modify the query steps.
  • Delete: Delete the selected query.
  • Refresh: Refresh the data in the selected query.
  • Load To: Activate the Load To dialog box, where you can redefine where the selected query’s results are used.
  • Duplicate: Create a copy of the query.
  • Reference: Create a new query that references the output of the original query.
  • Merge: Merge the selected query with another query in the workbook by matching specified columns.
  • Append: Append the results of another query in the workbook to the selected query.
  • Send to Data Catalog: Publish and share the selected query via a Microsoft Power BI server that your IT department sets up and manages.
  • Move to Group: Move the selected query into a logical group that you create for better organization.
  • Move Up: Move the selected query up in the Workbook Queries pane.
  • Move Down: Move the selected query down in the Workbook Queries pane.
  • Show the Peek: Show a preview of the query results for the selected query.
  • Properties: Rename the query and add a friendly description.
image

Figure 8-15: Right-click any query in the Workbook Queries pane to see the available management options.

The Workbook Queries pane is especially useful when your workbook contains several queries. Think of it as a kind of table of contents that allows you to easily find and interact with the queries in your workbook.

Understanding Column-Level Actions

Right-clicking a column in the Query Editor opens a context menu that shows a full list of the actions you can take. You can also apply certain actions to multiple columns at one time by selecting two or more columns before right-clicking. Figure 8-16 shows the available column-level actions, and Table 8-1 describes their purpose, as well as a few other actions that are available only on the Query Editor Ribbon.

image

Figure 8-16: Right-click any column to see the column-level actions you can use to transform the data.

Table 8-1 Column-Level Actions

Action

Purpose

Available with Multiple Columns?

Remove

Remove the selected column from the Power Query data.

Yes

Remove Other Columns

Remove all non-selected columns from the Power Query data.

Yes

Duplicate Column

Create a duplicate of the selected column as a new column placed on the far right end of the table. The name given to the new column is Copy of X, where X is the name of the original column.

No

Remove Duplicates

Remove all rows from the selected column where the values duplicate earlier values. The row with the first occurrence of a value isn’t removed.

Yes

Remove Errors

Remove rows containing errors in the selected column.

Yes

Change Type

Change the data type of the selected column to any of these types: Binary, Date, Date/Time, Date/Time/Timezone, Duration, Logical, Number, Text, Time, or Using Locale (which localizes data types to the country you specify).

Yes

Transform

Change the way values in the column are rendered. You can choose from the following options: Lowercase, Uppercase, Capitalize Each Word, Trim, Clean, Length, JSON, and XML. If the values in the column are date/time values, the options are Date, Time, Day, Month, Year, or Day of Week. If the values in the column are number values, the options are Round, Absolute Value, Factorial, Base-10 Logarithm, Natural Logarithm, Power, and Square Root.

Yes

Replace Values

Replace one value in the selected column with another specified value.

Yes

Replace Errors

Replace unsightly error values with your own, friendlier text.

Yes

Group By

Aggregate data by row values. For example, you can group by state and either count the number of cities in each state or sum the population of each state.

Yes

Fill

Fill empty cells in the column with the value of the first non-empty cell. You have the option to fill up or fill down.

Yes

Unpivot Columns

Transpose the selected columns from column-oriented to row-oriented or vice versa.

Yes

Rename

Rename the selected column to a name you specify.

No

Move

Move the selected column to a different location in the table. You have these choices for moving the column: Left, Right, To Beginning, and To End.

Yes

Drill Down

Navigate to the contents of the column. This option is used with tables that contain metadata representing embedded information.

No

Add as New Query

Create a new query with the content of the column, by referencing the original query in the new one. The name of the new query is the same as the column header of the selected column.

No

Split Column (Ribbon only)

Split the value of a single column into two or more columns, based on a number of characters or a given delimiter, such as a comma, semicolon, or tab.

No

Merge Column (Ribbon only)

Merge the values of two or more columns into a single column that contains a specified delimiter, such as a comma, semicolon, or tab.

Yes

tip All column-level actions available in Power Query are also available on the Query Editor Ribbon, so you can either choose the convenience of right-clicking to quickly select an action or use the more visual Ribbon menu. A few useful column-level actions are found only on the Ribbon, as described in Table 8-1.

Understanding Table Actions

While you’re in the Query Editor, Power Query lets you apply certain actions to an entire data table. You can see the available table-level actions by clicking the Table Actions icon, shown in Figure 8-17.

image

Figure 8-17: Click the Table Actions icon in the upper-left corner of the Query Editor Preview pane to see the table-level actions you can use to transform the data.

Table 8-2 lists the table-level actions and describes the primary purpose of each one.

Table 8-2 Table-Level Actions

Action

Purpose

Use First Row as Headers

Replace each table header name with the values in the first row of each column.

Add Custom Column

Insert a new column after the last column of the table. The values in the new column are determined by the value or formula you define.

Add Index Column

Insert a new column containing a sequential list of numbers starting from 1, 0, or another specified value you define.

Choose Columns

Choose the columns you want to keep in the query results.

Remove Duplicates

Remove all rows where the values in the selected columns duplicate earlier values. The row with the first occurrence of a value set isn’t removed.

Keep Top Rows

Remove all but the top N number of rows. You specify the number threshold.

Keep Bottom Rows

Remove all but the bottom N number of rows. You specify the number threshold.

Keep Range of Rows

Remove all rows except the ones that fall within a range you specify.

Remove Top Rows

Remove the top N rows from the table.

Remove Bottom Rows

Remove the bottom N rows from the table.

Remove Alternate Rows

Remove alternate rows from the table, starting at the first row to remove and specifying the number of rows to remove and the number of rows to keep.

Remove Errors

Remove rows containing errors in the selected columns.

Merge Queries

Create a new query that merges the current table with another query in the workbook by matching specified columns.

Append Queries

Create a new query that appends the results of another query in the workbook to the current table.

tip All table-level actions available in Power Query are also available on the Query Editor Ribbon, so you can either choose the convenience of right-clicking to quickly select an action or use the more visual Ribbon menu.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset