Chapter 12
In This Chapter
Making a custom function
Using custom functions in other queries
Creating a parameter query
In Chapter 8, you see that Power Query records all actions using its own formula language (known as the M language). When you connect to a data source and apply transformations to that data, Power Query diligently saves your actions as M code behind the scenes in query steps. The transformation steps can then be repeated when you refresh the data in your query.
That backstage coding is relatively transparent, and can, for the most part, be ignored for most data processing activities. In this chapter, I show you how to leverage the M language to extend the capabilities of Power Query to create your own custom functions and perform truly heroic data processing.
When building a custom function for Power Query, you’re essentially doing nothing more than creating a query and manipulating its M code to return a desired result. That result can be an array, a data table, or a single value.
To help you gain a sense of the general steps taken to create a custom function, I show you now how to build a basic mathematical function that calculates profit. This function should be able to take a revenue amount and a cost amount and output a profit amount using this basic mathematical operation:
Revenue – Cost = Profit
For basic functions such as this one, you can start with a blank query and simply enter the needed M code from scratch. Follow these steps:
Click the Data tab in Excel and select New Query ⇒ From Other Data Sources ⇒ Blank Query.
This step activates the Query Editor window.
let Profit = (Revenue, Cost)=>
Revenue-Cost
in Profit
Figure 12-1 illustrates what the code looks like in the Advanced Editor window.
At this point, you can select the Home tab of the Query Editor and click the Close & Load button.
As you can see in Figure 12-2, Power Query adds the query to the Workbook Queries pane as a connection-only query. Queries recognized as functions are automatically saved as connection-only.
You can now use this function in other queries that contain revenue and cost fields. For example, Figure 12-3 illustrates the contents of the Chapter 12 Sample text file, which you can find in the download files for this book.
This text file contains a table of invoices with the fields Qty, UnitCost, and UnitPrice. Your newly created function can be used to calculate profit using these fields.
To create a new query from this text file, follow these steps:
Click the Data tab in Excel and select New Query ⇒ From File ⇒ From Text.
This step opens the Import Data dialog box.
Browse for, and select, the Chapter_12 Sample text file.
Power Query opens the text file and opens the Query Editor.
While in the Query Editor, click the Add Column tab and then click the Add Custom Column button, as shown in Figure 12-4.
The Add Custom Column dialog box opens, and you can call the custom function and pass it the needed parameters.
= FunctionProfit([UnitPrice], [UnitCost])*[Quantity]
This syntax calls the FunctionProfit custom function and passes the UnitPrice and UnitCost fields as the required parameters. The results are then multiplied by the Quantity field. The Add Custom Column dialog box should look similar to the one shown in Figure 12-5.
Click the OK button to apply the custom column.
When you confirm the changes, Power Query triggers the function for each row in the data table.
Although this example is quite basic, it demonstrates that you can define a function that requires parameters and then use the function in other queries. This simple technique is the foundation for creating more useful functions.
When building a basic function, such as the profit function you create in the earlier section “Creating and Using a Basic Custom Function,” it’s no big deal to start from a blank query and enter all the code from scratch. But for more complex functions, it’s generally smarter to build a starter query via Query Editor and then manipulate the M code to accomplish what you need.
Imagine that you have a set of Excel files in a folder (see Figure 12-6). These files all contain a worksheet named MySheet that holds a table of data. The tables in each file have the same structure, but need to be combined into one file. This is a common task/nightmare that most Excel analysts have faced at one time or another. If you don’t have a solid knowledge of Excel VBA programming, this task typically entails opening each file, copying the data on the MySheet tab, and then pasting the data into a single workbook.
Power Query has the ability to make short work of this task, but it requires a bit of direction via a custom function. Now, it would be difficult for most anyone to start from a blank query and type out the M code for the relatively complex function needed for this endeavor. Instead, you could build a starter query via Query Editor and then wrap the query in a function.
To help you understand this concept, I present the following steps:
Use the Query Editor to apply some basic transformation actions.
For example, in the Applied Steps shown in Figure 12-8, you see that First Row step was promoted to column headers and a few unneeded columns were removed.
When all the needed transformations are complete, open the Advanced Editor window by clicking the View tab and selecting the Advanced Editor command.
Figure 12-9 demonstrates that as you build out the starter template, Power Query diligently creates the bulk of the code for your function. Note in the portion of the code that’s highlighted in gray (for illustration), Power Query has hard-coded the file path and filename of the Excel file that was originally selected. The idea is to wrap this starter code in a function that passes a dynamic file path and filename.
Wrap the entire block of code with function tags, specifying that this function requires two parameters: FilePath and FileName. Also replace the hard-coded file path and filename with each respective parameter.
Here’s the syntax shown in Figure 12-10:
let GetMyFiles=(FilePath, FileName) =>
let
Source = Excel.Workbook(File.Contents(FilePath&FileName)),
MySheet1 = Source{[Name="MySheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(MySheet1),
#"Removed Columns" =
Table.RemoveColumns(#"Promoted Headers",{"Branch_Number", "Effective_Date"})
in
#"Removed Columns"
in GetMyFiles"
Click the Home tab of the Query Editor and click the Close & Load button.
At this point, the custom function is ready to be used on all files in the target folder.
In the From Folder dialog box, provide Power Query with the file path of the target folder.
The Query Editor window activates to show you a table similar to the one shown in Figure 12-11. This table contains a record for each file in the chosen folder. The columns you’re interested in are Folder Path and Name, which provide the function with the needed FilePath and FileName parameters.
Click the Add Column tab, and then click the Add Custom Column command.
The Add Custom Column dialog box opens.
Invoke the function and pass the Folder Path and Name fields as parameters separated by commas (see Figure 12-12).
When you confirm your changes, Power Query triggers the function for each row in the data table. The function itself grabs the data from each file and returns a table array. Figure 12-13 shows the newly created custom column with a returned table array for each file (specified by a green Table hyperlink).
Click the Expand icon for your new custom column.
You see a list of fields included in each table array, as shown in Figure 12-14.
Choose which fields in the table array to show, click the Expand radio button, and then click the OK button.
With each table array expanded, Power Query exposes the columns pulled from each Excel file and adds the detailed records to the data preview. Figure 12-15 illustrates the data preview for the final combined table.
As you look at the final combined view, don’t lose track of the fact that this relatively complex task was facilitated by a simple custom function. For all the steps required to accomplish this task, you expend very little effort on creating the code for the function. Power Query writes the code for the core functionality, and you simply wrap that code into a function.
The takeaway here is that you don’t have to be an expert on Power Query’s M language to pull together effective and useful custom functions. You can leverage the Query Editor to create some base code and then adjust from there.
A parameter query is a kind of query that relies on one or more parameters to run. Although that sounds suspiciously like the custom functions covered earlier in this chapter (after all, they ran on parameters), there is a subtle difference.
A parameter query is one where you provide the parameters. So rather than have the parameters come from a predefined query, you enter the parameters. This comes in handy when creating interactive reporting for others to consume.
In this section, I walk you through creating your first parameter query.
To create a proper parameter query, you first have to understand the parameters necessary to make your reporting interactive. The best way to gain this understanding is to explore the target data source.
In this scenario, I’ll tell you how to build an interactive view of the top-grossing films for any given year and month. To accomplish this task, leverage the Box Office Mojo website. Box Office Mojo provides an array of box office reporting tools, including a monthly index of top-grossing films.
The URL for the monthly index includes a yr parameter and a month parameter. Enter this URL into any browser and you see a list of the top-grossing films of December 2015:
www.boxofficemojo.com/monthly/?yr=2015&month=12
A look at the website (shown in Figure 12-16) confirms that the URL opens a web page that contains the table you would expect to see: an index of movies for December 2015 box office. The parameters in the URL are working as expected.
Now that you know the year and month number are the parameters, you can get started.
The best place to start is to create the base query. The base query is essentially the one that will pull the data you’re working toward. In this scenario, you create a query that pulls the table shown in Figure 12-16 from the Box Office Mojo website.
Follow these steps:
Use the Query Editor to rename columns and apply any transformations that are needed to clean the web data.
Figure 12-17 illustrates a clean table that makes up the base query.
Wrap the entire block of code with function tags, specifying that this function requires two parameters: YearNum and MonthNum. Also replace the hard-coded year and month in the URL with each respective parameter.
Here’s the final syntax shown in Figure 12-18:
let TopMovies=(YearNum, MonthNum) =>
let
Source = Web.Page(Web.Contents(
"http://www.boxofficemojo.com/monthly/?yr=" & Number.ToText(YearNum) & "&month=" & Number.ToText(MonthNum))),
Data0 = Source{0}[Data],
#"Renamed Columns" =
Table.RenameColumns(Data0,{
{"Movie Title (click to view)", "Movie Title"},
{"Total Gross /Theaters", "Total Gross"},
{"Total Gross /Theaters2", "Theaters"}}),
#"Removed Columns" =
Table.RemoveColumns(#"Renamed Columns",
{"Opening /Theaters",
"Opening /Theaters2", "Open", "Close"})
in
#"Removed Columns"
in
TopMovies
You now have a fnGetTopMovies function, which can be used to pull web data from a custom function, and it’s ready to be used on all files in the target folder.
The final step is to create the parameter query. To do so, you need a simple table that will serve as the feeder for your dynamic parameters.
Staying in the same workbook where you created fnGetTopMovies, create a table similar to the one shown in Figure 12-19.
From here, follow these steps:
Place the cursor in the parameter table, and then select Data ⇒ New Query ⇒ From Table.
The Create Table dialog box opens.
Click OK to continue.
The Query Editor opens with the parameter table.
In the Add Custom Column dialog box, invoke the fnGetTopMovies function, passing the year and month fields as parameters (see Figure 12-20).
Because you’re mixing data from the web with data from Excel (though the parameter table can hardly be considered data), Power Query initiates a few data-privacy precautionary measures.
Click Continue.
The Privacy Levels dialog box opens, as shown in Figure 12-21.
Select Public for both the Current Workbook option and the website. Click the Save button to confirm and save the privacy levels.
Power Query, at this point, imports data from the website based on the year and month in the parameter table.
The data imports as a table array, so click the green Table hyperlink.
Alternatively, you can click the Expand icon.
Now that you’re basically done, it’s time to think about where the query should be loaded. If you simply click the Close & Load button, Power Query outputs the final parameter query in its own worksheet. However, it would be more practical to have the parameter table and query results on the same worksheet. This way, you can edit the parameters and see the results without having to flip between worksheets.
Figure 12-24 illustrates the final parameter query. Take a moment to think about what’s happening here. With this parameter query, you enter a year and a month and click Refresh (or press Ctrl+Alt+F5). Power Query then dynamically imports data back from the Internet based on the parameters you entered — all without your having to enter more than three lines of M language syntax. Truly amazing.