Chapter 9

Power Query Connection Types

In This Chapter

arrow Extracting data from files

arrow Getting data from external databases

arrow Importing from other nonstandard data systems

arrow Changing data source settings

Microsoft has invested a great deal of time and resources in ensuring that Power Query has the ability to connect to a wide array of data sources. Whether you need to pull data from an external website, a text file, a database system, Facebook, or a web service, Power Query can accommodate most, if not all, of your source data needs.

You can see all available connection types by clicking on the New Query drop-down arrow on the Data tab. As Figure 9-1 illustrates, Power Query offers the ability to pull from a wide array of data sources, as described in this list:

  • From File: Pulls data from specified Excel files, text files, CSV files, XML files, or folders
  • From Database: Pulls data from a database such as Microsoft Access, SQL Server, or SQL Server Analysis Services
  • From Azure: Pulls data from Microsoft’s Azure Cloud service
  • From Other Sources: Pulls data from a wide array of Internet, cloud, and other ODBC data sources
image

Figure 9-1: Power Query has the ability to connect to a wide array of text, database, and Internet data sources.

In this chapter, I help you explore the various connection types that can be leveraged to import external data.

Importing Data from Files

Organizational data is often stored in files such as text files, CSV files, and even other Excel workbooks. It’s not uncommon to use these kinds of files as data sources for data analysis. Power Query offers several connection types that enable the importing of data from external files.

remember The files you import don’t necessarily have to be on your own PC. You can import files on network drives as well as in cloud repositories such as Google Drive and Microsoft OneDrive.

Getting data from Excel workbooks

You can import data from other Excel workbooks by selecting Data ⇒ New Query ⇒ From File ⇒ From Workbook from the Excel Ribbon.

Excel opens the Import Data dialog box, shown in Figure 9-2. Use this dialog box to browse for the Excel file you want to work with. Note that you can import any kind of Excel file, including macro-enabled workbooks and template workbooks.

image

Figure 9-2: Browse for the Excel file that contains the data you want imported.

remember Power Query won’t bring in charts, pivot tables, shapes, VBA code, or any other objects that may exist within a workbook. Power Query simply imports the data found in the used cell ranges of the workbook.

After you’ve selected a file, the Navigator pane activates (see Figure 9-3), showing you all the data sources available in the workbook.

image

Figure 9-3: Select the data sources you want to work with, and then click the Load button.

The idea here is to select the data source you want and then either load or edit the data using the buttons at the bottom of the Navigator pane. Click the Load button to skip any editing and import your targeted data as is. Click the Edit button if you want to transform or shape before completing the import.

In terms of Excel workbooks, a data source is either a worksheet or a defined named range. The icons next to each data source let you distinguish which sources are worksheets and which are named ranges. In Figure 9-3, the source named MyNamedRange is a defined named range, and the source named National Parks is a worksheet.

You can import multiple sources at a time by selecting the Select Multiple Items check box and then placing a check mark next to each worksheet and named range that you want imported.

Getting data from CSV and text files

Text files are commonly used to store and distribute data because of their inherent ability to hold many thousands of bytes of data without having an inflated file size. Text files can do this by foregoing all the fancy formatting, leaving only the text.

A comma-separated value (CSV) file is a kind of text file that contains commas to delimit (separate) values into columns of data.

Text files

To import a text file, select Data ⇒ New Query ⇒ From File ⇒ From Text on the Excel Ribbon. Excel opens the Import Data dialog box, where you can browse for, and select, a text file.

warning Excel has another From Text command button on the Data tab, under the Get External Data group. This duplicate command is actually the legacy import capability found in all Excel versions. The Power Query version is much more powerful, allowing you to shape and transform text data before importing. Be sure to use the correct Power Query version of the From Text feature.

Power Query opens the Query Editor to show you the contents of the text file you just imported. As you can see in Figure 9-4, text files are imported as a table with one column containing a row for each line in the text file.

image

Figure 9-4: Text files are brought into the Query Editor, where you can apply your edits and then click the Close & Load command to complete the import.

The idea here is to apply any changes you want to make to the data and then click the Close & Load command on the Home tab to complete the import.

remember Some text files are structured as tab-delimited files. Similar to comma-separated (CSV) files, tab-delimited text files contain tab characters that separate text values into columns of data. Power Query recognizes tab-delimited text files and imports these files into a table that contains a separate column for each tab delimiter.

CSV files

To import a CSV file, go to the Excel Ribbon and select Data ⇒ New Query ⇒ From File ⇒ From CSV. Excel opens the Import Data dialog box, where you can browse for and select your target CSV file.

Power Query opens the Query Editor to show you the contents of the CSV file you just imported. Power Query excels at recognizing the correct delimiters in CSV files and typically does a good job of importing the data correctly.

For example, Row 5 in the sample CSV file illustrated in Figure 9-5 contains the value Johnson, Kimberly. Power Query contains the intelligence to know that the comma in that value is not an actual delimiter. So all the columns are separated correctly.

image

Figure 9-5: CSV files are brought into the Query Editor, where you can apply your edits and then click the Close & Load command to complete the import.

You can use the Query Editor to apply any edits you may need and then click the Close & Load command on the Home tab to import the CSV data.

Getting data from XML files

XML files are a family of text files that contain data wrapped in markup (tags that denote structure and meaning). These tags essentially make XML files machine-readable, which essentially means that any application or web-based solution designed to read XML files can discern the structure and content of the data within.

The markup tags found in XML files are quite robust and make importing XML files a bit tricky. Without getting too geeky, some XML files are attribute-based: They contain simple markup defining a table of columns and rows. Other XML files are element-based: They contain a wide array of complex markup defining intricate hierarchical tables with many levels of data.

Power Query has the built-in intelligence to handle any kinds of XML file just fine, but you often need to dig for the data you want imported.

Depending on the markup within, Power Query starts you off with either the Navigator pane or the Query Editor. Although this can be a bit jarring at first, you can quickly get the gist of drilling into the data you need.

You can start importing an XML file by going to the Excel Ribbon and selecting Data ⇒ New Query ⇒ From File ⇒ From XML. Excel opens the Import Data dialog box, where you can browse for, and select, the target XML file.

tip You can import an XML file from the web by simply entering the URL of the file into the Import Data dialog box. For example, you can enter the following line to get to the FinalXMLOutput file found on datapigtechnologies.com:

www.datapigtechnologies.com/FinalXMLOutput.xml

Attribute-based XML files

If the XML file you’ve pointed to is attribute-based, Power Query opens the Query Editor and shows the highest-level content it finds. You often see a single row of high-level attributes, similar to Figure 9-6.

image

Figure 9-6: Attribute-based XML files often start with one line of high-level values.

You can get to the data within the XML file in one of two ways:

  • Click on the Table hyperlink to drill into the next level of data.
  • Click on the Expand icon to view select the fields found in the next level of data (see Figure 9-7). Simply select the fields you want to see, and then click the OK button.
image

Figure 9-7: Click the Expand icon to view and drill into the next layer of data.

remember Every XML file is different, so don’t anticipate drilling down (expanding to the next level of data, as shown in Figure 9-7) any specific number of times. For some files, you may have to drill down only one level. Other files may require you to drill into several layers before getting to the data you need.

After you drill into the data you need, you can click the Close & Load command on the Home tab to complete the import.

Element-based XML files

If the XML file you’ve pointed to is element-based, Power Query opens the Navigator pane. As you can see in Figure 9-8, you need to drill into a varying number of elements to get to the element that contains the data you’re looking for.

image

Figure 9-8: Element-based XML files are initially displayed in the Navigator pane.

After you find the correct data layer, you can select the data source you want and then use the buttons at the bottom of the Navigator pane to either load or edit the data.

Getting data from folders

Power Query has the ability to use the Windows file system as a data source, enabling you to import a list of folder contents for a specified directory. This comes in handy when you need to create a list of all the files in a particular folder.

From the Excel Ribbon, select Data ⇒ New Query ⇒ From File ⇒ From Folder. The dialog box shown in Figure 9-9 opens, asking you to enter or browse for the folder (directory) you want to use.

image

Figure 9-9: Browse for the target folder.

Power Query opens the Query Editor with a new table containing the contents of the specified folder. As Figure 9-10 illustrates, this new table details the key attributes for each file, such as filename, file extension, date created, and date modified. You can even click the Expand icon in the Attributes field and choose to display some of the more advanced attributes for each file.

image

Figure 9-10: Power Query creates a useful list of files.

After you have all the attributes you need, you can click the Close & Load command on the Home tab to complete the import.

tip The files that are listed include all files contained in subfolders inside the folder you specified. Unfortunately, the resulting output is not hyperlinked back to the actual folder contents. In other words, you can’t open the individual files from the query table.

Importing Data from Database Systems

In smart organizations, the task of data management is not performed by Excel; rather, it’s performed primarily by database systems such as Microsoft Access and SQL Server. Databases like these not only store millions of rows of data, but also ensure data integrity and allow for the rapid search and retrieval of data by way of queries and views.

A connection for every database type

Power Query offers options to connect to a wide array of database types. Microsoft has been keen to add connection types for as many commonly used databases as it can.

Relational and OLAP databases

Choose Data ⇒ New Query ⇒ From Database and you see the list of databases shown in Figure 9-11. Power Query has the ability to connect to virtually any database commonly used today: SQL Server, Microsoft Access, Oracle, MySQL, etc.

image

Figure 9-11: Power Query offers connection types for many of the popular database systems now in use.

Azure databases

If your organization has a Microsoft Azure cloud database or a subscription to Microsoft Azure Marketplace, an entire set of connection types is designed to import data from Azure databases (see Figure 9-12). You can get to these connection types by choosing Data ⇒ New Query ⇒ From Azure.

image

Figure 9-12: Tools for connection to Microsoft Azure cloud database services.

ODBC connections to nonstandard databases

If you’re using a unique, nonstandard database system that isn’t listed under From Database (refer to Figure 9-11) or From Azure (refer to Figure 9-12), not to worry: As long as your database system can be connected to via an ODBC connection string, Power Query can connect to it.

Choose Data ⇒ New Query ⇒ From Other Data Sources to see a list of other connection types. Click the From ODBC option shown in Figure 9-13 to start a connection to your unique database via an ODBC connection string.

image

Figure 9-13: Starting an ODBC connection.

Getting data from other data systems

In addition to ODBC, Figure 9-13 illustrates other kinds of data systems that can be leveraged by Power Query.

Some of these data systems (SharePoint, Dynamics CRM, Salesforce, and Microsoft Exchange) are popular systems that are used in many organizations to store data, track sales opportunities, and manage emails. Other systems, such as OData Feeds and Hadoop, are less-common services used to work with very large volumes of data. These are often mentioned in conversations about big data. And of course, the From Web option (demonstrated in Chapter 8) is an integral connection type for any analyst who leverages data from the internet.

Clicking any of these connections opens a set of dialog boxes customized for the selected connection. These dialog boxes ask for the basic parameters that Power Query needs in order to connect to the specified data source; parameters such as file path, URL, server name, and credentials.

Each connection type requires its own, unique set of parameters, so each of their dialog boxes is different. Luckily, Power Query rarely needs more than a handful of parameters to connect to any single data source, so the dialog boxes are relatively intuitive and hassle-free.

Walk-through: Getting data from a database

It would be redundant to walk through the process of connection to every type of database available. However, it would be useful to walk through the basic steps of connecting a database.

Here are the steps for connecting to one of the more ubiquitous database systems — Microsoft Access:

  1. Choose Data ⇒ New Query ⇒ From Database ⇒ From Microsoft Access Database.
  2. Browse for your target database. You can use the Facility Services.accdb database, found in the sample files for this book.

    After Power Query connects to the database, the Navigator pane, shown in Figure 9-14, activates. There, you see all database objects available to you, including tables and views (or queries, in Access lingo).

  3. Click the Sales_By_Employee view.

    The Navigator pane displays a preview of the Sales_By_Employee data. If you want to transform or shape this data, click the Edit button. In this case, the data looks fine as is.

  4. Click the Load button to complete the import.

    After a bit of processing, Power Query loads the data to a new Excel worksheet and adds the new query to the Workbook Queries pane, as shown in Figure 9-15.

image

Figure 9-14: Select the view you want imported, and then click the Load button.

image

Figure 9-15: The final imported database data.

remember You can select multiple tables and views by selecting the Select Multiple Items check box and then placing a check mark next to each database object you want imported.

remember The icon next to each database object distinguishes whether that object is a table or a view. Views have an icon that looks like two overlapping grids. See the icon for the Sales_By_Employee view, shown in Figure 9-14, to get the idea.

It’s a best practice to use views whenever possible. Views are often cleaner data sets because they’re already optimized to include only the columns and data that are necessary. (This improves query performance and helps minimize the workbook’s file size.) In addition, you don’t need to have an intimate knowledge of the database architecture. Someone with that knowledge has already done the work for you — joined the correct tables, applied the appropriate business rules, and optimized output, for example.

Managing Data Source Settings

Every time you connect to any web-based data source or data source that requires some level of credentials, Power Query caches (stores) the settings for that data source.

Suppose that you connect to a SQL Server database, enter all your credentials, and import the data you need. At the moment of successful connection, Power Query caches information about that connection in a file located on your local PC. It includes the connection string, username, password, and privacy settings, for example.

The purpose of all this caching is so that you don’t have to reenter credentials every time you need to refresh your queries. That’s nifty, but what happens when your credentials are changed? Well, the short answer is those queries will fail until the data source settings are updated.

You can edit data source settings by activating the Data Source Settings dialog box. To do so, choose Data ⇒ New Query ⇒ Data Source Settings, as demonstrated in Figure 9-16.

image

Figure 9-16: Activating the Data Source Settings dialog box.

The Data Source Settings dialog box, shown in Figure 9-17, contains a list of all credentials-based data sources previously used in queries. Select the data source you need to change, and then click the Edit button.

image

Figure 9-17: Edit a data source by selecting it and clicking the Edit button.

Another dialog box opens — this time, specific to the data source you selected (see Figure 9-18). This dialog box enables you to edit credentials as well as other data privacy settings.

image

Figure 9-18: The credentials editing screen for your selected data source.

Click the Edit button to make changes to the credentials for the data source. The credentials editing screen will differ based on the data source you’re working with, but again, the input dialog boxes are relatively intuitive and easy to update.

Power Query caches data source settings in a file located on your local PC. Even though you may have deleted a particular query, the data source setting is retained for possible future use. This can lead to a cluttered list of old and current data sources. You can clean out old items by selecting the data source in the Data Source Settings dialog box and clicking the Delete button.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset