Connecting to data sources

In the previous recipe, we inserted data into the Tableau workbook by simply copying and pasting. In the real world, however, we need to be able to connect to different data sources that may contain large amounts of data.

We will now look at connecting to multiple data sources at a time. This is a useful way of enriching our data. We have access to multiple data sources. We can open up Tableau and connect numerous data sources.

First, we will see how we can connect to the Windows Azure Datamarket cloud data source, and then continue to connect to the local Excel file. Windows Azure Marketplace is an online market to buy and sell finished Software as a Service (SaaS) applications and premium data. Some data on Windows Azure Datamarket is free. We will be using one of the free data samples, which will give us a lot of information about individual countries, such as the country code, population, size, and so on. In data warehousing terminology, this data can be considered as a dimension, which is another way of describing data. In this definition, it is a field that can be considered an independent variable, regardless of the datatype. Tableau has a more specific definition of a dimension. Tableau treats any field containing qualitative, categorical information as a dimension, such as a date or a text field.

To connect the online data and local data, we will connect to Windows Azure Datamarket using OData, which is a standardized protocol to provide Create, Read, Update, Delete (CRUD) access to a data source via a website. It is the data API for Microsoft Azure, but other organizations use it as well, such as eBay, SAP, and IBM.

Getting ready

Before you start, you need to create a folder where you can download data to run through the examples. You should pick a folder name that is meaningful for you. Also, be sure to select a location that has plenty of space. In this example, we will use the following location to store data: D:DataTableauCookbook. For the example in this chapter, we will create a folder called Chapter 1.

How to do it…

  1. To connect to Windows Azure Datamarket, please sign up for a free account using a Windows Live ID. To do this, please visit https://datamarket.azure.com/ and follow the instructions. This may involve activating your account via a link, so please follow the instructions carefully.
  2. Sign in to Windows Azure Datamarket and navigate to the following URL:

    https://datamarket.azure.com/dataset/oh22is/countrycodes#schema

  3. About half way down the page, look for the Sign Up button and click on it.
  4. This will take you to a terms and conditions page. After you've read the terms and conditions, and, if you agree with them, tick the box to specify that you agree and click on Sign Up.
  5. This will take you to a Thank You page. Look for the EXPLORE THIS DATASET link on this page and click on it, as shown in the following screenshot:
    How to do it…
  6. When you click on EXPLORE THIS DATASET, you will be able to see the data appear in the browser, which you can slice and dice. Here is an example screenshot:
    How to do it…
  7. In this example, we will load the data in Tableau rather than in the Data Explorer URL. To do this, we need the primary account key. In Windows Azure Datamarket, this is easy to obtain. From the previous example, we can see a feature called Primary Account Key. If you click on the Show link next to Primary Account Key, then your primary account key will appear.
  8. Copy the primary account key to your clipboard by selecting it and pressing the CTRL + C keys. You will need the primary account key to access the data using Tableau.
  9. You will also need to get the OData feed for the Country Codes data of the Windows Azure Datamarket Country Codes store. To get the OData feed, you can see it under the sentence URL for current expressed query, and you should copy this information.
  10. Before you proceed, you should note the OData URL and the primary account key. Select them and press the CTRL + C keys simultaneously. The following table shows an example of how your data might look:

    OData URL

    https://api.datamarket.azure.com/oh22is/CountryCodes/v1/CountryCodes

    Primary account key

    Aaa0aaAa0aAa00AAaAAA0aaA0AaaOa0aAaeAaA1AAA

  11. To connect to Windows Azure Datamarket, let's open up Tableau and open the Chapter 1 Demo workbook that we started in the Getting ready section of the Showing the power of data visualization recipe.
  12. Go to the Data menu item and choose Connect to Data….
  13. This action takes you to the Connect to Data window, and you can see that there are a variety of data sources for you to choose from! A sample of the list can be seen in the next screenshot:
    How to do it…
  14. In this example, we are interested in connecting to Windows Azure Datamarket. Here, we will use the information that we saved earlier in this section. You will need the OData connection link. The connection panel only needs a few items in order to connect to the Country Codes data in Windows Azure, and an example can be seen in the next screenshot:
    How to do it…
  15. Insert the OData URL into the textbox labeled Step 1: Select or enter a URL.
  16. Next, take a look at the step labeled Step 2: Enter authentication information, select the radio button next to the option Use an Account key for Windows Azure Marketplace DataMarket, and insert the account key into the textbox. Then, click on the Connect button.
  17. All being well, the data connection will be successful and we can save the Tableau workbook before proceeding to connect to the Excel data source.
  18. We will download the GNI data from the World Bank. The URL is as follows:

    http://data.worldbank.org/indicator/NY.GNP.PCAP.CD?page=1

  19. To do this, open an Internet browser and navigate to the URL. You can see the web page in the following screenshot:
    How to do it…
  20. You will see a button called DOWNLOAD DATA, which is on the right-hand side.
  21. Click on this button and you will be presented with two options: EXCEL and XML. We will download all of the data in Excel format.
  22. Before accessing the data source, let's save the file into the directory that you created earlier.
  23. Once the file is saved, open it in Excel and take a look. If you don't see any data, don't be alarmed.

You will see that there are three sheets and the workbook may open on Sheet 2. This will only provide metadata about the data held in the worksheet, and we need to look at Sheet 1. Then, we'll perform the following steps:

  1. Let's rename Sheet 1 to something more meaningful. Right-click on the sheet tab name and rename it as GNI.
  2. Remove the first two rows of the file. They will only add noise to the import.
  3. Once you've done this, save the workbook. Now, you can exit Excel. We will go back to Tableau to connect to the data.
  4. To connect to the Excel file, go to the Data menu item. Select Connect to Data… and a browser will appear.
  5. Navigate to the location where the files are stored.
  6. Select the worksheet to analyze and double-check whether you are looking at the correct file.
  7. We are given the option of selecting a single file, multiple files, or a custom SQL.
  8. At step 3, labelled Does the data include field names in the first row?, select the option No.
  9. At step 4, labelled Give the connection a name for use in Tableau, type in GNI and click on OK.
  10. We can now decide whether to import the data or to connect live. We will connect live to the Excel workbook.
  11. Now, we can see the Tableau workbook in the following screenshot. In the Data view at the top, we can see two connections: our Windows Azure Datamarket connection and our Excel file connection.
    How to do it…
  12. If we want to flip between each data source, we can click on each connection and see that the dimensions and measures change in response.

How it works…

Tableau connects to each data source and talks to it using drivers that are specific to each datatype. For example, Tableau has some connectors to popular programs, such as R, Google Analytics, and Salesforce.

You can find more information about drivers on the Tableau website at the following link:

http://www.tableausoftware.com/support/drivers

There's more…

Tableau will connect to each data source independently. Even though they are different types of data sources, they appear to look the same in Tableau. From the user perspective, this is very useful since they should not be distracted by the differences in the underlying data source technologies. This means that the user can focus on the data rather than trying to put the data into one data source. Further more, it means that the sources of data can be refreshed easily because the Tableau visualization designer is able to connect directly to the source, which means that the data visualization will always be up to date.

See also

  • Tableau can import data into its own in-memory engine. We will look at this in a later section.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset