Connecting to the data in Tableau Public

Tableau Public has a graphical user interface (GUI) that was designed to enable users to load data sources without having to write code. Since the only place to save Tableau Public documents is in Tableau's Cloud, data sources are automatically extracted and packaged with the workbook. (The ability to save extracts as separate documents or open extracts and share them with different users is a feature of Tableau Desktop Professional).

Connecting to data from a local file, that is, an access, excel, or text file saved on your computer, takes several steps that have little variability by data source, which will be illustrated as follows with detailed screenshots:

  1. Click on the Connect to Data Link option from the Data menu.
  2. Select the data source type.
  3. Select the file or website to which you want to connect.
  4. For a Microsoft Access, Microsoft Excel, or a text file, determine whether the connection is to one table or multiple tables or it requires a custom SQL connection:
    • If the connection is to one table, select the table.
    • If the connection is to multiple tables, select the option for the connection to multiple tables and identify the join conditions. We will discuss this in detail in the next section.
    • Alternatively, you can type or paste a custom SQL.
  5. When all the selections have been made, click on Ok.

Now that you have learned what data sources look like and how they are structured, we will give you a couple of examples of data connections.

In the first exercise, we will connect to the World Bank's environment indicators. You can download this data, which is formatted either for Microsoft Excel or as a text file, at www.worldbank.org.

This data source is not formatted properly, which is shown in the following screenshot. It has a spacer row between the top of the worksheet and the headers' rows, and the years in which the measurements were taken were distributed as columns rather than individual dimensional values in a row.

Once we connect to the file, we will use Tableau Public's new data interpreter to clean up and pivot the rows:

Connecting to the data in Tableau Public

The data source user interface

Before loading data, it's important to know what the different parts of the data connection user interface are. The Tableau Public 9.x user interface that is used for the connection to data is shown in the following screenshot. Don't forget that the Start button remains in the upper-left corner of the UI. You can click on it from either the data source window or a worksheet to get back to the Start menu:

The data source user interface

The parts of the user interface and their descriptions are as follows. The parts have been given in alphabetical references:

References

Description

A

This is the data source name, which will be modified in subsequent exercises

B

These are the data source filters, which can be used to limit the data that you load

C

This is the workspace, where you can add and join tables

D

These are individual tables

E

This is the Data Interpreter, which is available for Microsoft Excel files; we will learn how to turn it on and use it in subsequent exercises

F

Edit data source display by showing/hiding fields

G

This is the data

H

This is a link to sheets; you can click on this to go back to your worksheets

I

This is the Data Source button, which can be clicked on from any worksheet to get back to the data source

J

These are the tables within the data source, which can be dragged to the workspace to join to other workspace

K

This is the pivot or view grid of the data, which will be used in subsequent exercises

L

This is the data source, which can be changed by clicking on the orange link and then browsing to a new file

To load this file into Tableau Public, we will start with a new Tableau Public workbook. You can download the Tableau Public workbook that we used for this chapter by visiting https://public.tableau.com/profile/tableau.data.stories#!/. The following steps will guide you through how to connect a file to Tableau Public:

  1. Open a new instance of Tableau Public.
  2. From the Connect pane, click on the data file type to which you'd like to connect. In this case, we are using an excel file.
  3. Browse to the file to which you would like to connect.
  4. Drag a table from the list of tables, which is a list of different worksheets in this case, along with the workbook onto the workspace.
  5. Note that the values in the data source are now populating the space below the workspace, but at least with this data set, there is no complete set of field headers. We will edit the data source by using the data interpreter in the next exercise.

The name of the data source is showing a concatenation of the name of the workbook and the table name. Click on it (in the previous screenshot, it's A) to give the data source a good name. Remember that anything that you publish on Tableau Public is available for other people to download, and since they aren't able to see the actual origin of your data source, it's a good idea to give it an explicit name so that there are no errors of attribution.

Using the data interpreter

Tableau Public 9.x has a new feature that is designed to reduce the amount of transformation that you need to do to your data sources. The data interpreter automatically detects where the first row of headers or data is in a Microsoft Excel data file, and if there are empty or semi-structured rows before the data, it can remove them. (The data interpreter does not work with text files.)

The following data source has the following major errors in it:

  • There are two rows of mostly blank, non-data values before the first row of the valid data; we will use the data interpreter to fix this
  • The years in which the measurements were taken should be going down a column rather than across the columns; we will pivot the data in the next exercise to fix this

We will use the data interpreter to fix the first problem. In the following screenshot, you will see that Tableau Public has suggested that we use the data interpreter. The steps are as follows:

  1. Under the workspace, note that Tableau Public has recognized that the data might not be formatted properly and has suggested using the data interpreter. Click on the button that says Turn on:
    Using the data interpreter
  2. Now that the data interpreter is on, you can review the results in the following screenshot and see how the data was transformed:
    Using the data interpreter

The two rows of garbage are now gone, and the field headers are populating properly. We still need to resolve the issue of the date dimension going across the columns rather than down a column. We can resolve this by pivoting the data.

Pivoting data

Pivoting data is a capability designed to help you resolve issues within data sources, like in the previous example, where the date dimension is not formatted properly.

By highlighting the headers, you can pivot them from columns into rows by performing the following steps:

  1. Highlight the field headers that you need to pivot. In this case, we click on 1960 and scroll all the way to the right, holding down the Shift key as we select columns.
  2. Right-click on a selected header and choose Pivot.
  3. The pivoted fields now have transformed into two new columns—the headers that you selected appear as values in a new column called pivot field names, and the measures now appear in a field called pivot field values.
  4. Right-click on the headers for each of these fields and rename them. We renamed Pivot field names to Year and Pivot field values to Measure.
  5. Check out the following modified data source. It is now formatted properly, but there is one issue—the numerous rows with null values. We will edit those in the next exercise:
    Pivoting data

Filtering data sources

It's reasonable to expect that you won't need to load all the data in the data source. It is important to load only what you need because the more the data in the data source, the slower it will be. In the current example that we are using, there are many rows with null values. The reason that they have null values is that for the selected measure, no measurement was taken for certain time periods.

Tip

The null values are different from measurements of zero. Values of zero mean that a measurement was taken and the value was zero. Null means that no measurement was taken.

We have no reason to load rows with null values. Therefore, we can filter them as follows:

  1. In the upper-right corner under Filters, click on Add.
  2. Click on Add again.
  3. Select the field that you wish to filter. (We filtered on Measure).
  4. Since Measure is a measure and not a dimension, we see a continuous spectrum of values. But we want to include everything except null values.
  5. Click on the Special button on the upper-right side.
  6. Click on Non-null values.
  7. Click on OK.
  8. Click on OK again.

Note

The data source shows values of zero, but not null. The data source is almost complete. The only item that is remaining before we can start using filter is joining it with another table in the same data source.

Joining tables

In this exercise, we will join the fact table with a dimension of the countries so that we can group the countries by region. A join is a logic statement in which you tell Tableau's data engine how two tables are related to each other. There are two parts to it, the join types (the left join, inner join, right join, or outer join) and the join conditions.

  • The left join: This keeps all the records from the left (or first) table and the corresponding records from the right (or second) table.
  • The inner join: This keeps only the records from both the tables that match the join condition.
  • The right join: It is the opposite of the left join; it keeps all the records from the right table and only the corresponding records from the left table. Outer joins keep all the fields from all the tables.

    Tip

    The availability of join types depends on your data source. For this data source, we can create an inner join or a left join.

A join condition is where you tell Tableau Public's data engine, which is functioning as a database management system in this case, how the two tables are related. In order to join tables, you need to have at least one field whose contents occur in both tables. In the following example, we will join our tables by the country name so that we can see the corresponding region for each country.

Tip

Tableau Public will automatically join your tables on the first fields that occur alphabetically in both the tables and have the exact same field name and field type.

In order to add new tables, you need to drag them from the list of tables on the left into the workspace next to the existing tables:

  1. Drag the Country Metadata table into the workspace and drop it next to the Environmental Fact Table.
  2. Tableau Public automatically detects the field that occurs first in both data sources alphabetically and has the following properties:
    • The exact same field name, including capitalization and punctuation
    • The same field type
  3. You can view and edit the Join details by clicking on the Venn diagram icon between the tables, as shown in the following screenshot. In this case, our data sources are joined by Country Code, which is correct. In order to select different fields, click on the name of the joined field and replace it with someone more appropriate:
    Joining tables
  4. When you are satisfied that the join condition is correct, click on the Venn diagram icon again.

    Note

    The additional fields, with their source table name appended in parentheses, are included in the data set below the workspace.

  5. Just because a field is included in a join condition, it does not mean that you need to load it in the workbook. You also do not need duplicates of existing fields. For that reason, click on the Country Name (Country Metadata) and Country Code (Country Metadata) fields, which already occur in the fact table, and from their context menus, select Hide, as shown in the following screenshot:
    Joining tables
  6. If you would like to see the fields that you have hidden, click on the checkbox next to the Show Hidden Fields text above the data source.

The data source is now ready to be used in a visualization. To create a visualization, click on a sheet number or name in the ribbon at the bottom. If you'd like to get back to the data source, you can click on the data source icon from any worksheet.

When you load a new data source, the following are some of the several items that you should check before you can use it:

  • Confirm that all the dimensions, are in fact in the Dimensions pane, rather than in the Measures pane
  • Confirm that the data source types of all the fields are correct. For instance, in this data source, Year is formatted as a string, but it really should be a number
  • We can change the data type by right-clicking on the field, selecting Change Data Type, and choosing Number (Whole)
    Joining tables

Connecting to web-based data sources

The steps required to connect to OData are different from the steps required to connect to the previously mentioned sources because they involve web servers and network security. These steps are a subset of the steps in Desktop Professional that are used to connect to a server:

  1. Enter the URL of the website.
  2. Select the authentication method.
  3. Establish the connection.
  4. Name the data source.

Another big difference between using local sources and online sources is that while the local sources can be refreshed with just a right-click on the data source name, online sources must connect to the website, which needs to be refreshed.

In order to refresh a web-based data source, perform the following steps:

  1. Right-click on the data source name in the data pane.
  2. Click on Edit Connection.
  3. In the previous dialog box, which will be populated with the connection parameters, click on the Connect button in step 3 of the preceding list. It isn't necessary to repopulate the connection parameters or create a new connection to refresh the data.

Check out the visualization in the following screenshot, where we used this data source to graph the average CO2 emissions per capita by region since 1980:

Connecting to web-based data sources
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset