Tableau Public has a graphical user interface (GUI) that was designed to enable users to load data sources without having to write code. Since the only place to save Tableau Public documents is in Tableau's Cloud, data sources are automatically extracted and packaged with the workbook. (The ability to save extracts as separate documents or open extracts and share them with different users is a feature of Tableau Desktop Professional).
Connecting to data from a local file, that is, an access, excel, or text file saved on your computer, takes several steps that have little variability by data source, which will be illustrated as follows with detailed screenshots:
Now that you have learned what data sources look like and how they are structured, we will give you a couple of examples of data connections.
In the first exercise, we will connect to the World Bank's environment indicators. You can download this data, which is formatted either for Microsoft Excel or as a text file, at www.worldbank.org.
This data source is not formatted properly, which is shown in the following screenshot. It has a spacer row between the top of the worksheet and the headers' rows, and the years in which the measurements were taken were distributed as columns rather than individual dimensional values in a row.
Once we connect to the file, we will use Tableau Public's new data interpreter to clean up and pivot the rows:
Before loading data, it's important to know what the different parts of the data connection user interface are. The Tableau Public 9.x user interface that is used for the connection to data is shown in the following screenshot. Don't forget that the Start button remains in the upper-left corner of the UI. You can click on it from either the data source window or a worksheet to get back to the Start menu:
The parts of the user interface and their descriptions are as follows. The parts have been given in alphabetical references:
To load this file into Tableau Public, we will start with a new Tableau Public workbook. You can download the Tableau Public workbook that we used for this chapter by visiting https://public.tableau.com/profile/tableau.data.stories#!/. The following steps will guide you through how to connect a file to Tableau Public:
The name of the data source is showing a concatenation of the name of the workbook and the table name. Click on it (in the previous screenshot, it's A) to give the data source a good name. Remember that anything that you publish on Tableau Public is available for other people to download, and since they aren't able to see the actual origin of your data source, it's a good idea to give it an explicit name so that there are no errors of attribution.
Tableau Public 9.x has a new feature that is designed to reduce the amount of transformation that you need to do to your data sources. The data interpreter automatically detects where the first row of headers or data is in a Microsoft Excel data file, and if there are empty or semi-structured rows before the data, it can remove them. (The data interpreter does not work with text files.)
The following data source has the following major errors in it:
We will use the data interpreter to fix the first problem. In the following screenshot, you will see that Tableau Public has suggested that we use the data interpreter. The steps are as follows:
The two rows of garbage are now gone, and the field headers are populating properly. We still need to resolve the issue of the date dimension going across the columns rather than down a column. We can resolve this by pivoting the data.
Pivoting data is a capability designed to help you resolve issues within data sources, like in the previous example, where the date dimension is not formatted properly.
By highlighting the headers, you can pivot them from columns into rows by performing the following steps:
pivot field names
, and the measures now appear in a field called pivot field values
.Pivot field names
to Year
and Pivot field values
to Measure
.It's reasonable to expect that you won't need to load all the data in the data source. It is important to load only what you need because the more the data in the data source, the slower it will be. In the current example that we are using, there are many rows with null values. The reason that they have null values is that for the selected measure, no measurement was taken for certain time periods.
We have no reason to load rows with null values. Therefore, we can filter them as follows:
In this exercise, we will join the fact table with a dimension of the countries so that we can group the countries by region. A join is a logic statement in which you tell Tableau's data engine how two tables are related to each other. There are two parts to it, the join types (the left join, inner join, right join, or outer join) and the join conditions.
A join condition is where you tell Tableau Public's data engine, which is functioning as a database management system in this case, how the two tables are related. In order to join tables, you need to have at least one field whose contents occur in both tables. In the following example, we will join our tables by the country name so that we can see the corresponding region for each country.
In order to add new tables, you need to drag them from the list of tables on the left into the workspace next to the existing tables:
The data source is now ready to be used in a visualization. To create a visualization, click on a sheet number or name in the ribbon at the bottom. If you'd like to get back to the data source, you can click on the data source icon from any worksheet.
When you load a new data source, the following are some of the several items that you should check before you can use it:
Year
is formatted as a string, but it really should be a numberThe steps required to connect to OData are different from the steps required to connect to the previously mentioned sources because they involve web servers and network security. These steps are a subset of the steps in Desktop Professional that are used to connect to a server:
Another big difference between using local sources and online sources is that while the local sources can be refreshed with just a right-click on the data source name, online sources must connect to the website, which needs to be refreshed.
In order to refresh a web-based data source, perform the following steps:
Check out the visualization in the following screenshot, where we used this data source to graph the average CO2 emissions per capita by region since 1980: