The databases, tables, dimensions, facts, field formats and conventions

Data that is retrieved from different sources will invariably have different structures. Some of these data resources need more formatting than others in order to turn them into clean, usable tables.

As previously mentioned, a table might be as simple as having a single digit in a text file. As long as users know what that digit represents, they can assign a qualitative or quantitative value to it. Imagine a situation where you are collecting rainfall measurements. Entering the amount of rainfall as subsequent rows of text into a new file constitutes a table.

The amount of rainfall is a measure; it is a quantitative fact. A dimension is a field that contains qualitative data. In this case, both the time of the day and the location of the measurement will be dimensions. Dimensions are typically formatted as date, string, or character fields, while measures are formatted as numbers. Text files do not have field formats, which are considered metadata, but Microsoft Excel and Microsoft Access do contain this information.

Note

It's important to make sure that the formats for fields of the same type (the date or the primary/foreign key) are consistent between worksheets in a workbook or tables in an Microsoft Access database, because Tableau Public automatically joins only the fields with the same format and the exact same name (including capitalization). If your field names are not the same but they should be joined using join conditions, you can join them manually.

Another common dimension is a unique identifier, which assigns a non repetitive value to each object in a set. A phone number is a unique identifier as it is related to only one phone at a time. The same is the case with a social security number. Within a data set, a person's name will be considered a unique identifier if it is not repeated; if it were, then a different unique identifier would be used to identify individual people. Thus, it's common to use numerical fields as primary keys for individuals, and these numerical fields can be used across multiple tables and across different dimensions.

Tables need to be structured so that the field names (dimensions and facts) go across and the rows of data (the dimension values and measure facts), go down the table. There are some databases that transpose data because their querying engines are optimized to search across columns rather than down the rows, but most DMBSes are not columnar, and Tableau is not built to search rows.

The following table is a great example of a table that is structured properly. This is the 2012 NFL performance data that is freely available:

The databases, tables, dimensions, facts, field formats and conventions

Each column is a field; the dimensions are Player and Team, and the measures are Receptions, Yards, and Average. The primary key is Player. There is only one row for each player. Team is the foreign key, as it may be the primary key in other tables, such as the aggregations by team.

Conversely, knowing what not to do is as instructive as knowing what to do. The following table, which shows the population (in millions) by country, is a good example of what not to do:

The databases, tables, dimensions, facts, field formats and conventions

The problem with this table is that the years, which actually are qualitative descriptions of when each population measurement was made, are used as separate columns even though the year is a dimension and should run down the page. (Dates are dimensions too and not facts, because they describe something qualitative). If we loaded this table in Tableau Public, we would see a separate measure field for each year because Tableau Public recognizes each column as a distinct field. (In one of the following examples, we will use Tableau Public's new data interpreter to structure the data source properly).

The correct structure for this table will have three columns, namely [Country], [Year], and [Population], with a separate row for each combination of country and year.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset