Fixing data issues

Often, data is not entirely clean. That is, it has problems that need to be corrected before meaningful analysis can be accomplished. For example, dates may be incorrectly formatted or fields may contain a mix of numeric values and character codes that need to be separated into multiple fields. We'll look in depth in many ways of working with messy data in Chapter 9, Making Data Work for You.  Here, we'll consider how calculated fields can often be used to fix these kinds of issues.

We'll continue working with the Apartment Rentals data. You'll recall that the start and end dates looked something similar to this:

Start date End date
May 01 Dec 31
Aug 01 Dec 2
Feb 16 Mar 02
... ...

 

Without the year, Tableau does not recognize the Start Date or End Date fields as dates. Instead, Tableau recognizes them as strings. Using the drop-down menu on the fields to change the data type to Date results in Tableau incorrectly parsing the string value (because it uses the day value as a year). This is a case where we'll need to use a calculation to fix the issue.

Assuming you are confident that the year should be 2016 in each case, you might create a calculated field named Start Date (fixed) with code:

DATE([Start Date] + ", 2016")

and another field named End Date (fixed) with the code:

DATE([End Date] + ", 2016")

What these calculated fields do is concatenate the year onto the existing string and then use the DATE() function to convert the string into a date value. Indeed, Tableau recognizes the resulting fields as dates (with all the features of a date field, such as built-in hierarchies). A quick check in Tableau reveals the expected results:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset