Chapter 3. Connecting to Data

Visualizations depend on the data in them, and however aesthetically pleasing your visualization might be, it may be misleading or even wrong, unless the data has been formatted, aggregated, and properly represented.

This chapter discusses the major elements of finding, cleansing, understanding, formatting, and aggregating data that you will need to understand in order to produce accurate visualizations that tell compelling stories, including the following elements:

  • Where you can get publicly available data and how to use it
  • What tables and databases are
  • The data formats that Tableau Public connects to
  • Databases, tables, dimensions, facts, and field formats and conventions
  • Preparing data to load it into Tableau
  • Connecting to the data from Tableau Public
  • Using the data interpreter
  • Pivoting fields

Public data

The data sets that are publicly available or the ones that you have compiled on your own, are ideal for Tableau Public. Since all users will be able to download this data and create their own visualizations once you have published your workbook, your data set should not contain information that may be considered sensitive, which can be anything that can be used to identify a private individual or reveal confidential corporate information or intellectual property.

Public data is readily available online. Tableau Public maintains a catalog of publicly available data. Much of this data is produced by various governments, economic groups, and sports fans, along with a link to, and a rating for each source. This catalog is updated monthly, and it is a great introduction to using publicly available data. You can find it at http://public.tableau.com/s/resources.

The Google Public Data Explorer has a large collection of public data, including economic forecasts and global public health data. This tool is unique because it allows users to make simple visualizations from all the original data sources without having to investigate the source data, though most of it is available by linking available resources.

There are several tools available for the scraping of data from public sites too, such as ScraperWiki, import.io, and IFTTT, among others. These industries and such tools pertaining to the industry evolve rapidly. Therefore, we will not discuss any specific tool. Social media applications, such as Twitter, have made it possible for individuals and companies to build application programming interfaces (APIs) to connect to their data streams. This is useful for nonprogrammers because it's often free of cost if you wish to scrape data about specific topics, hashtags, or users with a minimal amount of coding.

Not all data is public data, and it's very important to determine whether a data set is public before using it. None of us wants to end up being sued, with a ruined reputation, or as a victim. If your source data set has identifying characteristics, first and last name, address, and financial, geolocation, medical, or federally or state protected data, it should be removed or de-identified and then saved separately before saving the visualization to Tableau Public (or not used at all). Each state has guidelines on what is considered protected information, and it's a good idea to check the restrictions in case there's even the slightest chance that your data set has sensitive information in it.

Additionally, data from a corporation should never be used unless the corporation has given the permission to use it. (Did we mention lawsuits? Being fired also isn't any fun.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset