Putting data into context

A critical mistake is to treat all data the same and use the same learning processes to consume and visualize it. Even with similar data, sales information, for example, could vary widely based on who collected it, what the data is about, when it was collected, why it was collected, and how it was collected:

  • Who: With the use of our stadium example, the sales information provided by the stadium owner's organization as opposed to that solicited from independent sellers may be quite different. In addition to who collected the data, who the data is about is also important. If the sales are reported on higher dollar merchandise verses dollar items, the count may be based on a sample rather than a physical count.
  • What: Ultimately, you want to know what your data is about, but before you can do that, you should know what surrounds the numbers, or what the data represents in the world. In our example, does it make sense if our data is one week's worth of sales (as compared to) or one season's?
  • When: Most data is linked to time in some way in that it might be a time series or a snapshot from a specific period. In both cases, you have to know when the data was collected. In our example, sales from 5 years ago when the team had a winning season may not be reflected in or compared to the current season when the team is losing games (and perhaps fans).
  • Why: It's also important to know the reason the data was collected, mostly as a sanity check for bias. Sometimes, data is collected, or even fabricated, to serve an agenda, and you should be wary of these cases. Again, in our example, the sales reported by a particular product supplier (team hats) may be intended to influence the stadium owner to order more products from them.
  • How: The how is often skipped since it tends to be complex and for a technical audience, but it's important to know how the data of interest was collected. Data scanned at registers as sales occur might be more enlightening then data manually collected from shelf stockers at the end of a day.

A final thought: learn all that you can about your data before anything else, and your analysis and visualization will be better.

Importance of data context

Without context, data can easily be misinterpreted and therefore be unusable. If the data is unusable, then any report or visualization based on it will also be unusable. As always, bad data is worse than no data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset