APPENDIX B: Special Instructions for Automated Measurement, Connected Devices, and the Internet of Things

APPENDIX B44

Many people intuitively think that data created “by computer” or some automated measurement device is far superior to data created by humans. I’ve not done a careful study, but in my experience this is not the case. Consider the clock on my office file cabinet. This clock synchronizes itself at 1:00 AM every day using a time signal sent out by the National Institute of Standards and Technology, from Fort Collins, Colorado and has an advertised precision in the fraction-of-a-second range. You’d think I could trust it.

Not so. I began to suspect it wasn’t perfect one Friday morning when it read 9:32. That meant I was already two minutes late for a meeting that always starts promptly and I had to complete some critical preparation before joining. Stressed, I completed the prep and joined the call, about ten minutes late. But no one else was on. A bit of investigation revealed the clock was running twenty minutes ahead!

So I started a log of clock errors. After some two years, I’ve noted fifteen separate instances. In one instance the clock was off by three hours. I can rationalize that one—perhaps the clock thought I wanted Pacific time, not Eastern time. But in eight instances, both the time and date were incorrect. In one case it had to be off by about three years. I cannot figure those out. Nor can I figure out the twenty-minute error cited above. Interestingly, this smallest error is the only one that caused any real grief.

A rough estimate of the error rate of my clock is 2.4%,45 well within the range of the error rate for a typical attribute.

The story provides an apt warning of the dangers in connected devices. The Internet of Things (IoT) is bringing billions of new connected devices into our lives. From Fitbit activity trackers to Nest thermostats, to devices embedded in engines and factory machines, excitement is high and the potential is enormous. But as the clock example suggests, it will be much more difficult to achieve that potential if too much IoT data is bad.

Note that DQ measurements made in Chapter 4 included an entire data record, perhaps 15 -25 attributes and an overall error rate of 30%. My clock is a single attribute only, and so a 2.5% error rate is toward the high-end of the “what’s typical” range.

Further, bear in mind that clocks are remarkably simple compared to other devices such as accelerometers, locators, and chemical assays. Similar data quality issues can impact all—the Fitbit doesn’t count steps on the treadmill (at least in my experience); a device in the electric grid quits working (then mysteriously starts again); the gears in a weather vane fill with sand and compromise the measurement. If I move the scale in my bathroom just a few inches, my measured weight can change by three pounds. Frankly, I find all measurement devices quirky. Don’t trust them until you really understand them and they’ve proven themselves.

Issues associated with data definitions also impact such devices. Individually, most such issues are pretty mundane—a measurement made in yards (English units) but interpreted in meters (metric units) and a misinterpreted relationship between “steps” and “distance walked” (the Fitbit) are good examples. But there are quite literally thousands of such issues, any one of which can trip you up.

Of course, the problem is not just the devices per se. The whole idea of connected devices is that they work together to do things they can’t do alone. And here problems grow more acute. For a single device, it is good enough to know whether the units are English or metric. For multiple devices all the units of measure must align to perform even the simplest tasks. It just won’t do if your Nest device measures your house temperature in Fahrenheit and transmits it to a utility company that records temperature in Celsius. This means that people and organizations have to agree on how they will measure things.

Standards are the obvious answer, but they take a devilishly long time and much effort. As my sometimes co-author Tom Davenport points out, the development of the EPCGlobal (electronic product code for radio frequency identification) standard took about 15 years. The development of the ANSI X12 standard for electronic data interchange took about 14 years. We don’t want to wait that long for anything these days, and standards development could really slow down the penetration of IoT devices.

So how do the instructions of this book apply to the IoT and other connected devices? I’ve already noted that “bad data is like a virus. There is no telling where it will end up or the damage it will cause.” With viruses the basic idea is to try to prevent them in the first place and do all you can to contain them after that.

For data quality and the Internet of Things, preventing the virus starts with excellent design, manufacturing, and installation of the IoT device. Since such devices are typically purchased from another company (a semiconductor manufacturer, for example) you must view yourself as a customer for all the measurements the device will ever make. You must insist the device actually measures what it purports to measure. This implies both specification of the intended measurement and rigorous testing, under both laboratory and real-world conditions, to ensure this actually occurs. Make sure you know what a “step” is and that device actually count them properly.

The specification should spell out operating conditions. Recall that the Fitbit doesn’t work so well on the treadmill. The specification should spell out everything you’ll need to use the device successfully in practice: what you need to do to install and test it, its expected lifetime, how you’ll know when it is time to maintain or replace the device, and so forth.

Insist on two levels of calibration from your device supplier. First, there should be rigorous calibration before the device leaves the factory, and an “on-installation” calibration to ensure that the device works as expected. Second, ongoing calibration is required to make sure the device continues to work properly. Ideally, the on-installation and ongoing calibration routines should be built-in and automated.

To contain bad data, devices should come equipped with so-called “I’m not working right now” and “I’m broken and must be replaced” features, which do exactly what their names suggest.

Finally, you should not expect perfection, particularly with new devices. But you must insist on rapid improvement. So it is critical that either you, or the manufacturer on your behalf, aggregate and analyze the results of all these steps, looking for patterns that suggest improvements. Seek answers to basic questions such as: can the devices really be trusted? Are they lasting as long as expected? What is causing them to fail?

To carry out the above, clarify who in your organization is responsible for the performance of these devices. Ideally, embed the devices, the data they create, and the responsible person within a process.

To conclude, I wish to emphasize that measurement has always been difficult. Not surprisingly, quality has always been a huge issue. The attention scientists have devoted to making trusted measurements underscores the importance of high-quality data to analytics and data science. We should expect no less in business.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset