INTRODUCTION

It has been eight years since my last book and over 15 since I devoted one wholly to data quality. A lot has happened since then – all things data, including big data; the “internet of things“ data-driven cultures; advanced analytics; and chief data officers are penetrating every nook and cranny of every industry, company, department, and job. It’s an exciting time!

Much of my work continues to address data quality, perhaps the least sexy topic in the data space. But it is, in my humble opinion, the most important for two reasons.

First, improving data quality presents an immediate opportunity to free up cash and people for longer-term investments. Companies waste enormous amounts of money, most of it hidden, dealing with bad data. While estimates vary enormously, take 20 percent of revenue or 50 percent of day-in, day-out costs as a starting point. Companies that make diligent efforts to improve data can reduce such costs by up to 80 percent.

Continuing in this vein, leaders and managers can better run their companies and departments when they have data they can trust. Managing a company or department is difficult under the best of circumstances and, quite frankly, I don’t know how many executives do it today without trustworthy, complete, and comprehensive data.

I suspect that such savings are available to government agencies, nonprofits, scientific communities, and others but I’ve had less experience with them. Similarly, better data keeps us safer, advances equality, leads to better health care at lower cost, and on and on.

Second, bad data stands in the way of building a better future in data. Just as dirty fuel slows, even grounds, a supersonic jet, bad data stymies its full use and thereby prevents companies from gaining a competitive advantage. The disruption Uber brought to the taxi business, simply by connecting two pieces of data (“I’m looking for a ride” and “I’m looking for a fare”), should serve as a loud, clear message that data sparks whole new businesses. But many organizations find it difficult, even impossible, to take the right steps to improve data quality. One of the biggest obstacles is a persistent belief that responsibility for quality should reside with the tech department–”If it’s in the computer, it must be IT’s responsibility.” Misconceptions like this one make a company unfit for good data.

Companies must first address data quality if they are to find their futures. Hence, my title, Getting In Front on Data.

I help companies become fit for data by getting the right people, structures, and cultures in place. More specifically, I help leaders and provocateurs (a term I’ll explore more fully below) see the benefits, gain experience, and build the capabilities they need. Those experiences formed the basis of an article, “Data’s Credibility Problem,” that I wrote for Harvard Business Review in 2013. This book expands on that piece, resynthesizing my ideas and pulling together my work with clients (including some truly transformative successes and more than a few failures) over the past 30 years.

Improving data quality means getting in front on the issues that cause bad data. I hope that sounds obvious—this approach is my second motivation for my title. Obvious as it may appear, too many companies haven’t done what it takes to make it happen. This is too bad. After all, eliminating a single root cause may prevent tens of thousands of errors. Get the right people in the right roles and this happens quickly!

I’ve written this book with three specific audiences in mind:

  • Senior executives. I hope to turn them on to data quality, guide them on building needed organizational capabilities, and help them understand the specific actions they can take. There are enormous gains to be had, but these gains have to be seized!
  • Those charged with leading the data quality effort. My goal is to help these leaders orient their efforts more incisively.
  • Anyone who touches data in his or her job. That is just about everyone. After all, get the data you need and all goes well. Don’t get what you need and the job is far tougher! We have to make corrections, look for other sources, and sometimes even guess. In this respect, data quality is personal. We should all treat it as such!

Frankly, most of us are way too tolerant of bad data. This may be the most important reason that a solvable problem persists. It is time to demand better, to step up to our roles as data customers, clarify what we need most, and take steps to get it. It is also time to recognize that the data we create impacts the next person in line – our customers. We need to do a better job for them as well.

Thus, these roles as data customers and data creators, which we all play every day, are essential. I hope a light bulb goes off and we say to ourselves, “Yes of course I’m a data customer and yes of course I am a data creator!”

While these roles are at once obvious, they are also revolutionary. Getting started is a challenge, requiring provocateurs, who provoke the organization to get in front on data. My fondest hope is that this book unleashes the provocateur in all of us.

But of course, it is not enough that a few people get it. These roles must be driven across the entire work team, department, and company. That takes leaders, data quality teams, embedded data managers, people I call data maestros, with broad and deep expertise, and a specialist chief data architect. It also requires outstanding tech support.

I’ve studied the organic nature of data in organizations for a good long while. With the right people in the right spots, quality improves quickly. Done properly, getting in front on data is beautiful, even elegant. I hope more people come to appreciate this.

At the same time, I’m fully aware that people have less time to study. So I’ve crafted Getting in Front on Data as an instruction book and directed chapters toward specific roles. No matter who you are, you’ll find explicit instructions in here.

I’ve also kept it as concise as possible. Data quality issues come in dozens of forms, but I’ll focus on two that bedevil almost everyone. “The data is wrong” is the most common. “I don’t understand what the data means”, the second issue, is a bit more subtle. A classic example involves NASA losing a Martian lander because engineers confused English and metric units.1 Closer to home, almost all companies complain that their “systems don’t talk to each other,” a direct result of the lack of clarity in data definition.

My plan for this book is as follows:

Chapter 1 synthesizes the why, who, how, and when.

Chapter 2 is an unabashed “What’s in it for me?” First, it describes some simple tools to help develop a “case for improvement” at the work team, department, and company levels. Second, it describes the almost irresistible dynamic that causes people and companies to approach data quality the wrong way.

Then Chapters 3 through 8 consider the various data quality roles in turn. Chapter 3 focuses on customers, most of whom, as I’ve noted, have been way too tolerant of bad data for way too long.

Chapter 4, for data creators, consists of two parts. The first part covers “must-dos” for all creators; the second views data creation in the context of process management, an extremely powerful framework and organizing paradigm than more companies should employ.

Chapters 5 through 8 focus on provocateurs, data quality teams, senior leaders, and technologists, respectively.

Along the way, I’ve provided organization charts to help leaders understand the roles and people they need to put in place.

Chapter 9 presents two case studies in great depth. One features AT&T’s work to address the “this data is wrong” issue and the other examines Aera Energy’s efforts on the “I don’t understand what the data means” issue.

All of this may appear overwhelming. So Chapter 10, “Advancing Data Quality,” calls out a few important instructions for everyone. It also urges readers and their companies to get on with it.

This book will not cover every situation. In keeping the focus narrow, I leave it to readers to extend the getting in front approach to other situations. That said, two circumstances demand attention:

  • Those faced by decision makers and data scientists when demand for quality is high and time is short.
  • Automated measurement, connected devices, and the Internet of Things (IoT).

Appendices aim squarely at these.

Further, details and nuance abound. A more senior provocateur has wider latitude than a lower-level one. You’ll emphasize some things when your primary customer is internal and others when it is a paying customer. Finally, good data quality programs mirror their corporate cultures. For example, some companies have the discipline to tackle big, complicated improvement opportunities; others need to break them down into a series of smaller efforts.

So see yourself in these roles. Then dive in. Interpret the instructions in ways that best suit your circumstances and strengths. Have fun, and good luck!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset