CHAPTER 8: Great Data Quality Programs Need Great Tech

Tech departments are in tight spots when it comes to data quality. Many people automatically assume Tech is responsible for all things data. A sort of “if it’s in the computer, it must be Tech” logic. But of course Tech is neither an important data creator nor customer, so it is singularly ill-positioned to do much about quality. Still, when the data doesn’t meet their needs, people blame Tech.

Similarly, companies want to increase capacity and decrease costs by automating their business processes. But to paraphrase Dr. W. Edwards Deming: “Don’t automate processes that produce junk. You’ll just produce more junk faster.”37 While Dr. Deming was talking about factory automation, his words are no less true for computers and data. Automating a broken business process just produces more bad data. Companies made this mistake over and over. Then they blame Tech.

This dynamic may take place with hidden data factories as well. A department or company recognizes it has an ineffective data factory and asks Tech to automate it. Results do not satisfy. And Tech gets blamed.

At the same time, companies want to do more with their data and they need technology to do so. They know they need better data, but they think that the latest technology, be it an enterprise system, a vendor-hosted application, a data warehouse, software as a service, data lakes, a master data management or data governance tool, or cloud-based hosting will somehow correct previously-erred data and ensure quality going forward. It just doesn’t happen this way. Even worse, a mistaken confidence in new technologies distracts companies from placing responsibility for data where it belongs. Companies make this mistake over and over. And blame Tech.

Indeed, new technologies can exacerbate data quality issues. Obviously a new application that helps solve a nagging business problem is all to the good. But today most purchased systems and applications come fully-equipped with their own data structures and data definitions. This is exactly counter to the getting in front approach, which demands that the business, not Tech and certainly not an external vendor, assume responsibility to define data. The result is that the data quite don’t line up, new systems to old and exacerbating the “systems don’t talk” issue discussed in Chapter 4.

The obvious next step is that Tech sets up a data factory that translates the data back and forth as needed. This is hard work, fraught with error even under the best of circumstances. More systems mean a larger, more complex, and more expensive factory, inevitably creating more errors and leading to more misinterpretations by customers. Not surprisingly, Tech catches the blame. All the while, the sheer growth in data volumes and the pace of technological change continue to explode. Still, companies ask their Tech departments to do more with less.

Tech is not blameless in any of this. It continues to automate broken processes, including other departments’ data factories, the history of dismal failures notwithstanding. Tech continues to purchase systems without a moment’s thought to how they will fit into a broader data architecture, already an ill-defined and ill-tamed rat’s nest. And it continues to build larger, more complex data factories to compensate for its failure in this regard.

I sometimes find the Tech department a bit schizophrenic when it comes to data quality. Most readily admit that the business must own the data, that they’ve been unfairly tasked with too many data efforts that they cannot do well, and their reputations have suffered as a result. Still they can’t give data up. Almost like moths drawn to a candle.

At the same time, great data quality programs need great Tech support, for easy-to-navigate data dictionaries, automated measurement and controls, human-to-machine and machine-to-machine interfaces that promote data quality, and so forth. So Tech departments must walk a thin line. They must simultaneously provide solid support (far better than most do currently), and convince their business counterparts that they, not Tech, must take principle responsibility for data quality. Tech must ensure that data translation is done well, insisting that the business delivers clear data definitions. Over the long-term, Tech must get in front in simplifying the data architecture. Finally, Tech must have the courage to “just say no” if business counterparts don’t hold up their ends.

Instructions:

  1. Recognize and deal with the organizational and political realities you face in data space. I’ve discussed these realities above.
  2. Take responsibility for storing data safely, securely, and efficiently, as it is created; moving it where it is needed; and delivering it to data customers. Put in place the technological means to do so.
  3. Automate well-defined processes to translate data as it moves between data creators, data customers, organizations, applications, and systems.
  4. Contribute to the data quality effort in other ways. Automate well-defined processes, controls, measurements, and data dictionaries. Help with tool selection and configure those tools. Design interfaces that promote data quality.
  5. Don’t take responsibility for data quality. Resist the temptation to automate poorly-defined processes. To the degree you can, help others assume their responsibilities.
  6. Use all definitions, especially the common ones, in new systems development going forward. Insist that the chief data architect deliver solid data definitions and clear standards. Do not accept vendors’ data definitions unless they align with your own.
  7. Then, and only then, take steps to simplify the data architecture.
  8. Build the organizational capabilities needed to implement these instructions.

Store, Move, and Deliver Data Safely and Securely

The most important goal of this book is to clarify managerial responsibilities for data quality. I’ve stressed the moments of data creation and data use. Get those wrong and quality suffers!

But of course there is more. Upon creation, data does not magically store itself safely away, nor does it announce itself, on cue, at the moment of use. It does not secure itself, distinguishing “good guys,” with rightful access, from “bad guys,” and protect itself from those who wish to steal it. If data creators and data customers bear responsibility for data at select moments, Tech bears responsibility at all other times. This is the blocking and tackling of data management. It is demanding, unappreciated work. Tech must take care of these things, for both the data per se and the associated data definitions.

Automate Data Translation as Data Moves From System to System

It bears repeating that Tech must not take responsibility for data creation. At the same time, one of Tech’s roles is to automate well-defined processes, including translating data from the language of one system or application as it moves to another. To do so, Tech needs data definitions and any difficult translation logic. In this respect Tech is a customer, like any other data customer and it should follow the instructions of Chapter 3.

Tech does, of course, create and use some data and data definitions for its own purposes. Clearly it must assume roles of data creators and data customers in these situations.

Failing to do so, Tech may be left with poor definitions and no translation logic—a recipe for disaster. I recommend that Tech be circumspect here. Tech may be capable of developing translation logic, if it has clear data definitions. But I don’t see how it can proceed without these.

Examples of such translation include converting money from one currency in a local system to another in a global system; converting a “contact” in a Marketing system to a “prospect” in a Sales system; and converting a “drawing” in a CAD system to “specs” in a Manufacturing system.

The more detailed instructions under these circumstances are:

  1. Help the chief data architect provide you the definitions you need.
  2. If there is no chief data architect, work with systems providers and integrators, and others to obtain them. And however you obtain them, make sure you understand them.
  3. Document the translation logic and review it with the chief data architect. Alternatively, review the logic with those on both ends of the translation, i.e., the original data creator and customer.
  4. With these documents, consider yourself an “enabled data creator.” So follow the instructions of Chapter 4, including those on measurement and control.

Contribute to the DQ Effort

Tech should treat DQ teams, embedded data managers, and data creators as customers. These people have enormous (and growing) needs for technology to make measurements, automate controls, promulgate data definitions, and lock in process improvements. They need help in selecting the best tools and of course Tech must install, configure, and maintain those tools.

I’ll cite two other examples of the sorts of contributions Tech can make. Not too long ago, one filled in a web-based form and hit “send.” Sometime later, you might get an email, advising that “we still need your address,” because evidently you hadn’t put it in. The email message is an attempt at a control, one that most recipients ignored.

A better control, which most web-sites employ now, involves checking the form as people enter the data. If some is missing, the site stops them from going to the next part of the form and highlights the missing data. A far better control!

A second example involves a human-to-computer interface that makes it easier for people to enter and interpret data. Thus a great feature is what I call the “instant data dictionary.” When a data customer positions his or her mouse over a term, its definition pops up in a little window. No searching around!

Don’t Take Overall Responsibility for Data Quality

Since Tech does not own a broken business process, it cannot fix it. Since Tech doesn’t own an important decision, it cannot tell the decision-maker what data is most critical. Since Tech doesn’t speak the same language as business people, it cannot dictate what business terms mean. Nor can Tech hold data creators and data customers accountable for following the instructions called for herein.

Most of the hard work of data quality management cannot be performed by Tech nor automated by technology. Indeed, Tech is singularly ill-positioned to lead a data quality effort. Too many people and companies assume otherwise.

Worse, too many Tech departments get tagged with data quality. In some cases, a subtle dynamic lies at the root of the misassignment. Data quality enters the conversation while some new system is under development. Its users have naturally assumed its data will be better than the old system’s. If not, why go to the new system? Interestingly, no one points out that simply moving bad data from an old system to a new one can’t improve quality.

For its part, Tech recognizes that bad data will imperil acceptance of its new system. With a big development project in jeopardy, Tech decides it must do whatever it can to clean up data. So it creates a data factory, though this one is expensive and not so easily hidden. And in so doing, Tech gets tagged with data quality. While the details differ, the underlying dynamic is exactly the same as with the rising star’s assistant in Chapter 2.

As with all hidden data factories, management must get in front of the dynamic. At a minimum, Tech should just say “No. We can’t do data quality well” and move on. This instruction also applies to calls to automate a broken process. Tech departments violate this instruction all of the time. I frequently hear two rationalizations:

“Data quality is important and, if no one else will address it, then Tech must.”

“If we don’t address data quality, then our systems will look bad.”

While both rationales seem compelling, they are based on the faulty premise that Tech can address the data quality. It takes courage to admit you can’t, but displaying that courage beats the alternative!

More proactively, Tech must do a better job recognizing the effort required from its business counterparts if its projects are to succeed. It must be completely transparent about the people, time, and costs these efforts require and build them into its development plans. And it must engage early on to secure those resources.

Finally, and most proactively of all, senior Tech management should work with its business counterparts to put a proper data quality program in place.

Use Business Data Definitions in Systems Development

To open this chapter, I noted that Tech gets tagged with the “systems don’t talk” issue and the reduced quality that comes with it. In Chapter 4, I showed that the root of the problem lies in language and can only be resolved by the business. I also laid out several instructions for doing so. The last instruction however, “Use all definitions, especially the common ones, in new systems development going forward” is more properly directed at Tech. Doing so helps ensures that new systems faithfully represent the language employed by their users, in turn helping data customers know what the data mean.

Now to be clear, this instruction is not “insist that all new systems only use pre-existing data definitions.” That is far too rigid. But do understand, early on, the definitions a potential new system will employ, how closely aligned the system’s definitions are to your own data definitions, and what it will take to translate between the two.

Reduce Data Translation by Simplifying the Data Architecture

Data translation is necessary because data customers need data in the form and in the applications where they can best use it. At the same time, translation is fraught, contributing errors and leading customers to misinterpret data. Instructions so far aim to ensure that translation is done properly. When possible, Tech departments should also reduce the number of required translations.

Doing so depends on eschewing “system to system” translations, in favor of an approach that makes reference to common definitions. Figure 8.1 illustrates the basic idea.

To be clear, common data definitions are necessary, but not sufficient.38 Tech must also:

  1. Shift its development focus from one that concentrates solely on providing functionality, to one that also embraces “fit” of systems with one and other.
  2. Use these common data definitions along with a conceptual data model, and ensure that all future systems (including purchased systems) adhere to it.
  3. To the degree possible, separate data from application and system.
  4. Acquire the skills needed to execute these steps. A conceptual data modeler and systems engineers are also essential.

Figure 8.1 Common data definitions can simplify data architecture by reducing the number of interfaces needed between systems.

A generation ago, a bedrock principle in corporate computer departments was the separation of data from application, taking the simplified architecture of Figure 8.1 still further. With respect to data quality, decoupling the two had several advantages:

  1. It made it easier to build and maintain applications, without disturbing the data.
  2. It made it easier to add to and change the data, without disturbing the application.
  3. It facilitated data sharing—several applications could use the same data.
  4. It minimized error-prone (more below), time-consuming, and costly work of moving data around and translating it.

Embrace this principle to the degree you can, even if full adherence is not possible.

Put in Place a Powerful Team to Support Data Quality

Figure 8.2 depicts the team needed to carry out these instructions. Except for the Business Customer Team (which serves exactly the same role as a data creator’s Customer Team), the roles line up with the sections above. More and more Tech departments are also federating, with some groups aligning more closely with day-in, day-out work and business functions and dedication to infrastructure. With the possible exception of the Architecture Team, these roles should line up as close to DQ Teams, data customers, and data creators as possible.

Throughout this book I’ve stressed the importance of communications among all who touch data. Without it, too many things go wrong. And without a dedicated role, people and organizations revert into their silos. In many respects, the instruction to assign a Business Customer Team is no different than a data creator assigning a Customer Team. But I find this work is much more difficult for Tech. I see plenty of reasons—the tech community (not surprisingly) views the work as non-technical, there are no career paths, and many in the Tech community are introverts, more comfortable at their keyboards than in a face-to-face meeting with customers.

Figure 8.2 Organization structure for tech data quality team.

I have no easy solutions, though every technologist who establishes an ongoing dialogue with his or her business counterparts is glad he or she did!

In Summary

For too long, Tech has either taken or been given primary responsibility for data quality. But Tech is neither a major data creator nor data customer, so this responsibility is misplaced. Tech must help get these roles properly established.

Further, excellent data quality programs require excellent Tech support, particularly as they scale up. Much detailed work is needed and Tech’s first role is to do it well. Over the longer term, Tech can make an enormous contribution if it simplifies the data architecture. It should only take on that work if its business counterparts hold up their end in developing and standardizing business terminology/data definitions. Finally, to do these things, Tech must align more closely with the business, something it has not done well in the past.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset