Chapter 1 Overview

If the state of quality of your company’s products and services

was the same level of quality as the data in your databases,

would your company survive or go out of business?



–Larry English



A corollary: If the state of quality of your company’s data was

the same level of quality as your company’s products and

services, how much more profitable would your company be?



– Mehmet Orun

The Impact of Information and Data Quality

Information quality problems and their impact are all around us: A customer does not receive an order because of incorrect shipping information; products are sold below cost because of wrong discount rates; a manufacturing line is stopped because parts were not ordered—the result of inaccurate inventory information; a well-known U.S. senator is stopped at an airport (twice) because his name is on a government “Do not fly” list; many communities cannot run an election with results that people trust; financial reform has created new legislation such as Sarbanes–Oxley.1

Information is not simply data, strings of numbers, lists of addresses, or test results stored in a computer. Information is the product of business processes and is continuously used and reused by them. However, it takes human beings to bring information to its real-world context and give it meaning. Every day human beings use information to make decisions, complete transactions, and carry out all the other activities that make a business run. Applications come and applications go, but the information in those applications lives on.

That’s where information quality comes into play. Effective business decisions and actions can only be made when based on high-quality information—the key here being effective. Yes, business decisions are based all the time on poor-quality data, but effective business decisions cannot be made with flawed, incomplete, or misleading data. People need information they can trust to be correct and current if they are to do the work that furthers business goals and objectives.

A firm’s basis for competition … has changed from tangible products to intangible information. A firm’s information represents the firm’s collective knowledge used to produce and deliver products and services to consumers. Quality information is increasingly recognized as the most valuable asset of the firm. Firms are grappling with how to capitalize on information and knowledge. Companies are striving, more often silently, to remedy business impacts rooted in poor quality information and knowledge.

– Kuan-Tsae Huang, Yang W. Lee,
and Richard Y. Wang2

Tom Redman says it well:

The costs of poor quality are enormous. Some costs, such as added expense and lost customers, are relatively easy to spot, if the organization looks. We suggest (based on a small number of careful, but proprietary studies), as a working figure, that these costs are roughly 10 percent of revenue for a typical organization…. This figure does not include other costs, such as bad decisions and low morale, that are harder to measure but even more important.3

What is the cost to a company of the sales rep, publicly announced to have won the top sales award for the year along with the trip to Hawaii, only to have it rescinded a few days later because the sales data were wrong? Does the resulting embarrassment and low morale influence that sales rep’s productivity and therefore sales, or even his decision to stay with the company? What is the cost to the embassy whose name was splashed across the front pages of a major U.S. city’s newspaper when its visa applications containing sensitive personal and business information, such as Social Security numbers and strategic business plans, were found thrown in an open dumpster instead of being properly disposed of? Does the resulting lack of trust in the management of that information influence another company’s decision to do business in that country?

What Is Information Quality?

Information quality is the degree to which information and data4 can be a trusted source for any and/or all required uses. Simply put, it is having the right information, at the right time and place, for the right people to use to run the business, serve customers, and achieve company goals. Quality information is also fit for its purpose—the level of quality supports all of its uses.

Definition

Information quality is the degree to which information and data can be a trusted source for any and/or all required uses. It is having the right set of correct information, at the right time, in the right place, for the right people to use to make decisions, to run the business, to serve customers, and to achieve company goals.

Where Do Information Quality Problems Come From?

Information quality problems may be caused by human, process, or system issues. They are not restricted to older or particular types of systems. Although everyone is aware that data cause problems from time to time, it may be difficult to perceive the extent to which these problems affect the business. Some normal business activities are indicative of data quality problems5:

  • Correction activities
  • Rework
  • Reprocessing orders
  • Handling returns
  • Dealing with customer complaints

Many of these activities do not appear to be associated with information quality, when in fact they are. Since processes and functions are distributed across an organization and many people, the cost and scope of data quality problems are often not visible.

Business processes create, update, and delete data in addition to applying information in many ways. Information technology (IT) teams are responsible for the quality of the systems that store and move the data, but they cannot be held completely responsible for the content. Both IT and the business must share in insisting on clearly articulated requirements, strict testing of systems, and the development of quality processes for data management.

The Information Quality Challenge

I believe that two major trends have created an environment where information quality is getting more of the attention it deserves. One is the increasing number of legal and regulatory data quality requirements. The need for and benefits from information quality have always been there and ready for any organization who invests in it. But human nature being what it is, the threat of bad publicity and high fines and the risk of a CEO going to jail have created the motivation to actually do something about data quality.

The second reason is based on the need for business to see information brought together in new ways. Examples include the need to see what top customers are doing across the enterprise through CRM (Customer Relationship Management), to have data available for decision support through business intelligence and data warehousing, to streamline business processes and information through ERP (Enterprise Resource Planning), and to deal with the high rate of mergers and acquisitions, which require the integration of data from different companies.

All these initiatives require data integration—bringing together data from two or more different sources and combining them in such a way that new and better uses can be made of the resulting information. Data that previously fulfilled the needs of one particular functional area in the business are now being combined with data from other functional areas—often with very poor results. We have different business uses for the same information; different platforms, systems, databases, and applications; different types of data (customer, vendor, manufacturing, finance, etc.); different data structures, definitions, and standards; and data, processes, and technology customized to fit the business, geography, or application. These are the challenges of the current environment.

What we need is the ability to share information with our customers and with each other across the company. We need the ability to find what we need, when we need it, and to be able to trust it when we get it. What is required for that to happen? We must consciously manage information as a resource (a source of help) and as an asset (a source drawn on by a company for making profit). We must have information that is real (an accurate reflection of the real world), recent (up to date), and relevant (that our business and customers need and care about).

This book is here to help.

About the Methodology: Concepts and Steps

“Doctor, my left arm hurts!” The doctor puts your arm in a sling, gives you an aspirin, and tells you to go home. But what if you were having a heart attack? You would expect the doctor to diagnose your condition and take emergency measures to save your life. After you were stabilized you would expect the doctor to run tests, get to the root cause of the heart attack, and recommend measures to correct any damage done (if possible) and prevent another attack from occurring. The doctor would have you come in for periodic tests and follow-up to assess your condition and determine if other measures needed to be taken.

This seems like common sense when talking about our health. But when it comes to data and information, how often do we address the immediate business problem, then go for the “easy fix” (the aspirin and sling) and expect that to take care of our problems? No tests or assessments are run to determine the location or magnitude of the problems, no root cause analysis is performed, and no preventive measures are put into place. And then we are surprised when problems appear and reappear!

This book describes a methodology, Ten Steps to Quality Data and Trusted Information, that represents a systematic approach to improving and creating data and information quality. The methodology combines a conceptual framework for understanding information quality and The Ten Steps process, which provides instructions, techniques, and best practices. The methodology is for practical use—put it to work to create and improve the quality of information in your business and to establish continuous improvement through better information management.

Just as with your own health, you can use the methodology presented in this book to prevent data quality “health” problems and to assess and take action if they appear. This book provides processes, activities, and techniques that will improve your company’s information quality health. Think of it as your “wellness” program for data and information.

The Ten Steps Process

The Ten Steps are explicit instructions for planning and executing information quality improvement projects with detailed examples, templates, techniques, and advice. They combine data quality dimensions and business impact techniques to present a picture of the current state of data and information quality in your business. Data quality dimensions are facets of data quality you can use to measure and manage your data and information quality—which can only be improved if they can be measured. You will choose the data quality dimensions to measure and manage that best address your business needs.

Business impact techniques are quantitative and qualitative techniques for assessing the impact of your information quality on the business. Using them answers the questions “What is the impact of the data quality issues” and “Why should I care?” Results from assessing business impact are used to establish the business case for information quality. They are also used to gain support for and help determine the optimal level of investment in it. Following the assessments of quality and/or business impact, root cause analysis is conducted and appropriate actions for preventing and correcting data quality issues are put into place. Communication is critical to the success of any data quality effort, so it too is one of the Ten Steps that takes place throughout the life of every project.

All of the information contained in The Ten Steps is “how-to.” But just as you want a doctor who understands the theories and concepts of medicine so that specific actions can be correctly applied to your medical concerns, you also need to understand information quality basics so that the “how-to” can be properly applied in the many different situations that arise in your company. For that reason, the key concepts are presented first in this book, followed by The Ten Steps process.

The Key Concepts

The key concepts provide the foundation for understanding what information quality is and what is required to achieve it. They include the Framework for Information Quality, the Information Life Cycle, and the Information and Data Quality Improvement Cycle.

The Ten Steps process describes how to implement the key concepts. Just as with your own health, you can use the methodology presented in this book to prevent data quality “health” problems and to assess and treat them.

The Framework for Information Quality (FIQ) establishes the conceptual structure for understanding the components that contribute to quality. It helps you understand an existing complex environment that creates information quality problems. The concepts from the framework can also be used when developing an environment that will produce high-quality data.

The Information Life Cycle provides a view of how information is obtained, used, and discarded. When you use it to look at your information, you can see how quality information affects business processes during all phases in the life of information. By managing information as a resource throughout its life cycle, you can maximize its value to the business.

The Information and Data Quality Improvement Cycle explains the assessment, awareness, and action cycle as it leads to continuous quality improvement, for which The Ten Steps process provides concrete directions for execution.

Additional key concepts discussed include data categories (groupings of data with common characteristics or features such as reference data, master data, transactional data, and metadata), data specifications (data standards, data models, business rules, metadata, and reference data), data governance and stewardship, and best practices and guidelines for implementation. Other concepts introduced as key concepts, detailed in later chapters, include data quality dimensions, business impact techniques, and The Ten Steps process.

Additional Material

If you haven’t already done so, please read the Introduction. It contains useful background to help you make use of this book. Once you become familiar with The Ten Steps, Chapter 4 provides additional detail to the discussion on projects begun in this chapter. Techniques that can be used in several places throughout the methodology are included in Chapter 5.

The Appendix contains at-a-glance references for some of the concepts presented in the book. These are great for keeping close at hand (on your office or cube wall) to provide a quick reference for ideas that have been explained more thoroughly in the chapters that follow. Of course, the usual List of Figures, Glossary, Bibliography, and Index provide additional ways to help you find what you are looking for and where to go for more information.

A Word about Terminology

The full Ten Steps methodology consists of key concepts (outlined in Chapter 2) and The Ten Steps process—that is, Steps 1 to 10 (detailed in Chapter 3). For brevity I use the term “Ten Steps” to refer to the methodology (whose full name is Ten Steps to Quality Data and Trusted Information) and the phrase “The Ten Steps process” to refer to the steps themselves (the names of which appear in italic wherever they appear in the book). The point to remember is that the methodology is not just about The Ten Steps process, but is also about the key data quality concepts that underlie them.

Definition

In this book, a project is defined as any significant effort that makes use of the methodology.

Approaches to Data Quality in Projects

In this book, a project is any significant effort that makes use of The Ten Steps methodology. A project team can consist of a single person or a group of people. A project can focus solely on data quality improvement, or it can be data quality tasks integrated into another project or methodology (e.g., new application development or data migration or integration such as in a data warehouse or ERP implementation). A project can be the conscious application of specific steps in the methodology by an individual to solve an issue within his or her area of responsibility.

It has been briefly explained that The Ten Steps process has explicit instructions for planning and executing information quality improvement projects with detailed examples, templates, techniques, and advice. The process is intended to be flexible so that you can use those steps and activities applicable to your needs and situation. I want to introduce you to some of the ways the Ten Steps can be used now—even though you only know them at a high level. (More detail on applying The Ten Steps process to projects is provided in Chapter 4—but do read Chapter 3 first.)

What is useful to know at this point is that The Ten Steps process is designed to be used by projects in a pick-and-choose manner. That is, use the steps in different combinations to initiate projects with different approaches or to engage the process at different levels of detail. To make the methodology more accessible no matter where you are in your data quality journey, use these simple guidelines:

  • Pick and choose the steps to structure a project that fits your situation. You don’t have to use all of them.
  • Any of the steps can be carried out at varying levels of detail. You make the choice as to which fits your needs.
  • Use your knowledge of the key concepts to help you choose the steps and level of detail when implementing them.

Project Approaches

The methodology can be used in many different business situations (data quality improvement, data warehouse development, ERP migration, etc.). You may be trying to have data quality activities integrated into any of these types of projects, and the projects may be at different stages of completion. For example, before obtaining full management sponsorship, you may need to build a business case for data quality. Or a specific data quality problem may have already been identified and the need is to determine business impact or find root causes and implement solutions. In any of these examples different steps and activities in The Ten Steps process can be used to address what is needed now. Some typical approaches to applying the methodology are described in the following sections. Remember, you can use these approaches whether you are an individual or a member of a project team.

Establish Business Case

An Establish Business Case approach may be an exploratory assessment or a quick proof of concept assessing quality on a very limited set of data. As an individual, you can implement a brief project that will help you make a business case for further data quality improvements. If you already have a specific data quality problem, you may just want to assess the business impact of that problem without further quality assessment.

Use the methodology to think through the problem, understand the information environment at a very high level, and write a few queries against the data. Use some of the less complicated or less time-consuming business impact techniques to quickly demonstrate the business impact of the data quality problems you discover. Decide if there is a need for in-depth efforts to deal with data quality.

An Establish Business Case project may extend over a few days or weeks with one person doing the bulk of the work while enlisting help from colleagues (e.g., to gain access to data or determine business impact).

Establish Data Quality Baseline

An Establish Data Quality Baseline approach is used when the business has committed to improving data quality and there is support for a project team and resources. The project team will be looking at data in a single database or comparing them across databases. The data quality assessment will take longer here than in Establish Business Case. The goal is not just to uncover problems but to determine which ones are worth addressing, to identify the root causes for the high-priority issues, and to develop realistic action plans for improvement. The project may include purchasing and/or using some data quality tools for profiling or cleansing.

Often those who can correct the data errors found or implement the recommended improvements are not those on the project team. In this case, a new project is needed to implement improvement plans recommended by the baseline project.

Determine Root Causes

A Determine Root Causes approach is used when you already know the data quality issues and have decided that the impact of those issues warrants further investigation into their root causes. This may take the form of a focused workshop or a series of workshops that use techniques from the methodology for determining root causes. Once the causes are uncovered, this project should include developing specific recommendations for addressing them. The end result will be to find owners and gain commitment to implement the improvements.

Implement Improvements

An Implement Improvements approach executes the recommendations developed when the data quality assessment and business impact analysis have generated a data quality improvement plan. Or the recommendations may be data quality improvements suggested by another project. Many companies have resources dedicated to correcting data errors—but with no attention to preventing them. Correcting errors is an important part of improvement, particularly if the data errors are causing critical business problems. (For example, a product cannot be shipped and the problem is traced back to faulty master data.) However, spending the majority of resource time on correction with little or no time on prevention is a common pitfall and will only lead to more time wasted fixing more problems in the future.

It is equally important to ensure that data errors are prevented. Some of the prevention activities will warrant a project (e.g., to define and document data standards); others may be specific actions (e.g., to assign new quality-related responsibilities to an existing team and complete the associated training). Some recommendations will go to the business to improve its processes (e.g., to collect information needed or to educate and reward sales reps for synchronizing customer information from their handheld devices with the central database on a weekly basis).

Implement Ongoing Monitoring and Metrics

An Implement Ongoing Monitoring and Metrics approach focuses on instituting operational processes for monitoring, evaluating, and reporting results. When designing and implementing your control processes, remember to include actions for addressing issues found—both to correct current errors and to prevent future ones. It is less expensive and more efficient to incorporate monitoring and metrics during the initial system implementation.

Make use of the results from Establish Data Quality Baseline, if done, as part of your monitoring to provide the baseline metrics for tracking data quality improvement. Any data quality assessment will reveal many data quality issues—some big, some small, some important, some not. You may need to assess business impact to determine the data with high business value and therefore whose quality is worth tracking on a regular basis. The monitoring should also show if prevention improvements put into place are achieving the desired results.

Address Data Quality as an Individual

You as an individual may be assigned to address data quality by yourself. If so, any of the project situations just described could apply to you. Even without a project team, you may need to establish a business case, do an initial assessment or implement monitoring on a focused set of data, put into operation a specific data quality improvement, or address other information quality issues. As an individual you will still need to apply good project management processes (e.g., you have to handle your scope carefully, and you still need management support, even if only from your direct superior). Also, you will still need to decide which steps to incorporate, communicate effectively, and consult with other knowledgeable sources to meet your project goals.

In some circumstances you will simply find techniques that are useful to incorporate into your everyday processes. In all cases, you will be required to make good decisions to apply the key concepts and The Ten Steps process to your particular situation. Everything in this book can help you as an individual contributor—it will just be implemented in a more abbreviated fashion than if you had a project team.

Integrate Data Quality Activities into Other Projects and Methodologies

You can combine The Ten Steps concepts and techniques with your company’s favored project management style. Many data quality activities can and should be included in the various phases of a project life cycle and can also be integrated into a third-party methodology such as a vendor’s ERP migration program. Building a new application, migrating data from existing applications to new ones, and making process improvements are other examples of where data quality activities can benefit a project. For example, you can improve the quality of, and decrease the time needed to complete, source-to-target mappings or institute data clean-up as part of any migration project.

Careful planning at the beginning of a project will guarantee that appropriate data quality activities are fully integrated into it. The earlier that data quality is incorporated into the project, the better. But even if you are engaging the project later in its timeline, adding in suitable data quality activities can still significantly contribute to its success.

Engaging Management

Support from the right level of management and suitable investments in time, money, and people are essential for success. While the critical topic of obtaining management support is far too broad for the scope of this book, the following suggestions are presented to stimulate your thinking about how you can engage your management.6

Best-Case Scenario—Engage the CEO. In the best-case scenario, the CEO of your company will be completely convinced of the need to improve information and data quality, and will allocate resources to support the culture-change activities that are necessary to create an environment that supports continuous quality improvement.

Right Level of Management—Not everyone who initiates an information and data quality improvement project has access to the CEO. The Establish Business Case approach, for example, is designed so that an individual can initiate the project and use it to gain support from management and team members. A successful project can be a significant victory for a department, and engaging the right level of management is necessary to make a project successful.

Using the Methodology—Use the results of the appropriate business impact techniques from Step 4 to show the importance of information quality and to gain support for your information quality improvement project. You can also incorporate ideas from Step 10—Communicate Actions and Results to help you engage managers in the information improvement process. Prepare managers for expected levels of resource commitment by communicating the business need and the improvement plan. This will increase the likelihood that you will receive the time, money, and participation needed. Likewise, provide regular status reports to managers and other working groups to enable them to see progress and continue to champion your project, and to prevent conflicting or duplicated work.

Communication Techniques—The ideas presented in Step 10—Communicate Actions and Results (see Chapter 3) will help you optimize communication of important data quality concepts and project progress. Use those ideas to plan your communication strategy. Of course, communication goes two ways and it is vital to open a dialogue, get feedback, gauge reaction, and gain trust.

Know Your Audience—Knowing your audience’s goals, values, and success criteria are some of the best ways to help you communicate effectively. Various audience groups need different communication formats and different levels of detail.

Right-Size the Message—Even though you may be excited about your improvement project, your management audience rarely needs to hear the details of your data profiling or the impressive way in which you can now merge records to decrease redundancy. What they want to hear is that the improvement is working and that it is going to positively impact their work. Right-size the message to emphasize the topics that are most important to them.

Repeat—Repeat your project goals and review the milestones. Repeating your goals is especially important for communicating with managers, who need to be able to perceive the essentials of a project and track its pace and resource usage.

Broaden the Scope—Be alert for opportunities to communicate your project goals and the information quality key concepts to a broader audience. The famous “elevator pitch” (a 30-to 60-second summary) is a good technique for communicating to new audiences outside of a formal presentation. Also consider creating a four-slide project summary to make available in other presentations, such as department meetings or quarterly updates.

Manager as Resource—Your manager can be a valuable networking resource, so keeping her or him informed performs double duty—ensuring support for your project and connecting you to other projects or efforts with which you can collaborate on shared goals.

1The Sarbanes–Oxley Act of 2002 was enacted in the United States with the purpose of protecting investors by improving the accuracy and reliability of corporate disclosures.

2Kuan-Tsae Huang, Yang W. Lee, and Richard Y. Wang, Quality Information and Knowledge (Prentice Hall PTR, 1999), p. 2.

3Tom Redman, Data Quality: The Field Guide (Digital Press, 2001), p. 3.

4Data are known facts or other items of interest to the business; information refers to facts within context. Examples of data are “01/16/2008” and “752-5914”; an example of information is “Order #752-5914 was shipped on 01/16/2008.” While there are academic differences between the two concepts, this approach does not generally differentiate between data and information. (There are a few exceptions, which are noted in the book.) Some organizations respond to “data quality,” others to “information quality.” It may also vary with whomever you are speaking with. I tend to use information when talking to businesspeople, and data when discussing more detailed issues with those in IT or others responsible for the data. Use the term that will be most effective for you.

5Jack E. Olsen has an excellent discussion of reasons for and impacts of data quality problems in his book Data Profiling: The Accuracy Dimension (Morgan Kaufmann, 2003), pp. 13-16.

6I’m indebted to Rachel Haverstick for her assistance in expressing these important ideas.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset