In Chapter 6, we briefly described the purpose of data quality metrics as part of a self-feeding process for continuous improvement. We also discussed establishing and creating a data quality baseline for better understanding the current state of the data and its proper business alignment and fitness for use.
This chapter expands that concept by defining means for creating a scalable and sustainable process in which data quality metrics become the central point for data quality assessment and consequently a critical source for data quality proactive initiatives.
Data quality metrics falls into two main categories: (1) monitoring and (2) scorecards or dashboards. Monitors are used to detect violations that usually require immediate corrective actions. Scorecards or dashboards allow for numbers to be associated with the quality of the data and are more snapshot-in-time reports as opposed to real-time triggers. Notice that results of monitor reports can be included in the overall calculation of scorecards and dashboards, as well.
Data quality metrics need to be aligned with business key performance indicators (KPI) throughout the company. Each LOB will have a list of KPIs for its particular needs, which need to be collected by the data quality forum and properly implemented into a set of monitors and/or scorecards.
Associating KPIs to metrics is critical for two reasons:
1. As discussed earlier, all data quality activities need to serve a business purpose, and data quality metrics are no different.
2. KPIs are directly related to ROI. Metrics provide the underlying mechanism for associating numbers to KPIs and consequently ROI. They become a powerful instrument for assessing the improvement achieved through a comprehensive data quality ongoing effort, which is key to an overall MDM program.
The actual techniques for measuring the quality of the data for both monitors and scorecards are virtually the same. The difference is primarily related to the time necessary for the business to react. If a critical KPI is associated with a given metric, a monitor should be in place to quickly alert the business about any out-of-spec measurements.
Data quality level agreements (DQLAs) are an effective method to capture business requirements and establish proper expectations related to needed metrics. Well-documented requirements and well-communicated expectations can avoid undesirable situations and a stressed relationship between the data quality team and the business and/or IT, which can be devastating to an overall company-wide data quality program.
The next two sections describe typical DQLA and report components for monitors and scorecards.
Monitors
Bad data exists in the system and is constantly being introduced by apparently inoffensive business operations that are theoretically following proper processes. Furthermore, system bugs and limitations can contribute to data quality degradation, as well.
But not all data quality issues are made equal. Some will impact the business more than others. Certain issues can have a very direct business implication and need to be avoided at all costs. Monitors should be established against these sensitive attributes to alert the business regarding their occurrence so proper action can be taken.
A typical DQLA between the business and the data quality team will include the following information regarding each monitor to be implemented:
Table 8.1 describes a potential scenario where a monitor is applicable. Notice the explanation of the root cause of the problem, and the measures that are being taken to minimize the issue. Sometimes it is possible to address the root cause of the problem, and over time, eliminate the need of a monitor altogether. In these cases, monitors should be retired when no longer needed.
ID | DQ001 |
Title | Number of duplicate accounts per customer |
Description | Business rule requires a single account to exist for a given customer. When duplicate accounts exist, users receive an error when trying to create or update a service contract transaction associated with one of the duplicated accounts. |
The probability of users running into duplicate accounts is linearly proportional to the percentage of duplicates. A 1% increase in duplicates translates into a 1% probability increase of running into an account error. Each account error delays the completion of the transaction by 4 hours, which increases the cost by 200% per transaction. Keeping the number of duplicates at 5% helps lower the overall cost by 2%. | |
KPI | Lower the overall cost of completing service contract bookings by 5% this quarter. |
Dimension | Uniqueness |
Impacted LOB(s) | Services |
Unit of meas. | Percentage of duplicates |
Target value | 5% |
Threshold | ≤ 10% is Green, between 10% and 20% is Yellow, >20% is Red |
Frequency | Weekly |
Contact | [email protected] |
Root cause | Duplicate accounts are a result of incorrect business practices, which are being addressed through proper training, communication, and appropriate business process update. |
Fix in progress? | ___Yes ___No _rm x_Mitigation ___N/A |
The monitor report result is best when presented graphically. The graph type should be picked according to the metric measured, but almost always it is relevant to include a trend analysis report to signal if the violation is getting better or worse with time.
Scorecards
Scorecards are typically useful to measure the aggregate quality of a given data set and classify it in data quality dimensions.
Recall the data quality baseline in Chapter 6, and the sample shown in Table 6.1. In essence, the numbers for a scorecard can be obtained from regularly executed baseline assessments. The individual scores can be organized in many ways needed by the business, and presented in a dashboard format.
Table 8.2 shows a subset of Table 6.1, but it also adds threshold, which will be discussed shortly. The objective is to obtain a score for a particular combination of context, entity(ies), attribute(s), and data quality dimension. Once the score is available, the scorecard report or dashboard can be organized in many different ways, such as:
The threshold should be set according to business needs. Data quality issues represented by scores in the red or the yellow categories should be the targets of specific data quality projects. Furthermore, the scorecard itself will become an indicator of the improvements achieved.
The scorecard becomes a powerful tool for the following reasons:
Notice the scorecard alone may not be sufficient to determine the root cause of the problem or to plan a data quality project in detail. The scorecard will highlight the area that needs improvement as well as measure enhancement and deterioration, but it might still be necessary to profile the data and perform root cause analysis to clearly state the best way to solve the problem.
The DQLA for scorecards between the business and the data quality team can follow a format similar to Table 8.2.