Data Correction Initiatives

,

The ultimate goal for measuring the quality of the data is to identify what needs to be fixed. The data quality baseline will, in essence, lead to a series of data quality projects. Data correction could come down to two main activities:

1. Cleanup of existing bad data

2. Correction of offending system, process, or business practice causing the data problem

If the data quality baseline score is low for a given element, it will obviously require a data cleanup effort to improve the score. However, it's not always the source of the offending data that will be an ongoing issue. It is possible the bad data was caused by a one-time data migration effort, neglecting the need for correcting the migration code, for example. It is also possible an already corrected software bug or bad business process introduced the offending data, again negating the need for readdressing the origin of the problem. Nonetheless, a detailed root cause analysis must be completed. No assumptions should be made regarding the reason why bad data was introduced in the first place. As discussed earlier in this chapter, proactive data quality measures are essential to increase the maturity and effectiveness of a strong data management program.

Assuming the data problem is ongoing, it will normally fall into one of the two following categories:

1. A rule exists, but it is not being followed or implemented properly. A particular rule can either be enforced at the application level or via business processes. A system bug would prevent an application level rule from working properly, while an incorrect business practice would prevent a business process rule from being followed properly. IT will most likely work on fixing the system bug, while data stewards and data governance will work with the offending LOB(s) to correct the bad business practice(s).

2. A rule does not exist, but it is now required. The design team should propose where it is best for the rule to be enforced. If it is through an application level constraint, IT will again likely be the best option. If it is through business process enforcement, data stewards will work with data governance and impacted LOB(s) to modify and/or create rules, policies, and procedures around the new requirement. Adding a new rule can be tricky since it is necessary to consider if it impacts or violates existing ones.

Normally, the data quality team will need to work with the business to decide if cleaning up the existing bad data has a higher priority than preventing more errors from being introduced. Needless to say, both activities will have to be prioritized along with other ongoing projects throughout the company competing for the same resources.

There are many ways to perform a data cleansing activity depending on the volume of data, level of access to the system, technical capabilities, system interface features, and existing rules, policies, and procedures.

Practically all Customer MDM repositories will offer some type of front-end interface allowing business users to modify master data records individually. For low volume data correction, that is usually the no-brainer option selected. As the volume increases, however, other considerations need to be made. If the interface allows for bulk updates, that could be a viable option. If not, some type of automation could be sought. The tool itself may provide some type of custom automation. Another possibility is utilizing data entry automation software tools to expedite and avoid errors when performing hundreds or thousands of similar changes through the front end. Lastly, it is possible to execute automated back-end scripts to make the proper changes directly into the database and bypass the front-end interface.

Clearly, the last option has implications. One is the coordinated efforts of a business member and a technical professional. Normally, a business user does not have the skills or the proper access to write and execute the database scripts. Meanwhile, the IT professional will have to rely on the business partner for proper understanding, testing, and validation. It was stressed earlier in this chapter how a professional with business/technical competency can go a long way by narrowing the gap between business and IT and achieving faster and more precise results. Keep this important option in mind.

Another implication regarding using back-end scripts is related to compliance to existing processes and other potential rules governing the company. Sometimes, the execution of back-end procedures is not approved because it could violate certain rules. For example, SOX requirements regulate what kind of information can be updated without close validation, sufficient testing, and appropriate tracking. Furthermore, some Customer MDM applications include data integrity constraints at the data entry point level and not at the database level. This means bulk updates done on the back-end would bypass those validations and consequently jeopardize the integrity of the data. Because of all these possibilities, it is critical to look for business- and data-governance-approved processes before starting a data cleansing effort.

Not uncommon either is to have to do the same type of data cleansing on a regular basis. This usually happens when it is not possible to correct the root cause of the problem immediately. Therefore, it is necessary to perform scheduled cleanups to maintain the business operating properly until there is time to fix the problem for good. Scheduled cleanups are subject to the same aspects just described regarding manual versus automated fix, front-end versus back-end correction, bulk updates, and so on. Obviously, the repetitive nature of these scheduled cleanups just exacerbates the potential need for automation.

Recall the data quality process described earlier in this chapter. Once a problematic issue is identified through the data quality baseline, it should be submitted to the data quality process team for proper evaluation, prioritization, feasibility analysis, risk assessment, solution design, and final corrective actions. The actual data correction itself is embedded within the data quality process and subject to the observations presented previously.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset