Data Quality Factors

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

So much has been said in previous chapters about establishing a culture focused on data quality. In an ever-changing business environment, applications come and go, and even though data is more permanent, the surrounding business practices and data requirements are also changing. Therefore, you can't expect sporadic data quality projects to properly address persistent issues and changing conditions. Good data is needed all the time, but data quality management projects can be very resource intensive and time-consuming if not orchestrated on a more regular basis. Therefore, the best approach is to foster the disciplines of data quality on an ongoing basis and throughout the entire enterprise. Data quality is everybody's responsibility, but it is a company's duty to create a favorable environment and an ongoing focus with appropriate technology, effective processes, proper skills, and a mechanism to recognize and reward quality efforts.

Chapters 6 and 8 articulate the many dimensions of data quality and how they can be used to qualify data quality issues, as well as structure a strong data metric program. But the actual enforcement of quality of information according to a given set of dimensions can render data more adaptable. Let's take a look at some of the basic data quality dimensions and explore this concept further.

Data Completeness

Completeness of information typically serves two primary purposes:

1. Better knowledge. More attributes populated translate into more information provided about their respective entities.

2. Easier entity resolution. The better characterized an entity is, the easier it is to identify duplicates. Missing information does not translate into matching information. Therefore, when performing entity resolution, completeness of information becomes a differentiating factor.

But there is another aspect of data completeness that makes it more adaptable to constant changes. Evolving application systems will typically become more restrictive and require more information to define or characterize a given entity. Therefore, as companies progress through technology changes, records that do not comply with the new required standards will fall out or require work before the transition is completed.

Maintaining the highest level of data completeness can lessen the burden. Obviously, if non-existing information is required as systems evolve, it may be impossible to prepare. However, upholding existing information as completely as possible can definitely help absorb the impact of changes.

Fragmentation of key information, particularly in a customer data domain, is a common problem that also needs to be addressed in order to maintain consolidated and complete information. Good customer data integration (CDI) processes along with regular data merge and alignment routines will greatly help with the consolidation of fragmented data into a complete master record.

Data Consistency

Keeping information consistent is probably the single most important factor in a smooth transition. It is an extremely time-consuming activity to manipulate inconsistent data because multiple rules have to be created for the same set of data elements.

Consistent data is much easier to migrate, augment, cleanse, standardize, and consolidate. One of the challenges during a data migration effort is to intelligently combine multiple disparate sources of data. Adding to that problem is that many of those sources will have inconsistent information. Therefore, it is necessary to perform both inter-system and intra-system reconciliations. Expanding data quality analysis and metrics across that master data environment and the source systems will drive more end-to-end consistency and can reduce issues later if and when both the master data and transactional data needs to be migrated and realigned.

Adding to the challenge is that consistency of information at the data entry point is very difficult to validate. Users have a very creative imagination, and are capable of entering the very same information in many different ways, especially when utilizing free-form entry fields. Most of the consistency is left to business process instructions, which are usually weakly enforced. Training and creative ways to recognize and reward good practices become critical tools to minimize these issues.

Implementing a Customer MDM program in your company will help solve many prior discrepancies, but the work doesn't end there to withstand ongoing changes. Quality of information, at all dimensions, is not a single project; therefore, its significance must be constantly reiterated.

Data Integrity

Data integrity is multifaceted and reached by multiple means, such as data accuracy, validity, and referential integrity. In the end, data that is validated against or maintains a link to a given trusted source or reference is better positioned to withstand data conversions, migrations, and newly added operational and BI requirements.

Referential integrity is two-fold:

1. It guarantees entered information meets predefined values, which is one of the most effective mechanisms to assure healthy data.

2. It aids in the expansion of concepts. When support of new functionality is added to the system, it is easier if the new functionality is based on reference information that can be easily modified and made available automatically to entities and attributes that are mapped to it. For example, today, a given system may only support a set of five account types. If account type information is normalized, it can more easily be expanded in the future to support newly required types.

External and even internal data references can also be great resources for expandable knowledge and functionality. If the source of information can be delegated to specialized systems, there is a better chance for enhancement. Naturally, specialized systems will be more prepared to evolve their own disciplines and associated information. Consequently, it makes it easier to tap in to newly developed insights through existing mapping. For example, a DUNS number associated with a customer record allows for retrieval of whatever information is provided today by D&B. As D&B expands its reach, additional insight can become readily available.

Internal references are also subject to the same concept. A given LOB may be more apt to control a given set of information and associated rules and algorithms. Other LOBs dependent on that resource are better off making references to it, instead of creating their own version. For example, a customer-pricing structure may be very complex and rigid. Replicating much of this can further hamper upgrades.

Accurate and valid data also indicate better preparation. Data validated for accuracy against a given reference source typically signals an acceptable level of completeness and consistency, which, as discussed previously, are key factors in an evolving data program.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Data Quality Factors

Create new playlist

Sign In

Sign Up

Table of Contents for
Data Quality Factors