Master data governance and stewardship
This chapter introduces the concept of master data governance, its objectives, and scope. It defines master data governance for multidomain master data management (MDM) with several data governance aspects and policy control types. It also describes the roles of the data governance processes and key performance indicators (KPIs) and their critical importance for achieving mature levels of data governance maturity.
This chapter includes the following sections:
3.1 Definition and objectives of master data governance
It is commonly recognized that business information is one of the most important assets of any modern enterprise. Enterprises have developed policies that declare information a most important corporate asset. However, these policies are often cannot be acted upon. No commonly accepted standard exists for measuring and reporting on the corporate information assets. A lack of information asset KPIs makes it difficult to establish data governance organizations. In addition, it is also difficult to assign individuals who are accountable for the quantity and quality of enterprise information assets overall. These assets might greatly contribute to the equity of the corporations and market capitalization. However, enterprises lack an ability to set meaningful quantitative targets for corporate information assets, their growth, and improvement.
Master data is the most critical and valuable subset of enterprise information assets. Master data refers to data that is foundational to business processes and is widely distributed. The distributed data, when well managed, directly contributes to the success of an organization, and when not well managed, poses the most risk.
Master data typically includes a few major master information domains, such as party/customer, product, location, service, account, employee, and supplier. It also includes the relationships between those domains. In addition, master data includes reference data that serves as qualifiers for the master information domains, such as account types, customer categories, industry codes, and geographical areas. The reference data justifies a priority treatment of master data, which requires both technology and business strategy.
In many enterprises, master data has been a major strategy and implementation focus area for the last few years. The MDM strategy and implementation effort require support from business. Insufficient business directions and support can adversely impact the outcomes of the MDM initiative. By the nature of master data, its broad distribution, and multifunctional use, no single business function can take exclusive responsibility for master data requirements, rules, and processes. In some cases, requirements from different functional areas can even conflict with each other in terms of their content and priorities. This situation substantiates the need for a cross-functional master data governance council.
The MDM market recognizes that data governance is critical for enterprise MDM implementations. Master data governance is a focus area of data governance that is dedicated to MDM implementations. Master data governance, as a discipline, concentrates on controls, methodologies, capabilities, and tools that are developed by modern MDM over the last decade.
Master data governance has the following objectives:
Establish a master data governance council or board.
Formulate master data governance policies that establish accountability and enforcement.
Monitor, oversee, and enforce proactive, collaborative, and effective data stewardship that is driven by data governance.
Acquire and use tools that enable master data governance and data governance-driven stewardship, including policy administration, enforcement, remediation, and monitoring.
IBM offers a comprehensive set of products and components for master data management and governance.
Figure 3-1 illustrates the IBM multidimensional approach to master data governance.
Figure 3-1 Multidimensional approach to master data governance
As illustrated in Figure 3-1, policy management is addressed by the policy administration, enforcement, and monitoring components. All aspects of master data governance are addressed. The governance aspects include data quality and stewardship for master data, data visibility, privacy, and security (especially critical for customer data), and business and technical metadata.
The master data governance council is a cross-functional leadership team. This team works to approve and establish policies that dictate how master data is captured, managed, propagated, and used across the enterprise to achieve short-term and long-term goals. The pursuit of master data governance is an ongoing effort, rather than a one-time project.
Data stewards are responsible for everyday operations that enforce the guidance and policies of the master data governance council. Policy enforcement in mature data governance can require complex data stewardship remediation processes to adequately support business and master data governance requirements with agile and flexible data flows. Traditionally built applications might be unable to provide the required level of agility and flexibility in administration of master data governance policies and their remediation. This requirement puts forward the need to use business process management (BPM) software and templates that are integrated with MDM.
Master data governance is part of an information governance or data governance discipline. The primary focus is on master data implementations where modern MDM patterns are used with MDM data hub technologies. If an enterprise establishes a data governance council, a section of this council that specializes on MDM can play a role on the master data governance council. Conversely, if a data governance council is not established in the enterprise, master data governance can become a seed and starting point for an enterprise data governance council.
Most information governance and data governance methodologies lack a master data focus and do not reflect the specifics of modern MDM hub implementations. Few organizations achieve high master data governance levels of maturity. The percentage is even lower than for data governance in general.
3.2 Data governance maturity
The IBM Data Governance Council defines five levels of data governance maturity:
Level 1: Initial
Ad hoc operations rely on individuals’ knowledge and decision making.
Level 2: Managed
Projects are managed but lack cross-project and cross-organizational consistency and repeatability.
Level 3: Defined
Consistency in standards across projects and organizational units is achieved.
Level 4: Quantitatively Managed
The organization sets quantitative quality goals that use statistical and quantitative techniques.
Level 5: Optimizing
Quantitative process improvement objectives are firmly established and are continuously revised to manage process improvement.
To achieve advanced levels of data governance maturity (levels 4 and 5), organizations should be able to define and set quantitative policies, to effectively administer them, and to enforce the policies through agile data quality and stewardship processes.
Innovative IBM software products and components can help enterprises establish the advanced levels of master data governance maturity. They can also enable and sustain proactive, agile, and efficient policy-driven data stewardship on the MDM initiatives and MDM powered projects and programs.
3.3 Master data quality
Master data quality relies on the following components to effectively manage master data:
3.3.1 Policies
One goal of master data governance is to enable a master data governance council to define and issue master data quality policies. Policies or sets of business rules quantify the policy compliance criteria and define how certain levels of data quality must be achieved. Hard policies prevent a record from being saved if a policy is not met and prevent more flexible soft policies. By using soft policies, a user can save records even if a record is not compliant with data governance policies. The noncompliant records are routed to a data stewardship inbox for resolution.
Figure 3-2 illustrates a typical set of master data quality policies.
Figure 3-2 Master Data Policy management
In IBM InfoSphere Master Data Management V10.1, policy administration and policy enforcement are implemented within IBM Business Process Manager Express. Policy monitoring is implemented in the Master Data Policy Dashboard. IBM Cognos® is used to report on the data quality metrics and their compliance with established policy targets.
3.3.2 Processes
Also known as procedures, workflow, or practices, master data quality processes define how policies are to be implemented. Usually processes refer to the role or job function that is responsible for taking an action. They might also specify certain systems, screens, or forms that users in those roles must read, follow, or complete. The processes also address how exceptions are managed, which is typically by starting a separate process.
 
Exceptions: Some of the exceptions might be common across many different policies and processes. Therefore, you must ensure that, if changes are made to the parent processes and policies, you keep them sync.
Processes might specify certain time constraints for a process. For example, they might state that certain actions must be accomplished within a number of hours, minutes, or seconds. Processes might also specify escalation paths for higher-level approvals by people in other roles. The processes might start other processes. Therefore, use care to ensure that the processes and procedures are clearly articulated, leave little, if any, room for misinterpretation, and are as complete as possible.
Processes are also meant to be living specifications. That is, as experience is gained by practitioners, as systems and regulatory guidelines change, and as other factors come into play, policies and procedures might change.
At a high level, two categories of processes are used with MDM. The first category includes business processes that use master data, such as the following examples:
Global account opening and client onboarding
Patient registration
Customer relationship management (CRM)
Building the content of complex data warehousing dimensions
Product definition
Development and catalog management
Industry-specific and function-specific applications
The second category includes data quality and stewardship processes. These processes, despite differences in customer requirements, can be defined in a general way. This section focuses on the following data quality processes:
The Benchmark Maintenance process is responsible for the creation and maintenance of the golden record. The notion of the enterprise-wide trusted record, which is referred to as the golden record, is one of the key concepts in MDM. Data governance experts agree that an enterprise has reached an important milestone in its master data governance journey when it can agree on the golden record for customer, product, location, service, and other master entities.
The Benchmark Proliferation process ensures that all enterprise systems upstream and downstream of the MDM data hub use the golden record that is created and maintained in the Benchmark Maintenance process. The importance of this process is fairly clear. Why does the enterprise need the golden record if it is not used consistently across the organization? Unfortunately enterprises that implemented MDM often lack a properly implemented Benchmark Proliferation process, which limits the usefulness of the MDM solution.
Each process, which is shown in Figure 3-3, requires quantitatively defined policies and agile configurable workflows.
Figure 3-3 Master data quality processes
3.3.3 Metrics and KPIs
Data governance is a control discipline. A master data governance council initiates its control through master data quality policies. Metrics are necessary to the degree that the master data governance council wants to monitor and measure the levels of quality and consistency that are being achieved, in accordance with policies. For example, a policy states that a goal of completeness or uniqueness must be achieved, or the number of overlays must be reduced. Then, the metrics are the measurements that are taken so that managers can monitor trends in achieving the goals of the policies.
The metrics in Table 3-1 can be used as KPIs for the data quality processes.
Table 3-1 Primary metrics that drive data quality for master data
Metric
Quality process
Description
Estimated number of false positives in the GOLDEN RECORD SET
Benchmark Maintenance
Computed with the use of Receiver Operating Characteristic (ROC) curve that was originally developed in Signal Detection Theory. Provides an estimate for the false positive rate in the GOLDEN RECORD (Composite View). For mission-critical applications (such as healthcare and financial services), the false negative rate is expected to be in the range 10-7 - 10-8.
Estimated number of false negatives in the GOLDEN RECORD SET
Benchmark Maintenance
Computed with the use of ROC curve that was originally developed in Signal Detection Theory. Provides an estimate for the false negative rate in the GOLDEN RECORD (Composite View). For practical purposes, the false negative rate is often in the range of 10-4 - 10-5.
Self-score
Completeness of the
GOLDEN RECORD SET
Benchmark Maintenance
The metric is computed by using scoring application programming interfaces (APIs) that were developed earlier to match pairs of records. Within this metric computation, each GOLDEN RECORD (Composite View) is matched to itself to quantify the amount of identifying information on the record. The concept is similar to Entropy. A self-score is expressed in the same units as the thresholds (autolink and clerical review).
Source to GOLDEN RECORD
Consistency
Benchmark Proliferation
The metric is computed by using scoring APIs to match source system records (members) to the corresponding entity record (Composite View). Often organizations require at least 80% of consistency between each source and the golden record.
Uniqueness within sources
Benchmark Proliferation
The metric is used to identify duplicates within each source. Shows 100% in the absence of duplicates.
The policies that are defined by the master data governance council must be clearly defined. Mature data governance processes require quantitatively managed goals, advanced statistical metrics, and techniques. Quantitative process improvement objectives must be firmly established and continuously revised to manage process improvement.
The first two metrics, estimated number of false positives and estimated number of false negatives, determine the accuracy with which the golden record uniqueness is defined. A false positive indicates that an entity record should be separated into two customer records. A false negative indicates that two entity records should be linked because they represent one customer.
Advanced scoring algorithms are developed and used to quantify similarity of records for matching.
Scoring above a certain threshold, as illustrated in Figure 3-4, indicates that the two records belong to the same customer. This approach has been successfully used in probabilistic matching. The same scoring APIs can be used to quantify the completeness of the entity records. The metric that is obtained by scoring a record on itself is referred to as self-score completeness.
Figure 3-4 illustrates the difference between incomplete records in the lower part of the chart and a complete record set in which most of the records score high and the score distribution is shifted to the right side of the chart.
Figure 3-4 Master data completeness
The master data governance council directs data stewards to focus on the low scoring records, on the left side of the distribution in the lower graph in Figure 3-4. As a result of the data stewardship activity, the graph is expected to be transformed to look more like the distribution in the upper graph in Figure 3-4.
Figure 3-5 illustrates how the consistency between the data sources and the golden record is measured with the use of the scoring APIs.
Figure 3-5 Master data policy metrics
When the attribute value is identical to the value in the golden record, the data source of the golden record shows 100% of the data quality. Data quality can be 0% when the attribute value in the source is blank or has an anonymous value, such as “Baby girl.” The data quality in the source can be negative if the attribute value is incorrect. The scoring APIs can recognize that “Bill” is a nick name for “William” and can identify and penalize small deviations such as missing or transposed characters.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset