MDM Approaches and Architectures

,

Master Data Management (MDM) is about bringing master data together to facilitate the employment of master data management services—such as data governance and stewardship; data quality, metadata, hierarchy, and overall data lifecycle management—and ultimately, to serve as the single source of truth for the business. Customer MDM focuses on the customer data domain in particular and its associated properties, such as company name, tax ID, addresses, contacts, accounts, and company hierarchy.

In addition to data domains, such as customers, products, partners, and suppliers, data inside a company can also be classified as operational or nonoperational. Operational data is the real-time collection of data in support of a company's needs in their daily activities. Nonoperational data is normally captured in a data warehouse on a less frequent basis and used for business intelligence (BI). This particular classification of data is relevant in this context because it can be used to distinguish most common MDM initiatives.

Although the very essence of implementing MDM is in the appliance and fine tuning of MDM practices to fit the enterprise architecture and business model, MDM implementations as a whole can generally be categorized into three major types of initiatives based on its primary focus being operational or nonoperational data:

1. Analytical MDM: address BI

2. Operational MDM: address business operations

3. Enterprise MDM: address both BI and operations

Each has a somewhat different objective and carries distinct levels of complexity, risk, and impact. Companies should perform detailed analysis to decide which approach is required. At a minimum, an MDM program must take into consideration business and IT requirements, time frame, resource availability, priority, and the size of the problem to be addressed.

Deciding which approach to implement is dependent on the business case, which is explained in more detail later in this chapter. Because each of the previous approaches targets a different category of information, they ultimately impact a company at varying degrees. Figure 1.1 depicts the level of intrusiveness of each MDM approach.

Figure 1.1 MDM Approaches

img

Operational data is inherently more critical to a company than nonoperational data due to its usability and timeliness. Therefore, analytical MDM is the least intrusive approach, followed by operational MDM and obviously the all-encompassing enterprise MDM, which is a combination of both analytical and operational MDM.

Naturally, more intrusive MDM projects involve both higher risks and higher likelihoods of disrupting companies' daily operations. It is important to notice that the figure does not suggest a sequence or phases to be adopted when implementing an MDM solution. As a matter of fact, phased deployments need to be observed from two different perspectives. One is concerned with progressing from one approach into another, such as starting with an operational MDM, then an analytical one to complete the enterprise solution. Another way to look at phased deployments is within a particular approach. It is not uncommon to start an operational MDM integrating just a few legacy systems, and slowly incorporate others. More about phased deployments will be discussed in Chapter 3.

Next, each of the approaches is explored further with the most common architectures employed for each of them. Keep in mind these are generic frameworks. MDM can be so encompassing and pervasive that the number of potential combinations can be many. Hybrid solutions are also very common. Finally, many subjects in the MDM arena don't have a universal terminology. What is called approaches and architectures in this book may be called styles, framework, or implementation in other books along with other varying definitions. What is important is to understand how the master data is integrated, used, maintained, improved, and governed.

Analytical MDM

Historically, analytical MDM has been the most commonly adopted MDM approach. This stems mostly from the relative simplicity of leveraging data warehouse projects. It is beyond the scope of this book to describe data warehouses in detail, but the following summary of the three primary data warehouse architectures might help you understand how MDM projects can benefit from this already existing integration:

1. Top-down. Major proponent is Bill Inmon. Primarily characterized by a data warehouse as a centralized and normalized repository for the entire enterprise, with dimensional data marts containing data needed for specific business processes. Up-front costs are normally higher and it takes longer initially until common structure is designed, built, and sources are integrated, but it is more adaptable afterward.

2. Bottom-up. Major proponent is Ralph Kimball. Data marts are first created to provide reporting and analytical capabilities for specific business processes, and can eventually be integrated to create a comprehensive data warehouse. Provides results quickly to each independent business unit, but overall data integration is potentially harder to achieve.

3. Hybrid. A combination of top-down and bottom-up approaches, characterized by a high-level normalized enterprise model, more quickly integrated with business specific data marts for faster results.

One may ask: If there is already a data warehouse integrating the data from across the enterprise, isn't that MDM? The answer is: not necessarily. It actually depends what is being done with that data. Bringing the data together is just one piece of MDM. The other piece is applying MDM practices, such as identity resolution; data cleansing, standardization, clustering, consolidation, enrichment, categorization, synchronization, and lineage; metadata management; governance; and data stewardship.

Bottom line is a data warehouse, and data mart infrastructure can work as the conduit to a much larger and encompassing MDM program. Conversely, the business intelligence, analytics, reports, and other outputs relying on the data warehouse and data marts will greatly benefit from the additional practices imposed by MDM—above all, data quality and hierarchy management improvements. Keep in mind that in this context, a strategic or a tactical BI implementation is implied instead of an operational BI since the underlying data is nonoperational.

Figure 1.2 depicts a common architecture adopted by companies implementing an analytical MDM approach.

Figure 1.2 Analytical MDM

img

Figure 1.2 shows that an extract, transform, load (ETL) process gathers data from disparate operational systems. Ultimately, the data is stored on an enterprise data warehouse (EDW). EDW and associated data marts become the source of master data for BI and analytics. Since EDW is now a single source from an analytical perspective, it is also the centerpiece for what can be called MDM services.

Analytical MDM is the quick-hit approach. While companies can quickly make a tremendous impact with respect to reporting and BI, with the analytical MDM approach relatively minimal inputs yield corresponding outputs. Specifically, companies fail to harvest the benefits of the MDM services back to their operational data. Remember, the data improvements are happening in the data warehouse, which is downstream from the operational systems. What's more, the analytical MDM approach does not enforce any regulatory or audit requirements since those are mandatory at the operational level.

Another drawback with this implementation is the possibility of adding one more fragmented and incomplete data system to the company. Obviously, the quality of the results will be directly related to the quality of the MDM services applied to the data. But a less obvious conclusion is the quality of the results is also directly related to the amount of data sources integrated. Certain lines of business (LOBs) are very sensitive about feeding data warehouses with their operational and strategic information, making it hard to achieve comprehensive integration.

On the other hand, it is possible for companies implementing an analytical MDM to influence the operational world. Analytical teams have access to an integrated view of the data and its underlying quality. They can recognize bad data and potential root-cause offending practices relatively quickly, as well as correlate discrepancies across LOBs. This is powerful knowledge that can be used by a strong data governance team to influence and improve data quality and business practices at the source. Be aware, however, that operational LOBs tend to be very resistant to this approach and to succeed with this practice, strong sponsorship from high-level executives is necessary.

Operational MDM

Operational MDM targets operational systems and data. It provides the opportunity to consolidate many, and ideally all, disparate operational data systems across the company, and become a true system of reference. This is obviously an enormous task. From a data integration perspective, the difficulty increases with the volume of data to be integrated along with the level of disparity among the systems to be combined. But it is much more than simply data integration. It is about business process integration and massive technological infrastructure change, which can impact virtually everyone in the company.

Depending on the size of the company, an operational MDM will likely be deployed in phases. Breaking down what is included in each phase can vary widely as well. One method for phased deployment is gradually migrating each data system into a single MDM repository until all systems in scope have reached end-of-life (EOL).

Another method for breaking down phases is gradually migrating portions of data from a single system. Sometimes this is necessary because it is not possible to promptly EOL a particular legacy system if not all its business processes have been transitioned to the new application yet. It may sound strange that there is a need to start transferring the data if the system is still operating. But that is sometimes necessary to support other already migrated systems that have dependencies on that particular legacy data.

Finally, a combination of both phased methods are not uncommon either, with systems and portions of data making their way to the single MDM source at contrasting techniques. The data integration component of MDM is obviously complex, and companies need to be very creative in finding the best method for consolidating legacy data.

The bottom line is that it can be very difficult to EOL a given operational system because it is not only a technical issue; it is a business issue, as well. Changing business practices that have been in place for years and years can be overwhelming. Besides, it could impact customer relations, and that is the last thing anyone would like to happen. Therefore, to avoid disruption of current business practices, a common practice is to implement a temporary interface between the legacy and the new system until the transition is finally complete.

Chapter 3 will get into more detail regarding phased deployments, data migration, business process reengineering, build versus buy MDM, and so on.

Nonetheless, once an operational MDM is implemented, companies can leverage it into the analytical world for a complete enterprise MDM solution with relative ease. Operational MDM can be accomplished via three different architectures:

1. Single central repository architecture

2. Central hub and spoke architecture

3. Virtual integration

Note that a service-oriented architecture (SOA) with an enterprise service bus (ESB) and business process orchestration is not required to make the MDM repository or the federation system available, but it is the most common and effective architecture.

Single Central Repository Architecture (SCRA)

In this architecture, a single central repository within the operational environment serves as the source of data to an integrated suite of applications and processes. Only one physical copy of master data exists.

It is important to emphasize that this approach may obviate the need for certain applications. In other words, after the consolidation of data, a company may not need all of its previous applications. Required applications dependent on that data might need to be rewritten, or will likely require some interface or other major changes to maintain integration.

SCRA guarantees consistency of master data. However, it can be very expensive—if not impossible—to implement due to potentially inflexible off-the-shelf applications in use (although, if reached, this could actually be the easiest and cheapest to maintain). SCRA could potentially require a massive data conversion effort, depending on the size of the company and the number of disparate systems.

In Figure 1.3, multiple legacy systems go through a data conversion step to bring data into a central hub. This conversion normally takes place in phases to minimize impact and lower risk of concurrently converting multiple legacy systems. When the central hub is operational, it is then used by application systems that would either replace legacy systems or add new functionality to the company. In this particular case, new application systems do not have their own versions of master data.

Figure 1.3 Single Central Repository Architecture

img

Central Hub and Spoke Architecture (CHSA)

This is a more common variation of SCRA. Like SCRA, CHSA has an independently deployed common repository. However, CHSA does not require that all applications and processes are fully coupled to the hub.

The major advantage of this architecture is the efficiency of a central hub hosting the master data, combined with the flexibility to support spoke systems operating relatively decoupled. This flexibility is important when integrating commercial, off-the-shelf (COTS) applications with an MDM solution.

Some of the applications can act as spoke systems with independent data models, but cross-referenced and synchronized to the central data. To be sure, CHSA alleviates some of the problems presented by SCRA, but CHSA can still require a massive data conversion effort and new interfaces between the hub and its spokes.

In Figure 1.4, multiple legacy systems go through a data conversion step to bring data into a central hub. Again, this conversion normally takes place in phases to minimize impact and lower risk of concurrently converting multiple legacy systems. When the central hub is operational, application systems then access it to either replace legacy systems or add new functionality to the company. Spoke systems are synchronized and integrated with the central hub.

Figure 1.4 Central Hub and Spoke Architecture (CHSA)

img

Virtual Integration (VI)

Virtual integration is a generic term to represent solutions that don't physically copy existing data into a new repository. This is a fundamental difference compared to the previous two methods. Registry and data federation (DF) are common VI architectures. A VI system aggregates data from multiple sources into a single view by maintaining a metadata definition of all sources. Data across multiple sources is collected in real time through some pre-established keys connecting the VI system and its sources. MDM services are applied to the dynamically collected data, becoming a new source of trusted data to downstream process applications. DF systems normally provide a more robust infrastructure than a simple registry implementation.

The biggest drawback with this implementation is the lack of data improvement propagation back to the source. VI provides benefits to consumers of its services, but not to the original sources of the data. Conversely, due to its nondisruptive nature, it is relatively simple to deploy. It could be a good first step before embarking into a central hub implementation.

In Figure 1.5, a data service federation (DSF) system collects real-time data from multiple existing sources. The data is not physically copied into the federated system. Information about the data in each source is stored on a metadata repository. It is further used to determine which system and data element to access based on requests performed to the DSF.

Figure 1.5 Data Service Federation (DSF) System

img

Enterprise MDM

Enterprise MDM is a combination of both operational and analytical MDMs. As such, it can be implemented by combining the architectures previously discussed.

A data warehouse solution could be added to any of the three operational MDM architectures. As an added bonus, most of the MDM services that would be needed in the warehouse are already functional in the operational system, making the maintenance of your data warehouse much easier. Furthermore, the ETL function of the analytical MDM should be much simpler since companies now maintain fewer systems from which to extract data. What's more, the data should be cleaner, standardized, and already consolidated.

Data federation offers another potential solution. DF could be expanded to provide a view into multiple departmental data warehouses in addition to operational systems. Through this method, DF becomes the single point to resolve complex BI queries. This solution reduces both companies' costs and complexity by lowering the need for an extra and expensive database server. However, there's no free lunch here.

DF technology takes a toll on performance of the operational and transactional data sources that it queries. It requires that transactional data sources are always on. This is in stark contrast to batch load data at preset and convenient times as normally done by data warehouse implementations—for example, at 4 A.M., while few users are accessing the system. BI queries can be quite complex and aggregate a multitude of data. Data warehouses are normally optimized to support those queries, making a DF implementation for this purpose potentially unfeasible. If companies go this route, then they should proceed with caution and perform extensive load testing to confirm viability.

Figure 1.6 shows one possible enterprise MDM architecture implementation.

Figure 1.6 Enterprise MDM

img

In conclusion, the number of combinations of MDM approaches and architectures is large. The previous figures and categories are meant to be general guidelines and the most common implementations. It is important to consider the data domain in scope (obviously customer in the context of this book), the purpose of managing the data (operational or analytical), and the technical architecture (central hub, data warehouse, virtual integration, hybrid).

Next is a description of the common business cases normally utilized to justify the deployment of a Customer MDM solution, followed by which approach(es) and architecture(s) best fit each of the business cases. Concluding this chapter is a discussion around the elusive ROI question.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset