Solution reference architecture
This chapter describes how and where IBM InfoSphere Master Data Management Reference Data Management Hub (InfoSphere MDM Ref DM Hub) fits into an information management reference architecture. The chapter describes the core and supporting reference data objects and services that are provided by the InfoSphere MDM Ref DM Hub. It also examines the integration patterns with common enterprise information management components: InfoSphere Master Data Management, SAP, content and taxonomy management systems, and data warehouses.
2.1 Base reference architecture
Overall, the InfoSphere MDM Ref DM Hub serves as an integration, management, and distribution point in the enterprise for reference data sets, maps between reference data sets, and hierarchies over reference data.
Figure 2-1 shows an overall view of where RDM fits into an enterprise reference architecture.
Figure 2-1 Reference Data Management Hub in an enterprise architecture
Reference data sets and hierarchies that InfoSphere MDM Ref DM Hub provides are consumed by enterprise information systems (such as InfoSphere MDM, SAP, data warehouses, business intelligence systems and so on) to ensure that business objects are accurately and consistently described across the enterprise. Reference data maps (described in “Maps” on page 39) are used by data integration layers (such as IBM InfoSphere Information Server, or an enterprise service bus) to map reference data values between source systems and target systems.
Stewardship of reference data consists of the following tasks:
Importing or authoring reference data from source systems
Managing the change to the reference data in an orderly fashion
Distributing the reference data to downstream systems.
2.1.1 InfoSphere MDM Ref DM Hub
Master data is the common set of business objects (such as customers, products, location, account, and so on) that are shared across an enterprise. The InfoSphere MDM Ref DM Hub introduces a new domain of master data (in this case, reference data) that is hosted within the InfoSphere Custom Domain Hub. InfoSphere Custom Domain Hub serves as a framework that makes the InfoSphere MDM Ref DM Hub domain objects available as services, and provides ancillary services to InfoSphere MDM Ref DM Hub. The ancillary services include security, event notification, enterprise specific extensions to InfoSphere MDM Ref DM Hub services, and so on. For more information about master data and master data management, see Enterprise Master Data Management: An SOA Approach to Managing Core Information, by Allen Dreibelbis, Eberhard Hechler, Ivan Milman, Martin Oberhofer, Paul Van Run, Dan Wolfson.
Reference data (sets, maps, hierarchies) is either imported into the InfoSphere MDM Ref DM Hub or entered directly through the InfoSphere MDM Ref DM Hub user interface.
2.1.2 InfoSphere MDM Ref DM Hub services and user interface
All of the InfoSphere MDM Ref DM Hub objects can be accessed through the web services. Additionally, a REST layer on top of the InfoSphere MDM Ref DM Hub web services is used by the InfoSphere MDM Ref DM Hub user interface. The InfoSphere MDM Ref DM Hub UI is the standard way to manage the InfoSphere MDM Ref DM Hub and to govern reference data.
2.1.3 Core reference data domain objects
There are three core reference data domain objects: sets, maps, and hierarchies (Figure 2-2 on page 38). Each object supports the standard create, read, update, and delete (CRUD) operations. Each object also supports the notion of a validity period (the time when an object becomes active, and the time when an object is no longer valid). Sets and maps also support extensibility and lifecycle. These capabilities are described in 2.1.5, “Supporting services” on page 43.
Figure 2-2 Reference data domain objects
Sets
Sets are the core object of a reference data domain. Sets refer to the list of values of a data element that is used to classify data. In other systems, reference data sets are known as code tables, lookup tables, property lists, and value lists. Examples of sets that are commonly seen include the following items, for example:
Country codes
Employee types (for example, full time, part time, temporary, contractor)
Courtesy title (for example, Mr., Mrs., Dr., Miss, Rev.)
Values within sets have two required attributes:
Code: This attribute is the underlying system representation of an individual value. Codes are the key for a value (and can be extended to support multiple codes).
Name: This attribute is the actual meaning of the particular code.
Figure 2-3 shows a simple set.
Figure 2-3 A set representing ISO-3166 country codes
Set values also have an additional attribute, translations. Names of individual values can be translated into separate strings for separate countries, for example, doctor in English and medico in Spanish. Sets can also be extended with additional properties, both at the set level, and at the value level.
Maps
Maps are a special type of relationship between sets; in particular, maps are used to link values in one set to values in another set. For example, Table 2-1 and Table 2-2 on page 40 both represent the concept of countries, but each set represents that country differently.
Table 2-1 shows the values for countries as telephone dialing prefix, where the value for each country is represented as a two-digit number.
Table 2-1 A set of telephone country codes
Code
Value
1
United States of America
44
United Kingdom
49
Germany
...
...
Table 2-2 shows a set based on the International Standards Organization (ISO) 3166 standard for country codes, and represents the value of each country as a two-character values.
Table 2-2 A set of country codes represented by two characters per ISO-3166
Code
Value
DE
Germany
UK
United Kingdom
US
United States of America
...
...
Table 2-3 is a map that shows which values in two sets are equivalent. In this example, the value 1 in the Telephone Dialing Prefix Set is equivalent to US in the ISO 3166 Country Code Set. The link between one value in a set and the equivalent value in another set is called a value map.
Table 2-3 A Map between telephone dialing codes for countries and ISO 3166
Telephone Dialing Prefix Set
ISO 3166 Country Codes Set
1
US
44
UK
49
DE
Reference data maps are used by data integration programs (where data is transferred between separate systems that represent the same information differently) and data distribution programs. The data integration layer takes a reference value in a source set, and uses it to map it to the corresponding value in a target set. Thus in our earlier example, the value of 1 is mapped to US.
Hierarchies
Hierarchies are also a relationship between reference data values, but are different from a map. A map is a list of relationships between values in different sets; hierarchies are a relationship of values within a set and are known as set hierarchies.
Table 2-4 on page 41 shows a set of travel expense accounting codes, with each code having a numeric value and a name representing the detailed type of expense.
Table 2-4 Travel expense accounting code set
Code
Name
100
Travel
121
Airfare
124
Hotel
200
Non-Travel
225
Phone
233
Internet Service
2251
Mobile Phone
Figure 2-4 illustrates a simple hierarchy of values in this set.
Figure 2-4 Hierarchy of expense accounting codes
Hierarchies can be consumed by a number of enterprise information systems, but are often used by business intelligence systems, for use in reports that roll up data. For example, a report might show expenses summarized by types of expense (travel versus non-travel).
Hierarchies are implemented in InfoSphere MDM Ref DM Hub as a layer on top of underlying InfoSphere Custom Domain Hub hierarchies.There are other types of hierarchical relationships between sets, but those are not explicitly represented as InfoSphere MDM Ref DM Hub objects. For example, a country set might have a property that indicates in which continent that country resides, and that continent might be a value in a continent set. This way is known as a level-based hierarchy. Level-based hierarchies can also be represented in maps.
2.1.4 Supporting domain objects
In addition to the core reference data domain objects, some supporting objects are used in the reference data domain. These objects range from providing underlying support for core objects (types), to providing objects that link, to providing organizational containers (folders), and finally to providing a set of objects that are linked to the core objects (subscriptions, managed systems) as part of the reference data ecosystem.
Types
Types are used to specify the attributes and properties of reference data sets and maps. All reference data sets and reference data maps must have a type. InfoSphere MDM Ref DM Hub provides a default type for sets and maps.
A type can be customized for a specific set or map. The customizations for a set include the following items:
Additional properties at the set level
For example, a URL property can be added to reference the external standard for a set. Properties can be string, text, Boolean, URL, integer, date, time stamp, and other sets.
A validation routine for the code of a set
Validation routines are implemented as regular expressions. A validation for a two-character value can be implemented as follows:
[A-Z]{2}
Additional properties at the value level
For example, a country set might have two additional properties, latitude and longitude. Each property might also have additional attributes:
 – Whether this property is also a key of the set
Properties marked as keys are effectively concatenated together when used in a map, or when a set is imported and exported.
 – Whether the property is required
 – Whether the property is unique
 – A validation routine for the property (similar to the one at the set level)
Maps are also typed and they allow for extensibility at the value map level. For example, you can add a Boolean attribute called “verified” to a value map, indicating that the particular value mapping has been verified by a data steward.
Managed systems and subscriptions
Managed systems are enterprise information systems that are providers and consumers of reference data. Subscriptions are used to define which reference data is being consumed by a particular managed system. Applications that distribute reference data can use subscriptions to determine which managed systems should receive the reference data they require.
The managed systems objects contain information about what sort of enterprise information system is using or providing reference data (and can be extended generically with attributes like host name, and so on). Subscriptions can include the following information:
Which managed system the subscription is for
Frequency for which these systems want to receive updates
Which delivery mechanism to use
Which format to use to send reference data to the subscribed system (CSV or XML)
Which lifecycle state (approved, draft, test, and so on) is of interest
The last update of a managed system for this subscription
Folders
Folders are a convenient way to organize sets in the InfoSphere MDM Ref DM Hub. Sets can be placed into individual folders, for both viewing convenience and for security (different access rights can be assigned to different folders). In the InfoSphere MDM Ref DM Hub, folders are similar to folders that are used in email, directories that are used in file systems, and so on.
2.1.5 Supporting services
Besides the standard create, read, update, and delete functions on objects, the InfoSphere MDM Ref DM Hub includes additional services that are specific to the RDM domain. These services address the lifecycle, security, and stewardship of the RDM objects.
Lifecycle
The InfoSphere MDM Ref DM Hub core objects (sets, maps, and hierarchies) support two types of lifecycles:
Approval and lifecycle state:
The core InfoSphere MDM Ref DM Hub objects can have a well defined lifecycle that specifies the governance over the state of a reference data object. The lifecycle specifies how these objects are updated over time, and by whom. The current state is where the object is in the lifecycle in particular, how these objects are updated, and their current status. Some example states are draft, pending approval, approved, and retired.
Version
Reference data can support multiple instances of a set, map, or hierarchy. This type is useful in the following situations:
 – Major changes are introduced to these objects.
 – Consumers or providers of reference data are fixed on a particular version of a reference data object and cannot be easily changed.
For an example, a particular application might work with version 1.0 of a set of employee codes and cannot be upgraded to work with version 2.0 because the data for that application cannot easily be updated.
Stewardship and security
The InfoSphere MDM Ref DM Hub implements a security and stewardship model on top of reference data objects and their lifecycle states. Different users can have access to different objects, based on the ownership attribute of the object (which is a list of groups that have access to the object). For objects with lifecycles, users can be placed in different roles in the lifecycle process (steward and approver, for example).
The InfoSphere MDM Ref DM Hub supports a set of basic predefined roles that can be mapped to groups in an enterprise directory (LDAP), as follows:
Administrator: Can perform all InfoSphere MDM Ref DM Hub operations, including defining types.
Data Integrator: Can set up managed systems and do import and export of InfoSphere MDM Ref DM Hub objects.
Data Steward: Can create, update, and import and export sets, maps, and hierarchies, and can manipulate folders.
Approver: Can approve changes to sets and maps.
Import and export
Reference data is prevalent in enterprise systems. For reference data to become managed reference data, it must be imported into the InfoSphere MDM Ref DM Hub for stewardship, managed there, and then exported back to the enterprise systems. To work with existing reference data (sets, maps, and hierarchies), InfoSphere MDM Ref DM Hub supports services to import and export reference data in both CSV and XML format. Transform operations can then be used to change the exported data into the format consumable by the enterprise systems (database tables, properties files, and so on).
2.1.6 Supporting Custom Domain Hub services
Because the InfoSphere MDM Ref DM Hub is a Custom Domain Hub application, the InfoSphere MDM Ref DM Hub can avail itself of common Custom Domain Hub services (security, persistence, and hierarchies). The behavior of the InfoSphere MDM Ref DM Hub can also be enhanced (behavior extensions) or linked to transaction subscribers (event notification).
2.1.7 Batch utilities
The InfoSphere MDM Ref DM Hub provides command-line utilities to import and export reference data sets, maps, hierarchies, and types. These utilities can be used to automate the integration of reference data into the InfoSphere MDM Ref DM Hub, and the distribution of reference data out to enterprise systems.
2.2 Enterprise system integration
One of the critical functions of InfoSphere MDM Ref DM Hub is to interact with the reference data that is found in other enterprise systems. This section explores how InfoSphere MDM Ref DM Hub obtains reference data from key enterprise information systems and how those managed reference data objects are then used in conjunction with those enterprise information systems. In particular, this chapter explores integration with master data management, SAP, taxonomy and content management, and data warehouse systems.
2.2.1 InfoSphere Master Data Management
The InfoSphere Master Data Management (MDM) Custom Domain Hub hosts the reference data domain, but the core server itself has some other interaction patterns with MDM. In particular, InfoSphere MDM Ref DM Hub interacts with MDM for the following key functions:
Management of MDM code tables. MDM itself has code tables for several items (customer type, types of privacy policy, address type, and so on). InfoSphere MDM Ref DM Hub can be used to manage them as reference data sets in InfoSphere MDM Ref DM Hub, and export those as code tables to MDM.
Management of maps used to import data from different sources into MDM. As data is moved from source systems to MDM, reference data maps are used to transform reference data from a source system format to the MDM format for reference data. Usually this process is done as part of an extract, transform, and load (ETL) job loading data into MDM from source systems.
Management of maps used to export data from MDM to target systems. This function is the opposite of the use case previously mentioned, where master data containing reference data is exported from MDM into consuming systems.
Runtime transcoding of reference values used in transactions from bespoke systems that create, update, or retrieve data from MDM. As transactions are executing against MDM, the reference data in the transaction (for an update) might be in the format of a source system. MDM could call into RDM to read the map of values from the source system reference data to the values in InfoSphere MDM Ref DM Hub, and then transcode the reference data to the MDM value.
2.2.2 MDM and SAP
IBM InfoSphere Master Data Management Server is a repository that can centralize and manage an organization’s critical master data entities such as customer, product, supplier, and more. The centralization of these entities creates a single view of customers and products that results in better service, improved customer satisfaction, and improved relationships with partners and suppliers. Because many, if not all, of the organization’s applications (SAP applications, for example) and business processes operate on these entities, a reliable and flexible delivery of the master data is a key characteristic of the solution architecture. This section describes an MDM-SAP reference architecture to help you understand how IBM InfoSphere Master Data Management Server and the InfoSphere MDM Ref DM Hub can work with SAP regarding the management of customer data. The integration approach demonstrated here can also be applied for other business objects managed by the MDM Server (product, supplier, and so on).
Figure 2-5 on page 47 illustrates a scenario that includes both directions. Customer data is managed in MDM Server and sent to SAP. An SAP transaction is used to add, for example, the tax ID to the customer record. This additional information must be sent to MDM Server to update the central customer entities.
Figure 2-5 MDM-SAP system integration
A possible implementation of this architecture performs the following steps:
1. The customer data is created or updated with the Data Steward Console and saved on the MDM system.
2. MDM Server behavior extensions create an SAP customer ID (SAP KUNNR) for new records and send the customer data to a JMS topic.
3. An enterprise service bus (ESB) mediation flow reads the customer data from the JMS topic, performs the transcoding for country, province, and other codes of the record and calls the WebSphere adapter for SAP, which then sends an SAP IDoc containing the customer record to the SAP system.
4. If the SAP system provides additional information, for example, a tax ID, that must be stored in the customer record within MDM Server, SAP sends this information to the ESB, which then updates the customer record through MDM Server web service calls.
The bus implementation must translate various MDM Server-specific codes into the appropriate values used in the SAP system. The InfoSphere MDM Ref DM Hub is used to import the MDM specific values and the SAP specific values, define mappings and export the mappings to create the transcoding tables to be read within the ESB mediation flow.
2.2.3 Taxonomy management for ECM pattern
In Enterprise Content Management (ECM) systems, taxonomies are used to classify content to make it easier to find in content delivery systems. InfoSphere MDM Ref DM Hub can be used as a component in an enterprise taxonomy management solution for content management and delivery. This challenge is complex often for the following reasons:
Many older systems are involved that have manual processes for managing reference data and taxonomies for content tagging.
These systems often do not easily support the introduction of new versions of the reference data and thus cannot accept automatic feeds from the trusted source.
Consuming systems often need to obtain superset and subset, and join and transform multiple objects from InfoSphere MDM Ref DM Hub.
There can be hundreds of concepts and many variations of each that must be mapped to each other. An example might be a simple list of industries that might have three or four variations for the business to use, one for marketing, one for sales, and one for finance.
These ECM systems often support a business function that must change rapidly and frequently in response to changes in the market. Therefore, the process and tools of applying taxonomy changes into the consuming systems should be simple and easy to use.
Figure 2-6 shows the components of a taxonomy management system (TMS) that includes InfoSphere MDM Ref DM Hub.
Figure 2-6 Components of a taxonomy management system with InfoSphere MDM Ref DM Hub
This solution pattern has the following key components, as shown in the figure:
Configurable import, export, and publishing services that are built on base services of InfoSphere MDM Ref DM Hub and are part of a taxonomy management system (1).
A transform component that extracts reference data sets, hierarchies, and mappings from InfoSphere MDM Ref DM Hub, and publishes reports and feeds. This published reference data that can be consumed by down stream system administrators.
A services component that can combine InfoSphere MDM Ref DM Hub output and deliver it to a system that directly consume the output.
Synchronization with various master data sources that use the InfoSphere MDM Ref DM Hub reference data (2).
Because ECM systems often require a human to implement a change in a taxonomy, the TMS solution needs to include notifications, ideally as part of an integrated business process.
Controls in the content management systems (3) that allow the following tasks:
 – Allow authors to select values from reference data lists and tag content.
 – Allow content owners to identify updated needed when new versions of a taxonomy is released.
 – Allow for the code from the taxonomy to by syndicated to the delivery platforms.
Delivery systems (4), such as web applications, portals, search engines, and document management user interfaces, that use the official taxonomies.
This component involves close change management coordination with the TMS and the ECM systems.
Not shown in Figure 2-6 on page 49 is the need for notifications and integrated workflow management across systems.
Figure 2-7 shows another view of a possible implementation.
Figure 2-7 Taxonomy Management System and InfoSphere MDM Ref DM Hub pattern
2.2.4 Data warehouse
Data warehouses are used as the enterprise repository for business intelligence data. Information is collected from a variety of operational systems (typically through ETL) to provide a common repository for reports, dashboards, and analysis of enterprise information. Reports are usually rolled up based on common dimensions, such as country, region, product type, and so on.
Reference data plays three key roles in a data warehouse environment:
Providing a consistent canonical definition of reference codes to be used in all tables in the warehouse. Reports are typically organized around common classifications (sales by country, types of transactions, and so on), therefore, having the reference data defined in InfoSphere MDM Ref DM Hub, and then pushed into the warehouse enables consistency and governance around reference data dimensions.
Delivering a well managed set of hierarchies for reporting and analysis. Many reports are interactive and hierarchical in nature. For example, you might have the following data:
 – Sales by region
 – Sales by country within a region
 – Sales by province within a country within a region
Regions, countries, and provinces can be represented as reference data. The relationship between regions, countries, and provinces is a hierarchy over that reference data. InfoSphere MDM Ref DM Hub can be used to ensure the consistency, integrity, and governance of that hierarchy before the hierarchy is published to the warehouse.
Ensuring that reference data from source systems is mapped to the common canonical reference data used in the warehouse as part of the ETL process. The standard pattern for moving transactional data (and other dimensions, such as master data) into a warehouse is to use an ETL tool, such as InfoSphere Data Stage, to take updates from source systems and transform and load those into the dimensional tables in a warehouse. The representation of the reference data in the various source systems will likely vary from the canonical representation in the warehouse. For example, a source system might represent countries using a numeric code like 93 (dialing code for Afghanistan) or using the ISO-3166-1 two-character standard (AF), while the warehouse might use the ISO-3166-2 three-character standard (AFG).
InfoSphere MDM Ref DM Hub is ideal for managing the sets that are associated with the different source system and the warehouse, and for empowering business data stewards to create the maps and version them for the mappings between the sources and the warehouse. The resulting maps can then be exported to a lookup table used by the ETL system to transform reference data in source systems to the canonical representation for a particular warehouse.
A visualization of this architectural pattern is shown in Figure 2-8.
Figure 2-8 InfoSphere MDM Ref DM Hub integration pattern with a Data Warehouse
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset