Integration
This chapter describes certain key reference data integration scenarios and various considerations for them, facilitating effective enablement. As important as the underlying integration pieces are, it is equally essential to have a proper understanding of relevant reference data management functionality and features that play a key role in plumbing those pieces together. In that context, this chapter provides an overview of several common import, transcoding, and export features that, with a closely-related permissions model, enable effective loading, semantic alignment, and distribution of reference data. Further, through a series of exemplary integration scenarios with products in the IBM InfoSphere suite and related master data management (MDM) domains, this chapter describes how IBM InfoSphere Master Data Management Reference Data Management Hub (InfoSphere MDM Ref DM Hub) features can be used and extended to realize robust integration patterns.
6.1 Data loading and import
One of the key steps in any reference data integration scenario is to load data from source systems into the reference data management hub. A varied permutation of use cases and solution alternatives for performing data loads makes it a rather non-trivial step with no clear best answer in many cases. This section describes mechanisms, associated considerations, and trade-offs when loading data and performing imports.
6.1.1 Input transformation
In most cases, the source data tends to differ from the input expectation of a reference data management hub. In terms of expected data format, this difference can be quality, semantics, or any combination. Although quality and semantic differences usually are handled in the analysis phase (see Chapter 4, “Requirement analysis” on page 81), differences in expected format still must be resolved.
InfoSphere MDM Ref DM Hub allows the use of various file formats for doing imports. InfoSphere MDM Ref DM Hub allows character-separated value (CSV) or extensible markup language (XML) file imports. Any source data must be transformed into one of these formats before using any import interface. Figure 6-1 shows an example of an incompatible source with InfoSphere MDM Ref DM Hub. You can write custom adapters to perform such conversions. If the data source is unstructured, the adapter must parse it, induce appropriate structure, and then convert it to the expected character-separated or markup representation. If the data source is already structured but in a different format than what is expected by the InfoSphere MDM Ref DM Hub, the adapter converts it into one of the expected formats after augmenting and modifying it with expected delimitation or markups.
Figure 6-1 Example source data format incompatible with InfoSphere MDM Ref DM Hub
Some of our clients use custom code to combine data from multiple sources into a single CSV file before performing a bulk load into the InfoSphere MDM Ref DM Hub. This combination is often non-trivial where data from different sources represents different aspects or attributes of the same reference data value. For instance, for importing data into a country code table in InfoSphere MDM Ref DM Hub, part of the attributes (code, name) might come from a Location entity while description might be part of the Address field within a Person entity.
Figure 6-1 on page 112 demonstrates a file snippet containing reference data from a source system that is not compatible with InfoSphere MDM Ref DM Hub. Figure 6-2 shows the data after performing appropriate input transformation to make it compatible with RDM input expectations, where columns represent reference data value properties and rows represent a reference data value instance.
Figure 6-2 Transformed data consumable by InfoSphere MDM Ref DM Hub
6.1.2 Import interfaces
InfoSphere MDM Ref DM Hub provides several ways to import reference data. The most “user-friendly” method is the user interface (UI) import that is mostly a manual process. Other import methods are batch and command-line based mechanisms. By linking the batch importer to a cron job that is defined on a fixed or variable time-window, entire import process can be bootstrapped. Typically, application programming interfaces are also exposed to achieve fine-tuned control of the import process.
The decision of whether to use a UI import rather than a batch process to perform bulk imports has several implications beyond simplicity of use. For instance, depending on the nature of a specific RDM project in question, letting the development team bulk-load all the reference data initially while establishing the environment and processes might be preferable. One of our early user groups built a bulk load capability that proved to be useful beyond the initial data load because they were able to use the bulk load job for periodic batch update of RDM tables from their back-end systems. However, if the project is constrained by initial setup time, a preferable approach might be to train an initial set of business users to import their own data when the initial environment and governance processes are established. This way can help in getting the initial RDM environment up and running sooner.
Another argument in favor of letting the development team load data (instead of business users) is that the data analysis and modeling for the loaded data can be well-aligned initially. Letting business users load their own data has implications about enforcing suitable governance processes and modeling constraints. Business users who are familiar with mixing up their data in ad-hoc tools (spreadsheets, plaintext, and so on) might not model the data appropriately in InfoSphere MDM Ref DM Hub. For instance, if party address data was maintained in a single spreadsheet earlier, a business user is not likely to separate Country and State reference data, which might be highly desirable in an InfoSphere MDM Ref DM Hub. As a preventive measure, the project management team can decide to closely link modeling with the workflow of reference data entities, however, that would then have implications on the time necessary to process each reference data entity from the draft state through approval.
In some scenarios, a middle path might be more suitable where the development team loads enterprise-level data with broader usage and applicability, while sets of business users load suborganization-wide data later. Overall, there is often no clear data loading mechanism that is best for every scenario. Consequently, giving clients the power to choose is critical to broad-enough applicability of a comprehensive reference data management hub implementation.
User interface import
After the source reference data is properly transformed and represented in a CSV or XML file, it can easily be imported into the RDM hub. For a step-by-step process overview of the user interface import function in InfoSphere MDM Ref DM Hub, see 7.3, “Implementing InfoSphere MDM Ref DM Hub model” on page 172. Here, our focus of discussion is on various implications and considerations with respect to integration scenarios.
Importing code values into reference data sets
A typical integration scenario, such as ETL warehousing, real-time ESB, invariably begins with importing the code values residing in code or lookup tables within source systems into InfoSphere MDM Ref DM Hub. The user interface import functionality can make importing data from external source systems easy; there are considerations when planning the import process:
InfoSphere MDM Ref DM Hub requires the reference data set versions, into which the data must be imported, to be present before the import process can begin. For this reason, you must define a standard procedure to either use existing set version to load delta changes, or define a new version on each load. Depending on the requirement, you might need to retire the previous version before loading data into a new version. When this process is fixed, the reference data type definition must be linked with the import module to ensure the import file format is aligned with the properties defined in the reference data type definition. All required properties must be specified in the import file.
InfoSphere MDM Ref DM Hub user interface offers a wizard-based importer (Figure 6-3). With it, you have the flexibility to select the column name to attribute mapping, file format, separator used in the file, date and time stamp formats for enabling the importer to understand any potential date and time field conflicts, and so on. You must determine these configurations early in the process to streamline the entire import process.
Figure 6-3 Flexible wizard-based import interface of InfoSphere MDM Ref DM Hub
A common issue in reference data management is the representational gap between the data in source systems and what InfoSphere MDM Ref DM Hub can consume. Input transformation is then required. If using an XML file for importing data, you must ensure that the import file conforms to the published and bundled schema for the kind of reference data entity that is being imported (set values, value mappings, or hierarchies). When building custom adapters for creating XML import files, see the published schemas used in InfoSphere MDM Ref DM Hub at the following information center (by selecting Reference  Data and configuration reference → Schemas for XML import):
If the entity is defined to contain a compound key (a key defined using multiple reference data value attributes), the import file must contain values that can map to each part of that compound key. For instance, if a reference data set for countries has a key defined on both code and name properties, then they become required fields during the import process.
During the import process, any existing records (matched using the key) are updated while new ones are added. After the import finishes, the user can navigate back to the set and inspect the values. For reporting purposes, a summary page is presented after the import process finishes, letting the user know the number of successful imports (additions or updates) and failed cases (exceptions). The error records can be exported in a file and worked upon offline. After correction for any potential mistakes, they can be reimported back using the same procedure.
Defining level-based hierarchies while importing values in sets
The import procedure also allows importing rows for a data set that has a compound key formed by using the values from another set. By doing this step, you can establish relationships between more than one set, and define what are known as level-based hierarchies (Figure 6-4), where each level is defined by a unique reference data set.
Figure 6-4 Level-based hierarchy example
To import such relationships, you must define the reference relationship to the other set (for example, States at Level 2 to Countries at Level 1 in Figure 6-4). InfoSphere MDM Ref DM Hub import wizard provides flexibility to do this step. For instance, as shown in Figure 6-5 on page 117, while importing reference data values for a States set, you can decide to refer values from a set of Countries by referencing country codes in the compound keys.
Figure 6-5 Defining level-based hierarchies while importing reference data values
Because level-based hierarchies are defined through relationships across multiple sets, a requirement is that the set being referenced in the set that is being imported must be marked as active. In most cases, being active requires the set version to be marked as being current using effective dating (effective date falling prior to the current data and expiry date set to a future date).
Importing set hierarchies
In addition to importing reference data values into reference data sets, you can import set hierarchies (Figure 6-6) that induce a classification structure over existing values within a single set (compared to level hierarchies that are defined across multiple sets).
Figure 6-6 Set hierarchies defined on a single set of reference data values
These imports are separate from set imports and require a set with pre-existing values as a prerequisite. Therefore, if you want to bootstrap-import set hierarchies, you must define your import so that it occurs sequentially after the corresponding set import has finished.
We noted earlier in reference data value imports that an exception, which resulted from having an incorrect entry in the import file, can be exported and corrected from the summary page. This step is made easy because each row in the import file is imported independently of any other rows. However, for a set hierarchy, because every row is specifying a new relationship in a fully connected tree, any incorrect entry can potentially affect all the subsequent rows in the import file. For example, the first row in the import file is often used to specify the root. If this row itself is incorrect, all the remaining rows will result in errors because the root of the hierarchy is not defined.
Import using batch processor or service interfaces
A batch processing framework is bundled with InfoSphere MDM Ref DM Hub and can be used to build custom batch jobs to run RDM import transactions and support custom input and output data formats. For more information and examples about configuring, using, and customizing the batch interface, go to the information center:
Select InfoSphere MDM Server  Planning to use InfoSphere MDM Server  InfoSphere MDM Server platform technical features  Batch processing using MDMBatch.
Certain reference data management implementations can expose a light-weight Representational State Transfer (REST) wrapper interface on top of the core services, which can ease the labor of determining which exact transactions to invoke for performing data loads through a batch processor.
When loading data in bulk, you must understand the pattern of the ongoing load of changed data from back-end systems. Depending on the load and frequency, loading a delta file that contains only the changes might be easier than loading the entire file for a set. Similarly, for large sets, loading changed data into a new (empty) version rather than an already full version might make more sense, especially, if nearly all the code values have changed.
In addition to imports that require an update to existing reference data or addition of new data, there could be a requirement to remove existing data while performing imports. InfoSphere MDM Ref DM Hub provides a few alternatives to do this step. One is to use effective dating to update the records that no longer need to be active with an expiry date that is before the current date or effective date. Alternatively, you can explicitly add a flag property to the reference data set type, which is set to Active or Inactive. This property then is updated during the import process for data that no longer needs to be active or vice-versa.
In addition to batch processor framework, programmatic approaches to directly call InfoSphere MDM Ref DM Hub services offer a flexible and highly customizable mechanism to load data into the hub. As an example, one form of reference data concerns taxonomies. We use taxonomies for classifying web content as part of Enterprise Web Taxonomy Tooling (TMT). We must coordinate these taxonomies across many content management systems (CMS) in such a manner that we are able to use the same taxonomies across all those CMS. However, those systems have different histories and heritages, and therefore different designs. A few examples of such taxonomies are industries (the industries in which customers conduct their business), geographic locations, and solutions areas. One of the challenges during the import process is to adhere the input XML feed to a format that can be consumed by InfoSphere MDM Ref DM Hub. The scenario is illustrated in Figure 6-7.
Figure 6-7 Customizing import interface: Enterprise Web Taxonomy Tooling
The TMT Transform and Put component uses configuration information, stored in an ETL definition in RDS, to call for XML over HTTP and apply an XML transform to generate one or more import files in the RDM-consumable format. It then uses the RDM service APIs to load the data that is contained in the import files into appropriate RDM entities. The Import Configuration in the TMT Transform Component simply passes the name of the ETL configuration to use when importing, and the source file to use when getting the information. In this way, the transform can be run against new versions of the source data without having to create a whole new ETL definition in RDM. Overall, the process of setting up a new import in the TMT solution is as follows:
1. Find the source and determine the method to be used to consume the reference data.
2. Configure the import:
a. Create an XSL that build the required import file.
b. Create a new ETL definition in the ETL reference data set in RDM.
3. Create a new import configuration document in the TMT Transform component.
4. Trigger the import.
This example scenario demonstrates how the various InfoSphere MDM Ref DM Hub import mechanisms can be used to achieve the level of customization you want according to the integration scenario under consideration.
6.2 Data distribution and export
Exporting data out of InfoSphere MDM Ref DM Hub for distribution to target systems and consumers is another important step in an integration scenario. Without flexible mechanisms and alternatives for both performing export and also allowing for easy transformation of exported data into a representation that can be consumed by a downstream system, the ability to enable seamless integrations can be severely limited. This section describes considerations and challenges involved in the export and data distribution procedure. It also presents available mechanisms in InfoSphere MDM Ref DM Hub for overcoming these issues.
6.2.1 Downstream system challenges consuming reference data
InfoSphere MDM Ref DM Hub provides several methods to export reference data sets and hierarchies for immediate use. The most common one is the user interface based manual export that requires no programming experience. Alternatives, such as batch export or using service interfaces and APIs, require basic programming knowledge. InfoSphere MDM Ref DM Hub also provides utilities to define and export reference data mappings that are used for transcoding.
Similar to data import, during data export, you face the issue of representational differences between the default export format that a typical reference data management hub implementation provides and one that is expected by target consumers of the exported data. In rare cases, the target system can directly consume exports from the product. However, in most scenarios, a transformation is required. In environments where multiple downstream systems must consume reference data from a typical reference data hub, it is impractical to assume that the format in which reference data is being exported exactly matches the format that is used by all those systems. This behavior is true whether you use manual or automated export processes.
In certain scenarios, suitable assembly of export information is required. For example, in one of our internal use cases, combining the reference data set and set hierarchy exports is required. Because the InfoSphere MDM Ref DM Hub functions that are for immediate use are based on separate schemas and exports for reference data sets and set-based hierarchies, special adapters had to be written for the scenario to work.
Similarly, if the export is from a newly created version of a reference data set to which a downstream system is subscribed, be sure to understand the extent of these changes to understand whether there is a need to consume them differently. In case of multiple consumers, there might be a need to batch and distribute multiple transformations, a subset of which might be equivalent, while others might be distinct, catering to a different subset of downstream systems.
Delta files
If the export is incremental, you might want to obtain a delta export for the changes since the last export for certain downstream systems rather than the entire set of reference data values. For enabling this approach, a reference data hub implementation must compare the current version of the reference data set with the version that was last exported, and allow export of only the changes. The output transformer can then only consume this delta file and transform it in a series of add or update jobs that can be directly run on the downstream systems.
for all these reasons, you must carefully consider and plan transformation of reference data exports into formats that can be consumed by downstream systems. In addition, pay close attention to situations where special assembly of output is wanted. Several of these considerations are described in 6.2.3, “Output transformation” on page 124 after a discussion of the various considerations that are involved in using the export interfaces that are available in InfoSphere MDM Ref DM Hub. See the step-by-step export scenarios with InfoSphere MDM Ref DM Hub in 7.3, “Implementing InfoSphere MDM Ref DM Hub model” on page 172.
6.2.2 Export interfaces
Similar to the import interfaces, reference data residing in InfoSphere MDM Ref DM Hub can be extracted in CSV or XML format by using the user interface export and batch export mechanisms. This exported data can be consumed by an adapter or transformer to output a format consumable by target systems. If using programmatic approach, you can code the transformer or adapter to call the services directly to eliminate one level of serialization or deserialization to or from a file (CSV or XML).
User interface export
When using user interface export, several considerations are required to ensure that the exported file is as close to the target system expectation as possible to minimize the transformation overhead.
Similar to the user interface import function, user interface export gives you the flexibility to pick the correct format for the target systems and to customize the amount of information you want exported. For instance, if the export is part of a continuous update process, you can skip some of the set and value properties if they are redundant across continuous updates.
The user interface export wizard also provides the flexibility to conform the export to the target system expectation. Using the Edit Properties to Export page (Figure 6-8), you can edit the names of the properties to decide how the data appears in the export file. This feature is useful in reducing the burden on the transformation stage later. You also have the option to save the format as a reusable template if data in the same export format will be distributed to the target systems periodically.
Figure 6-8 Flexible wizard-based export interface in InfoSphere MDM Ref DM Hub
Batch export and services interface
Batch export is a command-line API that allows you to export reference data from the InfoSphere MDM Ref DM Hub. The API is included with the product. It allows you to choose between XML or CSV formats with a wide choice of character delimiters. Depending on whether you want a reference data set, hierarchy, or mapping export, you can set an appropriate configuration. You have the flexibility to provide the state and version of the specific reference data entity that must be exported.
Batch export is based on a J2SE client that is able to talk to RDM web services programatically. You can use the same client to programatically invoke RDM export services in the same way as the batch export does. With batch export, you can bootstrap the whole export process programatically instead of going through RDM’s bundled batch export application.
6.2.3 Output transformation
Transforming the output from a reference data management hub is necessary in many situations because the challenges in consuming reference data in downstream systems (described in 6.2.1, “Downstream system challenges consuming reference data” on page 121).
Similarly to the import transformation process, the output transformers also must accept the export as input, process it to perform structural and semantic changes to finally result in something that is consumable by downstream systems. Revisiting the example scenario on Enterprise Web Taxonomy Tooling (TMT) from “Import using batch processor or service interfaces” on page 119, another challenge here is that some of these taxonomies are handled differently across these systems. That is, although several or all systems might be able to ingest an XML file, the format of that XML file is not the same in all those systems.
To get across this problem, we adopted the approach to export the reference data from InfoSphere MDM Ref DM Hub using the automated batch export capability. We produce an XML file in a standard format that mimics the structure of the data inside InfoSphere MDM Ref DM Hub. We then apply a standard transformation using XSL to produce the desired XML output files. This XML file has a structure that allows consuming applications to find similar information with the same tags across all taxonomies (reference data sets). In particular, because properties on the taxonomy nodes (reference data values) are important to our consuming applications and because the properties are quite different from taxonomy to taxonomy (reference data set to reference data set), we present those properties consistently no matter which taxonomy (reference data set) is being used. In this way, the consuming applications can more easily use these properties.
Figure 6-9 illustrates the components that were added to InfoSphere MDM Ref DM Hub to enable the taxonomy administrator to manage repeated publishing of both simple and complex transforms.
Figure 6-9 Customizing export interface: Enterprise Web Taxonomy Tooling
The key components are as follows:
The TMT Transform component provides a UI and repository to capture the information needed to assemble a taxonomy.
A Publishing Configuration document contains information about the source InfoSphere MDM Ref DM Hub objects that are required for the transform, a URI pointing to the XSL used for the transform, and the target location for the output. It provides the administrator with a way to preview the input data and the output result. The document also provides a way to trigger the export of InfoSphere MDM Ref DM Hub objects.
The TMT Automated Export component provides a simple web interface to the InfoSphere MDM Ref DM Hub batch export jobs.
The information that is required to configure the export is passed from the TMT Transform component. The taxonomy administrator then runs the batch job and gets log information back in a web browser.
A corporate-wide file system is used to host the InfoSphere MDM Ref DM Hub XML file that is exported, the XSLs needed for the transform, and the final output files.
Assuming all the InfoSphere MDM Ref DM Hub objects are configured and that a standard transform (XSL) is available, the taxonomy administrator can take the following actions to create output:
1. Create the publishing configuration for a given input and output.
2. Trigger one or more exports and verify that the InfoSphere MDM Ref DM Hub XML was created.
3. Trigger the publishing transform and verify the results.
The taxonomy can then be consumed by multiple systems. Owners and users of the taxonomies can also use the reports created by the transforms.
Figure 6-10 shows an example of how a set of source taxonomies are transformed into a set of output taxonomies used to feed a search engine. The needed InfoSphere MDM Ref DM Hub object XML is exported, transformed, and published for reuse.
Figure 6-10 Conversion of source taxonomies into target output
Figure 6-11 through Figure 6-17 on page 134 are examples of the output of the InfoSphere MDM Ref DM Hub export for a reference data set and a mapping, the XSL that is used to create one of the output formats, and two examples of the output files.
Figure 6-11 XML output from InfoSphere MDM Ref DM Hub reference data set
Figure 6-12 XML output from InfoSphere MDM Ref DM Hub mapping (Part 1 of 2)
Figure 6-13 XML output from InfoSphere MDM Ref DM Hub mapping (Part 2 of 2)
Figure 6-14 XSL used in TMT scenario to create an XML feed file
Figure 6-15 Final output XML after applying the XSL for the XML feed file
Figure 6-16 Final output XML after applying the XSL for the XML feed file
Figure 6-17 Final output XML after applying the XSL for the report
We provide application-specific transformations for some consuming applications and let certain consuming applications transform the standard XML to the specific XML format that the applications need.
6.3 Transcoding
Although enterprise reference data tends to be standardized before getting consumed by individual operational systems and applications, it often differs in representation or semantics across different, often siloed, applications. This semantic difference can be unavoidable because applications often require their local representations for improved processing.
For example, in Figure 6-18, fields for country codes in source and target represent the set of country codes, which is a type of reference data. However, as observed, source and target reference values have different code representations for the same countries. For this reason, before performing data integration or distribution, the source representation must be transcoded to one that the target can understand. This process is known as reference data transcoding and is a key step in many integration scenarios (for example, master data integration and distribution pipeline).
Figure 6-18 Representational differences across source and target tables
In typical reference data hub implementation, such as InfoSphere MDM Ref DM Hub, reference data transcoding is achieved through reference data mappings from a source reference data set to a target reference data set. Depending on the integration scenario, the transcoding might be simple (one-to-one) where a single source representation is mapped to a single target representation, or complex (one-to-many, many-to-one, or many-to-many), where one or more source representations are mapped to one or more target representations. An example of one-to-many scenario is where a single canonical representation is mapped to multiple application specific representations. Each scenario is described next.
6.3.1 Simple transcoding
Simple transcoding is applicable in integration or distribution scenarios where there is a single source system or a set of source systems, all backed up by the same code tables, and a single target system or a set of target systems, all backed up by the same code tables. For performing the required mappings, reference data from sources and targets is imported into InfoSphere MDM Ref DM Hub using one of the mechanisms described in 6.1, “Data loading and import” on page 112. After they are imported, a mapping set can be defined by using the UI, with source and targets specified as the respective sets into which reference data was imported. Finally, the values mappings are created either by using the UI or through an import process. To perform import, a mapping file must be constructed with source and target specification and any properties you want for the mappings. Figure 6-19 illustrates the concept.
Figure 6-19 Defining reference data mappings from source to target set
As reference data goes through its lifecycle, stewards or business users make changes (additions, deletions, updates) to these mappings as necessary. Eventually, these mappings can be exported by using the UI or through a batch interface, and distributed to downstream systems for transcoding during integration scenarios.
For example, during master data integration and distribution process, the mappings are used to construct translation tables that help in semantic alignment of values during the alignment area. The alignment handles two types of transcoding:
The source and target systems have the same code value but with different semantics on the description.
The source and target systems have different code value sets for the same reference data domain.
In either type, without replacing the reference data values from the source MDM system with their semantic equivalent in the target, the semantic integrity of the records during data integration process cannot be guaranteed.
The mechanism for this replacement is a translation table similar to the one shown in Figure 6-20 which defines the rules to govern the replacement of reference values.
Figure 6-20 Translation tables for reference data alignment
6.3.2 Complex transcoding using multiple maps
Complex transcoding is applicable in scenarios where either multiple source systems are backed up by different code tables or multiple target systems are all backed up by the different code tables. For doing the required mappings, reference data from all sources and targets is imported into reference data management hub by using one of the mechanisms described in 6.1, “Data loading and import” on page 112. After they are imported, multiple mapping sets are defined by using the UI with each pair of source and target specified as the respective sets into which reference data was imported. Finally, the values mappings are created and undergo evolution as part of lifecycle activities.
When distributed, these mappings get utilized in a similar manner as simple transcoding, except the translation tables have entries for each pair of source and target systems (as shown in Figure 6-21).
Figure 6-21 Complex transcoding across more than one source and target pair
6.4 Attribute-level permissions
A comprehensive reference data implementation, such as InfoSphere MDM Ref DM Hub, provides a fine-grained access control and security model that can be fine-tuned to appropriately define attribute-level permissions. Using this security feature, the author of a reference data entity (set, mapping or hierarchy) is able to restrict access to certain attributes of that entity depending on the role of the user who is accessing the entity. For example, one could configure a custom property defined on a reference data set to be read-only, hidden, or updatable depending on the role of the logged-in user who is trying to access that reference data set. This fine-grain attribute-level access control has direct implications on data import and export because, depending on the visibility of an attribute to a user, the user might or might not be able to import data into it, or export data contained in it.
As seen in previous sections, in InfoSphere MDM Ref DM Hub, the UI presents an import wizard. In this wizard, an option is given to map the data columns (Figure 6-5 on page 117) in the (character-delimited) import file to the corresponding attributes of the entity into which the data is being imported. If attribute-level permissions are enabled, the attributes on the map screen are displayed appropriately, depending on the level of visibility that you have. For example, you are not presented with the “hidden” fields. Similarly, you are allowed to import data into the fields that are configured as “read-only” for his role. Similarly, while exporting data from a reference data entity, you are not allowed to export the attributes to which the access is restricted.
This security feature can be contrasted with the security that is based on ownership groups, where certain users (grouped by ownership groups) are not allowed to modify specific reference data sets that do not list those groups in their owners field. In such cases, the users are not able to modify any of the values in those sets. Whereas, with attribute-level security, users are able to modify the values; however, some attributes might be unmodifiable or hidden. This feature can be easily configured on a per set basis by using property files at the time of implementation of the scenario.
6.5 IBM InfoSphere Information Server
IBM InfoSphere Information Server is a data integration platform that helps you understand, cleanse, transform, and deliver trusted information to your critical business initiatives, such as big data, master data management, and point-of-impact analytics. Because reference data is prevalent in the information landscape, it plays a key role in enabling several integration scenarios across the IBM InfoSphere Information Server suite of technologies. This section discusses several of these integration scenarios.
6.5.1 InfoSphere Business Glossary
IBM InfoSphere Business Glossary (BG) is a tool that provides definitions for business terms across an enterprise for unified understanding. This section describes how InfoSphere Business Glossary can be integrated with InfoSphere MDM Ref DM Hub to enable seamless determination of range of valid values in InfoSphere MDM Ref DM Hub for the terms in Business Glossary.
Figure 6-22 describes the integration procedure at a high level.
Figure 6-22 InfoSphere Business Glossary and InfoSphere MDM Ref DM Hub integration
The process is as follows:
1. Through the Administration tab of InfoSphere MDM Ref DM Hub user interface, create a new data type to be used as a base type for reference data sets, with a link to Business Glossary as shown in Figure 6-23. Create a new set level property for the base data type that has a URL type. This set property is not required, but it can have a default value of the base URL to Business Glossary.
Figure 6-23 Creating a custom data type in InfoSphere MDM Ref DM Hub
2. In Business Glossary, a custom attribute must be created to contain the link to the reference data set that is created in InfoSphere MDM Ref DM Hub. For this step, through the Administration tab of Business Glossary, create a custom attribute that applies only to terms and has a string type.
In Figure 6-24, the Valid Values attribute is created to link the country codes term to a reference data set representing the list of valid country codes for that term.
Figure 6-24 Creating custom attribute in InfoSphere Business Glossary
3. After completion of the initial setup of Business Glossary and InfoSphere MDM Ref DM Hub, you can create terms and reference data sets and establish cross-links between them. Links can be created by using URI addressable links, available in both InfoSphere MDM Ref DM Hub and Business Glossary. The linking steps are as follows:
a. Create a new reference data set in InfoSphere MDM Ref DM Hub that uses the base type that was created with a URL set property.
b. Create a new term in Business Glossary so that the created reference data set represents reference data for the term.
c. Fill in the custom attribute of the team with a URL to the InfoSphere MDM Ref DM Hub set that was created. You can use the following special format of in the valid values attribute of Business Glossary to create a hyperlink to the specified URL that is displayed with the provided description text:
[<URL> | <Description>]
InfoSphere MDM Ref DM Hub provides a URI-addressable link that can be used to link to a current summary of a reference data set. To use this link, create a URL in the following format:
<Base RDM URL>/RefDataClient/ShowRefDataSet.html?setName=<name of reference data set>
In place of any spaces that occur in the name of the reference data set, you must use the following characters:
%20
For example, a reference data set named Country Codes results in the following URL:
<Base RDM URL>/RefDataClient/ShowRefDataSet.html?setName=Country%20Codes
This URL is then placed as the Valid Values attribute. See Figure 6-25.
Figure 6-25 Linking InfoSphere MDM Ref DM Hub URL to terms in InfoSphere Business Glossary
6.5.2 InfoSphere Information Analyzer
In this section, we describe an integration scenario between InfoSphere MDM Ref DM Hub and InfoSphere Information Analyzer This integration comprises of the following main steps:
Using InfoSphere Information Analyzer to create a reference table from a column analysis of some set of data
Transferring that reference data into RDM to be managed and manipulated
Transferring the updated reference data back into InfoSphere Information Analyzer
Figure 6-26 illustrates the high-level overview of the integration of InfoSphere MDM Ref DM Hub and InfoSphere Information Analyzer.
Figure 6-26 InfoSphere MDM Ref DM Hub and InfoSphere Information Analyzer integration overview
The reference data in both RDM and Information Analyzer must be kept synchronized to have consistent reference data across the two tools. This requirement provides accurate management and analysis of data. The integration process consists of four main steps:
1. Finding reference data in Information Analyzer
2. Using InfoSphere MDM Ref DM Hub to manage and update reference data
3. Creating or updating reference tables for the modified reference data in DB2
4. Updating the reference tables in Information Analyzer to conduct analysis with the new reference data set
Finding reference data in Information Analyzer
You can use Information Analyzer to find reference data by conducting a column analysis on a column of data from a table. The results can be saved in a reference table, which can then be exported for use by other tools, such as RDM.
To find reference data, use a column analysis on an existing table of data:
1. Open IBM InfoSphere Information Server Console to gain access to Information Analyzer services.
2. Open or create an Information Analyzer project.
3. Select Investigate → Column Analysis, select the column from which to find reference data, and then select to open or run a column analysis for that column of data.
4. After viewing an analysis, select View Details to view the details of that column analysis, such as the statistics determined by the analysis. See Figure 6-27.
Figure 6-27 Results of running a column analysis on a candidate reference data column
To create a new reference table that contains reference data for the selected column, complete the following steps:
1. Create a new table by selecting Reference Tables → New Reference Table.
2. Choose a name for the reference table, select the type as Valid and save it. Currently, only the Valid type is supported in RDM for reference data.
3. View the new reference table by selecting Reference Table → View Reference Table from the column analysis view.
4. Export the reference table by using CSV format:
a. Select Investigate → Table Management.
b. Open the newly created reference table.
c. Select Export. Select the path of where to save the file and keep the remaining options as the default values. Choose to save the reference table using CSV format (Figure 6-28).
Figure 6-28 Exporting a reference table from Information Analyzer
Managing reference data in InfoSphere MDM Ref DM Hub
When reference data is discovered in Information Analyzer, it can be imported into InfoSphere MDM Ref DM Hub for management and manipulation. After reference data is modified in RDM, it can be exported so that Information Analyzer can be updated with those modifications.
Use the following steps to import reference data from a CSV file:
1. Determine which data type is to be used to create a reference set for the reference data. If the data type is not already created and if the user is not using the default reference data type, a custom data type must be created through the Administration tab in the InfoSphere MDM Ref DM Hub web interface.
2. Create a new reference set using the selected data type.
3. Select to import the reference table that was exported from Information Analyzer into the created reference set by right-clicking the new reference data set and selecting Import.
4. In the Import Reference Data Set window, select the CSV file that was exported from Information Analyzer as the file to import.
5. In the Map Columns tab, select DISTINCT VALUE as the import file column for both the Code and Name properties.
6. In the Preview File tab, verify that the data shown is the correct reference data, then click Finish to import the data.
If required, make any necessary changes to the reference data set, such as adding or deleting values in the set through the InfoSphere MDM Ref DM Hub web interface.
You can then export the reference data set in CSV format by using the following steps:
1. Right-click the reference set and select Export (Figure 6-29).
Figure 6-29 Exporting modified reference data set
2. Keep the default values of Code and Name for the Value properties being exported and click Finish.
3. Select to download the exported file.
Updating the reference tables running report in InfoSphere Information Analyzer
The exported reference data can be brought back into Information Analyzer by updating the original code table with the new data. To do this step, you create a JDBC connection to the original code table in Information Analyzer and update the original table data with the exported CSV file.
You can then run a new analysis in Information Analyzer with the updated reference table by using the following steps:
1. Select Investigate → Column analysis.
2. Select the column that the reference data applies to.
3. Select View Details to view the details of the last column analysis.
4. Navigate to the Domain and Completeness tab.
5. Set the domain type as Reference table.
6. Select Table Name as the name of the new reference table. This is the reference table to run the analysis against.
7. Save the selections.
8. Select Rebuild inferences and observe that the analysis statistics is changed (Figure 6-30).
Figure 6-30 Running column analysis with the updated reference table
6.5.3 Conversion workbench
This section describes how InfoSphere MDM Ref DM Hub is used in SAP migrations and consolidations. Moving business data, for example customer objects and material objects, from one or more established systems to a target SAP system is the main process of such projects. While the data is moved from the source to the target system, the fields containing reference data must be transcoded from the values used in a source system into the values to be used in the target system. You can use InfoSphere MDM Ref DM Hub to provide the mapping rules required to perform the transcoding.
All SAP migration and consolidation projects are similarly structured into three phases:
Discover: In the first phase, the data models of the various source systems are discovered, followed by an extraction of the data into a staging area.
Prepare: In this second phase, the major steps of the prepare phase are structural alignment and data cleansing.
Deliver: In this third phase, the data is transformed from the alignment data model to the target system data model and loaded into the target system.
The InfoSphere Conversion Workbench for SAP Application provides the blueprint that describes the necessary tasks to do in each phase and the appropriate tools to realize these tasks.
The blueprint snippet in Figure 6-31describes the basic sequence of structural alignment, transcoding, and data cleansing. The transcoding step is based upon transcoding tables providing the source to target value mappings.
Figure 6-31 Transcoding in SAP migration and consolidation projects
The snippet in Figure 6-32 illustrates that the InfoSphere MDM Ref DM Hub is used by the Functional Data Analyst to define the value mappings. The Conversion Workbench application then populates the transcoding tables based on the value mappings defined in InfoSphere MDM Ref DM Hub.
Figure 6-32 Function Data Analyst uses InfoSphere MDM Ref DM Hub and the Conversion Workbench Application to populate transcoding tables
To define value mappings and populate the transcoding tables, the Function Data Analyst (FDA) does the following tasks by using the Conversion Workbench Application and InfoSphere MDM Ref DM Hub:
1. The FDA uses the Conversion Workbench Application to extract the reference data of the target SAP system and all source systems into InfoSphere MDM Ref DM Hub.
In addition, all required mapping objects that associate the appropriate source and target Reference Data Sets are created in InfoSphere MDM Ref DM Hub.
Figure 6-33 shows the created InfoSphere MDM Ref DM Hub source sets, matching target sets and the initially created mapping objects are listed in the Conversion Workbench Application.
Figure 6-33 FDA creates InfoSphere MDM Ref DM Hub source and target reference data sets and initial mapping objects
2. After all sets and mapping objects are created in InfoSphere MDM Ref DM Hub, the FDA defines the value mappings in InfoSphere MDM Ref DM Hub as in Figure 6-34.
Figure 6-34 FDA defines value mappings with InfoSphere MDM Ref DM Hub
3. After the value mappings are defined in InfoSphere MDM Ref DM Hub, the FDA uses the Conversion Workbench Application to import the InfoSphere MDM Ref DM Hub mappings into the Conversion Workbench transcoding tables (Figure 6-35).
Figure 6-35 FDA imports InfoSphere MDM Ref DM Hub value mappings into transcoding tables
4. The Application Developer generates the data movement ETL jobs that make use of the populated transcoding tables. The data movement jobs are generated with the Conversion Workbench Rapid Generator tool and IBM InfoSphere DataStage®.
Figure 6-36 shows a generated data movement ETL job.
Figure 6-36 Data movement job using transcoding tables
6.6 Workflow
A key aspect of a typical InfoSphere MDM Ref DM Hub implementation is imposing standard governance and procedural guidelines for seamless management of reference data. To that effect, an essential goal becomes having a well laid-out process workflow that can determine and manage the steps, tasks, assignments, and operations to enable smooth transition of reference data from assignment to approval, and eventually distribution. This section introduces several mechanisms to achieve this gaol in InfoSphere MDM Ref DM Hub.
6.6.1 Basic workflow using lifecycle processes
Every resource (reference data set, mapping, or hierarchy) has an associated lifecycle process that determines the states that the particular resource can go through. This lifecycle is used by data stewards to control specific versions of the resources that are in use. Although InfoSphere MDM Ref DM Hub provides a way to define your own custom lifecycle processes, it also provides a set of ready-to use lifecycle process definitions, each providing a different set of states.
InfoSphere MDM Ref DM Hub provides the following lifecycle process definitions:
Simple Approval process
State Machine - 2
Active Editable
Two Step Approval
See “Lifecycles and states” on page 22 for more detail of these descriptions.
Also see the details in the information center:
6.6.2 Advanced workflow using integration with external tooling
Although the InfoSphere MDM Ref DM Hub lifecycle processes provide a set of states that can be assigned to resources, in a business process (such as an approval business process), you usually have multiple participants (people and systems) changing the state of resources and sharing information through various transport channels (email, web portal, and so on). To realize these scenarios, you need a workflow engine such as IBM Business Process Manager that provides the means to interact with these people and systems (for instance, with worklists and notification capabilities). In addition to design and execution, a workflow engine (like BPM) also provides the means to simulate and determine the costs of a workflow.
6.6.3 Reference data workflow example
Figure 6-37 on page 157 illustrates a business process combining the BPM and InfoSphere MDM Ref DM Hub lifecycle capabilities. The overall workflow and user-data steward interaction can be realized with BPM. Some of the process steps require that the data stewards interact with InfoSphere MDM Ref DM Hub to modify sets and change the resource state using the Simple Approval Lifecycle associated with the set.
Figure 6-37 Business process
In this example scenario, a business user starts the process by submitting a reference data change request on a web form. This request might be for a new code that represents a product type. The request is added to the worklist of the lead data steward who verifies the information that is provided through the web form and, based on this information, either rejects the request or proceeds processing the request. Based on the provided information, the lead data steward assigns a data steward who is responsible for the requested subject area (for example, product area) to the request and, optionally, also links InfoSphere MDM Ref DM Hub sets and mappings to be processed. The data steward receives the request in his worklist, applies the necessary changes, and sets the state of the modified sets and mappings to Pending Approval. The approval request is added to the worklist of the lead data steward who then verifies the changes and sets the state of the modified sets and mappings to Approved or Rejected. The user is notified if the submitted change request is rejected or the change has been approved.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset