Requirement analysis
This chapter describes requirements that must be gathered and analyzed during an IBM InfoSphere Master Data Management Reference Data Management Hub (InfoSphere MDM Ref DM Hub) implementation. At the end of the analysis, you must be in a position to have a scope of work that the client agrees to, and a clear specification of what must be implemented.
This chapter includes the following topics:
4.1 Discovering reference data
As organizations and their business-critical applications grow over time, reference data becomes widely dispersed over many systems. One of the first tasks in identifying reference data is to identify the systems that create the reference data. This topic must be discussed with a client; this topic defines the scope of the assignment. When a system or a group of systems are identified, you can locate the reference data within by using various methods:
Data model
Profiling tools
Existing data
4.1.1 Data model
If the client documented its data assets by creating, maintaining, and using data models and metadata reports, you can use these artifacts to understand where reference data is being used. For example, by analyzing the data model or schema, it is possible to get an idea of the tables that contain lookup values that are being referenced in other tables. Often, lookup tables, such as state codes or industry classifications, reference other grouping or classification tables. These can be identified too through the data model.
4.1.2 Profiling tools
Sometimes, you might not have a data model as a starting point. Even if you do have data models, you do not necessarily understand the quality of the reference data. This is where you can use data discovery or profiling tools, such as Information Analyzer or InfoSphere Discovery, to discover and profile reference data.
As part of discovery, it is also important to understand where reference data is being consumed. The organic growth of applications and systems over a period of time makes this data lineage increasingly difficult for the data stewards of reference data to understand the impact of a change. The identification of reference data lineage, however, is not a trivial task and often requires due diligence in speaking with the consuming application owners and understanding their systems interfaces.
4.1.3 Reference data is already known
Sometimes, organizations have already taken the step of consolidating the reference data in a single system. In such situations, the reference data is identified and documented. Different organizations have different views of what reference data is. Having reference data, master data, and operational data housed in the same “reference data” application is possible. In these cases, a critical task is to “tease” the reference data apart from master data domains such as customer, product, and location. Although storing a reference to master data (for example product code and description) within InfoSphere MDM Ref DM Hub is natural, avoid the storage of any further master data attributes.
After reference data is discovered, the process of analysis begins. Data and process requirements of the client must be analyzed and mapped to InfoSphere MDM Ref DM Hub capabilities. The following sections discuss the requirements that must be analyzed.
4.2 Data requirements
The discovery process results in a catalog of reference data that then must be analyzed to determine how the reference data map to InfoSphere MDM Ref DM Hub objects such as reference sets, mappings, and hierarchies. Identifying patterns of reference sets is important in this process.
As an example, Table 4-1 and Table 4-2 on page 84 list two patterns of reference data. Table 4-1 describes a simple pattern that stores a state code and name.
Table 4-1 Simple reference data defining a code along with a description
State code
Name
AL
Alabama
AZ
Arizona
CA
California
CO
Colorado
Table 4-2 on page 84 describes a pattern which stores some country information such as a name, time zone, and currency.
Table 4-2 Reference data defining a code, its description and attributes
Country code
Name
Time zone
Currency
AF
Afghanistan
UTC+04:30
Afghani
AG
Argentina
UTC-03:00
Argentine peso
BE
Belgium
UTC+01:00
European euro
CA
Canada
UTC-08:00
Canadian dollar
These two patterns must be modeled in InfoSphere MDM Ref DM Hub as different reference data types.
Use the following analysis tasks:
Map metadata such as primary keys, foreign keys, attributes, and their data types to reference data properties and associated property data types.
Map metadata such as foreign keys and their data types to reference data custom properties and associated property data types.
Identify required and optional attributes.
Capture validation rules.
InfoSphere MDM Ref DM Hub implements validation of attributes by using regular expressions. Therefore, any validation that can be represented as a regular expression can be configured in InfoSphere MDM Ref DM Hub. Table 4-3 lists the metadata that must captured for our example.
Table 4-3 Metadata describing a reference data type
Reference table
Attribute Name
Attribute data type
Primary key1
Required
Default value
Validation rule
Country type
Code
String(10)
Y
Y
N/A
Cannot contain numeric values
Name
String(30)
N
Y
N/A
N/A
Timezone
String(30)
N
Y
N/A
N/A
Currency
Reference set
N
Y
N/A
References currency

1 Indicates whether the attribute is a primary key (Y) or is not a primary key (N).
Mappings in InfoSphere MDM Ref DM Hub address the requirement of translating values between different types of reference data. One example of where this translating occurs is when standard classification codes are being retired and new standards are being created, such as the International Classification of Diseases standards ICD-9, ICD-10. Another requirement that the mappings address is when reference values from a source system must be translated into enterprise values represented in a data warehouse.
Often, there are requirements to manage hierarchies. Such requirements need careful analysis to determine how best to model hierarchies in InfoSphere MDM Ref DM Hub. A requirement to model a region, country, and state hierarchy can be modeled in InfoSphere MDM Ref DM Hub by creating Region, Country, and State reference sets so that the State is related to Country, and Country is related to Region. Such hierarchies are referred to as level-based hierarchies in InfoSphere MDM Ref DM Hub. Figure 4-1 is an example of a level-based hierarchy.
Figure 4-1 Example of a parent child relationship across different reference tables
However, a requirement to model an industry classification structure, which is self-referential in nature, can be modeled by creating a hierarchy object that has parent and child values that belong to the same industry classification reference set.
Figure 4-2 shows a self-referential hierarchy. In this example, both the parent and child codes, and also the relationship between parent and child codes, is defined within the same source reference table.
Figure 4-2 Example of a parent child relationship within the same reference table
If the source reference data contains self-relationships such as the one listed in the figure, these relationships can be imported into InfoSphere MDM Ref DM Hub through the hierarchy object. The hierarchy object is part of the Custom Domain Hub framework upon which InfoSphere MDM Ref DM Hub is built. You can then view the hierarchy as a ragged tree structure by expanding and collapsing the nodes, and also change the parent-child relationships by clicking and dragging the nodes.
Figure 4-3 shows a representation of a ragged hierarchy
Figure 4-3 Representation of a ragged hierarchy
Certain data requirements must be captured from an InfoSphere MDM Ref DM Hub user interface perspective, such as reference set naming standards and reference sets classification. These requirements govern how the reference sets are named and organized in the user interface. Although a classification system helps to organize the reference sets into folders for easy access, an appropriate naming convention with search keywords embedded in the reference set name helps to provide a quick search facility. For example, if a client is implementing a mapping from source values to canonical values, the reference sets that contain the source values might have a naming convention that includes the source system and also the reference set. Therefore, if the enterprise code in question is “Fee Type,” which is from two applications, “System A” and “System B,” naming the sets System A Fee Type and System B Fee Type helps in searching the reference set by using either the source system name or the enterprise code name. As an example consider the following reference sets:
System A Fee Type
System B Fee Type
System A Loan Type
System B Loan Type
A search by using System A as a criteria lists all the reference sets that are sourced from System A, namely “System A Fee Type” and “System A Loan Type.” However, a search having the criteria as Fee Type lists the Fee Type reference set from all sources, namely “System A Fee Type” and “System B Fee Type.”
The data quality reports produced from the discovery phase must be analyzed for completeness and accuracy. If there are requirements for InfoSphere MDM Ref DM Hub to perform validation checks at the attribute level, then the quality of the data must be assessed to determine whether a cleanup of existing source reference data is required before loading. The next section describes the data quality issues to consider before migrating reference data into InfoSphere MDM Ref DM Hub.
4.3 Data quality
Definitions of data quality refer to the tools and processes that enable the creation of correct, complete, and valid data necessary to support good decision-making. Data is good quality if it accurately represents the real-world construct it is intended to. Reference data that is maintained and stored in InfoSphere MDM Ref DM Hub is meant to enforce data quality in applications by the use of code tables. However, there are data quality problems found within the code tables themselves. Quality problems within the code tables themselves can allow errors to propagate throughout systems, applications, and data warehouses.
Problems can be of the following types:
Completeness within code tables
Code tables have, as a minimum, the attributes value, description, and start date. More attributes are possible, such as owner, end date, language, lifecycle, and status. Within these few attributes, code tables might have, for example a space character instead of a value, descriptions are missing, or sometimes simply repeat the value; so A1 is described as “A1.” Different date formats are seen in the same table, or in related tables that should be interchangeable.
Completeness about code tables
You have incomplete knowledge of where the code table is used in applications, interfaces, and business processes. Lack of knowledge and therefore control of authorship and approval processes is difficult to manage.
Use of code tables
Cases are found where the codes are misused in business processes. For example, a country code that is used to define citizenship might classify a person as “Great Britain and Northern Ireland” instead of “British.”
Remedying data quality in code tables is a judgment regarding the extent to which the lack of quality adversely affects enterprise data quality, and detracts from operational efficiency, balanced against how much work needs to be done to detect and improve the quality. By their nature, code tables are small compared to transactional data files, and every line in the file should be different. Because of this data quality issue and because mining tools are of limited use, managing data quality in code tables is still largely a manual effort.
4.4 Discovering business rules
Business rules are additional constraints that are applied to the default behavior of InfoSphere MDM Ref DM Hub. These constraints are normally beyond the product capabilities, but can help customers to better manage reference data. The discovery of business rules plays an important role to help identify custom validations needed in InfoSphere MDM Ref DM Hub during the loading, maintaining, and publishing of reference data.
Consider the following scenarios where custom validations might need to be applied:
A mapping cannot be approved until the underlying reference sets are also approved.
A reference value that is expired cannot be used in a mapping.
When populating a mapping within an automated load process, can a source reference value be mapped to multiple target reference values over time? If yes, what rules must exist to set the effective date of the mapping value?
During initial load, what is the default lifecycle state that a reference set needs to be set to?
When extracting or exporting reference sets, are there specific criteria that might prevent a reference set from being exporting? For example. should the reference set be in an approved state to be considered for export.
Must reference values be validated? For example, should a reference value be only numeric or only alpha? Another example is that the reference value must have a maximum of n digits.
Are there specific rules to change notification? For example, whenever a reference value changes, a change notification must be sent, and the information in the notification data must contain the values before and after.
These scenarios are examples of where the product might not have the capabilities available initially but can certainly be built into the product.
4.5 Use cases
Development of use cases is critical to understanding how reference data will be managed within the InfoSphere MDM Ref DM Hub user interface. Although the capabilities of the interface are already defined in terms of what it can or cannot do, working with the client and mapping their process requirements to the capabilities within InfoSphere MDM Ref DM Hub is important.
An understanding from a client perspective, of how InfoSphere MDM Ref DM Hub fits into the client business processes, is critical to define what product capabilities can be used. Establish a series of meetings to understand how referenced data will be created, managed, and consumed by the organization. The identified use cases, as a result of these meetings, determine what configuration or customization must be done to achieve the business requirements.
An example might be the way that changes to a reference set post approval might be managed differently by different organizations. In some cases, after a reference is approved, it cannot be changed; the change can happen only by creating a new version of the reference set. In other cases, creating versions might be considered as an overhead and the solution here might be to move the state of the reference set from Approved to Draft. Because there is no one way of implementing a change, these requirements must be captured, and usually result in certain types of configuration to InfoSphere MDM Ref DM Hub. The configuration result in the creation of a new lifecycle process and new states.
Sometimes, the client might make a distinction between types of changes which can occur. For example, creation of a new reference set along with its data might not need to be communicated, whereas, adding new data or changing data on an existing reference set might mandate a notification. In some situations, the type of reference data also matters. Although reference data that is created internally within an organization most certainly requires an approval lifecycle process, this might not be the case for external reference data (such as NAICS, ICD-10) that is subscribed by the organization.
Prototyping workshops provide the opportunity for the implementation team to demonstrate the InfoSphere MDM Ref DM Hub user interface to the client. A good practice is to ask the client for sample data that can be used to drive these requirements. Because you are dealing with reference data, the volumes are quite sizeable and can be loaded within a relatively short period of time into the InfoSphere MDM Ref DM Hub database. After such a prototype is created, the client can more easily articulate requirements such as folder organization, reference set naming standards, and so on. This process also helps the client to understand the functionality of the user interface. During these sessions, gaps between business requirements and InfoSphere MDM Ref DM Hub capabilities can be identified. Any such gaps must be addressed in the project governance, either through customization or by scoping out the requirement to a future phase.
The outcome is a list of well-defined use cases that clearly articulate the InfoSphere MDM Ref DM Hub capabilities that are required to fulfill the business requirements.
4.6 Security requirements
InfoSphere MDM Reference Data Management Hub provides role-based security control over the reference data sets, mappings, hierarchies, and managed systems. Requirements for security must be viewed from three perspectives:
User Interface capabilities that are available to a role
Update operations that are permitted to a role in a specific lifecycle state
Entity level access through ownership groups
Figure 4-4 lists the user interface capabilities that are provided to the roles, by default. Custom roles can be added in InfoSphere MDM Ref DM Hub. When determining security requirements, the user interface capabilities permitted for these custom roles must be indicated.
Figure 4-4 Role and functionality matrix
A role can also be granted the create, update, and delete permissions to a reference data set or mapping based on a lifecycle state. For example, a Steward might not be able to update a reference data set while it is in a Pending Approval state; however, an Approver can update the reference set values. Such requirements must also be captured.
Security can also be granted to specific reference data sets or mappings by associating roles to ownership groups. Figure 4-5 associates the default roles to the default ownership groups in InfoSphere MDM Ref DM Hub.
Figure 4-5 Role and ownership group association
When defining the security at the ownership group level, understand the following information:
Users are assigned to user roles.
Users also assigned to ownership groups.
Users who belong to an ownership group can only update data sets that are created by a user who belongs to the same group.
All users are able to view all data sets.
In the configuration in Figure 4-5, the steward user can only modify reference sets that are created by using the enterprise ownership group. The reference sets that are assigned to the mdm ownership group cannot be modified by steward user but can be modified only by a steward2 user. Similarly, steward2 cannot make changes to the reference set that is assigned to the enterprise ownership group.
As part of the requirement-gathering process, the project team should ask the customer to provide information about the ownership groups that are associated with each reference data set or mapping.
4.7 Integration requirements
The InfoSphere MDM Ref DM Hub solution runs in the client IT environment and is administered by IT specialists. Data stewards and business users carry out the authorship and maintenance of the actual reference sets and mappings themselves. For InfoSphere MDM Ref DM Hub to run smoothly, it must integrate with the existing business processes and run in the IT environment meeting non-functional requirements.
4.7.1 Business process integration
The requirements for business process integration are often arrived at or finalized during the analysis stage of an InfoSphere MDM Ref DM Hub implementation instead of traditionally being a predetermined collection of requirements that must be met. The reason is that the functionality of InfoSphere MDM Ref DM Hub has a significant influence on designing the business process. In many cases, analysis shows that code tables are managed in many places and without a general standard. With InfoSphere MDM Ref DM Hub, the introduction of a central user interface and import and export routines enable a rationalization and standardization of a process that was not possible before.
For example, an existing business process uses an external set of industry codes to classify business customers. Updates to the code set are issued twice yearly and might change 30% of 10,000 values, retiring some and bringing in new ones. The staff personnel, who need the codes, concentrate only on the subset they frequently use and manually check on data entry of whether the value is still valid. This process is a candidate for change where the new code set can be updated by a batch import process: either the whole set with a new version, or only the delta incorporated with validity dates updated.
Another example is where a code set is updated in one system and that change is required to be repeated in many other systems. The existing process circulates the change by email or spreadsheet for the administrators of the consuming systems to update their own instance of the code table. This way is inherently error prone and can be helped by setting up a subscription in InfoSphere MDM Ref DM Hub.
The new business process might become the following process:
1. The business user approves automatic updates from one system to many others and then requests IT specialist to implement.
2. IT specialist sets up subscriptions in InfoSphere MDM Ref DM Hub and tests.
3. The new process is reviewed after a period of time.
The documented business processes in most enterprises reveal an emphasis on ensuring the process is documented, mandatory, and meets external compliance and governance standards. Little of most processes deal with the authorship and management of reference data sets, giving an InfoSphere MDM Ref DM Hub implementation an opportunity to add value by supporting new and improved business processes.
4.7.2 System integration
There are published prerequisites for system integration which detail precisely all the expected components that are required, down to the detail of release number and fix pack. These details vary with the environment into which InfoSphere MDM Ref DM Hub is being introduced but a purely illustrative example might be as follows:
DB2 9.7 Workgroup or Enterprise Edition installed
WebSphere Application Server 7 with fix pack 15 or later installed
Web 2.0 feature pack for WebSphere Application Server installed
WebSphere MQ 7.0.1.3 installed
Verify that these versions are installed. The requirement is that they are proved to run on the client IT infrastructure.
The InfoSphere MDM Ref DM Hub installation itself is made in the WebSphere Application Server and DB2.
If there is a suitable Active Directory in the client environment, a more straightforward approach is to link InfoSphere MDM Ref DM Hub to it, rather than set up InfoSphere MDM Ref DM Hub specific permissions.
4.8 Test cases
As with any system implementation, identification and execution of test cases is necessary to check whether any configuration or customization of the product meets the requested functionality. This section describes the test cases that must be documented and executed in the testing phase of an InfoSphere MDM Ref DM Hub implementation.
4.8.1 Configuration testing
Several aspects of configuration apply to InfoSphere MDM Ref DM Hub. The following list is an example of where configurations occur and therefore must be tested properly:
Creation of reference data types
These test cases check whether all the specified reference data types are created properly and that all the attributes are created with correct data types.
Lifecycle
These cases test whether the reference set flows through the lifecycle states as preferred. Another aspect of the testing is whether all the required roles have the appropriate permissions for every state in the lifecycle.
Validations
Determine whether all the required properties are specified. In cases where property values are being validated, determine whether they are being validated according to the requirements.
Security
Determine if all the authorized users have the correct roles and ownership groups.
User interface
Be sure the folder structure is set up according to the requirements. Check whether the proper naming convention was adopted, and whether it is possible to search for reference sets and reference values according to the defined search criteria.
4.8.2 System integration testing
Because InfoSphere MDM Ref DM Hub is all about the process of authoring, managing, and distributing reference data, the project test plan must cover some form of integration testing. For example, if a use case involves a source system sending an update of reference data to InfoSphere MDM Ref DM Hub which subsequently gets approved by the data steward and ends with a notification being sent to a consuming application, the interfaces into and out of InfoSphere MDM Ref DM Hub must be tested to ensure that updates and notifications are happening as expected. All interface functionality whether a batch update, notification, or real-time web service call, must have a set of test cases.
4.9 Deliverables
The deliverable of this phase is a requirements document that lists the data and functional requirements of the reference data solution. It is essentially a summary of the topics discussed in this chapter.
The following list is a summary of requirements that must be captured:
Data requirements
 – List the entities that will be modeled and managed by InfoSphere MDM Ref DM Hub and provide the requirements for each entity. This section might include a mapping document that defines where the source reference data must be stored in the InfoSphere MDM Ref DM Hub data model. For example, a particular source table is to be stored as a reference set and another is to be stored as a mapping.
 – List and define the areas of the InfoSphere MDM Ref DM Hub data model that will be used, that is, reference sets, reference data type definitions, mapping definitions, managed systems, subscriptions, folders, and so on.
Integration requirements
 – List the requirements about batch loads or web service calls from consuming applications.
 – List the requirements to communicate or notify changes to consuming applications.
 – List the integration requirements with other InfoSphere Information Server products such as Business Glossary.
Data governance requirements
 – Document the approval lifecycle process for reference sets and mappings.
 – Document any requirements to integrate InfoSphere MDM Ref DM Hub within an external workflow process. Although external to InfoSphere MDM Ref DM Hub, such requirements can be used to develop and direct the use cases regarding data governance.
 – Capture and list business rules for validating or standardization of data.
 – Capture and list any reporting requirements necessary to monitor data governance metrics.
Security requirements
 – List the users, roles, ownership groups.
 – List the user to role mappings, and user to ownership group mappings.
Use case specifications
 – Document the set of interactions between external actors and the InfoSphere MDM Ref DM Hub user interface.
 – Identify and detail the use cases required from a reference data governance perspective.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset