Business use cases for InfoSphere CDC
IBM InfoSphere Change Data Capture (InfoSphere CDC) can be used in any organization where data is changing in one system (or more than one system for bidirectional replication), and those changes must be replicated to other systems within the enterprise. Although this situation is conceptually simple, the additional features and functionality in InfoSphere CDC can be used to resolve many of the typical business problems associated with managing your data environment. This chapter describes the application of InfoSphere CDC and highlight some business use cases from client experiences and internal projects.
3.1 InfoSphere CDC techniques for transporting changed data
There are many methods for integrating data between applications. Some methods work at the application level and others work at the database table or transaction log level. This section focuses on the different techniques for transporting changed data that can be used when using InfoSphere CDC.
In cases where you need to establish a reporting / shadow database that matches the contents and structure of the source database, the topology in Figure 3-1 suffices. Table changes are picked up from the source database through its transaction log, then transported and applied to the target database.
Figure 3-1 Change data capture
For more complex data replication requirements, use InfoSphere CDC with extract, transform, and load (ETL) tools. In those situations, InfoSphere CDC replicates data to the target databases through ETL tools in various ways, such as those described in the following options:
3.1.1 Option 1: Database staging
This option (Figure 3-2) typically has the characteristics described in the following list:
1. InfoSphere CDC captures changes made to source systems.
2. Changes are written to a staging area table or tables.
3. The ETL processes extract the changes from the staging table or tables, and transform and clean the data as needed.
4. After the ETL processes are finished, the replication is resumed and the staging tables are populated for the next run.
Figure 3-2 Change data capture - database staging
3.1.2 Option 2: Message queue (MQ) based integration
The functionality of this option is delivered with IBM WebSphere® MQ (WebSphere MQ). It is shown in Figure 3-3 and typically has the characteristics shown in the following list:
1. InfoSphere CDC captures changes made to the source systems.
2. The captured changes are extracted and written to a message queue (MQ).
3. The downstream ETL processes (through the WebSphere MQ connector) extract the new messages.
4. Those messages are then passed to other downstream processes to be applied to the target database.
Figure 3-3 Change data capture - MQ-based integration
3.1.3 Option 3: File-based integration
This option (Figure 3-4) typically has the characteristics shown in the
following list:
1. InfoSphere CDC captures changes made to source systems.
2. Changes are then written to flat files.
3. The downstream ETL processes extract the changes from the flat files.
4. The data is then passed to other downstream processes to be applied to the target database.
Figure 3-4 Change data capture - file-based integration
3.1.4 Option 4: InfoSphere DataStage Direct Connect
This option (Figure 3-5) typically has the characteristics shown in the
following list:
1. InfoSphere CDC captures the changed data from the source systems and passes this data to a continuously active DataStage job.
2. The DataStage job receives transactions through the
CDC Transaction Stage operator.
3. In the job, the data can be transformed and passed on to other
downstream locations.
4. The CDC bookmark is maintained along with DataStage applying the changes to the target database.
Figure 3-5 Change data capture - direct connection integration
3.2 Data warehousing and business intelligence
A data warehouse is part of a decision support systems environment that is populated by extracting, transforming, and loading operational data from one or more Online Transactional Processing (OLTP) systems. This data is then available for reporting and analysis.
3.2.1 Active data warehousing
At one time, data warehousing and decision support was focused on analysis of historical data. However, this analysis is not sufficient in today's competitive market where decisions need to be based on more current data.
There is an inherent cost to produce up-to-date reports in a traditional fashion using batch extracts, and this cost is typically measured in resource utilization (such as processor usage and processing time) split between data extraction and report generation. This cost continues to increase as corporate data volumes grow and business requirements increase.
Active data warehousing is the process of capturing transactions when they occur, and immediately integrating those changes into the data warehouse. In an active data warehouse, OLTP transactions are automatically captured upon completion, transformed as required, and loaded immediately into the data warehouse. InfoSphere CDC enables active data warehousing.
Active data warehousing: Sample implementation
In this example, clients run multiple operational applications in an IBM DB2/z mainframe environment. Data from these operational applications are extracted at the end of each day through ETL batch processing (with IBM DataStage ETL) and loaded into the data warehouse, which services many lines of business. The long-term objective of the data warehouse is to provide up-to-date business intelligence (BI) and analytics at any point in the business day.
Although the initial batch implementation was successful, some issues and requirements were identified. For example, the data warehouse was only as current as the previous night’s batch update. However, the lines of business required more up-to-date information during the business day to make insightful decisions. There are typically two options available to accomplish this task:
1. Allow business users to run reports and queries on the production system during the day, which increases resource utilization and cost.
2. Implement a solution that enables intra-day updates from the production reporting applications to the data warehouse.
Active data warehousing: Proposed solution
The proposed solution is to implement IBM InfoSphere CDC to synchronize changed data from the operational production applications with the DataStage ETL server continuously during the business day. The solution is shown in Figure 3-6, and provides the following benefits:
Reduces the volume of changed data that needs to be extracted during the nightly batch window (time and resources required), which in turn minimizes the time that production applications are unavailable to users.
Provides continuous claims data changes (through ETL staging tables), which runs intra-day ETL updates to the data warehouse. This setup improves the overall data visibility for lines of business.
Figure 3-6 Proposed data warehouse solution - ETL-based integration
3.2.2 Slowly changing dimensions
Dimensions are logical groupings of data, such a product, customer, and time. Slowly changing dimensions (SCDs) are dimensions that change slowly over time. The ability to process SCDs is required to track historical changes in dimension tables. There are six different ways of working with SCDs, the most common of which are the following:
Type 1 SCD: Here the assumption is that the business does not worry about historical changes to attributes in data. Any change in data attributes results in over-writing existing column data. This replication is standard data replication that inserts, updates, and deletes data rows to reflect data operations in the source system.
Type 2 SCD: With this method, there is a business need to track historical changes to attribute values in data. Updates to data attributes are inserted as new database records using a surrogate key to support the natural key. Two additional columns, Effective Date and Active Indicator, are also included in the database table to track the historical changes to the data. Type 2 SCD allows for unlimited history preservation.
Type 3 SCD: This method also caters to tracking historical changes to attribute values in data. Although type 2 SCD allows for unlimited history preservation, with type 3 SCD, historical information is stored as an additional column, which limits history preservation.
InfoSphere CDC can be used to update the appropriate data for SCD tables using a combination of the journal control fields and Java or Stored Procedures-based user exits.
Slowly changing dimensions: Sample implementation
In this example, the client runs their operational system on IBM DB2 for Linux, UNIX, and Windows. They have customer data being replicated from their operational systems to the Enterprise Data Warehouse (EDW). Large volumes of customer information are being replicated to the EDW with new customers and updates to customer information, such as location.
The methodology used to update customer information is to use an ETL mechanism with lookups to existing customer information to identify data changes and update customer records as appropriate. Although this approach works, the following problems and new requirements were identified:
An increasing volume of customer updates results in increased latency.
Increased workload on the EDW due to increased look-up queries.
New business requirements dictate the need for near real-time customer
data updates.
Updates to the customer records need to be preserved to be able to reflect historical states.
Slowly changing dimensions: Proposed solution
The proposed solution is to implement IBM InfoSphere CDC to synchronize customer updates from the operational application to the target database tables continuously during the business day (Figure 3-7). The customer requirements dictate unlimited history preservation, so a Type 2 SCD table structure is maintained. Existing customer records are end-dated and the updates to customer information are maintained with the effective date set as the data replication date. The slowly changing dimensional information (customer data) is handled using DB2 stored procedure-based user exits. The user exit stored procedure performs UPDATE and INSERT operations to update SCD columns in the target and end-date current data, and then inserts customer updates as a new column with an effective date set to the data replication date. This solution provided the following benefits:
Reduces the latency in receiving customer data updates.
Reduces ongoing workload on the EDW as a result of customer
lookup queries.
Figure 3-7 Using InfoSphere CDC to synchronize updates
3.3 Consolidation
Organizational business needs can drive the organization in a direction where they need to consolidate their data for various reasons, such as to bring down the cost, increase the quality of data, run decision making reports, or implement data normalization.
IBM InfoSphere Change Data Capture is flexible, and can enable, as examples, the following types of business requirements:
Consolidation can be achieved at the database level, where clients want to bring together their data from various source application databases and make it available to users in one database.
A client can consolidate the data from various tables into one table on the target side.
A client can merge data from multiple tables into one table at the row level.
3.3.1 Consolidation: Sample implementation 1
In this example, the client runs several divisions across the United States and decides to consolidate their customer data from all of their divisions. All divisions are keeping their data on their own source systems. The client wants to consolidate all the data on the target database and then move it to another database. The environment is as follows:
There are 22 tables on each division, all having the same structure.
The client wants all source data manipulation language (DML) operations (inserts, updates, and deletes) performed on target database tables.
The client wants access to all source system information (such as operation type, and date and time) while data is replicating.
The client wants to stop the data replication process every night and move the data to the data warehouse.
After processing the data to another database, the client wants to truncate the target tables.
In addition, the client does not want to perform an initial load from all the source tables to the target tables.
Consolidation: Proposed solution 1
The solution includes using the InfoSphere CDC solution. InfoSphere CDC is installed and configured on each division source and target servers. The client configures all tables using the InfoSphere CDC Live Audit functionality. The client also uses journal control fields for DML operation type and time stamp. The initial load is performed outside of InfoSphere CDC. All table method and status are set to Mirror and Active mode.
The solution is shown in Figure 3-8.
Figure 3-8 Customer consolidation
3.3.2 Consolidation: Sample implementation 2
The client is one of the world's largest manufacturers of adhesives and sealants. They employ over 4000 people and their products are sold in over 120 countries. In this example, the client has merged with another company that uses a different ERP system than they do, so they want to consolidate the data for analysis and reporting. With such large and global operations, they could only gather their data at night to not impact the performance of their systems during the day. However, data volume and reporting demands are growing and, at the same time, the merger caused additional complications in data consolidation, so business operations are not running as well as they should be.
To summarize:
Corporate data is consolidated on a nightly basis from two ERP systems for analysis and reporting, to avoid impacting daytime performance.
The volume of data and reporting needs exceeds nightly batch load capacity and hinders business performance.
Consolidation: Proposed solution 2
InfoSphere CDC consolidates data from ERP systems into a single database and transforms it into a single format. The client implements an InfoSphere CDC real-time solution into their infrastructure and realizes significant benefits, such as the following:
Flexible reporting and analysis schedules. Reporting can now be done at
any time.
Using the most recent data from the consolidated ERP systems provides more accurate reporting, and removes the previously existing
execution bottlenecks.
The data consolidation of two different ERP systems is easily handled by InfoSphere CDC transformation capabilities, and provides a single view of the corporate ERP data.
The batch time slot at night is free for other activities. As a result, the clients business performance has improved with a 10% increase of on-time delivery of orders and an 80% decline in late orders.
3.4 Distribution
Data and workload distribution is about moving data between different servers that run parts of the same application.
This setup allows the workload to be efficiently distributed across multiple server environments. Hardware vendors, such as IBM, are extending their server ranges from the low end to the high end, which makes this proposition increasingly viable. Software vendors often price their applications based on the processing power band of the host server. It makes economic sense to offload workload, for example, query and reporting, onto other servers. Some servers handle batch work (such as query) better than interactive work (such as user query and data analysis), and they also do this work at a much lower cost. Moving data around to multiple servers in an organization also produces improved availability resilience.
InfoSphere CDC can satisfy these data and workload distribution requirements in near real-time scenarios. An example of data and workload distribution is shown in Figure 3-9.
Figure 3-9 Data and workload distribution
3.4.1 Distribution: Sample implementation 1
In this example, a retail department store client has several stores, and a distribution center, which runs a corporate IBM System i server. The client requirement is to upgrade their electronic point of sale (EPOS) equipment in all stores to SQL Server Clients. Within each store, the new EPOS terminals are connected to a consolidation server also running SQL Server. It is important that all SQL Server EPOS terminals have access to up-to-date information for the many thousands of products that the company sells.
The master files are maintained centrally on the System i server at the distribution center. The volume of sales transactions varies significantly throughout the year and is highest in January during the yearly sell-off sale. It is important that the distribution center is kept informed about sales transactions throughout the day for stock level and distribution purposes.
Distribution: Proposed solution 1
The client implements a solution that uses InfoSphere CDC to distribute data changes to the consolidation server in each store. Using the InfoSphere CDC Scheduled End (formerly known as Net Change) functionality, only changes in the master files are replicated, thus reducing the replication workload. Sales transactions are mirrored continuously to the distribution center, which provides a reliable and up-to-date position in each store. The solution is shown in Figure 3-10.
Figure 3-10 InfoSphere CDC Scheduled End
3.4.2 Distribution: Sample implementation 2
A brokerage firm wants to develop an in-house Global Common Database application that could be available at each of their locations worldwide. The application allows entry of new trades from the stock exchange floors to an application using a SQL Server database and running on Microsoft Windows. The client has an existing communications infrastructure with utilities to communicate between all their System i servers worldwide. The first location to go live would be London and then the solution would be rolled out to other locations. The database structures for SQL Server are different from the structures on the System i servers.
Distribution: Proposed solution 2
The client implements a solution that uses InfoSphere CDC to continuously mirror data between SQL Server servers and System i servers using a local LAN and TCP/IP plus an existing worldwide network to distribute data around the world. The ALIAS functionality was used extensively to remap columns from SQL Server tables to fields in System i physical files. SQL Server allows long column names, but System i has a maximum field size of 10.
The implementation time is short, and exceeds the client's performance expectations. The client initially considered developing an in-house replication solution, but rejected it because it would have taken many months to develop, have unknown performance, be expensive, and require a programmer each time changes needed to be made. The solution that was used is shown in Figure 3-11.
Figure 3-11 Global database
3.5 Database migration
Database migrations might be necessary in many organizations. For example, if the company changes their hardware, performs major database upgrades, or switches their business application from one database to another. The basic idea is that whenever a company must migrate their database from one server to another, the process must be smooth, uninterrupted, have minimum downtime, and be less expensive.
InfoSphere CDC can load data from one server to another using the InfoSphere CDC Refresh functionality or the set bookmark capability to set the point from which InfoSphere CDC starts capturing delta changes (after using other vendor tools to load the data on the target database):
The Refresh capability provides a full copy of the data on the target side while your source applications are performing normal daily business operations without any interruptions. After refreshing the data, InfoSphere CDC continues capturing the delta changes and applying them on target side.
If the client wants to use another vendor’s tool to load the data and then capture the delta changes for their daily business activates. InfoSphere CDC is flexible enough to adjust to this business need by using the Mark Table Capture Point function.
3.5.1 Database migration: Sample implementation
SAP heterogeneous system copy is defined as the change of the hardware or database platform for an SAP instance. Many SAP systems can be migrated with SAP standard tools (such as R3load), which are also available to IBM and other service providers. The requirements can become critical for very large SAP databases, because they typically have small windows of downtime.
The overall approach for porting SAP instances with minimum downtime to another hardware or database platform is based on a combination of using InfoSphere CDC for the 30 - 50 largest tables in the SAP database. You then use standard tools provided by SAP (such as R3load) for the remainder of the tables.
Database migration: Proposed solution
IBM implements an InfoSphere CDC solution SAP utility that fulfills the client business requirements (Figure 3-12).
Figure 3-12 SAP heterogeneous system copy
The implementation consists essentially of the following actions:
1. Build the target DB2 database and SAP application by running SAPINST until the database load is performed by SAPINST or the migration monitor.
2. Create the large tables in the target system using the SAP R3load utility. This action ensures the consistency of the data definition required for the SAP data dictionary.
3. Start initial data transfer using the Refresh function of InfoSphere CDC. This function reads the data from the source database while the source SAP system is up and running. This action might have a performance impact and should be initiated during times with little workload on the table in question.
Alternatively, R3load can be used for initial synchronization. InfoSphere CDC is then only used to replicate the delta changes starting from a marked Table Capture Point, which must be consistent with the R3load export.
4. Start replicating the data by scraping the log files and apply the data to the target table.
5. When tables on source and target are synchronized and a downtime window is available, the remaining tables are migrated using the R3load utility.
3.6 Application integration
In this example, an organization wants their different running applications linked together so they can automate their business process.
InfoSphere CDC integrates client applications with each other, which can provide the following benefits:
InfoSphere CDC unidirectional functionality provides full integration of the data from the applications.
InfoSphere CDC bidirectional functionality allows the application and user to enter the data from both sides (source and target). InfoSphere CDC makes sure that the data is synchronized between the source and target databases.
InfoSphere CDC can be used in such a way that data can be extracted from the source database and provided to a client on a handheld
device application.
3.6.1 Application integration: Sample implementation 1
A client wants to integrate their billing application data with their CRM process to better serve their customer needs and improve their business process. Here is the list of their objectives and environment:
Improve customer retention against lower-cost operators.
Provide better customer service by being able to react in real time.
No extra bandwidth for this type of processing (the billing system is
already overloaded).
Pre-paid customers should receive short message service (SMS) messages when low on minutes, which prevents customers from losing service, and improves the customer experience. An additional benefit is that revenue
is increased.
When customers activate a new phone, they should be able to start using it immediately because of instantaneous billing.
Application integration: Proposed solution 1
IBM implements an InfoSphere CDC solution, which integrates the client’s billing system data with their CRM system by using the InfoSphere CDC event server messages functionality. InfoSphere CDC scrapes the data from source database and generates XML messages that are passed to target queues. Then the Enterprise Service Bus (ESB) application picks up the data from InfoSphere CDC messages queues and makes them available to the CRM system.
The results are:
Low impact (minimal additional workload on the billing system)
High volume (typical for billing systems)
Real-time (Changes are captured and responded to immediately.)
Easy to implement (Uses the existing infrastructure.)
Real-time visibility of customer billing information in the CRM system
Minimal impact on the operational system
Flexible for changing data requirements
Minimal latency
Scalability
The solution is shown in Figure 3-13.
Figure 3-13 Integrating billing and CRM
3.6.2 Application integration: Sample implementation 2
A leasing company is planning to integrate their front-end application with their back-end application. The objective is that the customer wants quick quotations using the front-end application.
The front-end application's strengths are user and web interfaces. The back-end application's strengths are mainly in the calculation area. The quotations are calculated quickly and with maximum flexibility.
To speed up the company’s integration process, the company decides to use the strengths of both applications and have them work together. The front-end application enters the information required for a quotation, and the back-end application would do the calculations.
Integration between the applications is a great challenge, primarily because of database differences, which includes code page differences.
Application integration: Proposed solution 2
InfoSphere CDC is installed and configured with a bidirectional solution between these two servers to meet client business objectives. The project has already started and is struggling when InfoSphere CDC is implemented. The result is that development costs are reduced and the application integration goes smoothly, resulting in a quicker time to market. The solution is shown in Figure 3-14.
Figure 3-14 Application integration
3.6.3 Application integration: Sample implementation 3
A Fortune 500 retailer needed to integrate their website with their production systems to serve their customers in real time while enhancing their own
internal operations.
To provide a superior shopping experience to its customers, the company improves their online gift registry system by implementing real-time data integration. For example, when a customer makes a purchase on the website, the gift registry system is updated immediately, such that no duplicated purchase can occur to cause customer dissatisfaction and lost sales through product returns and refunds. At the same time, the inventory count is adjusted to reflect the new product statistics to all other customers. As a retailer of exquisite service, having real-time data is a mission-critical capability. By integrating the web application to corporate systems, the company is able to deliver accurate information that is updated continuously in real time.
The company's internal operations also benefit greatly from real-time data. Because the company's products are not produced in high volumes, inventory management requires careful planning, especially with 150 stores and product demand that can come from any part of the world. By implementing a real-time infrastructure, the company is able to receive the most updated ordering information, which enables them to make the most informed business decisions in managing its inventory. This setup enables the following capabilities:
Access to up-to-date inventory and product information for online purchasing
Purchases made online are accurately reflected in the inventory system
Application integration: Proposed solution 3
The following solution is implemented with the following results:
Implement InfoSphere CDC. InfoSphere CDC real-time data flows to the e-commerce application for online customer purchasing, then back to the inventory systems after purchases are made.
Increase customer satisfaction with more accurate online gift registry information, which is a significant portion of the company's online business.
Improve inventory tracking results due to more effective
inventory management.
3.7 Integration with master data management
Leading organizations are working to gain control of their most important and commonly shared enterprise information assets. They are looking for adaptive solutions to help them make more informed business decisions. These initiatives are based on master data management (MDM), a practice designed to achieve and maintain a single version of data across the enterprise. Managing product master data plays an important role in any company's measure of business performance. Establishing a single view of product master data and associated domains, such as suppliers, vendors, and locations, enables companies to address key strategic business initiatives throughout their organization. An implementation for product master data might require features such as a flexible data model, granular data access and security, real-time access, workflow, rich user interfaces, and business logic.
IBM InfoSphere Master Data Management Server is an enterprise customer master data management application that provides a unified customer view and updated environment to multiple channels. It aligns these front-end systems with multiple back-end systems in real time, providing a single source of customer data across the enterprise.
The IBM InfoSphere MDM Server enterprise application uses a relational database management system that contains the operational tables, history tables, reference tables, metadata tables (business objects and attributes, and MDM transactions and actions), management tables, and configuration tables.
InfoSphere CDC provides near real-time data to applications, such as ETL tools. Those applications can then feed the data to an MDM server in near real time.
3.7.1 Integration with master data management: Sample implementation
A high-level solution architecture design provides an enterprise-level system implementation design for collecting, improving, retaining, and distributing information that describes the customer's enterprise master data. It is important to note that the architecture described is a high-level representation of the key components for an MDM Server system based on information available now.
Integration with master data management: Proposed solution
This solution uses InfoSphere CDC to extract changes from the mainframe DB2 logs, and then the MDM services load the MDM database, not the Rapid Deployment Package (RDP) asset. For example, an initial load might consist of 16.5 million customer records, while a delta load would be much lighter, for example, only around 20 K rows per day.
The data volume for the delta load (20 K rows per day) is much lighter than for one initial load (16.5 million customers). A record is added to a flat file whenever a change occurs in a table. This process typically occurs during the day, and at a predefined time the files are closed and the update to MDM is initiated in batch. This situation is shown in Figure 3-15.
Figure 3-15 Integration with MDM
RDP has a limitation when processing Standard Interface Format (SIF) records with multiple updates of the same type in the same run. For example, if an address for a customer is changed more than once during the delta capture period, only the last row is processed.
Initially, once a day was sufficient, but now it should be run more often. After InfoSphere CDC hardens (marks as being complete) the landed files, the delta ETL job can run. If multiple changes are in these files, only the last update is processed. Although this situation should not be a problem because the most up-to-date change is reflected in MDM, the prior change history is lost. The process includes the following actions:
InfoSphere CDC processes the delta changes from the database log and sends them to an InfoSphere CDC process on the Information Server, which saves them to a local directory in the Flat File format.
ETL transforms the CDC / CDB data into a Standard Interface Format (SIF) file. The address lines must be calculated (based on codes), while in the initial load this task has been completed by the client. Address data is captured by the InfoSphere CDC functionality (user exit) so that you get the address whether it has been changed.
The layout of the Standard Interface Format (SIF) is a Flat File format delimited by the pipe symbol between columns and the new line character between records.
The SIF Sequencer runs. This process consists of ETL jobs designed to convert the regular SIF input into multiple files that contain SIF records for delta maintain transactions.
The SIF Parser and Maintenance Business Transactions are used to load data through the Batch Framework. A role-based Sample User Interface (UI) is also included. This UI is an umbrella GUI that incorporates a Reporting UI, a registry that serves and renders Business Intelligence and Reporting Tools (BIRT) reports, a Customer Matching Critical Data Rule UI that allows defining matching criteria, a Data Stewardship UI, and a Party Maintenance UI to view and maintain data in the MDM data repository.
3.8 Integration with IBM Information Server
IBM Information Server is a leading solution in the market that integrates data, performs cleansing and transformations, and delivers information to your critical business initiatives.
InfoSphere CDC provides near real-time data for building business initiatives. InfoSphere CDC scrapes the near real-time data from source systems and either writes it into a flat file or uses direct connect by using CDC for DataStage on the target system. The ETL application reads the data from the flat file and processes it into the target.
3.8.1 Integration with IBM Information Server: Sample implementation
In this example, a company creates a Bulk Cumulative Volume of their database once a night for the ETL server. The ETL application uses that data for a comparison between the Bulk Cumulative Volume copy and the existing copy. The delta changes are loaded into the Data Warehouse by using the ETL application. The entire process takes more than 22 hours.
The company has the following requirements.
The company wants access to more near real-time data for various business initiatives and requirements.
The company wants to reduce the total processing time to feed the data to a data warehouse.
Integration with IBM Information Server: Proposed solution
The proposed solution is shown in Figure 3-16. It shows that the billing application generates database logs on the production server. To move the data, a script is written on the source side that, using FTP, transfers the archive logs to the InfoSphere CDC and DataStage server upon creation and every 15 minutes. The InfoSphere CDC capture agent has a database connection to the production database, and processes the new logs on the InfoSphere CDC and DataStage server, scraping information from the archived logs. The InfoSphere CDC Apply process also runs on that same server, and generates flat files. Those flat files are then written to the data warehouse.
Figure 3-16 Integration with IBM Information Server
3.9 Operational business intelligence
Operational business intelligence, sometimes called real-time business intelligence, can enable companies to achieve their business needs.
Companies have good reasons to develop BI systems to fulfill their operational needs. As an example, BI has become a company’s tool that provides comprehensive views of business directions. BI systems provide quantitative data to manipulate data, which enables business analysis and supports decision making. Most of the BI systems are supported by an information management infrastructure that includes data warehouses, data marts, and other integrated data resources, such as an operational data store (ODS). The ODS provides access to data that can help to show the company’s current situation.
InfoSphere CDC also plays a vital role in building near real-time operational BI, as a way for companies to fulfill their daily business needs.
3.9.1 Operational business intelligence: Sample implementation
A large company is building their data warehouse to see the daily trends of their customers so they can make quick decisions when fulfilling their customer needs. The company has the following challenges in building their operational BI:
Effectively loading production data into the data warehouse for business intelligence analytics
High impact of batch extraction from production system, which places operations at risk
Operational business intelligence: Proposed solution
The company implements InfoSphere CDC and installs and configures it on their source and target systems (Figure 3-17). They use InfoSphere CDC time stamp and soft delete functionality to achieve this business need, as follows:
1. InfoSphere CDC is used to extract and integrate data into the operational data warehouse.
2. Time stamps are added to operations in the ODS and used by the ETL tool.
3. Delete operations are soft deletes in ODS. If the delete operation deletes the record on the target side, InfoSphere CDC flags that record.
4. Granular data provides detailed information for complex analysis to identify customer trends.
Figure 3-17 Operational BI
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset