Introduction and overview
This book describes the IBM InfoSphere Change Data Capture (InfoSphere CDC) solution and how it fits in an Optimized Data Integration solution. Optimized Data Integration addresses the increasing need of businesses for timely access to changed data before making critical business decisions. It is a solution that allows businesses to access, move, and deliver data in a timely and cost-effective manner from the source systems on which it is located to the target systems and applications where it is required.
In today's demanding business environment, users and consumers of business data want real-time access to personalized information and instantaneous updates, setting a new level of expectation for data within organizations. To make more informed business decisions, better serve customers, and increase operational efficiencies, organizations must ensure that they are aware of changes to key data as they occur. These changes must be immediately delivered to the people and processes that need to act upon them.
From data warehousing to service-oriented architecture (SOA), application consolidation, and master data management, business leaders are aware of the power of information to streamline processes, reduce costs, and make businesses more efficient and effective. To be successful, these projects must have steady and reliable delivery of timely business information from across the enterprise, which can be both expensive and resource-intensive.
Change data capture technology helps businesses overcome these challenges by capturing only changed operational data and transmitting it across the enterprise. This approach provides substantial business value while helping to reduce risk and to deliver cost and speed advantages that enhance traditional data movement processes.
1.1 Optimized data integration
Most businesses have information management environments organized similar to the one shown in Figure 1-1. These environments consist of transactional source databases that contain the basic building blocks of raw data that all companies generate and require to operate their operational systems (lower left). As you move up and right from these source databases through the data usage continuum, you find users, analytics, and the consuming applications that companies use to derive business insight from their data (upper right). In the middle, you find a reporting or data warehousing environment that most companies use to organize and analyze their raw data to enable the delivery of business insights.
Figure 1-1 Optimized data integration
The main goals of optimized data integration are to move and deliver data in the most timely and cost-effective manner. These goals are achieved by optimizing the consumption and latency period associated with data movement, moving only the data requested by a user or that has been changed or updated on the source systems. Moving only the changed data saves time and money by moving only the subset of data that is required rather than moving the entire source data set. This approach to data movement ensures that any organization that moves data to support business intelligence (BI) and reporting systems, applications consolidations and migrations, continuous availability solutions, or general data distribution and synchronization scenarios, can do so knowing that the raw data is delivered efficiently and effectively.
Figure 1-2 provides a simple schematic of how optimized data integration is most commonly implemented to enable businesses to make better informed business decisions, run smoother operations, win new customers, and increase their bottom line.
Figure 1-2 How optimized data integration is used
Some examples of how optimized data integration is used include the following:
Provides feeds of changed data for Data Warehouse or Master Data Management (MDM) projects, enabling users to make operation and tactical business decision making using the latest information.
Dynamically routes data based on content to message queues to be consumed by one or more applications, ensuring consistent, accurate, and reliable data across the enterprise.
Populates real-time dashboards for on-demand analytics, continuous business monitoring, and business process management to integrate information between mission-critical applications and web applications, ensuring access to real-time data to customers and employees.
Consolidates financial data across systems in different regions, departments, and business units.
Improves the operational performance of systems that are adversely affected by shrinking nightly batch windows or expensive queries and
reporting functions.
1.2 InfoSphere architecture
Before beginning an in-depth discussion about change data capture, this section describes how it fits in to the broader IBM InfoSphere Information Server architecture strategy.
IBM InfoSphere Information Server helps business and IT personnel collaborate to understand the meaning, structure, and content of any type of information across any sources. It provides breakthrough productivity and performance for cleansing, transforming, and moving this information consistently and securely throughout the enterprise so it can be accessed and used in new ways to drive innovation, increase operational efficiency, and help lower risk. With a unified metadata foundation, IBM InfoSphere Information Server allows users from various roles in the organization to establish a common vocabulary and understanding of the business from end-to-end to enrich and
streamline operations.
InfoSphere Information Server achieves new levels of information integration speed and flexibility by providing the following capabilities:
A comprehensive and unified foundation for enterprise information architectures, scalable to any volume and processing requirement
Auditable data quality as a foundation for trusted information across
the enterprise
Metadata-driven integration, providing breakthrough productivity and flexibility for integrating and enriching information
Consistent and reusable information services, along with application services and process services, essential for enterprises
Accelerated time to value with proven and industry-aligned solutions
and expertise
Broadest and deepest connectivity to information across diverse sources, such as structured, unstructured, mainframe, and applications
IBM InfoSphere Information Server provides every capability needed to integrate information across heterogeneous systems, including understanding source data, applying data quality, complex transformation, and various ways to deliver information. It has a unique, metadata-driven design that helps align business goals and IT activities, provides a consistent understanding of what things mean, captures business specifications and uses them to automate development tasks, and provides deeper insight into data by tracking its lineage.
InfoSphere Information Server is composed of four main pillars (Figure 1-3):
Understanding your data
Cleansing your data
Transforming your data
Delivering your data to where it is needed
Figure 1-3 InfoSphere Information Server
Here is a brief description of those four pillars:
Understanding your data: InfoSphere Foundation Tools provides tools that enable you to discover your data across systems by the following means:
 – Through comprehensive data discovery and mapping
 – With trusted information structures for business optimization through a common business vocabulary to define an enterprise model and
design specifications
 – Through governing the data over time by managing data quality and monitoring data flows
Cleansing your data: InfoSphere data quality capabilities ensure that you have reliable and accurate information by identifying the source of data quality problems, defining business rules to monitor and maintain quality, removing duplicate data, and validating, standardizing, and enriching
the data.
Transforming your data: InfoSphere data transformation extracts, transforms, and loads data between multiple sources and targets, supporting massive scalability requirements and delivering data in batch or real time.
Delivering your data: InfoSphere data delivery provides timely and reliable movement of heterogeneous data.
As shown in Figure 1-4, moving data and getting it to where it needs to be in a timely manner has a key role in data integration projects. Enterprises have many different, disparate, and heterogeneous data sources from which they require updated and fresh data to make timely and trusted business decisions. InfoSphere Information Server supports a broad range of data delivery styles to address business and IT objectives.
Figure 1-4 InfoSphere Deliver Pillar
The Deliver Pillar addresses the key challenges encountered by businesses that need to have trusted and up-to-date data to make decisions that positively affect business, whether for strategic initiatives or to for improving company efficiencies. A common challenge is heavily or overutilized source systems and applications that contain business critical data that cannot afford additional workload to query or extract data for reporting purposes. Batch windows are the traditional approach for updating data warehouses or reporting databases. As business data volume increases, the length of time required to bulk load these data changes is increased, which can impact the operational systems. Change data capture allows incremental data changes to be captured and delivered throughout the enterprise with minimal processing cost or impact to source systems. Batch windows can be shortened or eliminated by real-time feeds of data changes, and large volumes of data can be moved quickly and efficiently.
The remaining chapters of this book provide details and examples about implementation and operational aspects of InfoSphere Change Data Capture.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset