Chapter 7. Environmental considerations

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Environmental considerations

This chapter outlines some of the specific environmental considerations you need to be aware of, including:

•Dealing with different time zones or encodings is described in 7.1, “Globalization with InfoSphere CDC” on page 138.

•An IBM InfoSphere Change Data Capture (InfoSphere CDC) landscape can consist of several servers that must interconnect to be able to perform configuration and replication activities. It is common that firewalls have been put into place to protect these servers from unauthorized access. As such, firewalls contribute to almost all InfoSphere CDC connectivity issues. The configuration of these firewalls must be considered when designing your replication landscape with InfoSphere CDC, as described in 7.2, “Firewall configurations” on page 149.

•If and how you can use InfoSphere CDC when access to the source or target is restricted is described in 7.4, “Remote processing capabilities” on page 162.

•When replicating from a resilient environment or when InfoSphere CDC must be part of the resilient environment you have set up or been asked to design, see 7.5, “Using InfoSphere CDC in resilient environments” on page 168.

•How to cooperate with changes in your database environment or replication needs is outlined in 7.6, “Change management” on page 190.

•This chapter describes the initial load aspects when you start (or reinitiate) InfoSphere CDC.

Whenever your environment has some requirements for settings, the information in this chapter should help you with design considerations and
InfoSphere CDC suggestions.

This information can be beneficial in the design phase, but also has some hands-on examples that can assist during the implementation of the
designed architecture.

7.1 Globalization with InfoSphere CDC

In this fast growing economic environment, globalization is an important factor to consider when combining information from around the world. Two key aspects that this section addresses are:

•How do you combine data from around the world into a centralized
time zone?

•How do you translate the different characters into a single character set?

The good news is that current IT environments have been provided with the technical means to solve these questions:

•Time zones and Coordinated Universal Time (UTC)

•Unicode (an international character encoding standard)

When systems with different time zones or code pages interact with each other, you need to take care of proper conversions. This section describes the means and methods to deal with these issues while using InfoSphere CDC.

7.1.1 Time zone considerations

Time zone considerations can be made both from a business and a
technical viewpoint.

For certain business requirements, the time of the creation of the data or the time of consolidating the data is important. For example, for a banking environment, you might want to only create the planning and forecasting report when all data from the different worldwide business units is consolidated from all business units' at noon local time.

From a technical point of view, you could be investigating a data consistency problem by comparing source logging with the target log. You must be sure to compare the events that happened at the same time, independent of the time zone where the event occurred.

Technical considerations when dealing with time zones

Within InfoSphere CDC, you have date and time information in the
following areas:

•Date and time fields within the data.

These fields are the date and time fields within the table. They can either be generated by the source application or the database (for example, in current time fields.). From an InfoSphere CDC perspective, when they are part of the selected columns for replication, they are not modified. So the time that was put in the column is reflected without any time-zone conversions into the target database.

•Time stamp within the log.

This time stamp is provided by the database and is put in the database logging mechanism.

•Time stamp from the events.

This time stamp is put in the event log by InfoSphere CDC.

Time zone considerations for replicating data

Consider timestamps within the log. This section uses the example shown in Figure 7-1. Assume that the worldwide travel policy is that travel can only be booked after approval has been received from the travel manager. The travel policy is audited on regular basis by the worldwide expense auditor, who does not approve any expenses when the travel policy has not
been followed.

Figure 7-1 Considerations without time zone

As outlined in Figure 7-1, both the Travel Requester and Travel Approver use the Travel Approval System for the Approval process, which keeps its system time in UTC. At 12:30 local time, the Travel Requester (who is in an area where the local time is UTC+2) puts in a Travel request. This travel request is approved by the Travel Approver (who is located in an UTC-2 time zone) at 10:15 local time. Assume that the records do not keep a date and time and that the operation time stamp in the database log is used by InfoSphere CDC to generate the time of request and approval in the Consolidated Audit Information, which uses the system time.

The Travel Requester did not wait for the Travel Approver to book his trip. However, due to the time zone differences, the invalid sequence of events are not reflected in the Consolidated Audit Information. When the auditor compares the time of approval with the time of booking, and without having the location information of each of the roles, no irregularities are found and the
booking passes.

When the time zone of the Travel Requester is considered when the data is replicated, the sequence of events becomes clear. In this case, assume that the log entries are changed to the UTC time zone. This time, the Consolidated Audit Information reflects that the Travel Requester did not wait for approval before booking his trip (Figure 7-2).

Figure 7-2 Considerations with time zone

With the correct timing sequence, it is clear to the worldwide auditor that the process for Travel Request has not been done in the correct sequence and therefore fails the audit.

To illustrate the situation, consider the sample scenario outlined in the
following sections.

Booking system

The booking system is set to UTC+2, and has the following database table definition:

CREATE TABLE "CDCDEMO"."STD_TRAVEL_BOOKED" ("TRAVELID" NUMBER(10)

NOT NULL, "TRAVEL_INFO" VARCHAR2(30),

CONSTRAINT "TRAVELID" PRIMARY KEY("TRAVELID"))

TABLESPACE "USERS"

In this sample, the STD_TRAVEL_BOOKED table represents the booking application. It records TRAVEL_INFO but does not have a booking time in the table itself. The booking time is taken from a journal control field.

Within InfoSphere CDC, the instance refers to the CDC_Oracle_Redo data store.

Approval system

This system is set to UTC-7, and has the following database table definitions:

CREATE TABLE "CDCDEMO"."STD_TRAVEL_AUDIT" ("TRAVELID" NUMBER(10)

NOT NULL, "REQUEST" DATE NOT NULL, "APPROVED" DATE,

"BOOKED" DATE,

CONSTRAINT "TRAVELID" PRIMARY KEY("TRAVELID"))

TABLESPACE "USERS";

CREATE TABLE "CDCDEMO"."STD_TRAVEL_APPROVE" ("TRAVELID"

NUMBER(10) NOT NULL, "REQUEST" DATE, "APPROVED" DATE, PRIMARY

KEY("TRAVELID"))

TABLESPACE "USERS"

In this sample, STD_TRAVEL_AUDIT represents the auditors audit application, which contains the Request, Approved and Booked date. The STD_TRAVEL_APPROVE table represents the managers approval system.

Two subscriptions need to be created:

•The TRAVEL_MANAGER (Figure 7-3) replicates the approval date and time to the audit application.

Figure 7-3 TRAVEL_MANAGER

•The TRAVEL _BOOKING (Figure 7-4) replicates the booking date and time to the audit application.

Figure 7-4 TRAVEL_BOOKING

The tables are mapped with one-to-one consolidation and the TRAVELID is used as the key.

Wrong mapping without time zone considerations

STD_TRAVEL_APPROVE is mapped to STD_TRAVEL_AUDIT (Figure 7-5).

Figure 7-5 Wrong mapping approval

STD_TRAVEL_BOOKED is mapped to STD_TRAVEL_AUDIT (Figure 7-6). As there is not a booked date in the source table, the journal time stamp is used.

Figure 7-6 Wrong mapping booking

Besides using Consolidation One-to-One, we did not take anything else into consideration and automapped the available fields.

To simulate the application, complete the following steps:

1. An employee files a booking request in the Application Approval system. The application uses the local application time, which is put in the table shown in Figure 7-7.

Figure 7-7 Approval table

2. The employee decides not to wait and uses the booking application to confirm the booking (Figure 7-8). The local system has been added for information, but is not replicated.

Figure 7-8 Booking table

3. The manager approves the travel (Figure 7-9).

Figure 7-9 Manager approval

The resulting travel table opens (Figure 7-10).

Figure 7-10 Result table

Although in the simulation we made sure that we made the booking before the approval data and time was entered in the approval system, the audit table incorrectly reflects that all was booked in the correct sequence.

Date and time fields within the data

When replicating across time zones, it is important to understand how the different time zone's are understood by InfoSphere CDC.

Complete the following steps:

1. On the source database, create a table by running the following command:

CREATE TABLE "CDCDEMO"."REP_TIME" ("NUMBER" NUMBER(10) NOT NULL,

"SRCAPPTIME" DATE, "SRCLOCTIME" DATE, "SRCUTCTIME" DATE,

"SRCLOGTIME" DATE, "TGTAPPTIME" DATE, "TGTLOCTIME" DATE,

"TGTUTCTIME" DATE)

TABLESPACE "USERS";

2. On the target database, create a table by running the following command:

CREATE TABLE "CDCDEMO"."TimeOverView" ("NUMBER" NUMBER(10),

"SRCAPPTIME" DATE, "SRCLOCTIME" DATE, "SRCUTCTIME" DATE,

"SRCLOGTIME" "TIMESTAMP(6)", "SRCTIMEOFF" NUMBER(10),

"TGTAPPTIME" DATE, "TGTLOCTIME" DATE, "TGTUTCTIME" DATE,

"TGTTIMEOFF" NUMBER(10))

TABLESPACE "USERS";

3. Create a subscription and set up an audit replication with mappings (Figure 7-11).

Figure 7-11 TimeOverViewMappings

The derived column, DERSRCLOC, contains the following expression:

%TODATETIME(%CURDATE('*LOC'),'*YYMD',%CURTIME('*LOC'))

The derived column, DERSRCUTC, contains the following expression:

%TODATETIME(%CURDATE('*UTC'),'*YYMD',%CURTIME('*UTC'))

The derived column, DERSRCTIMOFF, contains the following expression:

%TONUMBER(%SUBSTRING(%CURTIME('*LOC'),1,2))-%TONUMBER(%SUBSTRING(%CURTIME('*UTC'),1,2))

When you start the replication and add a record in the source table, you have a target table record (Figure 7-12).

Figure 7-12 TimeOverViewTargetRecord

When the user, or user application, inputs a time stamp date or time, the time defined by the application or user is used. This time is either the system time or application time.

For InfoSphere CDC, replicate this time as is, and if you need to centralize this time, you need to convert it. As only the system where the data is created is aware of the time zone, any conversions needed should be done on that system.

If possible, you could modify the data model so the data and time field can hold the time zone where the data is created. You could also pass the date and time by using one of the following techniques:

•Delivering a centralized UTC time stamp

•Delivering local time overview table

Time zone considerations for monitoring

InfoSphere CDC uses local time for events, so when using the event viewer in the Management Console, both source and target events are shown with their local time.

7.1.2 Encoding conversions

In most circumstances, InfoSphere CDC handles code page conversions automatically. When you want to override the automatic code page conversions within InfoSphere CDC, you need to determine the code pages to transform from and to.

How InfoSphere handles encoding

InfoSphere CDC dynamically determines the code page from the source and target during the startup of the subscription. InfoSphere CDC detects the data encoding based either on the column encoding or database encoding and assigns the detected encoding to each column detected in the database. The encoding is based on what the database knows about the encoding of the data found in a specific column. Before assigning the encoding to a column, InfoSphere CDC normalizes the encoding knowledge found to an Internet Assigned Numbers Authority (IANA) encoding name. For more information about the IANA, go to the following address:

http://www.iana.org/assignments/character-sets

Every database has its specific knowledge about data encoding. That knowledge is represented by either a number (code page, Coded Character Set Identifier (CCSID), or Oracle charset ID) or a string (Oracle charset name or Sybase encoding name). Some databases can have a different encoding for each column, and some of them might have a database encoding for non-Unicode columns and then the Unicode national encoding for columns supporting that encoding.

When a table is added to the catalog, the column code page is requested from the database and stored in the internal metadata. InfoSphere CDC always keeps the encoding at the column level and uses this encoding during the replication.

For mainframe, any necessary encoding is done using the Unicode Conversion Services (UCS), which must be enabled on the system.

During startup of the subscription, the source and target negotiate which side does the encoding conversion for each column and the encoding information is put into memory.

Overriding code page conversions

When you have applications that write data into character columns with a different encoding from what has been defined in the database, InfoSphere CDC provides the flexibility to override the column encoding to specify the encoding of the actual data. Besides specifying the encoding conversion for character data, you can override the encoding to specify that no conversion should be done (binary data) or specify the conversion for binary columns, making InfoSphere CDC treat them as character data.

To override the automatic code page conversion, you need to go to the Encoding window of the table mapping. All character type data is shown there. You can map source column data to different column encodings. For example, in Japanese environments with different types of the same encoding (such as IBM-943 and IBM-943C), this capability can be useful. An example is shown in Figure 7-13.

Figure 7-13 Overriding code page

You can also override the encoding to specify that it is a binary column or you can specify encodings for binary columns, making them character columns. InfoSphere CDC then treats the source column as through it were a binary or character column from the perspective of what target columns it can be mapped to, and this setting changes the functions in which InfoSphere CDC can be used.

7.2 Firewall configurations

An InfoSphere CDC landscape can consist of several servers that must interconnect to be able to perform configuration and replication activities. If the servers that host any of the InfoSphere components (Management Console, Access Server, or Replication instance) are in different networks or network segments, firewalls are usually in place to protect these servers from unauthorized access.

Firewalls contribute to almost all InfoSphere CDC connectivity issues. The configuration of these firewalls must be considered when designing your replication landscape.

7.2.1 How InfoSphere CDC uses TCP/IP

Before elaborating about how firewalls affect your replication configuration, it is important to understand how InfoSphere CDC uses the TCP/IP application protocol. As with all TCP-based applications, InfoSphere CDC uses a client-server model. A server is an application that offers a service to Internet users. In addition, a client is a requester of a service and aa server is a program that receives a request, performs the required service, and sends back the results in a reply.

Each process (client or server) that wants to communicate with another process identifies itself to the TCP/IP protocol suite by one or more ports. This port is used by the operating system to determine to which application program (process) it must deliver incoming messages. After it is identified to the TCP/IP protocol, a socket is registered. The socket is uniquely identified by an address that consists of the transport protocol (TCP), local network address, and the local port number, for example TCP, 172.16.5.14:10901. The process of registering a socket is also called binding.

When a client socket communicates with a server socket, the connection is identified by an association that completely specifies the two processes that make up the connection using the following attributes: transport protocol (TCP), local network address:local port number, foreign / destination and network address:foreign / destination port number. An example is tcp, 172.16.5.10:47112, 172.16.5.14:10901.

A server can usually deal with multiple requests and multiple requesting clients at the same time. Most servers wait for requests at a well-known port so that clients know which port (and in turn, which application) they must direct their requests to. The client typically uses an arbitrary port called an ephemeral port for its communication. A connection that is established between a client and a server is full duplex, meaning that data streams can flow in both directions concurrently.

InfoSphere CDC typically requires multiple connections from a client to a server. Each connection is identified by its own association where the destination port number is well-known (fixed) and the source port number is variable. For example, if a subscription is running to replicate data from an Oracle to a DB2 database, InfoSphere CDC establishes two connections, one to replicate the change data and the other one to send / receive control information. If you include the Management Console and Access Server in the picture, the connections that are active at a given point in time could look as shown in Figure 7-14.

Figure 7-14 InfoSphere CDC connections

You can see the connections and their associations and the fact that multiple connections are activated. The InfoSphere CDC Management Console client process connects to the Access Server server process on port 10101. It establishes at least two connections for configuration and monitoring activities and an additional connection for each connected data store. In this example, the source port for each connection has been requested to and assigned by TCP/IP. This port is dynamically assigned based on availability and does not follow a specific sequence.

The Access Server routes all Management Console requests to the respective CDC engines. In the context of setting up firewall rules, it is important to know that there are no direct connections between the Management Console and the CDC engines.

7.2.2 Firewalls

Typically, firewalls protect a network from the outside (untrusted) environment (Internet and intranet). Firewalls can consist of one or more functional components, the most important CDC-supported component being the packet-filtering router.

Packet-filtering routers that can forward or discard packets according to filtering rules. When a packet (network data and a header which includes address and routing information) arrives at the packet-filtering router, the router extracts information from the packet header and decides, according to the filter rules, whether the packet passes through or is discarded. The following information can, and other information, be extracted from the packet header:

•Source IP address

•Destination IP address

•TCP/UDP source port

•TCP/UDP target port

Because most server applications run on a specific server and use well-known TCP/UDP port numbers, it is possible to allow or deny services by using the related destination IP address and target port in the filter. Most firewalls are configured this way: They shield servers from unauthorized inbound connection attempts coming from a client.

In a number of cases, firewalls are also configured to only forward packets coming from a certain network, indicated by the source IP address.

Rarely, firewalls are configured to forward packets coming from a certain source port or range of source ports. Configuring a firewall in this manner requires client applications to always bind to a certain port or a port within a certain range.

7.2.3 InfoSphere CDC in a firewalled network environment

InfoSphere CDC can operate in most firewalled network environments, including ones where you have packet filtering based on a source IP address and source ports. However, when firewalls are configured with source port filtering, you must carefully compare your intended InfoSphere CDC configuration against the restrictions put in place by the firewall. Failure to do this comparison might cause replication to not start successfully and it might be difficult to determine the cause. Generally, ensure that when a firewall exists anywhere on the route between a client and server that its configuration must be known and validated against the InfoSphere CDC configuration.

Looking at the different components in Figure 7-15, the following connections must be reviewed when being used by firewalls:

•Management Console connections to the Access Server

•Access Server connections to the source and target CDC instances

•Source CDC instances to target CDC instances (subscriptions)

•Replication instances to database instances (remote log reading and
remote apply)

Figure 7-15 InfoSphere CDC firewalled network

All connections between the InfoSphere CDC Management Console and the Access Server are initiated from the workstation running Management Console. For every connection, the target port is the port on which the Access Server listens. Connections are established when the Management Console connects to the Access Server (after the user and password are entered) and when data stores are connected. Traffic on all connections is mostly bidirectional.

Similarly, all connections between the InfoSphere CDC Access Server and the CDC instances are initiated from the server running the Access Server. Connections are established when a user connects to a data store from the Management Console and new connections are established if monitoring functions are used from within the Management Console. For every connection, the target port is the port that has been configured for the CDC instance. Traffic on these connections is mostly bidirectional.

Connections between CDC source and target instances are initiated from the server running the CDC source instance. These connections are established when subscriptions are started.

Connections between CDC instances (source and target) and the database are initiated from the server running the CDC instance in question. These connections are established when the CDC instance is started and when subscriptions are active.

7.2.4 Configuring source port restrictions

If your firewall also filters based on source port, you can and should configure a starting source port for the InfoSphere CDC component in question, whether it is Management Console, Access Server, or Subscription. When specifying a source port, the component in question establishes the first connection using this port. Any subsequent connection uses the previously issued port number + 1.

Management Console to Access Server

The starting source port and number of ports Management Console can use is configured in the Connection preferences of your Management Console client (Figure 7-16). When connecting to the Access Server, which listens on port 10101, the first connection has the following association: tcp, source_ip:59001, target_ip:10101. The next source port is 59002, and so on.

Figure 7-16 Connection preferences

When planning for the firewall configuration, the total number of connections that will be established can be calculated by the following formula:

2 + (number of connected data stores * 2)

For example, with starting port 59001 and 20 ports, the source ports to be opened in the firewall would range from 59001 through 59020. The Management Console is able to connect to the Access Server and have a maximum of 19 data stores connected simultaneously.

Access Server to CDC instances

The starting source port and number of ports Access Server can use is configured in the dmaccessserver.vmargs file that is located in the Access Server config directory. You can edit the text file and restart the Access Server, after which the changes take effect. The format of the arguments is:

jar lib/server.jar local_port:<first_port> local_port_count:<number_available_ports> <Access_Server_Listener_Port>

The number of source ports to open when defining Access Server source ports depends on the number of Management Consoles that are routed through to the different InfoSphere CDC instances. This number can be calculated using the following formula:

2 * (number of management consoles + (number of management consoles * number of connected data stores) + number of data stores)

CDC source to target instances (subscriptions)

The starting source port must be configured for each subscription in the Advanced settings. Each subscription potentially establishes up to six connections to the target CDC instance. This situation depends on the type of CDC engine that is running on the source and target.

The number of source ports to open in your firewall configuration can be calculated using the following formula:

6 * number of subscriptions

CDC instance to database

The source port for connections established from the CDC instance to the database cannot be configured. As a best practice, keep the CDC engine close to the database server and do not use firewall rules with source port restrictions.

7.2.5 Troubleshooting CDC connection issues

This section describes basic troubleshooting steps to determine whether a firewall is blocking traffic between CDC components.

The first step in analyzing connectivity issues is determining whether the client can find the server by its configured IP address or host name. Often, connectivity issues are caused by a mismatch on the host tables of the server that runs Access Server and the CDC source server.

For example, the host name for the CDC target server, tgtsvr, might be resolved to it real address of 172.16.5.1 in your DNS. The server running the Access Server might have a host table that causes the tgtsvr host name to resolve to the address 10.1.1.5. This configuration causes the Access Server to be unable to reach the target server.

Another example is when the tgtsvr host name is resolved to address 10.1.1.5. The Access Server uses the DNS and resolves to address 171.16.5.1 for the target CDC server. You can map source and target tables up to the point where the subscription tables are described and the source CDC engine attempts to establish a connection to the tgtsvr host name. The description fails because of the invalid 10.1.1.5 address that is listed in the source server's host table.

If the environment allows this action and firewalls do not block ICMP requests, you can run ping from each of the servers running a CDC component to ensure that host names are resolved to the expected IP addresses. Alternatively, you can configure the data stores in your Access Server by their IP addresses.

If there are still connectivity issues after having confirmed the IP addresses are correct, a firewall might block traffic coming from a certain server. Each of the CDC components, Access Server, and CDC engines, respond to basic connection requests coming from a telnet client. Simple tests can be run to determine if the route between any of the components blocked by a firewall or different issue.

The command to test the activity of a server port is:

telnet <ip address of server> <target port>

If a CDC component is active on the specified IP address and listening on the target port, the connection is established successfully and opens a telnet session. You can return to the telnet command line by pressing ^] (Ctrl+]) and then run quit to exit.

When testing connectivity, it is important to remember which CDC component establishes the connection. If a connectivity issue occurs when tables are being mapped, you need to determine whether the source CDC engine can connect to the target CDC engine. This situation means that the telnet test must be done on the CDC source server. Should you be able to connect to the Access Server but not be able to connect to a certain data store, run the test on the server hosting the Access Server.

7.3 Log retention

As described in Chapter 1, “Introduction and overview” on page 1, InfoSphere CDC depends on the logging mechanism for its push capabilities. This section describes some general guidelines for log retention in combination with InfoSphere CDC. If you require some specific platform or database guidelines, see 7.3.2, “Log retention platform-specific guidelines” on page 159.

7.3.1 Log retention general guidelines

Each database has an initial fast logging mechanism, typically referred to as an online log. The DBA has tuned the online log by providing fast storage. Having highly performed storage means having higher costs, so the storage space reserved for the online log is limited. When the online log has used all predefined settings, it is moved to what is referred to as archive logging. These predefined settings are mostly dependent on available fast storage and the number of transactions made by the database or different databases using the same storage space.

By implementing InfoSphere CDC, you can read from the online log during normal operations. If there are frequent cases where InfoSphere CDC has a guaranteed latency (for example, with back-up procedures and in environments where InfoSphere CDC is running periodically), it is necessary to provide storage for the archive log that also has an adequate read speed. This action ensures that performance of the InfoSphere CDC scraping process can stay within limits to cope with the replication needs.

Long open transactions in the source database can cause InfoSphere CDC to require a log much older than necessary. For example, a user who logged in to the database and issued a command without having issued a subsequent commit or rollback can cause this situation. If the user does not exit their session, a transaction can be open for hours, days, or even weeks. InfoSphere CDC must retain information about all open transactions in the database in the staging store on Linux, UNIX, and Windows or the log cache on z/OS. As the transaction ages, it can be removed from this storage as room is needed for current transactions. Ending replication for a subscription while this situation exists causes InfoSphere CDC to require the log where the long open transaction began when it is restarted. A standard practice when running InfoSphere CDC is to ensure that there is a procedure in place to end open transactions in the database after a certain period of inactivity. At minimum, a policy should be in place requiring users to log out at the end of the work day.

Make sure that all log data stays available until InfoSphere CDC has been able to replicate the required data to the defined destinations.

7.3.2 Log retention platform-specific guidelines

As described in 7.3.1, “Log retention general guidelines” on page 158, usage of the database log is key to replicating data using InfoSphere CDC in near real-time mode. Some client environments require regular and frequent cleanup or archiving of the logs to meet their business / disk space requirements. If database logs are removed before InfoSphere CDC has completed processing the transactions within them, mirroring fails. The following sections outline considerations for log retention / management in a replication environment.

Log retention for DB2 on Linux, UNIX, and Windows

For this type of log retention, InfoSphere CDC requires the following items:

•Log retention must be enabled for any DB2 for Linux, UNIX, and Windows database to be replicated.

•A database backup is required before an initial connection is established from InfoSphere CDC.

•Physical access to the DB2 online logs and archive logs on disk is required.

•Retention of the DB2 online logs and archive logs for as long as InfoSphere CDC is shut down or latent.

Log retention for DB2 for System i

For this type of log retention, InfoSphere CDC requires the following items:

•InfoSphere CDC replicates data from physical files. The files must be journaled with journal images on both.

•When tables are not journaled when selected within InfoSphere CDC for mirroring, InfoSphere CDC activates the journaling for this table in the default journal. The standard default journal for InfoSphere CDC is DMIRROR/DMCJRN, but this journal can be (and should) be modified to a client-specific journal.

•Retention of the DB2 online logs and archive logs for as long as InfoSphere CDC is shut down or latent.

Log retention for z/OS

For this type of log retention, InfoSphere CDC requires the following items:

•Retention of archive logs for as long as any of the subscriptions are inactive or latent.

•Fast access to archive logs if they are stored on tape. However, the preferred configuration is archive logs on disk.

•Where a log cache is configured, archive logs are required only when a subscription is inactive or latent longer than the oldest data retained in
the cache.

Here are some guidelines about InfoSphere CDC on z/OS:

•Long open transactions (Units of Recovery (URs))

Messages are issued to the event log when a UR has been open in the database for a long period (60 minutes by default, which can be change by using the OPENCOMMITWARNAGE parameter). If the subscription is shut down while the UR remains open, InfoSphere CDC needs to reread the logs from the beginning of that UR. You should ensure that there are no long open transactions before ending a subscription. These messages can optionally be issued to the system console by using the CONSOLEMSGS parameter.

•Log cache

With a log cache configured, requirements to reread data from the DB2 logs are reduced. The log cache is a circular buffer and retains information about all URs and all data for tables with DATA CAPTURE CHANGES configured. The cache reader keeps up with the head of the DB2 log and removes the oldest data to make room for new URs as the DB2 log is extended.

Suppose a subscription is inactive and requires data older than the data stored in the log cache, or a subscription experiences such extreme latency that it is removed from the cache. In that case, it requests the data from the DB2 instrumentation facility interface (IFI) directly and returns to reading from the log cache when the data it needs is available there.

Log retention for Oracle platforms

InfoSphere CDC requires access to both the redo and the archive logs, which need to be configured for supplemental logging.

Retention of the logging is required for both redo and archive logging for as long as InfoSphere CDC is shut down or latent.

Log retention for Microsoft SQL Server

For this type of log retention, InfoSphere CDC requires the following items:

•Physical access to the online transaction logs on disk. Multiple physical log files are supported.

•Physical access to the transaction log backups on disk. Transaction log backups may not be compressed or encrypted or otherwise transformed from the format output by SQL Server.

– InfoSphere CDC can only read files in the original unaltered format generated by the built-in SQL backup utility. Log files may not be moved after they have been saved

– When using third-party back-up tools, you must make sure that they do not compress or move the transaction logs from the original location.

•Retention of the logging is required for both redo and archive logging for as long as InfoSphere CDC is shut down or latent.

Log retention for Sybase

For this type of log retention, InfoSphere CDC requires the following items:

•Physical access to the online logs on disk.

•Physical access to the archive logs on disk. All archive logs must be located in one directory and InfoSphere CDC must have read permission for
this directory.

•Retention of the online logs and archive logs for as long as InfoSphere CDC is shut down or latent.

Here are some guidelines:

•The online logs and archive logs can be on different disks.

•Ensure the truncate log option on checkpoint is disabled. If the truncate log option is enabled, the database log is truncated automatically without backup every time a database checkpoint is performed

•Never run truncate_only operations. Running a memory dump transaction with the truncate_only option deletes inactive transactions from the log without creating a backup.

•Use only data or log segments, not a combination. If a database has mixed segments, log backup is not allowed and only full database backups can
be performed.

7.4 Remote processing capabilities

Although a general preferred practice is to install InfoSphere CDC on the server where the source or target database server is located, in some environments business reasons may dictate that it must be installed on a separate or remote server. This requirement might exist because of a lack of resources on the source / target database server sufficient to run InfoSphere CDC, corporate security standards, or other vendor application environments.

This section outlines implementation methods that can be used to implement InfoSphere CDC in a remote capture or apply configuration.

Some options for these configurations include:

•InfoSphere CDC is installed for both the source and target on the target database server and configured to remotely scrape the logs of the source database system (see 7.4.1, “Remote source” on page 163 for more details).

•InfoSphere CDC supports log shipping for Oracle databases. This functionality has been designed to cater to environments where log retention policies conflict with the use of replication solutions against those logs or when access to the original database logs is not possible or is forbidden (see 7.4.1, “Remote source” on page 163 for more details).

•InfoSphere CDC is installed for both the source and target on the source database server and configured to remotely apply database changes to the target database (see 7.4.2, “Remote target” on page 164 for more details).

•InfoSphere CDC is installed on an intermediate server for either remote source scrape or target apply (see 7.4.3, “Remote source and target” on page 165 for more details).

•This section also describes InfoSphere CDC support for log shipping, which is available only on the Oracle environments (see 7.4.4, “Log shipping” on page 166 for more details.

When InfoSphere CDC is installed on a different server from the database server it replicates changes from or to, there is always a requirement that a client for the accessed database be installed on the CDC server.

7.4.1 Remote source

When referring to a remote source in the context of InfoSphere CDC, the CDC engine is on a different server from the database that is generating the logs (Figure 7-17).

Figure 7-17 InfoSphere CDC reading logs from a remote server

Remote log reading is supported for DB2 on Linux, UNIX, and Windows, DB2 on System i, Sybase, and Oracle databases, and the mechanics for setting remote log reading are different for all databases. The following sections provide additional information regarding DB2 and Oracle as examples.

DB2 databases

When processing the DB2 logs from a different server, the DB2 client must be installed on the server that hosts InfoSphere CDC.

Before creating the InfoSphere CDC instance, the DB2 source database from which the log entries are read must be identified to the DB2 client. This task can be accomplished by completing the following steps:

1. Catalog the remote database node on the DB2 client (CDC) server.

2. Catalog the remote database on the DB2 client (CDC) server.

Specific commands and more details can be found at the following address:

http://publib.boulder.ibm.com/infocenter/cdc/v6r5m1/topic/com.ibm.cdcdoc.cdcfordb2luw.doc/concepts/configuringremotelogreading.html

Oracle databases

When remotely reading Oracle logs, the server that InfoSphere CDC is running on must be the same brand and chipset (for example, reading Oracle logs on IBM System p® from an HP server is not supported). Also, the endian of the database and CDC server must be the same (processing database logs generated on a Linux server from a System p server is not supported).

The following items must be considered:

•The archive and redo log files are stored in a shareable file system (SAN or NFS mount). If available, a SAN is preferable over NFS for performance impact and throughput reasons; when log volumes are high, NFS mounts are not practical. It is not possible for redo / archive files stored on RAW devices.

•If the first set of online and archive logs cannot be shared, consider having Oracle multiplex the archive logs (and, optionally, the online logs too).

•The directories that hold the archive and redo log files must be accessible from the server on which InfoSphere CDC runs. The InfoSphere CDC user must have read permissions on archive and redo logs.

Some specifics to consider when setting up the remote source configuration are:

•Create the dba group on the InfoSphere CDC server with the same GID as the source database server and make the InfoSphere CDC UNIX account part of this group. This configuration eliminates having to open too many permissions for others.

•Create a mount point on the InfoSphere CDC server that resembles the directory referenced in the v$archived_log and v$logfile views.

– For example, the archive logs are stored in /oradata/cdcdemo/archive, and if possible, create the following mount point on the InfoSphere CDC server:

/oradata/cdcdemo/archive

– When InfoSphere CDC determines the name of the log it must read, it can go to the directory without having to replace the path component.

•If the mount point cannot be created, the following parameters can be set to override the directories from which the log files are read:

– mirror_archive_log_directory (location of the archive logs).

– mirror_online_log_directory (location of the online logs).

If remote log reading must be performed in a RAC configuration and the original directories cannot be mounted the same way, you must ensure that archive logs of all RAC nodes are in the same directory.

7.4.2 Remote target

In a remote target environment, InfoSphere CDC must apply the transactions to a database that is on a different server. On the InfoSphere CDC server, make sure that you are able to connect to the remote target database as though it was on the same server.

Database logs are read locally on the source server and changes are sent to the target CDC instance. This instance then applies them remotely to the target server (using the database client) (Figure 7-18).

Figure 7-18 Remote target

Remote apply is supported for DB2 on Linux, UNIX, and WIndows, Sybase, SQL Server, and Oracle, and requires that the database client for the applicable database is installed on the server that hosts InfoSphere CDC. Additionally, the database server that is targeted must be identified in the configuration of the client so that it can be reached by CDC. For DB2, the database server and the remote database must be cataloged; for Oracle, the remote database should exist as a TNS entry in the SQL*Net configuration of the client.

7.4.3 Remote source and target

In this scenario, InfoSphere CDC is installed on a dedicated server that hosts the source and target engine. Besides the CDC engines, the source and target databases must have their database client installed on this server.

Database logs are read remotely from the source server and sent to the target CDC instance. This instance then applies them remotely to the target server (using the database client) (Figure 7-19).

Figure 7-19 Remote source and target

7.4.4 Log shipping

Log shipping is available for Oracle environments only.

In log shipping, InfoSphere CDC uses the database logs coming from another system (Figure 7-20).

Figure 7-20 InfoSphere CDC log shipping

InfoSphere CDC does not take care of the shipping of the logs. It is the responsibility of the DBA to make sure that the logs are sent to the defined system. Log shipping is only possible by using archive logs. Therefore, for a near real-time scenario, this option might not be the best solution.

Here is what you need to use log shipping in an Oracle environment. The InfoSphere CDC system parameters should be set as follows:

•Oracle_log_shipping=true

•Oracle_archive_logs_only=true

When changing these system parameters, the InfoSphere CDC instance must be restarted for them to take effect.

The shipping of the archive logs can be performed by either the Oracle Data Guard log shipping feature or by a client-created log shipping script.

When you use the Oracle Data Guard log shipping feature, InfoSphere CDC can safely pick up the archive log files stored in the oracle_archive_dir directory while Data Guard ensures the consistency of the files. The following considerations must be met:

•Archive logs are shipped to a secondary destination by Oracle Data Guard.

•InfoSphere CDC must be configured to read the logs from the
secondary destination.

•The extra data store system parameters to be set are:

– oracle_using_log_transport_services=true

– oracle_archive_destination_id=<destination_id_from_oracle>

– oracle_archive_dir=<directory_holding_archive_logs>

When changing these system parameters, the InfoSphere CDC instance must be restarted for them to take effect.

When creating your own script to perform the log shipping, you must include the following steps:

1. Select new archive files from v$archived_log with status="A" (marked available by Oracle). Ensure that you do not ship archive logs that do not have the “A” status, as this situation exposes the risk of incomplete shipment and losing transactions.

2. Run a checksum of each archive file before sending the file to the target.

3. Transfer the archive files to the InfoSphere CDC server.

4. Run a checksum of each archive file on the InfoSphere CDC server and compare with the checksum of source server.

5. Notify InfoSphere CDC for each archive file using the
dmarchivelogavailable command.

The following system parameters must be set when using custom scripting for archive log shipping:

•oracle_archive_dir= <directory_holding_archive_logs>

•oracle_log_path_userexit=<class_name>

Class name must be in the lib directory under the InfoSphere CDC home and must implement ArchiveLogPathUserExit and hold method ArchiveLogPath.

Here are some considerations for the log shipping scripts:

•If possible, run the entire log shipping from the source server.

•All steps, from picking up the archive log and to making the archive log available for InfoSphere CDC, can then be handled by one single script. This situation allows for full control over comparison of checksums and availability of the archive logs.

•If sending the archive logs and processing on the target cannot be done in a singe script, consider the following items:

– Send the archive log under a different name.

– Rename the archive log file when the transmission is complete (using the SFTP subcommand).

– The script that makes the archive logs available for InfoSphere CDC only picks up archive logs with the original name.

•When refreshing tables or marking table capture points, ensure that you switch online log files to archive and ship the logs. Otherwise, the starting point for replicating the table may not be reached until the online logs
are archived.

7.5 Using InfoSphere CDC in resilient environments

In many organizations, information is the most important business asset, which makes the database that holds it an important IT asset. Also important is having the data be highly available. There are two options to implement a highly available (HA) architecture for databases:

•Hardware mirroring

•Software mirroring (replication)

Hardware mirroring is implemented at the disk level, and is supplied by disk arrays vendors. For software mirroring, there are dedicated solutions available that ensure a backup server is kept synchronized (asynchronously) with production. This section describes the implementation of InfoSphere CDC in resilient environments, which includes typical configurations for the
various platforms.

Database clusters are typically implemented for high availability, disaster recovery, load balancing, or a combination of those options. In a cluster scenario, one of the nodes can stop due to a planned activity or due to a failure. The process of stopping a node and turning over services to another node is called failover. InfoSphere CDC works directly with the database and can be installed on the node that might fail over at some point, which affects InfoSphere CDC functionality. While designing an InfoSphere CDC implementation in a cluster environment, two main points should be considered:

1. If InfoSphere CDC processing is moved from one node to another during failover, how do the external clients, such as Management Console and subscriptions, reach this instance?

2. How is replication processing by CDC started after a backup node takes over control? This situation includes finding the binary files and configuration (metadata). Both binary files and metadata of the instance should
be accessible.

7.5.1 InfoSphere CDC reachability: Virtual IP

In a high availability cluster, a common resource is the virtual IP address. A virtual IP (VIP) is an IP address that is not connected to a specific computer or network interface card on a computer. Incoming packets are sent to the VIP address, but they are redirected to physical network interfaces.

InfoSphere CDC is typically implemented in an active / passive configuration where the replication is only active on one node at a time. The virtual IP address is directed to the InfoSphere CDC node and is used in other instances and in Management Console. When InfoSphere CDC has a failover to another node, nothing should be changed for external users, and all connections can be restored to the same (virtual) IP and port (Figure 7-21).

Figure 7-21 CDC cluster reachability

In the left side of Figure 7-21, Management Console connects to the data store using (virtual) IP address 192.168.21.68 which is directed to physical address 192.168.21.45 (node A). Also, data packets coming from subscriptions that are targeting this data store are redirected to the physical IP address. When a failover of node A is performed, the traffic targeting virtual IP address is redirected to physical IP address 192.168.21.56, which is on node B (Figure 7-21).

7.5.2 InfoSphere CDC binary files and metadata for the Linux, UNIX, and Windows engine

The location of the InfoSphere CDC installation binary files and metadata for the common (Java) engine has a significant impact on how the InfoSphere CDC cluster is maintained. It is important to know that the configuration of InfoSphere CDC consists of two components:

•Configuration metadata (subscriptions and mapped tables): These items are kept in a directory on the file system under the cdc_home/instance directory.

•Operational information, such as instance signature and bookmarks. These items are kept in tables in the database from or to which CDC is
replicating data.

The configuration metadata and operational information are related through the signature and cannot be maintained independent from each other. The signature is established when the instance is created. You cannot recreate an instance with the same location of the operational information without invalidating the
old instance.

The next sections describe three possible topologies of an InfoSphere CDC cluster implementation of the Linux, UNIX, and Windows engine and how it should be maintained in each one of them. These topologies are:

•InfoSphere CDC on a shared volume

•InfoSphere CDC on separate nodes with a shared database

•InfoSphere CDC on separate servers with separate databases

7.5.3 InfoSphere CDC on a shared volume

To implement this topology for the Linux, UNIX, and Windows engine, you need a volume that can be accessed by all nodes in the cluster. Usually the shared volume is provided by a storage area network (SAN) or Network Attached Storage (NAS). In some configurations, the disk volume is only mounted on and accessible to the active node. Having a shared volume that keeps the installation of CDC provides the simplest topology for implementation and maintenance.

InfoSphere CDC binary files and configuration metadata is stored on the shared volume and any changes made to the configuration on the active node are centrally stored and available at failover time. InfoSphere CDC operational information is stored centrally in the database, for which clustering must have been set up. All files in the CDC directory structure must be owned by the same user ID on all nodes in the cluster and also the same group ID if it is installed on UNIX / Linux. Because there is only one copy of the configuration metadata, all nodes on which this InfoSphere CDC instance runs must have same database, services, database connection parameters, and database log locations.

The topology for a shared mount point installation is shown in Figure 7-22.

Figure 7-22 InfoSphere CDC on SAN

Installation and configuration steps

Here are the installation and configuration steps:

1. Install InfoSphere CDC once on the active node. The product home directory (referred to as $CDC_HOME) must be on a shared volume.

2. Create an InfoSphere CDC instance on the active node. Configuration metadata is created in the file system under the product home directory and operational metadata in the database.

3. Start InfoSphere CDC instance on the active node.

4. Configure subscriptions and table mappings.

Failover steps

If failover is planned, complete the following steps:

1. Stop replication for all subscriptions (run dmendreplication). If this InfoSphere CDC instance is a target, replication must be stopped from
the source.

2. Stop InfoSphere CDC instance (run dmshutdown).

After the shared volume is available on the failover node:

•Start the CDC instance (run dmts32 or dmts64).

•Clear the staging store for the instance (run dmclearstagingstore).

•Restart subscriptions (run dmstartmirror). If this InfoSphere CDC instance is a target, replication must be started from the source.

7.5.4 InfoSphere CDC on separate nodes with a shared database

In this topology for the Linux, UNIX, and Windows engine (Figure 7-23), InfoSphere CDC is installed on each node in the cluster and there is no shared volume available to contain the CDC installation. The attached database is clustered using a shared volume. Although it is straightforward to implement, it requires additional maintenance and failover steps.

Figure 7-23 CDC resilient installation - separate nodes

InfoSphere CDC binary files and configuration metadata are kept independent from each other on both nodes, so you need to synchronize all the changes that are made to the configuration from the primary to the backup node. Files that are kept in the CDC home directory and its tree should have the same user ID and group ID when installed on UNIX and Linux. This setup ensures that the user running the instance on the backup node can actually read and update the configuration when failed over.

Because operational information is stored in the database and there is a link (through a signature) between this data and the configuration metadata, it is not possible to create a CDC instance on the backup node and redo the changes. Therefore, the configuration metadata on the primary node should periodically be backed up and restored onto the backup node. The configuration metadata can be synchronized as frequently as wanted. At a minimum, you should do this task after making configuration changes (mapped tables, subscriptions, and system parameters) and after the initial refresh of mapped tables has taken place and the table status has gone to active.

Because configuration metadata is replicated and operational metadata is the same, as in the previous topology, all nodes that this InfoSphere CDC instance runs on must have the same database, services, database connection parameters, and database log locations.

Installation and configuration steps

Here are the installation and configuration steps:

1. Install InfoSphere CDC on all nodes. Keep $CDC_HOME the same on
the servers.

2. Create an InfoSphere CDC instance on the active node. The product metadata is created under the product home directory and in the database.

3. Copy the entire $CDC_HOME/instance directory to the backup server. Do not create an instance on the passive node, because it invalidates the instance created on the active node.

4. Start the InfoSphere CDC instance of the active node.

5. Configure subscriptions and map tables.

6. Periodically copy the file system metadata and event logs from the active to the passive node. Metadata is kept in the $CDC_HOME/instance/<instance_name>/conf/md* file. You can run the dmbackupmd command to back up the files to the $CDC_HOME/instance/<instance_name>/backup/bnn directory. To keep the log history when the instance becomes active on the second node, the events database should be also copied. Events are kept in the $CDC_HOME/instance/<instance_name>/events/* file.

Failover steps

If a failover is planned, complete the following steps:

1. Stop replication for all subscriptions (run dmendreplication). If this InfoSphere CDC instance is a target, replication must be stopped from the source.

2. Stop InfoSphere CDC instance (run dmshutdown).

After the InfoSphere CDC instance is stopped (or has failed) on the active node and the Virtual IP becomes available on the passive node:

1. Restore the latest version of the file system metadata and events.

2. Start the instance (run dmts32 or dmts64).

3. Clear the staging store for the instance (run dmclearstagingstore).

4. Restart subscriptions (run dmstartmirror). If this InfoSphere CDC instance is a target, replication must be started from the source.

7.5.5 InfoSphere CDC on separate servers with separate databases

In this topology for the Linux, UNIX, and Windows engine, there are two separate instances of the database that use a technology, such as DB2 High Availability Disaster Recovery (HADR) or Oracle Data Guard, to keep a backup (standby) database synchronized with the primary database.

When the high availability solution is implemented on the CDC source, the log entry sequence numbers (LSN for DB2 and SCN for Oracle) are the same on the primary and the backup databases. The backup database is typically in standby mode or only opened for read activity (reporting).

If the CDC target is a database for which such a high availability solution is active, the synchronization of the log entry sequence numbers is not relevant. However, in such a scenario, the reachability of the server is an important consideration. Typically, a virtual IP address cannot be assigned because the primary and backup databases are remote from each other and on different network segments.

InfoSphere CDC configuration and operational metadata are different and kept in pairs due to the signatures between them. This topology should only be implemented if a virtual IP cannot be assigned.

This configuration is shown in Figure 7-24. The InfoSphere CDC binary files and configuration metadata are kept independent from each other on both servers, so you need to synchronize all changes that are made to the configuration from the primary to the backup server. Files in the InfoSphere CDC home directory and its tree should have the same user ID and group ID when installed on UNIX and Linux. This setup ensures that the user running the instance on the backup node can actually read and update the configuration when failed over.

Figure 7-24 CDC in a HADR / Data Guard environment

The configuration metadata on the primary node should periodically be backed up and restored onto the backup node. The configuration metadata can be synchronized frequently, but at a minimum you should do this task after making configuration changes (mapped tables, subscriptions, and system parameters) and after the initial refresh of mapped tables has taken place and the table status has gone to active.

Because configuration metadata is replicated and operational metadata is the same, as described in 7.5.4, “InfoSphere CDC on separate nodes with a shared database” on page 173, all nodes that this InfoSphere CDC instance runs on must have the same database, services, database connection parameters, and database log locations.

Installation and configuration steps

Here are the installation and configuration steps:

1. Install InfoSphere CDC on all nodes. Keep $CDC_HOME the same on
the servers.

2. Create an InfoSphere CDC instance on the active server. The product metadata is created under the product home directory and in the database.

3. Copy the entire $CDC_HOME/instance directory to the backup server.

4. Start the InfoSphere CDC instance on the active server.

5. Configure subscriptions and map tables.

6. Periodically, copy the file system metadata and event logs from the active to the backup server. Metadata is kept in the $CDC_HOME/instance/<instance_name>/conf/md* file. You can run the dmbackupmd command to back up the files to the $CDC_HOME/instance/<instance_name>/backup/bnn directory. To keep the log history when the instance becomes active on the backup node, the events database should be also copied. Events are kept in the $CDC_HOME/instance/<instance_name>/events/* file.

Failover steps when the server is the source for InfoSphere CDC

If the failover is planned, complete the following steps:

1. Stop replication for all subscriptions (run dmendreplication). If this InfoSphere CDC instance is a target, replication must be stopped from
the source.

2. Stop the InfoSphere CDC instance (run dmshutdown).

After the InfoSphere CDC instance is stopped (or fails) on the primary server and the database on the backup server is active, complete the following steps on the backup server:

1. Restore the latest version of the file system metadata.

2. Start the instance (run dmts32 or dmts64).

3. Clear the staging store for the instance (run dmclearstagingstore).

4. Restart subscriptions (run dmstartmirror).

Failover steps when the server is the target for CDC

If the failover is planned, complete the following steps:

1. Stop replication for all subscriptions (run dmendreplication) on the
source server.

2. Stop the InfoSphere CDC instance (run dmshutdown).

After the InfoSphere CDC instance is stopped (or fails) on the primary server and the database on the backup server is active, complete the following steps on the backup server:

1. Restore the latest version of the file system metadata.

2. Start the instance (run dmts32 or dmts64).

Then, the subscriptions need to be redirected to the backup server. Complete the following steps:

1. From the Access Manager perspective, change the IP address (or host name) of the target data store.

2. Connect to the target data store from the Management Console
Configuration perspective.

3. Right-click the connected data store, click Properties, and click the
Aliases tab.

4. Add the IP address or host name of the backup server to the aliases.

5. Go to each subscriptions’ properties and click the Details button next to the target data store. You should be able to select the backup server's IP address or host name as the new destination for the subscription

7.5.6 System i resilient environments

There are different scenarios for InfoSphere CDC resiliency when an IBM System i server (formerly known as AS/400 and IBM eServer™ iSeries®) is part of the replication landscape. There are two methods for providing resilience in a System i environment, one similar to the clustering solutions that are available for other platforms and the other using software-based high availability solutions.

This section focuses on how the two methods pertain to the configuration failover of InfoSphere CDC.

Resiliency using Independent Auxiliary Storage Pools

Independent Auxiliary Storage Pools (IASPs) are a System i supported means of storing business data and applications separately from the internal storage that comes with the server. IASPs lend themselves to using SANs and disk-based replication solutions for resiliency of data and applications. IASPs can be detached from one server and then attached to a backup server so that the backup server can take over production work.

A simple scenario of an IASP that can be attached to local servers is shown in Figure 7-25.

Figure 7-25 IASP attached to local servers

When using a disk-based mirroring solution, an IASP can have a physical backup that is remote from the primary production site. If there is a failure of the primary server or a site disaster, the server on the backup site can take over the production work.

A mirrored IASP is shown in Figure 7-26.

Figure 7-26 IASP attached to local servers

An IASP can only be attached to one System i server at a time. Also, when disk mirroring is used to synchronize to a backup site, the backup IASP cannot be attached to the backup server until synchronization is stopped.

Using InfoSphere CDC in an IASP environment

InfoSphere CDC supports installation in environments where IASPs are used. The product has a number of objects that must be kept in internal storage (*SYSBAS ASP), such as a user profile (D_MIRROR), the subsystem description, and a job queue.

When installing InfoSphere CDC in a resilient environment, you must perform the installation on the active server that is connected to the IASP. During the installation, you are prompted to specify the name of the IASP and the product library; this library is where the full product, including the configuration tables, is installed, and which is kept on the IASP. The installation automatically creates another library on *SYSBAS ASP (the name is automatically determined by the installation program, being the first eight characters of the product library name, appended with 01), which is the work library. The work library holds all the objects that cannot be kept on the IASP.

After you have installed InfoSphere CDC on the active system, you must perform a work library installation on the inactive system. The work library installation just creates the library and objects that cannot be kept on the IASP and that are only used when CDC must be activated on the backup server.

If you change the D_MIRROR user profile or the objects in the InfoSphere CDC work library, these changes must also be made on the inactive server. The system ASP holds the service table entries, which define the port that the product listens on. The service table entry must be available on the backup server, or the listener does not start successfully when the subsystem is started. Configuration, such as mapped tables, subscriptions, and so on, are kept in the full product library, which is on the IASP, and shared (or mirrored) between the primary and backup server.

If there is a failover or switchover of InfoSphere CDC in an IASP environment, you must stop the subscriptions replicating from or to the primary System i server, and then start the subsystem in the CDC work library. This action automatically starts the TCP/IP listener. Once active, the replication can be resumed and should incur no loss or duplication of data.

Besides the InfoSphere CDC library, the IASP needs to hold the tables that are being replicated from or to. Also, if you are replicating from tables that are on an IASP, the journal and journal receivers of these tables must also be on the same storage. This action should already be part of your implemented
recovery strategy.

Using InfoSphere CDC with software-based high availability solutions

Many clients with System i servers employ software-based high availability solutions to provide for disaster recovery. An example of a software-based high availability solution is IBM iCluster® for i.

Software-based HA solutions operate by reading journal entries generated by the production application and applying these entries onto the target tables. These target (backup) tables are also journaled and therefore generate journal entries similar to the ones on the production side. Journaling on the backup server might be different from the journaling on the primary server (but typically not). The journal receivers containing the backup server's tables journal entries might (and probably do) have a different name from the primary server's journal receivers. In some cases, journaling on the backup server might even be disabled while the server is not running any production processes.

This situation adds additional complexity to the journal management and switchover process if InfoSphere CDC uses the System i production server as a source. This section lists the considerations when implementing InfoSphere CDC in such a landscape.

There are two main items that must be considered in a software-based
HA implementation:

•Switchover and failover of InfoSphere CDC

•Journal management (purging of obsolete journal receivers)

These considerations only apply when the System i server is used as a source for InfoSphere CDC. When targeting this server platform, switchover is much more straightforward and journal management considerations do not apply.

Switchover and failover

In high availability terminology, switchover is when switching control to a backup server is a planned activity. You can stop applications on the primary server and perform preparative actions on the backup server before resuming production activity, for example, when migrating to a new server. A failover occurs when there is an unplanned event that causes the primary server to be unavailable, such as a power outage or a site disaster.

If there is a switchover, the preparative actions that must be taken on the primary server are:

1. Stop business application activity so that journal entries are no
longer generated.

2. Stop the InfoSphere CDC subscriptions.

3. Restart the subscriptions with scheduled end of now (net change mirroring) to ensure that all transactions have been replicated to the target server.

4. Optionally, stop the subsystem.

After these steps have been performed, you should complete preparative steps on the backup system before resuming business application activity:

1. Start the InfoSphere CDC subsystem, but do not start any subscriptions yet.

2. Set the bookmark (SETJRNPOS) to any entry in the current journal receiver.

3. Mark table capture points for all tables in the subscriptions; this action marks the point from which changes are being replicated again.

4. The business applications can be restarted.

5. At any time, the subscriptions can now be started; any transactions that were generated after the marking of table capture points are replicated to the
target server.

When there is a failover, you might not exactly know up to which point the transactions in the primary server's journals have been replicated to the target server. The next few sections elaborate on the challenges that you are presented with in the event of a failover and how to overcome them to be able to resume replication without data loss.

Journals and journal receivers

On System i servers, the subscription bookmarks that are kept by InfoSphere CDC are identified by the name of the journal (and its library), the journal receiver (and its library), and a journal entry sequence number within the receiver.

Tables (physical files) on the primary (production) server are journaled and the journal has one or more receivers associated with it, storing database transactions in the current, attached receiver. If tables are also journaled on the backup server (this setup is optional, but is a best practice to facilitate failover), the attached journal should have the same name and be on the same library as the one on the production server.

The production journal could have the following journal receivers associated
with it:

•RCV0578

•RCV0579

•RCV0580

On the backup server, the equivalent journal almost certainly has different journal receivers associated with it, for example, RCV1593 and RCV1594. Naming and switching of the journal receivers on the backup server is independent of the journal receivers on the primary server. The likelihood of journal receivers having the same name is small and cannot be enforced.

An example scenario for an InfoSphere CDC implementation coexisting with a software-based HA solution is shown in Figure 7-27.

Figure 7-27 Software-based HA solution

Bookmark and failover

In Figure 7-27, a journal entry 110 in receiver RCV0580 might correspond to the journal entry 81524 in receiver RCV1594 on the backup server. When InfoSphere CDC applies the primary server's journal entry to the target database, the bookmark that is written would be RCV0580-110.

Should production have to be switched over from the primary to the backup server and the subscription restarted from the backup server, the bookmark information would not be valid anymore. Most likely, the journal entry that is identified by the bookmark does not exist in the backup server's journal and the subscription fails to start.

Before starting the subscriptions on the backup server, the bookmark (journal position) must be reset on the backup server. Because the source server might no longer be available and the bookmark information kept in the database targeted by the subscription cannot be associated exactly with a journal entry on the backup server, you must use a strategy that replicates all entries. You cannot prevent some table operations from being replicated more than once, and it is likely that primary key violations occur when applying entries to the target tables. If the replication is targeting tables without unique keys, such as audit tables, or applications, such as DataStage or a JMS provider, there will be duplicates in the entries that are sent, and these duplicates might have to be resolved afterward.

A common strategy to set the journal position after a failover is to keep a time stamp for every subscription in the database / application that is targeted. A technical table is created on the source System i server and this table holds a (dummy) key and a time stamp column. The technical table is included in every subscription and resolves in a target table that is populated with the last update that was done on the source table. On the source server, a simple program is scheduled to run that updates the row in the source table with the current system's date and time.

By following this strategy, you have a table in the target database that holds the date and time of the last applied entry. This date and time can then be used to find the journal entry in the backup server's journal and set the bookmark using the SETJRNPOS command.

In this scenario, assume that the mirroring by the high availability solution is up to date and has transferred all changes from the primary server to the
backup server.

To determine if no changes are skipped for replication, you could set the bookmark to a journal entry that is a few minutes before the recorded time stamp. This setting covers situations where the system times of the primary and backup server are not synchronized and entries on the backup server have an earlier time stamp than the corresponding entries on the primary server.

If the target tables have primary keys, you can keep the subscription from stopping with a duplicate key or missing record error by choosing one of the following options:

•Activate conflict detection on the replicated tables and specify “source wins”. If an insert cannot be applied to the target, it is turned into an update. An update of a row that does not exist results in an insertion, and deletion of a non-existing row is ignored. Additionally, the conflicts are logged in a conflict auditing table so they can be reviewed afterward.

•Remap the tables with the Adaptive Apply mapping type. This action produces the same result as activating conflict detection, but it does not
log conflicts.

•Instruct the InfoSphere CDC target engine to continue on error during mirroring. A system parameter can be set for this option. The name of the system parameters is dependent on the engine type; for the Linux, UNIX, and Windows engine, set mirror_end_on_error=false.

The configuration changes should only be used to allow the subscription to continue beyond the point where the source and target are synchronized again. All three options might have an adverse effect on the apply throughput and you should consider changing the settings back to the original ones as soon
as possible.

Journal management

If there is a failover, the journal on the backup server must have all the entries that were not replicated to the subscription's target server. The journal receivers must only be removed from the backup server once they do not hold any data that might have to be replicated in a switch.

As a simple strategy for ensuring that journal entries are available at failover time, implement a clean-up strategy that keeps a number of journal receivers on the backup server, for example, for one day.

If there is insufficient space on the backup server, you must correlate the journal position (bookmark) of the subscriptions with a backup journal receiver. RTVDMJENT returns the oldest journal receiver that is still needed by any of the subscriptions; it must be run on the primary server. Through the journal receiver that is returned, you can determine the time stamp of the first journal entry. The time stamp can be used to find the oldest journal receiver that is still required on the backup server. You should subtract a few minutes from the time stamp if this procedure is also done when the bookmark is set after failover.

Mirroring scope for a high availability solution

In addition to managing the journal receivers and setting the bookmark, consider the configuration of the high availability solution. On the backup server, a fully functional copy of the InfoSphere CDC installation must be available to be used when control is switched to it. At the same time, not all objects in the product library may be available for mirroring (save / restore) due to object locks.

Objects listed in Table 7-1 must be included in the mirroring scope of the high availability solution.

Table 7-1 Mirroring scope objects

Library	Object	Object Type	Include or Exclude	Comments
InfoSphere CDC Product Library	*ALL	*ALL	Include	N/A.
InfoSphere CDC Product Library	*ALL	*USRQ	Exclude	These objects are exclusively locked when subscriptions are running.
QSYS	D_MIRROR	*USRPRF	Include	InfoSphere CDC User Profile.

Initially, all objects in the InfoSphere CDC product library should be saved on the primary server and restored onto the backup server. There are objects such as communications user spaces that must exist on the target. Stop all subscriptions and the subsystem when doing the initial synchronization of the InfoSphere CDC product library to the backup server.

7.5.7 z/OS / Sysplex and InfoSphere CDC in resilient environments

Resiliency consists of being able to recover and restart replication if any of the main source or target components fail.

The overall architecture is shown in Figure 7-28. The components in solid boxes are the ones in active use; the components in dashed-line boxes become active should the current components fail. In general, each component can be physically on a different machine. Any active component could fail, requiring a failover to a backup machine.

Figure 7-28 Failover architecture

A failover entails the following actions:

•Ensuring the configuration and operational metadata on the failed over machine matches the metadata is required to continue replication. This action includes ensuring that database log restart positions are properly set.

•Re-establishing communications.

•Ensuring the required software is running on the failover machine.

•Restarting replication within the new failover environment.

Resiliency with InfoSphere CDC z/OS can be achieved by working within a DB2 z/OS data sharing environment. The InfoSphere CDC z/OS address space must run in a z/OS instance corresponding to one of the members of the data sharing group. Should that instance of z/OS fail, or should that DB2 member not be operational, InfoSphere CDC z/OS can be restarted on another z/OS instance containing a member of the data sharing group. Because the data is shared, the failover instance of InfoSphere CDC z/OS uses the same metadata (including operational metadata) as the failed instance, so failover considerations are primarily around connectivity.

To prepare InfoSphere CDC z/OS to restart on a different member, ensure that the following tasks are accomplished:

•The DB2 subsystem parameter SSID in the CHCDBMxx configuration member must refer to the group attachment name, not the subsystem ID of an individual member.

•The Host Name data store, as configured in the Management Console, must resolve to a virtual IP address before configuring any subscriptions.

•The InfoSphere CDC z/OS source machine's TCP/IP configuration must be configured to use the same virtual IP address for the InfoSphere CDC target configured in the Management Console.

•Upon failover, the virtual IP address used by the Management Console to connect to the source or the target engine must refer to the new z/OS instances on which the InfoSphere CDC z/OS address space is running. The data store must be configured using virtual IP addresses in the Management Console before creating subscriptions.

•Subscriptions are configured as persistent by right-clicking the subscription and clicking Properties → Advanced Settings... → Mark subscription
as persistent.

The four main resiliency scenarios are

•Source database management system (DBMS) failure

•InfoSphere CDC source failure

•InfoSphere CDC target failure

•Target DBMS failure

The InfoSphere CDC source failure and InfoSphere CDC target failure scenarios also cover the case of overall OS or hardware failures for the z/OS instance on which they are running.

Should the source DBMS become unavailable, all subscriptions are automatically ended by the product. With persistent subscriptions, the product detects when the DBMS becomes available and automatically restarts the ended subscriptions. If there is a hard DBMS failure and the DBMS member is unusable, then the InfoSphere CDC source needs to be failed over to use another member.

Failover of the InfoSphere CDC z/OS source entails ending the InfoSphere CDC z/OS address space if it is still running and restarting it with the same user ID on another member of the data sharing group. If the address space was ended while the subscriptions were running, then the subscriptions that have been marked persistent are restarted automatically when the address space is restarted. Otherwise, the subscriptions need to be restarted manually.

Because the target's IP address is unchanged and correctly configured in the shared metadata, only the virtual IP address for the InfoSphere CDC source needs to be redirected to the failover member for the Management Console to access it.

Resiliency for an unavailable InfoSphere CDC z/OS target could entail surviving network outages or, if the target becomes unusable, entail failing over the target to another member. For network outages, the persistent subscription feature of InfoSphere CDC z/OS provides a means of reconnecting automatically. Should the InfoSphere CDC target become unreachable through communication channels, the InfoSphere CDC z/OS source tries connecting to the target at intervals set by the AUTORESTARTINTERVAL configuration parameter. After the InfoSphere CDC target becomes available, the subscriptions marked as persistent are automatically restarted.

If a failover of the target is required, the persistent subscription tries simplify failover procedures. The InfoSphere CDC z/OS target address space needs to be ended (if it is still running) and then restarted using the same user ID on another member in the data sharing group. The virtual IP address used by the source and Management Console to reach the target needs to be updated to resolve to the failover member. With the target instance running and virtual IP address for the target redirected, the persistent subscription automatically reconnects to the new target. The InfoSphere CDC z/OS source can be left running during a target failover. For InfoSphere CDC sources that do not support the persistent subscription feature, the subscriptions need to be manually restarted. Also, if the failover is run in a controlled manner, that is, by explicitly ending subscriptions, then the subscriptions need to be restarted manually. The automatic restart feature is designed to restart subscriptions that shut down due to loss of connectivity, not subscriptions that were shut down normally or due to a DBMS error on the target.

If the target DBMS becomes unavailable, the InfoSphere CDC z/OS target remains up and suspends its operations. For reliable suspension behavior, the retry cache must be configured through the RECOVERYRETRYLIMIT and RETRYCACHESIZE configuration parameters. The suspension results in data backing up and eventually leads to the product stalling. This situation is a normal recovery situation until the target DBMS becomes operational once again. The product detects that the DBMS is up, mirroring resumes automatically, and the product processes its backlog.

7.6 Change management

It is essential to know what changes to your environment have an impact on your InfoSphere CDC environment. You need to incorporate InfoSphere CDC into your change management procedures. This section describes the change management workflow for InfoSphere CDC, what are its important steps, the role of bookmark information, and how certain activities can be automated.

Basically, there are two main topics that trigger change management in a InfoSphere CDC environment:

1. Database table changes that occur in the environment InfoSphere CDC is replicating from or to.

2. The configuration changes that are done within InfoSphere CDC. New business requirements or applications can demand changes in your replication environment.

Changes within just the InfoSphere CDC configuration typically do not require special treatment. Subscriptions to be deployed can be stopped, changed, or replaced (import) with a new definition, and restarted.

Alterations of tables, which are part of the InfoSphere CDC replication environment, could have a significant effect on the behavior of InfoSphere CDC. A possible cause for changed table structures is new application releases. Within InfoSphere CDC, you map specific columns of a source table to specific columns of a target table. When changes to either the source or target table structure take place, it can affect replication. The following examples of DDL changes
affect replication:

•Adding new columns

•Dropping in-scope columns

•Dropping out of scope columns

•Modifying column formats, such as data type, length, and precision

When the structure of a table from which InfoSphere CDC is replicating is changed, the format of the database log entries changes accordingly. If the log reader is actively reading entries and is not aware of the structure change, it does not correctly interpret the log entries and unexpected errors could occur. The InfoSphere CDC engines can detect DDL changes from the database log and take corrective actions in such cases. There are slight differences in how a InfoSphere CDC engine behaves in case of source table changes.

If the structure of a target table changes, InfoSphere CDC behaves as follows:

•Add new column: The new column is not mapped by InfoSphere CDC, but is populated with the database default value.

•Delete previously mapped column: InfoSphere CDC encounters an apply error because InfoSphere CDC attempts to write to a non-existent column.

•Alter column definition: InfoSphere CDC might encounter mapping issues because of a changed target column definition.

7.6.1 Understanding InfoSphere CDC bookmarks

When implementing changes to source tables, understanding the bookmarks that InfoSphere CDC keeps is key to choosing the best method for a safe deployment of subscription changes in your environment. Here are the following two types of bookmarks that are maintained by InfoSphere CDC:

•Subscription bookmark: This bookmark is the bookmark that InfoSphere CDC keeps per subscription and is updated on the target side as transactions are applied. The bookmark provides for the resilience of the product. If there is any disruption in the processing that causes the replication to stop, InfoSphere CDC resumes reading the logs from the bookmark position from that point onwards so that no transactions are lost or duplicated.

•Table capture point: This bookmark is the commit point in the database from which operations for the table that has its capture point marked are processed. All changes that are committed after the marked point are replicated. The table capture point is set when you run the mark table capture point routine for a table in a subscription. When the subscription is restarted after marking the table capture point, it resumes from its (subscription) bookmark position. However, changes to tables that have their table capture point marked are ignored until the log scraper advances and reaches the capture point. Each table can have its own table capture point; they do not have to correspond.

If there is a service window available where database and application changes can be applied, and you can ensure that there are no source database transactions during that time frame, this window provides an ideal way of deploying subscription changes. If you ensure that all transactions before the service window are sent and applied to the target side, the InfoSphere CDC configuration, including subscriptions, can be fully recreated and table capture points marked before the first application activity after the service window starts.

Figure 7-29 shows the deployment within the service window.

Figure 7-29 Deployment within a service window

Suppose you have a situation where there is not a service window or if the InfoSphere CDC configuration changes cannot wait to be applied during a time of no activity. In that case, you want the subscriptions to continue processing changes following the last successfully applied transaction after the subscription changes have been deployed, without skipping any changes (Figure 7-30).

Figure 7-30 Deployment with no service window

The focus in this section is on deploying changes when a service window is in place. This section also briefly touches on the considerations for the different InfoSphere CDC engines when there is no service window.

7.6.2 Change Management sample environment

For our change management samples, we use the client environment shown in Figure 7-31. The architecture is split into a development (APPSRCDEV and APP_TGT_DEV) and a production environment (APPSRCPRD and APP_TGT_PRD), both replicating between the two locations in Toronto and Sydney. CM_DEV and CM_PROD are the two subscriptions that are identical in structure and are deployed in the client’s development and production environment. Both subscriptions contain several source tables that are mapped through standard replication in a 1:1 fashion to their corresponding target tables, with no transformations.

Figure 7-31 Sample client environment

Changes are first made in the development environment and then deployed to the production environment. In the development environment, there are no constraints with regards to refreshes and so on, but in the production environment, no refreshes can be done; the procedure must be such that no operations are lost or duplicated.

7.6.3 DDL changes in a service window

For our change management scenario, we assume that at least one table within the subscription needs to be changed in structure due to new business requirements (such as column added and column data type modified). These changes need to be run for the source and the target table.

In our planned scenario, we perform these changes in the development environment first. Complete the following steps:

1. Stop the CM_DEV subscription in the development environment.

2. To implement the changes in the development environment, run the DDL statements on the table in question to make the table structure changes for the source and target table.

3. Update the source and target table definitions for InfoSphere CDC through the Management Console. This step is necessary to make InfoSphere CDC aware of the structural changes and enable it to correctly interpret the database log entries (Figure 7-32 Figure 7-33 on page 195) and the target table (Figure 7-34 on page 195 and Figure 7-35 on page 196).

Figure 7-32 Update source table - 1

Figure 7-33 Update source table - 2

Figure 7-34 Update target table - 3

Figure 7-35 Update target table - 4

4. Adjust the mappings in the development subscription. After you update the table definitions for the source and target tables, you must alter the table mappings for newly added columns that have a different name or data type on the source and target. The status of the changed tables has been set to Parked (Figure 7-36).

Figure 7-36 Changed tables are parked

To keep the example simple, assume that the changed table is refreshed in the development environment (Figure 7-37).

Figure 7-37 Refresh changed tables

5. To deploy the changed subscription in the production environment, export the updated subscription CM_DEV out of the development environment to an XML file for import in production (Figure 7-38).

Figure 7-38 Export development subscription

A prerequisite for the successful importation of a subscription that has table mappings using an index is that a named index (not a system generated name) must be specified and that index exists in the import environment. Alternatively, the individual key columns can be specified. An unsupported system generated index is shown in Figure 7-39.

Figure 7-39 System generated index

For the deployment in the production environment, you are not allowed to refresh the changed tables after making the DDL changes. Instead, mirroring should continue at the last confirmed bookmark without losing any transactions. To accomplish this task, ensure that replication for the tables in question is and remains active up to the point where the first DDL change is made. Ensure that all transactions to the tables in question have been successfully replicated to the target before making any DDL changes to them.

It is important, for the production environment, that the sequence of steps in this procedure is followed. If DDL changes are done to a source table without first having processed all entries in the database log, the only possible recovery is to refresh the table in question. This procedure is focused on keeping the downtime for the replication at a minimum. Therefore, the update of the target table definition and the import of the new subscription are the last steps. It also allows for automation of the change management process from the production side.

6. Stop activity on the tables to be altered. Ensure that the application has stopped so that no more transactions are generated on the tables for which the structure changes.

7. Stop the subscription and restart it in net change mirroring mode to apply any pending transactions. This action ensures that all the transactions up to the DDL change have been applied to the target tables. When the subscription automatically stops, the replication is up-to-date (Figure 7-40).

Figure 7-40 Restart replication in net change mode

8. Implement the DDL change at the source.

9. Update source table definitions to ensure that the source table definitions have been updated for the data store. Repeat this step for all tables that had DDL changes applied (Figure 7-41 through Figure 7-44 on page 201).

Figure 7-41 Update Source Table definition - 1

Figure 7-42 Update Source Table definition -2

Figure 7-43 Update Source Table definition - 3

Figure 7-44 Update Source Table definition - 4

10. The last step before resuming business application activity is to set the changed tables to active again. Select the tables and click Mark Table Capture Point for Mirroring (Figure 7-45 and Figure 7-46).

Figure 7-45 Mark Table Capture Point for Mirroring - 1

Figure 7-46 Mark Table Capture Point for Mirroring - 2

Now the business application can resume activity on the tables in question. Log entries are written in the tables’ new format and also must be collected by InfoSphere CDC in the correct format.

11. Implement the necessary DDL changes at the target system. Then, update the target table definition through Management Console (Figure 7-47).

Figure 7-47 Update target table definition

12. Deploy the new version of the subscription with altered table mappings for the new columns that were exported from the development environment into the production environment. You can choose to replace the existing CM_PRD subscription and override the different schema names between the development and production environments, as shown in Figure 7-48 to Figure 7-51 on page 207.

Figure 7-48 Import Altered Subscription from development environment

Figure 7-49 Import Altered Subscription from development environment

Figure 7-50 Import Altered Subscription from development environment

Figure 7-51 Import Altered Subscription from development environment

13. Resume replication by restarting the subscription in Continuous Mirroring mode. This action replicates all pending and any new transactions from source to target.

7.6.4 DDL changes without a service window

There are differences in how source table DDL changes are dealt with across the different engines. This section elaborates on how InfoSphere CDC engines behave when DDL operations on in-scope tables are encountered in the database log.

To ensure that replication can continue without having to refresh the table, the database must log full image data for all columns in a table, including newly added columns. InfoSphere CDC relies on the full row images to be present in the database logs to ensure this action.

Linux, UNIX, and Windows engine behavior and procedure

InfoSphere CDC common (Java) engines can detect DDL statements in the database logs and, when encountered, they stop the replication. Then, the definition of the table in question must be updated before replication can
be resumed.

If a discrepancy in the table structure is detected when the replication is started, the subscription fails and requires the table definition to be updated before continuing. When updating the table definition for a table whose structure has changed, the table is parked and is no longer replicated if the subscription
is resumed.

To avoid losing transactions when applying DDL changes, complete the
following steps:

1. Ensure that the subscription is active with no latency.

2. Complete the DDL changes on the source tables; the subscription stops and reports that a DDL change has taken place.

3. Update the source table definition; the tables are then parked.

4. Mark the table capture point for the tables in question to change the status to active again.

5. Get the current bookmark from the target side of the subscription (by
running dmshowbookmark).

6. Set the subscription bookmark to the retrieved bookmark and ensure that the -a option is used to apply the bookmark to all active tables.

7. Restart the subscription.

This procedure only works if DDL statements are not mixed with DML statements for the replicated tables.

Table 7-2 shows some example statement sequences. In that example, Tables A, B, and C are replicated, while Table D is not.

Table 7-2 Sample statement sequence

Statement sequence	Comments	DDL change without refresh?
DML TABLE A DML TABLE B DML TABLE C DML TABLE D DDL TABLE A DDL TABLE B DML TABLE A DML TABLE B DML TABLE C DML TABLE D	You can continue to replicate without refreshing. Replication must be active or current until the time the first DDL change is made. If replication is running, it then stops and InfoSphere CDC definitions can be adjusted to accommodate the new table structures.	OK
DML TABLE A DML TABLE B DML TABLE C DDL TABLE A DML TABLE D DDL TABLE B DML TABLE A DML TABLE B DML TABLE C DML TABLE D	You can continue to replicate without refreshing. Replication must be active or current until the time the first DDL change is made. If replication is running, it then stops and InfoSphere CDC definitions can be adjusted to accommodate the new table structures. The only DML statement between the two DDL changes to the tables in scope belongs to Table D, which is not replicated.	OK
DML TABLE A DDL TABLE A DML TABLE B DML TABLE C DDL TABLE B DML TABLE A DML TABLE B DML TABLE C	The subscription does not start until the definitions of both table A and B have been adjusted in the InfoSphere CDC definitions. DML for table B before the DDL for table B must be skipped to avoid malformed data and unexpected behavior of the replication. Refresh the tables in this case.	NO

Oracle considerations

One of the key elements in limiting the downtime for source table changes in an Oracle environment is making sure that the Oracle log entries are dynamically adapted to the table structure changes. As of Oracle 10, the database allows you to specify supplemental logging for all columns at the table level. In environments where the deployment of the changes in production is critical and downtime dependency between production and InfoSphere CDC must be reduced, ensure that table-level supplemental logging for all columns has been activated.

Here are SQL examples to activate table-level supplemental logging for
all columns:

SQL> ALTER TABLE APP_SRC_PRD.CLIENT ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

SQL> ALTER TABLE APP_SRC_PRD.PRODUCT ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

SQL> ALTER TABLE APP_SRC_PRD.SALESREP ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Table changes for DB2 on z/OS

The behavior of InfoSphere CDC for z/OS regarding DDL operations performed on in-scope tables depends on several factors:

•DDL statement type

•DB2 DATA CAPTURE CHANGES setting

•Version of InfoSphere CDC

•Version of DB2

DDL statements only impact InfoSphere CDC for z/OS if the types of DDL operations affect its ability to read the log. Here are examples of these types of DDL statements:

•Adding columns

•Modifying column formats (such as data type, length, or precision)

•Changing column types (such as CHAR to VARCHAR)

•Dropping and recreating tables

•Turning off DATA CAPTURE CHANGES

These operations cause data to be written to the log that is not received by InfoSphere CDC for z/OS or cause the log record format of the table (the layout of the data in the database logs) to change. For operations where the log record format changes, the log reader must be directed about how to proceed when a change is encountered. InfoSphere CDC metadata must be modified to accommodate the new log record format, or the log reader fails to properly decode log records after the point of the DDL change. Changes that do not affect the physical structure of the table in the log or the Data Capture Changes setting for the table do not interrupt replication.

InfoSphere CDC for z/OS can detect most DDL changes on in-scope tables only if DATA CAPTURE CHANGES is enabled on the SYSIBM.SYSTABLES system catalog table in DB2. If it is not enabled, InfoSphere CDC for z/OS is aware of DDL changes when a log record is encountered that does not match the definition in the metadata, but its actions depend on the type of DDL operation and the version of DB2.

Detailed information about how InfoSphere CDC for z/OS DB2 deals with DDL changes can be found in the Schema Evolution section in the IBM Information Center at the following address:

http://publib.boulder.ibm.com/infocenter/cdc/v6r5m1/topic/com.ibm.cdcdoc.cdcforzos.doc/concepts/schemaevolution.html

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 7. Environmental considerations

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 7. Environmental considerations