Chapter 12. Sample continuous availability and disaster recovery scenarios

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Sample continuous availability and disaster recovery scenarios

In this chapter, we describe several common client scenarios and requirements, and what we believe to be the most suitable solution for each case.

The following scenarios are described:

•A client with a single data center that has already implemented IBM Parallel Sysplex with data sharing and workload balancing wants to move to the next level of availability.

•A client with two centers needs a disaster recovery capability that will permit application restart in the remote site following a disaster.

•A client with two sites (but all production systems running in the primary site) needs a proven disaster recovery capability and a near-continuous availability solution.

•A client with two sites at continental distance needs to provide a disaster recovery capability.

•A client with two sites at relatively long metropolitan distance needs to provide local continuous availability and remote disaster recovery with zero data loss.

•A client who runs only IBM z/VM with Linux on z Systems guests (no z/OS in their environment) with two sites at metropolitan distance require an automated disaster recovery and near continuous availability solution.

The scenarios described in this chapter pertain to using the GDPS products that are based on hardware disk replication. The scenarios for GDPS/Active-Active using software data replication are described in Chapter 8, “GDPS/Active-Active solution” on page 231.

12.1 Introduction

In the following sections, we describe how the various GDPS service offerings can address different continuous availability (CA) and disaster recovery (DR) requirements. Because every business is unique, the following sections do not completely list all the ways the offerings can address the specific needs of your business, but they do serve to illustrate key capabilities.

In the figures accompanying the text we show minimal configurations for clarity. Many client configurations are more complex than this, but both configurations are supported.

12.2 Continuous availability in a single data center

In the first scenario, the client has only one data center, but wants to have higher availability. The client has already implemented data sharing for their critical applications, and uses dynamic workload balancing to mask the impact of outages. They already mirror all their disks within the same site but have to take planned outages when they want to switch from the primary to secondary volumes in preparation for a disk subsystem upgrade or application of a disruptive microcode patch. They are concerned that their disk is their only remaining resource whose failure can take down all their applications. The configuration is shown in Figure 12-1.

Figure 12-1 Data sharing, workload balancing, mirroring: Single site

From a disaster recovery perspective, the client relies on full volume dumps. Finding a window of time that is long enough to create a consistent set of backups is becoming a challenge. In the future, they plan to have a second data center, to protect them in case of a disaster. And in the interim, they want to investigate the use of FlashCopy to create a consistent set of volumes that they can then dump in parallel with their batch work. But their current focus is on improved resiliency within their existing single center.

Table 12-1 lists the client’s situation and requirements, and shows which of those requirements can be addressed by the most suitable GDPS offering for this client’s requirements, namely GDPS/PPRC HyperSwap Manager.

Table 12-1 Mapping client requirements to GDPS/PPRC HyperSwap Manager attributes

Attribute	Supported by GDPS/PPRC HM
Single site	Y
Synchronous remote copy support	Y (PPRC)
Transparent swap to secondary disks	Y (HyperSwap)
Ability to create a set of consistent tape backups	Y¹
Ability to easily move to GDPS/PPRC in the future	Y

¹ To create a consistent source of volumes for the FlashCopy in GDPS/PPRC HyperSwap Manager, you must create a freeze-inducing event and be running with a Freeze and Go policy

This client has a primary short-term objective to be able to provide near-continuous availability, but wants to ensure that they address that in a strategic way.

In the near term, they need the ability to transparently swap to their secondary devices in case of a planned or unplanned disk outage. Because they have only a single site, do not currently have a TS7700, and do not currently have the time to fully implement GDPS system and resource management, the full GDPS/PPRC offering is more than they currently need.

By implementing GDPS/PPRC HyperSwap Manager, they can achieve their near-term objectives in a manner that positions them for a move to full GDPS/PPRC in the future.

Figure 12-2 shows the client configuration after implementing GDPS/PPRC HyperSwap Manager. Now, if they have a failure on the primary disk subsystem, the controlling system will initiate a HyperSwap, transparently switching all of the systems in the GDPS sysplex over to what were previously the secondary volumes. The darker lines connecting the secondary volumes in the figure indicate that the processor-to-control unit channel capacity is now similar to that used for the primary volumes.

Figure 12-2 Continuous availability within a single data center

After the client has implemented GDPS and enabled the HyperSwap function, their next move will be to install the additional disk capacity so it can use FlashCopy. The client will then be able to use the Freeze function to create a consistent view that can be flash-copied to create a set of volumes that can then be full-volume dumped for disaster recovery. This will create a more consistent set of backup tapes than the client has today (because today it is backing up a running system) and the backup window will now be only a few seconds rather than the hours it that it currently takes. This enables the client to make more frequent backups.

12.3 DR across two data centers at metro distance

The next scenario relates to a client that is under pressure to provide a disaster recovery capability in a short time frame, perhaps for regulatory reasons. The client has a second data center within metropolitan distances and suitable for synchronous mirroring, but has not yet implemented mirroring between the sites. Before moving to a full GDPS/PPRC environment, the client was going to complete their project to implement data sharing and workload balancing. However, events have overtaken them and they now need to provide the disaster recovery capability sooner than they had expected.

The client can select between the full GDPS/PPRC offering, as they had planned to do in the long term, or to install GDPS/PPRC HyperSwap Manager now. Because they will not be using the additional capabilities delivered by GDPS/PPRC in the immediate future, the client decides to implement the lower-cost GDPS/PPRC HyperSwap Manager option. Table 12-2 summarizes the client’s situation and requirements and shows how those requirements can be addressed by GDPS/PPRC HyperSwap Manager.

Table 12-2 Mapping client requirements to GDPS/PPRC HyperSwap Manager attributes

Attribute	Supported by GDPS/PPRC HM
Two sites, 12 km apart	Y
Synchronous remote copy support	Y (PPRC)
Maintain consistency of secondary volumes	Y (Freeze)
Maintain consistency of secondary volumes during PPRC resynch	Y¹ (FlashCopy)
Ability to move to GDPS/PPRC in the future	Y

¹ FlashCopy is used to create a consistent set of secondary volumes before a resynchronization, following a suspension of remote copy sessions.

This client needs to be able to quickly provide a disaster recovery capability. The primary focus in the near term, therefore, is to be able to restart its systems at the remote site as though it was restarting off the primary disks following a power failure. Longer term, however, the RTO (which is the time to get the systems up and running again in the remote site) will be reduced to the point that it can no longer be achieved without the use of automation (this will be addressed by a move to GDPS/PPRC). The client also has a requirement to have a consistent restart point at all times (even during DR testing).

This client will implement GDPS/PPRC HyperSwap Manager, with the controlling system in the primary site and the secondary disks in the remote site. The auxiliary storage subsystems are configured with sufficient capacity to be able to use FlashCopy for the secondary devices; this will allow the client to run DR tests without impacting its mirroring configuration.

GDPS/PPRC HyperSwap Manager will be installed and the Freeze capability enabled. After the Freeze capability is enabled and tested, the client will install the additional intersite channel bandwidth required to be able to HyperSwap between the sites. This configuration is shown in Figure 12-3. Later, in preparation for a move to full GDPS/PPRC, the client will move the controlling system (and its disks) to the remote site.

Figure 12-3 GDPS/PPRC 2-site HM configuration

12.4 DR and CA across two data centers at metro distance

The client in this scenario has two centers within metro distance of each other. The client already uses PPRC to remote copy the primary disks (both CKD and FB) to the second site. They also have the infrastructure in place for a cross-site sysplex; however, all production work still runs in the systems in the primary site.

The client is currently implementing data sharing, along with dynamic workload balancing, across their production applications. In parallel with the completion of this project, they want to start looking at how the two sites and their current infrastructure can be maximized to provide disaster recovery and continuous or near-continuous availability in planned and unplanned outage situations, including the ability to dynamically switch the primary disks back and forth between the two sites.

Because the client is already doing remote mirroring, their first priority is to ensure that the secondary disks provide the consistency to allow restart, rather than recovery, in case of a disaster. Because of pressure from their business, the client wants to move to a zero (0) data loss configuration as quickly as possible, and also wants to investigate other ways to reduce the time required to recover from a disaster.

After the disaster recovery capability has been tested and tuned, the client’s next area of focus will be continuous availability, across both planned and unplanned outages of applications, systems, and complete sites.

This client is also investigating the use of z/VM and Linux on z Systems to consolidate several of their thousands of PC servers onto the mainframe. However, this is currently a lower priority than their other tasks.

Because of the disaster recovery and continuous availability requirements of this client, together with the work they have already done and the infrastructure in place, the GDPS offering for them is GDPS/PPRC. Table 12-3 shows how this offering addresses this client’s needs.

Table 12-3 Mapping client requirements to GDPS/PPRC attributes

Attribute	Supported by GDPS/PPRC
Two sites, 9 km apart	Y
Zero data loss	Y (PPRC with Freeze policy of SWAP,STOP)
Maintain consistency of secondary volumes	Y (Freeze)
Maintain consistency of secondary volumes during PPRC resynch	Y¹ (FlashCopy)
Remote copy and remote consistency support for FB devices	Y (Open LUN support)
Ability to conduct DR tests without impacting DR readiness	Y (FlashCopy)
Automated recovery of disks and systems following a disaster	Y(GDPS script support)
Ability to transparently swap z/OS disks between sites transparently	Y (HyperSwap)
DR and CA support for Linux guests under z/VM	Y

¹ FlashCopy is used to create a consistent set of secondary volumes before a
resynchronization, following a suspension of remote copy sessions.

Although this client has performed a significant amount of useful work already, fully benefiting from the capabilities of GDPS/PPRC will take a significant amount of time, so the project is divided into the following steps:

1. Install GDPS/PPRC, define the remote copy configuration to GDPS, and start using GDPS to manage and monitor the configuration.

This will make it significantly easier to implement changes to the remote copy configuration. Rather than issuing many PPRC commands, the GDPS configuration definition simply needs to be updated and activated, and the GDPS panels then used to start the new remote copy sessions.

Similarly, any errors in the remote copy configuration will be brought to the operator’s attention using the NetView SDF facility. Changes to the configuration, to stop or restart sessions, or to initiate a FlashCopy, are far easier using the NetView interface.

2. After the staff becomes familiar with the remote copy management facilities of GDPS/PPRC, enable the Freeze capability, initially as PPRCFAILURE=GO and then moving to PPRCFAILURE=COND or STOP when the client is confident with the stability of the remote copy infrastructure. Because HyperSwap will not be implemented immediately, they will specify a PRIMARYFAILURE=STOP policy to avoid data loss if recovery on the secondary disks becomes necessary after a primary disk problem.

Although the client has PPRC today, they do not have the consistency on the remote disks that is required to perform a restart rather than a recovery following a disaster. The GDPS Freeze capability will add this consistency, and enhance it with the ability to ensure zero (0) data loss following a disaster when a PPRCFAILURE=STOP policy is implemented.

3. Add the FB disks to the GDPS/PPRC configuration, including those devices in the Freeze group, so that all mirrored devices will be frozen in case of a potential disaster. As part of adding the FB disks, a second controlling system will be set up¹.

Although the client does not currently have distributed units of work that update both the z/OS and FB disks, the ability to Freeze all disks at the same point in time makes cross-platform recovery significantly simpler.

In the future, if the client implements applications that update data across multiple platforms inside the scope of a single transaction, the ability to have consistency across all disks will move from being “nice to have” to a necessity.

4. Implement GDPS Sysplex Resource Management to manage the sysplex resources within the GDPS, and start using the GDPS Standard actions panels.

GDPS system and sysplex management capabilities are an important aspect of GDPS. They ensure that all changes to the configuration conform to previously prepared and tested rules, and that everyone can check at any time to see the current configuration, that is, which sysplex data sets and IPL volumes are in use. These capabilities provide the logical equivalent of the whiteboard used in many computer rooms to track this type of information.

5. Implement the GDPS Planned and Unplanned scripts to drive down the RTO following a disaster.

The GDPS scripting capability is key to recovering the systems in the shortest possible time following a disaster. Scripts run at machine speeds, rather than at human speeds. They can be tested over and over until they do precisely what you require. And they will always behave in exactly the same way, providing a level of consistency that is not possible when relying on humans.

However, the scripts are not limited to disaster recovery. This client sometimes has outages as a result of planned maintenance to its primary site. Using the scripts, they can use HyperSwap to keep its applications available as it moves its systems one by one to the recovery site in preparation for site maintenance, and then back to the normal locations after maintenance is complete.

Because all production applications will still be running in the production site at this time, the processor in the second site is much smaller. However, to enable additional capacity to quickly be made available in case of a disaster, the processor has the CBU feature installed. The GDPS scripts can be used to automatically enable the additional CBU engines as part of the process of moving the production systems to the recovery processor.

6. After the disaster recovery aspect has been addressed, HyperSwap will be implemented to provide a near-continuous availability capability for the z/OS systems. A controlling system should be set up in each site when using HyperSwap to ensure a system is always available to initiate a HyperSwap regardless of where the primary disks might be at that time. In the case of this client, they had already set up the second controlling system when they added the FB devices to the GDPS configuration.

The client will use both planned HyperSwap (to move their primary disks before planned maintenance on the primary subsystems) and unplanned HyperSwap (allowing the client to continue processing across a primary subsystem failure). They will test planned HyperSwap while their Primary Failure policy option is still set to STOP. However, when they are comfortable and ready, they will change to running with a PRIMARYFAILURE=SWAP,STOP policy to enable unplanned HyperSwap.

7. Finally, and assuming that the consolidation onto Linux on z Systems has proceeded, the heterogeneous disaster recovery capability will be implemented to manage z/VM systems and its guests and to add planned and unplanned HyperSwap support for z/VM and the Linux guests.

Although the ability to transparently swap FB devices using HyperSwap is not available for z/VM guest Linux systems using FB disks, it is still possible to manage PPRC for these disks. GDPS will provide data consistency and will perform the physical swap, and can manage the re-IPL on the swapped-to disks.

z/VM systems hosting Linux guests using CKD disks will be placed under GDPS xDR control, providing them with near-equivalent management to what is provided for z/OS systems in the sysplex, including planned and unplanned HyperSwap.

And because it is all managed by the same GDPS, the swap can be initiated as a result of a problem on a z/OS disk, meaning that you do not have to wait for the problem to spread to the Linux disks before the swap is initiated. Equally, a problem on a CKD Linux disk can result in a HyperSwap of the Linux disks and the z/OS disks.

The projected final configuration is shown in Figure 12-4 (for clarity, we have not included the Linux components in the figure).

Figure 12-4 Active/standby workload GDPS/PPRC configuration

12.4.1 Active/active workload

As mentioned, this client is in the process of enabling all its applications for data sharing and dynamic workload balancing. This project will proceed in parallel with the GDPS project. When the critical applications have been enabled for data sharing, the client plans to move to an Active/Active workload configuration, with several production systems in the primary site and others in the recovery site.

To derive the maximum benefit from this configuration, it most likely is possible to transparently swap from the primary to secondary disks. Therefore, it is expected that the move to an Active/Active workload will not take place until after HyperSwap is enabled.

The combination of multisite data sharing and HyperSwap means that the client’s applications will remain available across outages affecting a software subsystem (DB2, for example), an operating system, a processor, a coupling facility, or a disk subsystem (primary or secondary). The only event that can potentially result in a temporary application outage is an instantaneous outage of all resources in the primary site; this can result in the database managers in the recovery site having to be restarted.

The move to an Active/Active workload might require creating minor changes to the GDPS definitions, several new GDPS scripts, and modifications to existing ones, depending on whether new systems will be added or some of the existing ones moved to the other site. Apart from that, however, there is no fundamental change in the way GDPS is set up or operated.

12.5 DR and CA across two data centers at metro distance for z/VM and Linux on z Systems only

The client in this scenario runs their main production work on Linux on z Systems, which run as z/VM guests. The production data resides on CKD disks. The critical workloads are running on four z/VM systems. Two of the z/VM systems run in one site and the other two in the other site. They also have a couple of other, less important production z/VM systems running Linux guests. The z Systems server in each site is configured with IFL engines only (no general-purpose CPs), and the client has no z/OS systems or skills. They have two centers within metro distance of each other. The client already uses PPRC to remote copy the primary disks to the second site. They also have the infrastructure and connectivity in place for the SSI cluster.

The disk environment is well-structured. Although the various z/VM systems share a physical disk subsystem, the disks for each of the z/VM systems are isolated at an LSS level.

Because the client is already doing remote mirroring, their first priority is to ensure that the secondary disks provide the consistency to allow restart in case of a disaster, rather than recovery. Because of pressure from their business, the client wants to move to a zero (0) data loss configuration as quickly as possible, and also wants to investigate ways to reduce the time required to recover from a disaster.

There are also regulatory pressures that force the client to periodically demonstrate that they can run their production workload in either site for an extended period of time. Therefore, they also need to have processes to perform planned workload moves between sites as automatically and as fast as possible with minimum operator intervention.

Because of the disaster recovery and continuous availability requirements of this client, together with the work that they have already done and the infrastructure that is in place, the GDPS offering for them is the GDPS Virtual Appliance. Table 12-4 shows how this offering addresses this client’s needs.

Table 12-4 Mapping client requirements to GDPS/PPRC attributes

Attribute	Supported by GDPS Virtual Appliance
Two sites, 9 km apart	Y
Zero data loss	Y (PPRC with Freeze policy of SWAP,STOP)
Maintain consistency of secondary volumes	Y (Freeze)
Maintain consistency of secondary volumes during PPRC resynch	Y¹ (FlashCopy)
Remote copy and remote consistency support for FB devices	Y (Open LUN support)
Ability to conduct DR tests without impacting DR readiness	Y (FlashCopy)
Automated recovery of disks and systems following a disaster	Y(GDPS script support)
Ability to transparently swap z/VM (and guest) disks between sites transparently	Y (HyperSwap)
DR and CA support for Linux guests under z/VM	Y
Ability to automate planned move of systems between sites	Y (Script support)
z/OS skills not required	Y

¹ FlashCopy is used to create a consistent set of secondary volumes prior to a resynchronization, following a suspension of remote copy sessions.

Although this client has performed a significant amount of useful work already, fully benefiting from the capabilities of the GDPS Virtual Appliance, they are concerned about enabling appliance management for their entire production environment all at once. Because they have their disks isolated in separate LSSes for the SSI and the stand-alone z/VM systems, the following phasing-in of the function is possible:

1. Install a general-purpose CP engine on the Site2 z Systems server to run the GDPS Virtual Appliance².

2. Install the GDPS Virtual Appliance to initially manage one of the stand-alone z/VM systems and the data for this system. Start with the least critical system.

Define the remote copy configuration to GDPS, and start by using GDPS to manage and monitor the configuration for the first z/VM system.

In this limited implementation, the client can test all aspects of the GDPS Virtual Appliance, isolated from their more important systems. They can code and test scripts, exercise Freeze, planned and unplanned HyperSwap, refine their operational procedures and prepare for cutover of their more important z/VM systems.

3. After the staff becomes familiar with the appliance, the client can then put the second z/VM system and the disks of this system under appliance management. They can perform more tests in this environment to understand how the appliance works when there are multiple systems under its control and make final preparations for moving the SSI environment to be under appliance control.

4. Finally, the client will add the 4-way SSI into the appliance managed environment, perform some more tests and finalize their implementation.

In the future, if the client implements applications that update data across multiple platforms within the scope of a single transaction, the ability to have consistency across all disks will move from being “nice to have” to a necessity.

5. After all systems are under GDPS control, the client can schedule a test to move their workload to all run in Site2 using the Site2 disks. Primary disk role will be swapped to Site2 using planned HyperSwap thus making the move transparent to the systems that were already running in Site2. The PPRC mirror will be reversed to run from Site2 disks toward the Site1 disks in order to retain unplanned HyperSwap capability while the workload is running in Site2. The systems running in Site1 will be stopped and re-IPLed in Site2 after the disks are swapped. A single planned action script will be used to perform this move, minimizing operator intervention and the time required to execute the entire process.

Similarly, a planned action script will be used to move the systems back to their “normal” locations.

The first time that the client does this exercise, they will isolate to run production in Site2 over a weekend period, returning to normal before Monday morning. However, using the same process and scripts, they will eventually schedule moves where they remain in Site2 for a longer period of time.

12.6 Local CA and remote DR across two data centers at long metropolitan distance

The client has two datacenters (Site1 and Site2) at 100 km distance. They run all their systems in Site1 and Site2 is their disaster recovery location.

They already use PPRC to mirror their data to Site2 and have GDPS/PPRC (single-site workload) implemented to manage the environment. They use GDPS/PPRC with a Freeze and Stop policy because they have a requirement for zero data loss (RPO=0). However, they have enabled this environment for unplanned swaps because of the long distance between the sites. Also, because they do not have sufficient cross-site channel bandwidth between, they cannot run production with their systems in Site1 using the disks in Site2. The reason that they have HyperSwap enabled is so they can do a graceful shutdown of their systems. After the systems are shut down, they move production to Site2.

The client has a large number of mirrored devices and defines their PPRC secondary devices in an alternate subchannel that is set to mitigate their UCB constraint. They have FlashCopy devices in Site2, which they use for periodic DR validation testing.

The fact that they are unable to fully benefit from HyperSwap means that disk failure is a single point of failure for their sysplex, so they need to invoke DR for a disk failure, a single component failure. They have a requirement to eliminate this single point of failure by providing a local PPRC mirrored copy of the data, which will give them the full benefit of HyperSwap.

They are due for a disk technology refresh and would like to take advantage of this activity to add a local copy of the disk for CA.

Whatever solution they chose, the client must not be exposed, from a DR risk perspective, while implementing the solution.

Given their requirement for local PPRC and HyperSwap, they need to decide how to also protect their data for DR purposes. Although using XRC or GM in conjunction with the local PPRC mirror in an MGM or MzGM 3-site configuration could be an option, with XRC or GM, they cannot achieve zero data loss that is an absolute requirement for their business. MTMM can provide them with a synchronous mirror, both locally and in the remote data center, and meet their zero data loss requirement.

Another key consideration the client has is the skills that they have already built in using GDPS/PPRC as their DR solution. Although they understand that a new topology with an extra copy of data will necessitate changes, they would like to avoid reinventing the wheel and using a radically different solution that voids their investment in the GDPS technology. They would like the solution to be phased in.

GDPS/MTMM is the ideal solution for this client. The MTMM copy technology meets their requirements for local CA and remote DR with minor additional skill requirement and their existing PPRC mirror can remain functional during the upgrade from GDPS/PPRC to GDPS/MTMM. In Table 12-5, we show how GDPS/MTMM can meet the client’s requirements. The client is already using GDPS/PPRC, so they need to select a solution that provides all of the benefits of GDPS/PPRC and meets their additional requirements.

Table 12-5 Mapping client requirements to GDPS/MTMM attributes

Attribute	Supported by GDPS/MTMM
Two sites, 100 km apart	Y
Zero data loss	Y (Freeze policy with STOP)
Maintain consistency of secondary volumes	Y (Freeze)
Local CA and remote DR	Y (MTMM technology)
Ability to conduct DR tests without impacting DR readiness	Y (FlashCopy)
Automated recovery of disks and systems following a disaster	Y(GDPS script support)
Ability to transparently swap z/OS disks between the local copies of data transparently	Y (HyperSwap, preferred leg)
Ability to transparently swap z/OS disks between the one of the Site1 copies and the Site2 copy transparently to facilitate orderly shutdown	Y (HyperSwap, non-preferred leg)
Support for a single PPRC leg (Site1-Site2) to facilitate a phased migration to the new topology	Y
Protect investment in GDPS/PPRC and GDPS/PPRC skills	Y
Maintain existing Site1-Site2 mirror while adding local mirror	Y

The client can plan for the following high-level steps when moving their GDPS/PPRC environment to a GDPS/MTMM environment:

1. Refresh the existing GDPS/PPRC Site1 and Site2 disks with new technology disks that support the MTMM technology. This is a process that clients are fairly familiar with already. Often, it can be achieved nondisruptively using HyperSwap or TDMF technologies. At this time, the client will also acquire the third set of disks that will be installed locally.

2. Upgrade GDPS/PPRC to GDPS/MTMM. This will initially be a GDPS/MTMM with a single replication leg which is the client’s existing GDPS/PPRC mirror. GDPS/MTMM in a single leg configuration will function very similar to GDPS/PPRC with some minor differences. At this point, the client has the same protection and capabilities that they had with GDPS/PPRC. The procedural changes required to accomplish this implementation step are quite minor since the overall topology of their mirror has not changed. The client will have to adjust some of their GDPS scripts and operational procedures, but this will not be a major change.

3. Finalize the implementation by adding the second, local replication leg to the GDPS/MTMM configuration. This step will, again, require some modifications to the client’s existing GDPS automation scripts as well as addition of some new scripts since the new topology with two replication legs can now cater to additional planned and unplanned outage scenarios. The operational procedures will also need to be changed in parallel. Because the client has somewhat familiarized themselves with the high-level differences between GDPS/PPRC and GDPS/MTMM while running in the single-leg configuration, this second step will not be a radical change from a skills perspective. With the accomplishment of this step, the client will meet all of their requirements.

12.7 DR in two data centers, global distance

The client in this scenario has a data center in Asia and another in Europe. Following the tsunami disaster in 2004, the client decides to remote copy their production sysplex data to their data center in Europe. The client is willing to accept the small data loss that will result from the use of asynchronous remote copy.

However, there is a requirement that the data in the remote site is consistent, to allow application restart. In addition, to minimize the restart time, the solution must provide the ability to automatically recover the secondary disks and restart all the systems. The client has about 10000 primary volumes that they want to mirror. The disks in the Asian data center are IBM, but those in the European center that will be used as the secondary volumes are currently non-IBM.

The most suitable GDPS offering for this client is GDPS/XRC. Because of the long distance between the two sites (approaching 15000 km), using a synchronous remote copy method is out of the question. Because the disks in the two data centers are from a different vendor, GDPS/GM is also out of the question. Table 12-6 shows how the client’s configuration and requirements map to the capabilities of GDPS/XRC.

Table 12-6 Mapping client requirements to GDPS/XRC attributes

Attribute	Supported by GDPS/XRC
Two sites, separated by thousands of km	Y
Willing to accept small data loss	Y (actual amount of data loss will depend on several factors, most notably the available bandwidth)
Maintain consistency of secondary volumes	Y
Maintain consistency of secondary volumes during resynch	Y¹ (FlashCopy)
Over 10000 volumes	Y (use coupled SDM support)
Requirement for data replication for and between multiple storage vendors products	Y
Only z/OS disks need to be mirrored	Y
Automated recovery of disks and systems following a disaster	Y(GDPS script support)

¹ FlashCopy is used to create a consistent set of secondary volumes before a resynchronization, following a suspension of remote copy sessions.

The first step for the client is to size the required bandwidth for the XRC links. This information will be used in the tenders for the remote connectivity. Assuming the cost of the remote links is acceptable, the client will start installing GDPS/XRC concurrently with setting up the remote connectivity.

Pending the availability of the remote connectivity, three LPARs will be set up for XRC testing (two SDM LPARs, plus the GDPS controlling system LPAR). This will allow the systems programmers and operators to become familiar with XRC and GDPS operations and control. The addressing of the SDM disks can be defined and agreed to and added to the GDPS configuration in preparation for the connectivity being available.

The final configuration is shown in Figure 12-5. The GDPS systems are in the same sysplex and reside on the same processor as the European production systems. In case of a disaster, additional CBU engines on that processor will automatically be activated by a GDPS script during the recovery process.

Figure 12-5 Final GDPS/XRC configuration

12.8 Other configurations

There are many other combinations of configurations. However, we believe that the examples provided here cover the options of one or two sites, short and long distance, and continuous availability and disaster recovery requirements. If you feel that your configuration does not fit into one of the scenarios described here, contact your IBM representative for more information about how GDPS can address your needs.

¹ Only the GDPS controlling systems can see the FB disks. Therefore, a second controlling system is recommended to ensure the FB disks can always be managed even if a controlling system is down for some reason.

² The option to purchase a z Systems general-purpose CP engine for clients that require one is included in the GDPS Virtual Appliance deal.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 12. Sample continuous availability and disaster recovery scenarios

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 12. Sample continuous availability and disaster recovery scenarios