The checklist for the capture or requirements is as follows:
- An organisational disaster management plan has been created that identifies key people, business functions, systems, and processes and put in place mitigations to cope with disaster.
- The system is modifiable, and allows sufficient control to revert to earlier version in the event of failed upgrades
- The data is backed-up at a frequency proportional to the rate of change of data and stored securely and in controlled conditions both offsite (disaster support) and onsite (quick restore)
- The recovery media which includes data, source code, and duplicated hard copy materials is stored in a geographically different location which provides sufficient insulation from regional predicaments whilst still being workable in the event of s natural disaster or incident.
- The recovery media is be stored safely, securely and in controlled conditions
- All mission critical apps have a standby site maintained in an operationally ready state. This site is be capable of replacing the primary site and will be able to provide a hot fail-over capability.
- Key roles are shared between different SMEs to aid recoverability in the case of staff turnover, injury or illness
- An established methodology to test disaster management and recoverability on a periodic basis.
The checklist for architecture definition
- Is a methodology to move from the current environment to the target environment defined?
- Is a migration strategy to transfer the load to the target application defined?
- Has an application backup procedure been identified?
- Does the defined methodology allow reliable restoration of the systems in an acceptable timespan?
- Will the operations team be able to monitor the application landscape in a runtime environment?
- Do the ops team have a clear understanding of the processes they need to perform for the landscape?
- Does the solution takes into account the time taken to recover from system failures?
- Does the backup technique ensure integrity of restored data?
- Does the backup technique support online backups, with acceptable performance degradation?
- Has consideration been given for restoring data from corrupt or incomplete backups states?
- Will the application be able to respond elegantly to errors and exceptions, logging and reporting them into the management and monitoring solution?
- Is a standby site in the DR plan defined? Is the standby site identical to the primary site, or does it offer reduced performance?
- Is the technique for switching from primary to standby site established?
- Have you assessed the impact of the availability solution on functionality and performance? Is this impact acceptable?
- Has the architecture been assessed for single points of failure and other bottlenecks?
- Does the fault-tolerant model extend to all vulnerable components and modules?
- Is the information movement process from the existing environment into the target environment documented?
- Is there an outlined migration strategy to move the workload to the target system?
- How will the landscape deal with data synchronization challenges?
- Has the approach been identified to allow reliable system restoration in an acceptable time spans?
- Does the backup technique provide for integrity of restored data?
- Does the backup technique provide online backup model, with acceptable performance degradation?
- Has attention been given to restoring data from corrupt or incomplete backups?
- Is a standby site defined in the architecture? Is the standby identical to the primary site, or does it offer reduced performance?
- Have you defined and QA'd the technique for swapping from production to standby environments?
- Has the impact of the availability technique on functionality and performance been assessed and is it acceptable?
- Has the architecture been assessed for single points of failure and weaknesses?
- Does your fault-tolerant model extend to all vulnerable entities?