The checklist for the capture of requirements is as follows:
- An organisational disaster management plan must be created that identifies key people, business functions, systems, and processes and put in place mitigations to cope with disaster.
- The system has sufficient control to revert to earlier version in the event of failed upgrades.
- The data an be backed-up at a frequency proportional to the rate of change of data and stored securely and in controlled conditions both offsite (disaster support) and onsite (quick restore).
- The recovery media, which includes data, source code, and duplicate hard copy materials, is stored in a geographically different location that provides sufficient insulation from regional emergencies whilst still being workable in the event of a natural disaster or incident.
- The recovery media is stored safely, securely and in controlled conditions.
- The business critical applications have a standby site maintained in an operationally ready state. This site is capable of replacing the primary site and can provide a hot fail-over capability.
- Key roles are shared between different people to facilitate recoverability in the case of staff turnover, injury, or illness.
- A methodology exists to QA the disaster management and recoverability on a periodic basis.
The checklist for architecture definition is as follows:
- Are development standards leveraged (for example, schemes, programming, database approach, recognizable nomenclature, and user interfaces)?
- Are the subsystems distributed?
- Are the check processes or watchdogs established?
- Is the operations team capable of providing status information?
- Are the input, output, and processing, implemented separately?
- Are the programs well-structured and easy to understand?
- Is the critical functionality contained in distinct modules?
- Will data processing be done in parallel?
- Has a methodology been defined to move from the current environment to the target environment?
- Has a migration strategy been defined to transfer the workload to the target system?
- Has the application backup procedure been identified?
- Does the defined methodology allow reliable restoration of the systems in an acceptable timespan?
- Will the operations team be able to monitor and control the application landscape in a production environment?
- Do the administrators have full knowledge of the processes to perform for the landscape?
- Does the solution emphasize the time to recover from system failures, for example, restoring backups?
- Does the backup technique ensure transactional integrity of restored data?
- Does the backup technique provide online backups, with acceptable performance degradation?
- Has consideration been given to restoring data from corrupt or incomplete backups?
- Will the application be able to respond elegantly to exceptions, and reporting them into the management and monitoring solution?
- Is a standby site in the DR plan defined? Is the standby site identical to the primary site, or offer reduced performance?
- Are the techniques for switching between primary to standby site defined?
- Has the impact of the availability solution on functionality and performance been assessed?
- Has the architecture been assessed for single points of failure and other bottlenecks?
- Does the fault-tolerant model extend to all entitites in the landscape?