Methodology

Reliability is the ability to operate without faults and failures and to be able to recover from faults and exceptions. This includes precise data and transformations, flawless state management, and non-corrupting recovery from failure events. Creating reliable systems depends on the entire SDLC from the architecture to early design, through the build, to deployment and ongoing maintenance.

This process involves the following key aspects:

  • Build Management and instrument information into the system:

In the architecture stage, it is critical to include health monitoring information for the application. This information includes resource consumption, response times, status conditions, and warnings. Monitoring is a critical best practice that enables continues analysis, identification, and isolation of system failure problems before they can occur and crippling the infrastructure

  • Leverage redundancy for reliability:

Design methodology for achieving reliability are based on redundancy of software and hardware components. The redundancy ensures non-corrupting recovery from various failure events.

They might be double or triple redundant components running in parallel with common validation checks. The alternative technique is leveraging clustering, load balancing, replication, and protecting complex functions with transactions to ensure integrity.

Redundant hardware components: Best practice redunduncy strategy includes arrays of disks, network interfaces, and power supplies. With such an infrastructure, module failures can occur without affecting the overall reliability.

  • Leverage quality development tools:

Software tools and frameworks should help with development of fault-tolerant and robust applications and provide a UI rich IDE for coding, QA and deploying distributed applications.

  • Leverage robust error handling and health checks:

An error handling capability is a critical source of failure resolution in many distributed architectures. A well-architected application must respond to exception conditions in a systemetic way. The application also needs to run scheduled health checks on a continuous basis.The process consists of identification of error condition, determination of the resolution, and gracefully continue running the application.

  • Remove single point of failures in the design:

A reliable system provides a significant benefit: such an application is much simpler to enhance, while an unreliable application costs much more to change.

  • Leverage SDLC process:

Leverage consistent, repeatable, software development methodology which will lead to a reliable application. A formal process establishes a detailed analysis leading to innovation and discovery. An efficient data center leverages documented processes for software development, capacity planning, configuration management, change management, incident management, network and security operations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset