Chapter 3

Design business continuity solutions

Cloud Solution Architects understand the importance and need to design a business continuity solution. Most enterprises have a well-established business continuity and disaster recovery plan, also known as a BC/DR plan. Typically, the best starting point when defining and choosing a business continuity solution is to perform a business criticality assessment. A criticality assessment helps you determine the criticality of systems and their impact on the business if an outage occurs. This assessment should guide you in developing the right business continuity strategy for the company. When you perform the criticality assessment and identify critical applications, the next step is to figure out your backup and disaster recovery strategy.

The AZ-305 certification exam expects you to demonstrate a solid understanding of designing a business continuity and disaster recovery plan. The Azure Solution Architect certification is an expert-level exam, so this exam expects you to have advanced-level knowledge of each domain objective.

Skills coveredin this chapter:

Skill 3.1: Design a solution for backup and disaster recovery

The success of any application, especially when it runs in the cloud, depends on how gracefully it handles failures and continues to deliver as much business value as possible. This approach is also known as designing for failure. When designing a solution for backup and recovery, you should first identify failure situations and their potential impacts on your organization. Then you should perform analysis and a criticality assessment, develop a business continuity strategy, and document your data protection requirements. Finally, you should develop backup and recovery plans to address the data protection requirements identified by your analysis.

NoteSuccessful architects typically follow the same approach while designing backup and recovery solutions.

Recommend a recovery solution for Azure, hybrid, and on-premises workloads that meets recovery objectives (recovery time objective [RTO], recovery level objective [RLO], recovery point objective [RPO])

When your systems are unavailable, your company could directly or indirectly face some reputational harm. Large-scale outages or disasters can disrupt your business, staff, and users. Also, your company could face financial losses such as lost revenue or penalties for not meeting availability agreements for your services.

Business continuity and disaster recovery (BC/DR) plans are formal documents that organizations develop to cover the scope and steps to be taken during a disaster or large-scale outage. Each disruption is assessed on its merit.

For example, consider a scenario in which an earthquake has damaged your datacenter power and communication lines. This situation has rendered your corporate datacenter useless until power is restored and lines of communication are fixed. A fiasco of this magnitude could take your organization’s services down for hours or days, if not weeks. This is why you need a complete BC/DR plan: to get the services back online as quickly as possible.

RTOs, RPOs, and RLOs

As part of your BC/DR plan, you must identify your application’s recovery time objectives (RTOs) and recovery point objectives (RPOs).

Both objectives, at a minimum, help you determine a baseline approach with a clear commitment to a speed of recovery (recovery time objectives, or RTOs) and risk of data loss (recovery point objectives, or RPOs).

Before diving into the solutions, let us look at three widely used terms to define recovery objectives RPO, RTO, and RLO.

  • Recovery point objective (RPO)The recovery point objective is used to determine the maximum time between the last available backup and a potential failure point. Also, the RPO helps determine the amount of data a business can afford to lose in a failure. For example, if your backup occurs every 24 hours at 4 a.m. and a disaster happens at 1 p.m. the following day, then 9 hours of data would be lost. If your company’s RPO is 12 hours, then no data would be lost because only 9 hours would have passed, and you would have a better recovery point backup available from which you could recover. However, if the RPO is 4 hours, then your backup strategy would not meet your RPO requirement, and damage would occur to the business.

  • Recovery time objective (RTO)The recovery time objective is used to determine the maximum time a data recovery process can take. It is defined by the amount of time the business can afford for the site or service to be unavailable. For example, let’s say one of your applications has an RTO of 12 hours. This means your business can manage for 12 hours if this application is unavailable. However, if the downtime is longer than 12 hours, your business would be seriously harmed.

  • Recovery level objective (RLO)The recovery level objective defines the granularity with which you must be able to recover data regardless of whether you must be able to recover the whole application stack.

Figure 3-1 explains the recovery point and recovery time concepts. The recovery time is the amount of time needed to recover the data, whereas the recovery point is the last point a successful backup was made.

The figure shows a timeline from T-x to T+y. A red X icon representing failure is shown in the center at time T. The figure shows the Recovery Point as being how far back the last successful backup was taken and how long data recovery took after the failure, which is shown as Recovery Time.

FIGURE 3-1Recovery point objective and recovery time objective

Azure Site Recovery

To meet your business continuity and disaster recovery strategy, you should leverage Azure Site Recovery.

Azure Site Recovery supports applications running on Windows- or Linux-based physical servers, VMware, or Hyper-V. Using Azure Site Recovery, you can perform application-aware replication to Azure or to a secondary site. You can use Azure Site Recovery to manage replication, perform a DR drill, and run failover and failback.

Azure Site Recovery (ASR) is recommended for application-level protection and recovery:

  • ASR can be used to replicate workloads running on a supported machine.

  • ASR offers near-real-time replication with RPOs as low as 30 seconds. Typically, this meets the needs of most critical business apps.

  • ASR can take app-consistent snapshots for single- or multi-tier applications.

  • ASR also integrates with SQL Server AlwaysOn and other application-level replication technologies such as Active Directory replication and Exchange database availability groups (DAGs).

  • ASR recovery plans are very flexible and enable you to recover the entire application stack with a single click and include external scripts and manual actions in the plan.

  • ASR offers advanced network management capabilities to simplify app network requirements, such as the ability to reserve IP addresses, configure load-balancing, and integrate with Azure Traffic Manager for low RTO network switchovers.

  • A rich automation library is available, which provides production-ready, application-specific scripts that can be downloaded and integrated with recovery plans.

ImportantFrequently Asked Questions About ASR

Microsoft documentation has a very comprehensive list of FAQs for ASR that cover various workload types and disaster recovery scenarios. To learn more, visit the Microsoft documentation at https://docs.microsoft.com/en-us/azure/site-recovery/site-recovery-faq.

Azure Backup service

The Azure Backup service provides a secure and cost-effective solution to back up your data and keep it safe and recoverable in case of service disruption, accidental deletion, or data corruption. ASR and Azure Backup complement each other, helping organizations design end-to-end BC/DR plans.

Azure Backup helps you back up files, folders, machine states, and other workloads running on on-premises and Azure virtual machines (VMs). You can use Azure Backup to protect the following workload types:

  • Azure VMsUse the Microsoft Azure Recovery Services agent (MARS) to back up both Windows and Linux VMs.

  • Azure Managed DisksBack up Azure Managed Disks using Backup vault.

  • Azure File sharesBack up Azure File shares using the Recovery Service vault.

  • SQL Server in Azure VMsBack up SQL Server Databases running on Azure VMs.

  • SAP HANA databases in Azure VMsBack up SAP HANA databases running on Azure VMs.

  • Azure Database for PostgreSQL serversBack up Azure Database for PostgreSQL servers with long-term retention.

  • Azure BlobsAzure Backup helps you protect blobs in the storage account and enhance the data protection at scale.

  • On-premises machinesUse Microsoft Azure Recovery Services (MARS) agent to back up Windows Servers, or use System Center Data Protection Manager (DPM) or Azure Backup Server (MABS) agent to protect VMs (Hyper-V, VMware).

Azure Backup stores backed-up data in vaults: Recovery Services vault and Backup vault. A vault is a storage entity in Azure that holds data, such as backup copies, recovery points, and backup policies.

Consider the following recommendations when you create storage vaults:

  • Use separate vaults for Azure Backup and Azure Site Recovery.

  • Use role-based access control (RBAC) to protect and manage access to storage vaults.

  • Design for redundancy. This means specifying how data in vaults is replicated. Azure offers the following three options to replicate data:

    • Locally redundant storage (LRS)To protect data from server rack and drive failures, use LRS. LRS replicates data three times within a single datacenter in the primary region and provides at least 99.999999999 percent (11 nines) annual uptime.

    • Geo-redundant storage (GRS)To protect data from region-wide outages, use GRS. GRS replicates data to a secondary region. The Recovery Services Vault uses GRS by default.

    • Zone-redundant storage (ZRS)ZRS replicates data in availability zones, guaranteeing data residency and resiliency in the same region.

Understand the recovery solutions for containers

Many organizations’ cloud adoption strategies use containers to focus heavily on modern application development. Containerization is an approach used in software development in which an application or service and its dependencies are packaged together as a container image. Containerized applications help organizations accelerate time to market, reduce operating overhead, make workloads more portable, and modernize legacy workloads.

The Azure Kubernetes Service (AKS) is the most popular service used by organizations to deploy and manage containerized applications in Azure. Although AKS is a fully managed service that provides built-in high availability (HA) by using multiple nodes in a virtual machine scale set (VMSS), the built-in HA within the region does not protect your system from failure.

Consider the following best practices and recommendations to maximize uptime and faster recovery of solutions in case of regional disruption:

  • Deploy AKS clusters in multiple regions. Choose Azure-paired regions, which are designed explicitly for disaster-recovery scenarios.

  • Use Azure Container Registry to store container images and geo-replicate the registry to each AKS region. You need a Premium SKU to use geo-replicated instances of Azure Container Registry.

  • Back up AKS clusters using Velero and Azure Blob storage. Velero is an open-source community standard tool you can use to back up and restore Kubernetes cluster objects and persistent volumes.

EXAM TIP

Velero is an open-source community tool you can use to back up and restore AKS cluster persistent volumes and other additional cluster objects.

Recommend a backup and recovery solution for compute

As you learned earlier in this chapter, you can use Azure Backup to back up supported compute resources such as Azure virtual machines and restore them seamlessly when needed. Azure Backup consists of two tiers:

  • SnapshotIn this tier, the backups are stored locally for five days. The restore process from the snapshot tier is much faster because there is no wait time for snapshots to copy to the vault before triggering the restore.

  • Recovery Services vaultAfter the snapshots are created, Azure Backup transfers the data to the Recovery Services vault for additional security and longer retention.

Consider the following recommendations for Azure virtual machine backup and recovery:

  • Backup schedule policiesCreate separate backup policies for critical and noncritical virtual machines. Consider scheduled start times for different backup policies at different times of the day and ensure the time slots do not overlap.

  • Backup retention policiesImplement both short-term (daily) and long-term (weekly) backups.

  • Cross-Region Restore (CRR)Using CRR, you can also restore Azure VMs in a secondary region. This option lets you conduct drills to meet audit or compliance requirements.

  • Optimize restore timeDuring the restore process from a single vault, it is recommended that you use a general-purpose v2 Azure Storage account for each VM to avoid transient errors. For example, if five VMs are restored, use five different storage accounts.

  • MonitoringUse Azure Monitor alerts for Azure Backup to receive notifications when a backup or restore fails.

EXAM TIP

In a VM replication scenario, create a Recovery Services vault in any region except the source region you want to replicate from. In a VM backup scenario to protect data sources, create a Recovery Services vault in the same region as the data source.

Recommend a backup and recovery solution for databases

You learned earlier in this chapter that Azure Backup is the service you should use to back up and recover SQL Servers running on virtual machines and SAP HANA databases running on Azure virtual machines.

This section covers recommendations for the backup and recovery of the Azure SQL Database.

The Azure SQL Database and Azure SQL Database Managed Instance have a built-in automatic backup system, also known as a point-in-time restore (PITR). The PITR retains backups for 7 to 35 days, depending on your database service tiers. The PITR allows you to restore a database from these backups to any time in the past within the retention period. You incur an additional cost only if you use the restore capability to create a new database.

The automated database creates full weekly backups, differential backups every 12 to 24 hours, and transaction log backups every 5 to 10 minutes.

You might wonder, what if you need to keep backups for longer than 35 days for audit or compliance reasons? In this case, you can use the long-term retention (LTR) feature. With LTR, you can store Azure SQL Database backups in read-access geo-redundant storage (RA-GRS) blobs for up to 10 years. If you need access to any backup in LTR, you can restore it as a new database using either the Azure Portal or PowerShell.

More InfoLong-Term Retention (LTR) Backup Data Using Azure Backup Archive Tier

You can also consider the Azure Backup Archive tier for long-term data retention to meet data compliance requirements. You can find the list of supported workloads and Azure regions for the Azure Backup Archive tier in the Microsoft documentation at https://docs.microsoft.com/en-us/azure/backup/archive-tier-support.

Recommend a backup and recovery solution for unstructured data

Azure Blob storage is a storage solution for unstructured data. Unstructured data doesn’t adhere to a particular data model or definition. Examples of unstructured data include text and binary data.

Azure Storage account has a built-in local data protection solution called operational backup for Blobs. The operational backup solution protects the block Blobs from various data loss scenarios such as container deletion, Blob deletion, or accidental storage account deletion. The data is stored locally within the storage account and can be recovered when needed to a selected point in time within a maximum retention period of 360 days.

Consider the following recommendations to enhance data protection and recovery for Azure Blob storage:

  • Soft deleteYou can enable soft delete at the container level or for Blobs. When soft delete is enabled, you can restore a container and its Blob at the time of deletion.

  • VersioningBlob versioning automatically maintains previous versions of a Blob. When Blob versioning is enabled, you can restore an earlier version of a Blob when needed.

    • Resource locksSoft delete does not protect you against the deletion of the storage account. You must use resource locks to prevent the accidental deletion of the storage account. You can use the following lock types:

    • CanNotDeleteAuthorized people can read and modify a resource but can’t delete it.

    • ReadOnlyAuthorized people read but cannot modify or delete a resource.

Skill 3.2: Design for high availability

Resiliency, fault tolerance, and high availability are essential attributes for mission-critical systems so that they can recover from failures and continue to function. You should design cloud applications keeping in mind the fact that failures do happen, so you should be able to minimize the effects of failing components on business operations. Every system has particular failure modes, which you must consider when designing and implementing your application.

High availability (HA) is the capability of any computing system to provide desired and consistent uptime, even in the event of an underlying infrastructure failure. This requirement is vital for mission-critical systems that will not tolerate an interruption in the service availability. HA is also imperative for any system for which any downtime would cause damage or monetary loss.

HA systems guarantee a percentage of uptime. The number of nines in the percentage is usually used to specify the degree of high availability offered. For example, “five nines” indicates a system that is up 99.999 percent of the time. A system with 99.9 percent uptime can be down only 0.1 percent of the time, so in a year, to meet 99.9 percent SLA, you can only have 8.77 hours of downtime.

Designing apps for high availability and resiliency usually means running them in a healthy state without significant downtime. This design begins with gathering requirements and asking the right questions. For example:

  • How much downtime is acceptable?

  • What does this potential downtime cost your business?

  • What are your customer’s availability requirements?

  • How much can you invest in making your application highly available?

  • How much risk versus the cost can you tolerate?

Following are three essential characteristics of a highly available system:

  • RedundancyThis means ensuring that any elements crucial to the system operations have additional redundant components that can take control in the event of failure.

  • MonitoringThis means gathering data from a running system and identifying when a component fails or fails to respond.

  • FailoverThis refers to a mechanism that could automatically switch from the currently active component to a redundant component if monitoring shows a breakdown of the active component.

Microsoft Azure services are designed and built at every layer to deliver the highest levels of redundancy and resilience. Azure infrastructure is composed of geographies, regions, and availability zones, limiting the failure and the potential impact on customer applications and data.

Microsoft defines its SLA for each Azure service. If you need to have a higher SLA than what Azure offers, you can set up redundant components with failover.

Identify the availability requirements of Azure resources

As you learned in the previous section regarding high availability (HA) and different service-level agreements (SLAs), depending on the SLA, your cloud workload can provide a continuous user experience with no apparent downtime, even when things go wrong.

Highly available workloads have the following quality attributes:

  • They do not have a single point of failure.

  • They can scale on demand to meet performance needs when load increases.

  • They can detect and respond to failure gracefully.

Consider the following recommendations when defining the requirements to design resilient and highly available Azure applications:

  • Identify workload types and usage patternsThe SLA in Azure defines Microsoft’s commitment to the uptime of the Azure services. Different services have different SLAs. For example, App Services have an SLA of 99.95 percent, and an Azure SQL Database has an SLA of 99.99 percent. Both services together provide a composite SLA of 99.94 percent. Understanding your overall SLA expectation for the application is vital to designing the application architecture appropriately to meet the business SLA need.

  • Cost and complexityAs you move toward more nines, the cost and complexity grow. The higher the SLA, the less frequently the service can go down, and the quicker the service must recover. To achieve four nines (99.99 percent), you can’t rely on manual intervention to recover from failures. The application must be self-diagnosing and self-healing.

  • Start with failure mode analysis (FMA)FMA is a process for building resiliency into a system by identifying possible failure points in that system. Create end-to-end dependency mapping in the application architecture and identify dependencies. Pay particular attention to dependencies that might be a single point of failure or cause bottlenecks to scale. If a workload requires 99.99 percent uptime but depends on a service with a 99.9 percent SLA, that service can’t be a single point of failure in the system.

  • Understand availability metricsFollowing are two measures you should use to plan for redundancy and determine SLAs:

    • Mean time to recovery (MTTR)The average time it takes to restore a component after a failure

    • Mean time between failures (MTBF)How long a component can reasonably expect to last between outages

Recommend a high-availability solution for compute

Microsoft Azure global datacenters and underlying infrastructure are designed to deliver the highest redundancy and resiliency for an application running on Azure services. However, failures do happen. Therefore, the key to designing a reliable application in the cloud is to design applications to handle failures and minimize business disruptions gracefully.

In this section, you’ll learn the recommendations to increase the availability of Azure VMs:

  • Single VMSingle VMs have an SLA offered by Azure. If you use premium storage for all operating system disks and data disks, you can get only 99.9 percent SLA.

  • Availability setsThese can help you increase the level of SLA from 99.9 percent to 99.95 percent. Availability sets protect a set of VMs from localized hardware failures, such as a disk or network switch, ensuring not all VMs are deployed on the same underlying hardware. Each virtual machine in the availability set is assigned an update domain and a fault domain by default. Each availability set can be configured with up to three fault domains and 20 update domains. Update domains indicate groups of virtual machines that can be rebooted simultaneously. For example, if you deploy 10 virtual machines in an availability set with three update domains, you have at least six VMs always available during planned maintenance.

  • Availability zonesThese are unique physical locations within an Azure region. Every single zone in Azure is composed of one or more datacenters with independent power, cooling, and networking. The physical separation of availability zones within a region limits the impact on applications and data from zone failures such as large-scale flooding or other natural disasters that could disrupt the entire datacenter and the availability of resources. Availability zones help you increase SLA levels from 99.95 percent to industry best 99.99 percent uptime.

More InfoAvailability Zones Supported Regions

Not every Azure region supports availability zones. You can find the list of supported Azure regions for availability zones in the Microsoft documentation at https://docs.microsoft.com/en-us/azure/availability-zones/az-region#azure-regions-with-availability-zones.

  • Proximity placement groups (PPGs)A proximity placement group is a logical grouping that ensures Azure compute resources are physically located in close proximity for low network latency between VMs. You can use PPGs with both availability sets and availability zones.

  • Virtual machine scale sets (VMSS)To achieve redundancy, high availability, and improved performance, applications are distributed across multiple instances. Azure VMSS are used to create and manage a group of load-balanced VMs. The number of virtual machine instances can automatically scale (increase or decrease) on demand or per defined time schedules.

EXAM TIP

Virtual machine scale sets can be deployed in multiple availability zones to achieve resiliency and fault tolerance against regional failures.

EXAM TIP

Always place VMs in one availability set. A single availability set with two or more VMs helps to provide redundancy so that one VM is always up and running if a failure occurs.

Recommend a high-availability solution for non-relational data storage

Azure Storage provides several redundancy options to help ensure your data is available. Azure stores multiple copies of your data in Azure Storage to prevent unplanned disruptions. Redundancy ensures that your storage account fulfills the SLA for Azure Storage.

While deciding which redundancy option is best, you should consider the trade-offs between cost and durability. The factors that help determine which storage type you should choose include the following:

  • How do you replicate your data on the primary site?

  • If your data needs to be replicated to a second site, is it geographically distant from the primary site to protect against regional disasters?

  • Does your application need read access to the replicated data in the secondary region if the primary region is no longer available?

As noted, Azure maintains multiple copies of your data stored in Azure Storage. Azure offers two options for Azure Storage, based on how data will be replicated throughout the primary region:

  • Locally redundant storage (LRS)With LRS, data is replicated synchronously three times within a single physical location in the primary region. Because LRS provides local redundancy, it is the least expensive option, but it is not recommended for mission-critical applications that require better availability.

  • Zone-redundant storage (ZRS)With ZRS, data is replicated synchronously across three Azure availability zones in the primary region. It is recommended that you use ZRS in the primary region for applications requiring high availability, and you should also replicate it in a secondary region.

For mission-critical applications requiring the best availability, you can also replicate data in your Azure Storage account to another region that is hundreds of miles away from the primary region. Your data is more durable when your Azure Storage account is replicated to a secondary region. You are covered even in the case of a complete regional outage or a disaster, even if the primary region is not recoverable.

Microsoft offers two options for Azure Storage that offer redundancy for your data to another region:

  • Geo-redundant storage (GRS)With GRS, data is replicated synchronously three times within a single physical location in the primary region using LRS. Azure then moves an additional three copies of data asynchronously to a single physical location in the secondary region. You get enhanced redundancy with a total of six copies of data.

  • Geo-zone-redundant storage (GZRS)With GZRS, data is replicated synchronously across three Azure availability zones in the primary region using ZRS. Azure then moves an additional three copies of data asynchronously to a single physical location in a secondary region. You get enhanced redundancy with a total of six copies of data.

If you compare GRS and GZRS, you will find the only difference is how data is copied in the primary region. There is no difference in replication to the secondary region. For both options, data is always replicated in the secondary region three times using LRS. This LRS redundancy in the secondary region protects the data against hardware failures.

For both GRS and GZRS, the secondary location data will not be available for read or write access unless you do a failover to the secondary region. If you need read access to data in the secondary location, you should go for read-access geo-redundant storage (RA-GRS). If you also need zone redundancy, go for read-access geo-zone-redundant storage (RA-GZRS).

When the primary region is unavailable, you can failover to the secondary region. Once the failover is completed, the secondary region will become a new primary region, and you will again be allowed to read and write data.

More InfoFailing Over to The Secondary Region

For more information on failing over to the secondary region, see the Microsoft documentation at https://docs.microsoft.com/en-us/azure/storage/common/storage-disaster-recovery-guidance.

Table 3-1 describes critical parameters for each redundancy option.

TABLE 3-1Redundancy parameters

 

LRS

ZRS

GRS/RA-GRS

GZRS/RA-GZRS

Percent durability of objects over a given year

At least 99.999999999 percent (11 9s)

At least 99.9999999999 percent (12 9s)

At least 99.99999999999999 percent (16 9s)

At least 99.99999999999999 percent (16 9s)

Availability SLA for read requests

At least 99.9 percent (99 percent for Cool access tier)

At least 99.9 percent (99 percent for Cool access tier)

At least 99.9 percent (99 percent for Cool access tier) for GRS

At least 99.99 percent (99.9 percent for Cool access tier) for RA-GRS

At least 99.99 percent (99.9 percent for Cool access tier) for RA-GZRS

Availability SLA for write requests

At least 99.9 percent (99 percent for Cool access tier)

At least 99.9 percent (99 percent for Cool access tier)

At least 99.9 percent (99 percent for Cool access tier)

More InfoAzure Storage Guarantees

For more information about Azure Storage guarantees for durability and availability, see https://azure.microsoft.com/support/legal/sla/storage/.

Table 3-2 depicts the durability and availability of data in various scenarios, depending on which type of redundancy is in effect for your storage account.

TABLE 3-2Durability and availability of data

Outage scenario

LRS

ZRS

GRS/RA-GRS

GZRS/RA-GZRS

A node within a datacenter becomes unavailable.

Yes

Yes

Yes

Yes

An entire datacenter (zonal or non-zonal) becomes unavailable.

No

Yes

Yes

Yes

A region-wide outage occurs in the primary region.

No

No

Yes

Yes

Read access to the secondary region is available if the primary region becomes unavailable.

No

No

Yes (with RA-GRS)

Yes (with RA-GZRS)

NoteAccount Failover

Account failover is required to restore write availability if the primary region becomes unavailable. For more information, see https://docs.microsoft.com/en-us/azure/storage/common/storage-disaster-recovery-guidance.

Recommend a high-availability solution for relational databases

All applications need databases to store business data for the functionalities and features they provide to end-users. It’s important that these apps, and their respective databases, be highly available and recoverable.

Following are the four major potential disruption scenarios that could affect the database’s availability and the application:

  • Local hardware or software failures affect the database nodeAn example of such a scenario is disk-drive failure.

  • Data corruption or deletion caused by an application bug or human errorSuch failures are application-specific and typically cannot be detected by the database service.

  • Datacenter-wide outage, possibly caused by a natural disasterThis scenario requires some level of geo-redundancy with application failover to an alternate datacenter.

  • Upgrade or maintenance errorsUnanticipated issues during planned infrastructure maintenance or upgrades might require rapid rollback to a previous database state.

Azure SQL Database from the Azure SQL product family provides several business continuity features that you can use to mitigate various unplanned scenarios. For example:

  • Temporal tables allow you to restore row versions from any point in time.

  • Built-in automated backups and Point-in-Time Restore enable you to restore a complete database within the configured retention period of up to 35 days in the past.

  • You can restore a deleted database to the point at which it was deleted if the server has not been deleted.

  • Long-term backup retention allows you to keep backups for up to 10 years. This is in limited public preview for SQL Managed Instance.

  • Active geo-replication is another out-of-the-box feature that helps you create readable replicas and allows you to manually failover to any replica in case of a datacenter outage or application upgrade.

  • An auto-failover group allows for the recovery of a group of databases in a secondary region if a regional disaster occurs or if there is a full or partial loss of an Azure SQL database or Azure SQL Managed Instance.

Chapter summary

  • As part of your BC/DR plan, identify the RTOs, RPOs, and RLOs for your applications.

  • ASR gives you the flexibility to failover to Azure if a disaster occurs and fails back to on-premises machines after the event is over.

  • AKS is the most popular tool for deploying container workloads. To maximize uptime for AKS, plan for AKS clusters in multiple regions, and use geo-replication for container image registries.

  • Azure Backup provides simple, secure, cost-effective solutions to back up your compute, databases, and unstructured data.

  • Availability zones are distinctive physical locations within an Azure region made up of one or more datacenters, along with independent power, cooling, and networking. The physical separation of availability zones within a region limits the impact on applications and data from zone failures.

  • Autoscaling is a process of dynamically allocating computing resources to match performance requirements.

  • Azure stores multiple copies of your Azure Storage data to protect against planned and unplanned incidents, including transient hardware failures, network or power outages, and substantial natural disasters.

  • Azure Storage offers a durable platform and multiple geo-redundant storage options to ensure high availability. Storage account options with geo-redundant replication such as GRS and GZRS first synchronously replicate data in the primary region and then asynchronously replicate data to a secondary region at least a few hundred miles away.

  • GZRS/RA-GZRS will provide you with a maximum availability and durability solution (but it is more expensive).

Thought experiment

Now it is time to validate your skills and knowledge of the concepts you learned in this chapter. You can find answers to this thought experiment in the next section, “Thought experiment answers.”

You have been hired to work as a Cloud Solution Architect for Contoso. You must design disaster recovery and high-availability strategies for your internally hosted applications, databases, and storage. Your company has a primary office in Seattle and branch offices in New York, Chicago, and Dallas. As part of this project, you plan to move to the cloud three on-premises applications that belong to different departments. Each application has a different requirement for business continuity:

  • Sales departmentThe application must be able to failover to a secondary datacenter.

  • HR departmentThe application data needs to be retained for three years. From a disaster recovery perspective, the application needs to run from a different Azure region with an RTO of 15 minutes.

  • Supply-chain departmentThe application must be able to restore data at a granular level. The RTO requirement is six hours.

You must recommend which services should be used by each department. While there could be multiple answers, choose the options that help minimize cost.

  1. Which of the following would you use for the sales department?

  1. Azure Backup only

  2. ASR only

  3. ASR and Azure Migrate

  4. ASR and Azure Backup

  1. Which of the following services would you recommend for the HR department?

  1. Azure Backup only

  2. ASR only

  3. ASR and Azure Migrate

  4. ASR and Azure Backup

  1. Which of the following services would you recommend for the supply-chain department?

  1. Azure Backup only

  2. ASR only

  3. ASR and Azure Migrate

  4. ASR and Azure Backup

Thought experiment answers

This section contains the answers to the “Thought experiment” questions.

  1. Which of the following would you use for the sales department?

    AnswerB: ASR only

    ExplanationYou can use the ASR service to ensure that you can failover your application to a secondary region. The other options are incorrect because you need ASR to address the sales department’s requirement for the failover. You don’t need Azure Migrate because it should be used when you want to migrate VMs from VMWare VMs to Azure VMs.

  2. Which of the following services would you recommend for the HR department?

    AnswerD: ASR and Azure Backup

    ExplanationAs stated in the requirements, you need to retain backups for three years, so you must use Azure Backup. You also need the ASR service to ensure that the application can run in another datacenter in case of a disaster. You need both Azure Backup and ASR. The other options are not adequate to meet the stated requirements.

  3. Which of the following services would you recommend for the supply-chain department?

    AnswerA: Azure Backup only

    ExplanationAs stated in the requirements, you need to be able to restore from any point in time in the past. So Azure Backup is what you use. Azure Backup automatically creates recovery points when subsequent backups are taken so that you run the restore operations from any point in time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset