Chapter 22. Creating a Fault-Tolerant Environment

Because more and more businesses rely on constant and uninterrupted access to their IT network resources, many technologies have been created to help ensure continuous uptime of servers and applications. Windows Server 2003 is inline with these new technologies to meet the demands of the modern business model that seeks to provide a fault-tolerant network environment where unexpected downtime is a thing of the past. By combining Windows Server 2003 technologies with the appropriate hardware and general best practices, IT organizations can realize both file-level and system-level fault tolerance to maintain a high level of availability for their business-critical applications and network services.

This chapter highlights the features available in Windows Server 2003 that target fault tolerance and provides best practices for their implementation of and application to the IT environment. On the file-system side, in addition to proper disk management and antivirus protection, Windows Server 2003 provides Distributed File System (DFS), Volume Shadow Copy (VSC), and Remote Storage technologies. Related to system-level fault tolerance, Windows Server 2003 includes the Microsoft Cluster Service (MSCS) and Network Load Balancing (NLB) technologies to provide redundancy and failover capabilities.

Optimizing Disk Management for Fault Tolerance

System administrators have long since relied on Redundant Arrays of Inexpensive Disks (RAID) technologies to provide levels of fault tolerance for their server disk resources. And though the technology is a familiar mainstay in server management, its importance should not be overlooked. There are a couple of ways to leverage RAID to optimize disk management in Windows Server 2003. The first is creating RAID disks using disk controller configuration utilities, and the second is creating the RAID disks using dynamic disk configuration from within the Windows Server 2003 operating system.

Hardware-based RAID Solutions

Using two or more disks, different RAID-level arrays can be configured to provide fault tolerance that can withstand disk failures and still provide uninterrupted disk access. Hardware-based RAID is achieved when a separate RAID disk controller is used to configure and manage the disks participating in the RAID array. The RAID controller stores the information on the array configuration, including disk membership and status.

Implementing hardware-level RAID configured and stored on the disk controller is preferred over the software-level RAID configurable within Windows Server 2003 Disk Management because the Disk Management and synchronization processes in hardware-level RAID are offloaded to the RAID controller. With Disk Management and synchronization processes offloaded from the RAID controller, the operating system will perform better overall.

Another reason to provide hardware-level RAID as a best practice is that the configuration of the disks does not depend on the operating system. This gives administrators greater flexibility when it comes to recovering server systems and performing upgrades.

Because there are many hardware-based RAID solutions available, it is important to refer to the manufacturer’s documentation on creating RAID arrays to understand the particular functions and peculiarities of the RAID disk controller in use.

Using Dynamic Disk RAID Configurations

Windows Server 2003 supports two types of disks: basic and dynamic. Basic disks are backward-compatible, meaning that basic partitions can be accessed by previous Microsoft operating systems such as MS-DOS and Windows 95 when formatted using FAT; and when formatted using NTFS, Windows NT, Windows 2000, and Windows .NET Server 2003 can access them.

Dynamic disks are managed by the operating system and provide several configuration options, including software-based RAID sets and the capability to extend volumes across multiple disks. Though there are several configuration options, including spanned and stripped volumes, the only really fault tolerant dynamic disk configurations involve creating mirrored volumes (RAID 1) or RAID 5 volumes as described in the following list:

  • Mirrored Volume (RAID 1). Mirrored volumes require two separate disks, and the space allocated on each disk must be equal. Mirrored sets duplicate data across both disks and can withstand a single disk failure. Because the mirrored volume is an exact replica of the first disk, the space capacity of a mirrored set is limited to half of the total allocated disk space.

  • RAID 5 Volume. Software-based RAID 5 volumes require three or more disks and provide faster read/write disk access than a single disk. The space or volume provided on each disk of the RAID set must be equal. RAID 5 sets can withstand a single disk failure and can continue to provide access to data using only the remaining disks. This capability is achieved by reserving a small portion of each disk’s allocated space to store data parity information that can be used to rebuild a failed disk or to continue to provide data access.

Using the Disk Management MMC

Most disk-related administrative tasks can be performed using the Disk Management MMC snap-in. This tool is located in the Computer Management console, but the standalone snap-in can also be added in a separate Microsoft Management Console window. Disk Management is used to identify disks, define disk volumes, and format the volumes.

New Feature in the Windows Server 2003 Disk Management Console

A new feature in the Windows Server 2003 Disk Management console enables administrators to also manage disks on remote machines.

To use the Disk Manager to create a software-based RAID, the disks that will participate in the array must first be converted to dynamic disks. This is a simple process by which the administrator right-clicks on each disk in question and chooses Convert to Dynamic, as shown in Figure 22.1.

Convert basic disks to dynamic.

Figure 22.1. Convert basic disks to dynamic.

The system will require a reboot to complete if the system volume is being converted to Dynamic. After the disks are converted, perform the following steps to set up a Mirrored volume or RAID 1 of the system volume:

  1. Click Start, All Programs, Administrative Tools, Computer Management.

  2. In the left pane, if it is not already expanded, double-click Computer Management (local).

  3. Click the plus sign next to Storage, and select Disk Management.

  4. In the right pane, right-click the system volume and choose Add Mirror.

  5. Choose the disk on which to create the mirror for the system volume and click Add Mirror.

  6. The volumes on each disk start a synchronization process that might take a few minutes or longer, depending on the size of the system volume and the types of disks being used. When the mirrored volume’s status changes from Re-synching to Healthy, select File, Exit in the Computer Management console to close the window.

Using the Diskpart Command-Line Utility

Diskpart.exe is a flexible command-line disk management utility that performs most of the functions available to it with the Disk Management console. Using diskpart.exe, both basic volumes and dynamic volumes can be extended whereas the Disk Management can only extend dynamic volumes. The real value of using Diskpart.exe is that it can be run with a script to automate volume management. This is particularly useful when automating server builds across several servers that have the same characteristics. For more information on automatic server installations, refer to Chapter 11, “Implementing Windows Server 2003.”

Extend a Basic Volume Using Diskpart.Exe

If you want to extend a basic volume using diskpart.exe, the unallocated disk space must be on the same disk as the original volume and must be contiguous with the volume you are extending. Otherwise, the command will fail.

The syntax for Diskpart.exe is as follows:

Diskpart.exe /s script

The script referenced by the utility is a text file that will include the specific instructions necessary for the desired function. For example, to extend a volume using unallocated space on the same disk, the associated script file would look like this:

Select Volume 2
Extend
Exit

Maximizing Redundancy and Flexibility with Distributed File System

One method for creating low-cost failover and redundancy of file shares is leveraging Microsoft’s Distributed File System (DFS). A feature introduced in Windows NT 4, DFS improves file share availability by providing a single unified namespace to access shared folders hosted across different servers. Because the same data can be synchronized through replication across multiple servers, there is no single point of failure for the access of the data.

Further, because a DFS root can support multiple targets physically distributed across a network, the network load for accessing particular file shares can be load-balanced rather than taxing a single server.

DFS also improves the users’ experience for accessing files because the user needs to remember only a single server or domain name and share name to connect to a DFS-shared folder. Because domain-based DFS, available from Windows 2000, is published in Active Directory, the DFS namespace is always visible to users in the domain. Moreover, if a server hosting a particular share becomes unavailable, DFS will use the site and costing information in Active Directory to route the user to the next closest server.

Finally, because DFS uses NTFS and file sharing permissions, administrators can improve security of data by ensuring only authorized users have access to DFS shares.

The next section explains new DFS features available in Windows Server 2003 and provides best practices for deploying DFS in a Windows Server 2003 network environment.

New DFS Features in Windows Server 2003

Administrators deploying DFS in Windows NT 4, or even in a Windows 2000 Active Directory, often found that the technology promised more than it could deliver. With Windows Server 2003, such problems have been worked out, startup and configuration times have been reduced, memory usage has been improved, and new features have been added.

Closest Site Selection

One such enhancement deals with site costing, which exists in both Windows 2000 and Windows Server 2003. When a client accesses a DFS namespace, DFS will connect the client to a DFS root target in the client’s site. In Windows 2000, if there are no available root targets in the client’s site, the client will randomly connect to another DFS root target in any site.

Intersite Topology Generator (ISTG) Must Be Running

For Closest Site Selection to work on link targets, Intersite Topology Generator (ISTG) must be running on Windows Server 2003. All domain controllers in a domain must be running Windows Server 2003 for Closest Site Selection to work on domain root targets.

With Windows Server 2003, if a root target is not available in the client’s site, it will randomly look for a target in the next closest site, and so on. This feature, called Closest Site Selection, improves upon site costing by automatically connecting the client to the closest possible DFS target.

To enable Closest Site Selection, use the DFSutil.exe command-line tool that is installed with the Windows Server 2003 support tools. The syntax for the command is as follows:

Dfsutil /root:\servername>dfsrootname /sitecosting /enable

Multiple Roots per Server

With Windows 2000 DFS, administrators were limited to creating a single DFS root per server. With Windows Server 2003, a server can contain multiple DFS roots. This new feature provides an immediate server and namespace consolidation opportunity for existing Windows 2000 DFS deployments.

More importantly, Windows Server 2003 provides an opportunity to set up different DFS roots on a single server that each have unique security settings. For companies that want to delegate administration of different DFS roots to particular organizational groups, this can now be accomplished from a single server.

With Windows Server 2003 Enterprise or Datacenter Edition, server clusters can support the multiple DFS roots. Multiple DFS roots can exist in multiple resource groups and each group can be hosted on a different node in the cluster. Microsoft Cluster Service (MSCS) is discussed in a later section of this chapter.

Administration Improvements

Windows Server 2003 provides a new DFS Microsoft Management Console (MMC) snap-in that eases the administration of the File Replication Service (FRS). Replication of DFS targets can now be configured via a wizard that includes a built-in topology generator as shown in Figure 22.2.

Configuring DFS Replication topology.

Figure 22.2. Configuring DFS Replication topology.

DFS and Security

Although DFS in Windows Server 2003 enables delegation of administration for assigning permissions to DFS roots and links, it does not provide any additional security to the actual DFS root or link targets. What this means for administrators is that the permissions will need to be set on the NTFS shares manually to provide proper access to files and folders within DFS targets.

Combining the Functionality of DFS with Software Distribution Via Active Directory Group Policies

When combining the functionality of DFS with software distribution via Active Directory Group Policies, it is important to appropriate NTFS permissions on those shares that contain the software installation packages. If Group Policies are used to push software to computer accounts from DFS shares, make sure those computer accounts have NTFS permission to the file shares.

Moreover, when multiple targets are involved, it is important for administrators to duplicate the NTFS permissions exactly for each additional target. Otherwise, administrators might inadvertently grant users elevated privileges or deny users access completely. To prevent this problem, administrators should create the target file share and configure the share and NTFS permissions manually at the shared folder level before defining the share as a DFS target.

Simplifying Fault Tolerance with Volume Shadow Copy

When a user deletes a file from his workstation and needs to recover it, she can restore that file from the recycle bin (assuming the recycle bin has not been emptied). In a traditional networking environment, when that same user deletes a file on a network share, that file is gone. To restore the file, a call will be made to a help desk, an administrator will need to load a backup tape, and a restore process will ensue. This typical routine can consume a great deal of time and effort. Further, if the file in question is a business critical database, the company might experience cost impacting downtime waiting for the file to be restored.

Windows Server 2003 provides a solution to this downtime scenario with the Volume Shadow Copy Service (VSS). VSS is a new technology that provides file system–based fault tolerance that does not rely on the typical backup-restore routine. VSS is used to perform a point-in-time backup of an entire NTFS volume, including open files, to a local or remote disk. The process is completed in a very short period of time but is powerful enough to be used to restore an entire volume, if necessary. VSS can be scheduled to automatically back up a volume once, twice, or several times a day.

Configuring Volume Shadow Copies

If shadow copies of a server’s volume are created on a separate local disk, administrators can avoid having to restore data from a backup tape or library. Volume Shadow Copy is already installed and is automatically available using NTFS-formatted volumes.

To enable and configure shadow copies, follow these steps:

  1. Log on to the desired server using an account with Local Administrator access.

  2. Click Start, All Programs, Administrative Tools, Computer Management.

  3. In the left pane, if it is not already expanded, double-click Computer Management (local).

  4. Click the plus sign next to Storage.

  5. Select Disk Management.

  6. Right-click Disk Management, select All Tasks, and select Configure Shadow Copies.

  7. On the Shadow Copies page, select a single volume for which you want to enable shadow copies and click Settings.

  8. The Settings page enables you to choose an alternative volume to store the shadow copies. Select the desired volume for the shadow copy, as shown in Figure 22.4.

    Selecting an alternative drive to store the shadow copies.

    Figure 22.4. Selecting an alternative drive to store the shadow copies.

  9. Configure the maximum amount of disk space that will be allocated to shadow copies.

  10. The default schedule for shadow copies is twice a day at 7 a.m. and 12 p.m. If this does not meet the business requirements, click the Schedule button and configure a custom schedule.

  11. Click OK to enable shadow copies on that volume and to return to the Shadow Copies page.

  12. If necessary, select the next volume and enable shadow copying; otherwise, select the enabled volume and immediately create a shadow copy by clicking the Create Now button.

  13. If necessary, select the next volume and immediately create a shadow copy by clicking the Create Now button.

  14. After the shadow copies are created, click OK to close the Shadow Copies page, close the Computer Management console, and log off the server.

Restoring Data from a Shadow Copy

The server administrator or a standard user who has been granted permissions can recover data using previously created shadow copies. The files stored in the shadow copy cannot be accessed directly, but they can be accessed by connecting the volume that has had a shadow copy created.

To recover data from a file share, follow these steps:

  1. Log on to a Windows .NET Server 2003 system or Windows XP SP1 workstation with either Administrator rights or with a user account that has permissions to restore the files from the shadow copy.

  2. Click Start, Run.

  3. At the Run prompt, type \servernamesharename, where servername represents the NetBIOS or fully qualified domain name of the server hosting the file share. The share must exist on a volume in which a shadow copy has already been created.

  4. In the File and Folder Tasks window, select View Previous Versions, as shown in Figure 22.5.

    Using shadow copies to view previous file versions.

    Figure 22.5. Using shadow copies to view previous file versions.

  5. When the window opens to the Previous Versions property page for the share, select the shadow copy from which you want to restore and click View.

  6. An Explorer window then opens, displaying the contents of the share when the shadow copy was made. If you want to restore only a single file, locate the file, right-click it, and select Copy.

  7. Close the Explorer window.

  8. Close the Share Property pages by clicking OK at the bottom of the window.

  9. Back in the actual file share window, browse to the original location of the file, right-click on a blank spot in the window, and select Paste.

  10. Close the file share window.

Optimizing Disk Utilization with Remote Storage

Another fault tolerance technique devised in Windows Server 2003 to protect the file system is Remote Storage. When it is installed and configured, Remote Storage has the ability to migrate eligible files from an NTFS volume to a library of magnetic or optical tapes, thus freeing up space on the production server’s managed volume. The eligibility is determined in the configuration of Remote Storage based on certain criteria: the percentage of free space on the volume, the size of the files, and a time period over which the files have not been accessed.

When Remote Storage migrates a file or folder, it is replaced on the volume with a file link called a junction point. Junction points take up very little room, which reduces the amount of used disk space but leaves a way for this data to be accessed later in the original location. When a junction point is accessed, it spawns the Remote Storage service to retrieve the file that was migrated to tape.

Not Available on Windows XP or the Standard and Web Editions of Windows Server 2003

Remote Storage functionality is only available in Windows Server 2003 Enterprise and Datacenter editions. It is not available on Windows XP or the standard and Web editions of Windows Server 2003.

The next section explains how Remote Storage is configured and provides best practices for its use.

Configuring Remote Storage

Remote Storage is not installed by default in Windows Server 2003, but is easily added from the install media through the familiar Add/Remove Windows Components section of the Add or Remove Programs applet. Once installed, the administrator must configure the backup device that will be used, allocate backup media, and then configure the settings Remote Storage will use to determine whether files should be migrated to the media.

Remote Storage Supports All SCSI Class 4mm, 8mm, DLT, and Magneto-Optical Devices

Remote Storage supports all SCSI class 4mm, 8mm, DLT, and magneto-optical devices that are supported by Removable Storage. Using Remote Storage with Exabyte 8200 tape libraries is not recommended. Remote Storage does not support QIC tape libraries or rewritable compact disc and DVD formats.

Configuring the Backup Device

Ideally the backup device to be used in conjunction will be a tape library, so that file retrieval from junction points can occur automatically.

To enable a device, follow these steps:

  1. Install the backup device or library on the Windows Server 2003 system. Use the backup device manufacturer’s documentation to accomplish this process.

  2. After the backup device is connected, boot up the server and log on using an account with Local Administrator access.

  3. Click Start, All Programs, Administrative Tools, Computer Management.

  4. In the left pane, if it is not already expanded, double-click Computer Management (local).

  5. Click the plus sign next to Storage.

  6. Click the plus sign next to Removable Storage.

  7. Click the plus sign next to Libraries.

  8. Right-click the library (backup device) and select Properties.

  9. On the General tab of the Device Properties page, check the Enable Drive box, and click OK.

Allocating Media for Remote Storage

After the backup device is configured, tape media needs to be allocated for Remote Storage usage. New, unused media inserted into the device is placed in the free media pool. Previously used media will be placed in the import, unrecognized, or backup media pools. Remote Storage uses the Remote Storage media pool, but will look in the free media pool if it does not find available media in Remote Storage.

Specify the Type

Remote Storage can support only a single tape or disk type for use as Remote Storage. Specify the type during the Remote Storage Setup Wizard process.

To inventory a backup device and allocate media for Remote Storage, follow these steps:

  1. Locate the desired device, as outlined in the preceding section. Then right-click the device and choose Inventory.

  2. After the device completes the inventory process, select the backup device in the left pane. The media will then be listed in the right pane.

  3. Right-click the media listed in the right pane and select Properties.

  4. On the Media tab of the Media Properties page, note the media pool membership in the Location section. Figure 22.6 shows media that are part of the ImportDLT media pool.

    Removable media in the ImportDLT media pool.

    Figure 22.6. Removable media in the ImportDLT media pool.

  5. Click Cancel to close the Media Properties page.

Configuring Remote Storage Settings

After the backup device and media are properly configured and allocated, a volume can be managed by configuring Remote Storage settings. To configure a managed volume, follow these steps:

  1. Click Start, All Programs, Administrative Tools, Remote Storage.

  2. If this is the first time the Remote Storage console has been opened or no volumes on the server have been configured for Remote Storage management, the Remote Storage Wizard will begin. Click Next on the Welcome screen to continue.

  3. On the Volume Management page, choose whether to manage all volumes or manage only selected volumes by selecting the appropriate radio button. In this example, select Manage Selected Volumes, and click Next.

  4. Select the volume to manage and click Next.

  5. On the Volume Settings page, enter the amount of free space for the managed volume.

  6. On the same page, configure the minimum file size before it will be migrated by Remote Storage; then configure the number of days a file must remain unaccessed before Remote Storage will make it a possible candidate for migration, and then click Next.

    Figure 22.7 shows a volume setting that will migrate data to Remote Storage when a volume has 10% free space remaining, and the file that will be migrated must be larger than 12KB and must remain unaccessed for 120 days.

    Setting typical Remote Storage volume settings.

    Figure 22.7. Setting typical Remote Storage volume settings.

  7. On the Media Type page, choose the media type associated with the backup device enabled for Remote Storage to use. Choose a media type from the Media Types pull-down menu.

  8. On the next page, you can configure a schedule to perform the file copy. The default is to run at 2 a.m. seven days a week. Click the Change Schedule button to configure a custom schedule or click Next to accept the default schedule.

  9. Click Finish on the Completing the Remote Storage Wizard page to complete the process.

Optimizing Clusters to Simplify Administrative Overhead

Microsoft Cluster Service (MSCS) is included with the Enterprise and Datacenter versions of Windows Server 2003. MSCS provides system fault tolerance through a process called failover. When a system fails or is unable to respond to client requests, the clustered services are taken offline and moved from the failed server to another available server, where they are brought online and begin responding to existing and new client requests.

Cluster Support

Windows Server 2003, Enterprise Edition supports clusters up to four nodes. Datacenter Edition supports up to eight nodes. A cluster cannot be made up of nodes running both Windows Server 2003, Enterprise Edition, and Windows Server 2003, Datacenter Edition.

MSCS is best used to provide fault tolerance to such resources as file shares, print queues, e-mail or database services, and back-end applications. Applications and other services defined and monitored by the cluster, in addition to cluster hardware, are called cluster resources.

Choosing the Best Cluster Configuration Model

MSCS can be deployed in one of three different configuration models: single-quorum device cluster, single-node cluster, and majority node set cluster. Choosing the best model depends on the type of service that will be clustered and the type of fault tolerance intended.

The Single-Quorum Device Cluster

The most common model adopted for clustering is the Single-Quorum Device Cluster. The defining characteristic of this model is that each node in the cluster is connected to a shared storage device that houses a single instance of the quorum, or cluster configuration, data.

This configuration is appropriately suited to providing fault tolerance to applications and services that access large amounts of mission-critical data. Examples would be file, messaging, and database servers. When the cluster encounters a problem on a cluster group containing a shared storage disk resource, the group is failed over to the next node with little or no noticeable disruption to the end user.

The Single-Node Cluster

A single-node cluster, as its name suggests, utilizes only a single node. In addition to running solely on local disks, a single-node cluster has the ability to use shared storage. A single-node cluster is primarily created as a first step to creating a single-quorum cluster. Because only a single server with local resources is needed, single-node clusters are also beneficial for application development for testing cluster applications.

Because the single-node cluster only contains one node, there is no failover when the server goes down.

The Majority Node Set Cluster

The Majority Node Set (MNS) cluster can use shared storage devices but it does not depend on the shared resource for configuration data as does the single-quorum cluster. Each node in an MNS cluster maintains a local copy of the quorum device data. As such, MNS clusters can be deployed across a WAN in a geographically distributed environment to provide fault tolerance to two distinct sites in an IT organization.

In situations where the cluster needs to failover across sites, the two sites need to be either bridged or a virtual private network (VPN) or Network Address Translation (NAT) must be installed and configured for proper recovery to occur. The latency between the cluster nodes for private communication must not exceed 500 milliseconds; otherwise, the cluster will go into a failed state.

For an MNS cluster to remain up and running, more than half of the nodes in the cluster must be operational. For example, in a four-node cluster, three nodes must be operational; a three-node cluster requires two operational nodes.

Installing Microsoft Cluster Service

The Cluster Service is installed by default in Enterprise and Datacenter editions of the operating system. Both the GUI-based Cluster Administrator and the command-line Cluster.exe utility can be used to create and manage clusters. In any event, Microsoft recommends that the Manage Your Server and the Server Configuration Wizard not be used to configure cluster nodes.

To install the first node in the cluster using the Cluster Administrator, perform the following steps:

  1. Shut down both the cluster nodes and shared storage devices.

  2. Connect cables as required between the cluster nodes and shared storage devices.

  3. Connect each node’s NICs to a network switch or hub using appropriate network cables.

  4. If a shared storage device is being used, power on the shared storage device and wait for bootup to complete.

  5. Boot up the first node in the cluster. If a shared disk will be used, configure the adapter card’s ID on each cluster node to a different number. For example, use ID 6 for node 1 and ID 7 for node 2.

  6. Log on with an account that has Local Administrator privileges.

  7. If server is not a member of a domain, add the server to the correct domain and reboot as necessary.

  8. Configure each network card in the node with the correct network IP address information. Network cards that will be used only for private communication should have only an IP address and subnet mask configured. Default Gateway, DNS, and WINS servers should not be configured. Also, uncheck the Register This Connection’s Address in DNS box, as shown in Figure 22.8, on the DNS tab of the Advanced TCP/IP Settings page. For network cards that will support public or mixed networks, configure all TCP/IP settings as they would normally be configured.

    TCP/IP DNS configuration settings.

    Figure 22.8. TCP/IP DNS configuration settings.

  9. Log on to the server using an account that has Local Administrator privileges.

  10. Click Start, Administrative Tools, Cluster Administrator.

  11. When the Cluster Administrator opens, choose Create New Cluster Action, and click OK.

  12. Click Next on the New Server Cluster Wizard Welcome screen to continue.

  13. Choose the correct domain from the Domain pull-down menu.

  14. Type the cluster name in the Cluster Name text box and click Next to continue.

    Cluster Service Account

    The Cluster Service account needs to be only a regular domain user, but specifying this account as the Cluster Service gives this account Local Administrator privileges on the cluster node and also delegates a few user rights, including the ability to act as a part of the operating system and add computers to the domain.

  15. Type the name of the cluster node and click Next to continue. The wizard defaults to the local server, but clusters can be configured remotely. The cluster analyzer analyzes the node for functionality and cluster requirements. A detailed log containing any errors or warnings that can stop or limit the installation of the Cluster server is generated.

  16. Review the log and make changes as necessary; then click Re-analyze or click Next to continue.

  17. Enter the cluster IP address and click Next.

  18. Enter the Cluster Service account name and password and choose the correct domain. Click Next to continue.

  19. On the Proposed Cluster Configuration page, review the configuration and choose the correct quorum type by clicking the Quorum button, as shown in Figure 22.9.

    Choosing the cluster quorum configuration.

    Figure 22.9. Choosing the cluster quorum configuration.

    • To create an MNS cluster, click the Quorum button on the Proposed Cluster Configuration page, choose Majority Node Set, and click OK.

    • If a SAN is connected to the cluster node, the Cluster Administrator will automatically choose the smallest basic NTFS volume on the shared storage device. Make sure the correct disk has been chosen and click OK.

    • To configure a single node cluster with no shared storage, choose the Local Quorum resource and click OK.

  20. Click Next to complete the cluster installation.

  21. After the cluster is created, click Next and then Finish to close the New Server Cluster Wizard and return to the Cluster Administrator.

After the cluster is created on the first node, additional nodes can be added. To add a node to a cluster, perform the following steps:

  1. Log on to the desired cluster node using an account that has Local Administrator privileges.

  2. Click Start, Administrative Tools, Cluster Administrator.

  3. When the Cluster Administrator opens, choose Add Nodes to a Cluster and type the name of the cluster in the Cluster Name text box. Click OK to continue.

  4. When the Add Nodes Wizard appears, click Next to continue.

  5. Type in the server name of the next node and click Add.

  6. Repeat the preceding steps until all the additional nodes are in the Selected Computer text box. Click Next to continue. The cluster analyzer will then analyze the additional nodes for functionality and cluster requirements.

  7. Review the log and make changes as necessary; then click Re-analyze or click Next to continue.

  8. Enter the Cluster Service account password and click Next to continue.

  9. Review the configuration on the Proposed Cluster Configuration page and click Next to configure the cluster. After this is finished, click Next and then Finish to complete the additional node installation.

  10. Select File, Close to exit the Cluster Administrator.

Configuring Failover and Failback

Although failover is configured automatically on clusters of two or more nodes, failback needs to be configured manually. Failback is designed to allow a preferred server, assuming it is available, to always run a cluster group. Failover functionality can be configured manually as well to set a threshold number of failovers after which the cluster group is changed to a failed state.

Creating a failover/failback process will automate server cluster functionality. To create a failover/failback process, perform the following steps:

  1. Click Start, Administrative Tools, Cluster Administrator.

  2. When the Cluster Administrator opens, choose Open Connection to Cluster and type the name of the cluster in the Cluster Name text box. Click OK to continue. If the local machine is part of the cluster, enter . (period) as the cluster name, and the program will connect to the cluster running on the local machine.

  3. Right-click the appropriate cluster group and select Properties.

  4. Select the Failover tab and set the maximum number of failovers allowed during a predefined period of time. When the number of failovers is exceeded within the Period interval, the Cluster Service will change the group to a failed state.

  5. Select the Failback tab, choose the Allow Failback radio button, and set time options for allowing failback.

  6. Click Next and then Finish to complete the failback configuration.

  7. Select File, Close to exit the Cluster Administrator.

Leveraging Network Load Balancing for Improved Availability

Another method used to provide fault tolerance to system services in Windows Server 2003 is through Microsoft’s second clustering technology, Network Load Balancing (NLB). An NLB cluster works by distributing the network traffic targeted across a cluster of host servers each running the clustered service. The load weight to be handled by each host can be configured as necessary. Hosts can be added dynamically to the cluster to handle increased load. Additionally, Network Load Balancing can direct all traffic to a designated single host, called the default host. Network Load Balancing allows all the computers in the cluster to be addressed by the same set of cluster IP addresses (but also maintains their existing unique, dedicated IP addresses).

Whereas MSCS is intended primarily for clustering services with dynamic content such as database, e-mail, and file and print services, NLB is best used in clustering services that provide static content. Good candidates for NLB would be Terminal services, VPN, proxy services, Web server applications, and streaming media services.

Choosing a Network Load Balancing Model

When an NLB cluster is created, a general port rule for the cluster is also created to define the type of network traffic that will be load-balanced. Additionally, the administrator will need to choose an operational mode, either unicast or multicast, for the cluster. Within the port rule, three types of filtering modes are available: Single Host, Disable Port Range, and Multiple Host. The combination of the operational mode with a filtering mode defines the NLB model for the cluster.

Most NLB clusters will leverage the unicast operational mode unless the functionality delivered by the cluster is specifically multicast services such as steaming media, Internet radio, or Internet training courses. NLB does not support a mixed unicast/multicast environment within a single cluster. Within each cluster, all network adapters in that cluster must be either multicast or unicast.

The filtering modes are defined as follows:

  • Single Host. This filtering mode directs the specified network traffic to a single host. For example, in an IIS Web farm in which only one server contains the SSL certificate for a secure Web site, the single host port rule will direct port TCP 443 (SSL port) traffic to that particular server.

  • Disable Port Range. This filtering mode specifies ports that the cluster will not listen on, dropping such packets without investigation. Disabling ports that do not need to be load balanced secures and enhances the performance of NLB clusters.

  • Multiple Host Range. The default filtering mode allows network traffic to be handled by all the nodes in the cluster. Application requirements will then determine the multiple host affinity configuration.

There are three types of multiple host affinities:

  • None. This affinity type can send a unique client’s requests to all the servers in the cluster during the session. This can speed up server response times but is well suited only for serving static data to clients. This affinity type works well for general Web browsing and read-only file and FTP servers.

  • Class C. This affinity type routes traffic from a particular class C address space to a single NLB cluster node. This mode is not used too often but can accommodate client sessions that do require stateful data. This affinity does not work well if all the client requests are proxied through a single firewall.

  • Single. This affinity type is the most widely used. After the initial request is received by the cluster nodes from a particular client, that node will handle every request from that client until the session is completed. This affinity type can accommodate sessions that require stateful data.

Creating a Network Load Balancing Cluster

The Network Load Balancing Manager is a new tool in Windows Server 2003 that is used for creating and managing NLB clusters. Administrators still have the ability to configure NLB clusters through the network interface card properties page, or through the NLB.EXE command-line utility, though the preferred method is through the NLB Manager. The NLB Manager also simplifies the process by which additional nodes are added to the cluster.

To create a cluster, perform the following steps:

  1. Log on to the local console of a cluster node using an account with Local Administrator privileges.

  2. Click Start, All Programs, Administrative Tools, Network Load Balancing Manager.

  3. Choose Cluster, New.

  4. Enter the cluster IP address and subnet mask of the new cluster.

  5. Enter the fully qualified domain name for the cluster in the Full Internet Name text box.

  6. Enter the mode of operation (unicast will meet most of your NLB application deployments).

  7. Configure a remote control password to use the command-line utility NLB.exe to manage the NLB cluster remotely and click Next to continue.

  8. Enter any additional IP addresses that will be load-balanced and click Next to continue.

  9. Configure the appropriate port rules for each IP address in the cluster, being careful to set the correct affinity for the load-balanced applications.

  10. After creating all the allowed port rules, create disabled port rules to reduce network overhead for the cluster nodes. Be sure to have a port rule for every possible port and click Next on the Port Rules page after all port rules have been created. Figure 22.10 shows a best practice port rule for an NLB Terminal server implementation.

    Port rule settings for NLB configuration.

    Figure 22.10. Port rule settings for NLB configuration.

  11. On the Connect page, type the name of the server you want to add to the cluster in the Host text box and click Connect.

  12. In the Interface Available window, select the NIC that will host the cluster IP address and click Next to continue.

  13. On the Host Parameters page, set the cluster node priority. Each node requires a unique host priority, and because this is the first node in the cluster, leave the default of 1.

  14. If the node will perform non–cluster-related network tasks in the same NIC, enter the dedicated IP address and subnet mask. The default is the IP address already bound on the network card.

  15. For nodes that will join the cluster immediately following the cluster creation and after startup, leave the initial host state to Started. When maintenance is necessary, change the default state of a particular cluster node to Stopped or Suspended to keep the server from joining the cluster following a reboot.

  16. After all the information is entered on the Host Parameters page, click Finish to create the cluster.

  17. When the cluster is ready for release to the production environment, add the HOST or a record of the new cluster to the DNS domain table.

Use the Network Load Balancing Manager to add nodes to the existing cluster by performing the following steps:

  1. Click Start, All Programs, Administrative Tools and right-click Network Load Balancing Manager.

  2. Choose the Run-as option and specify an account that has Administrative permissions on the cluster.

  3. Choose Cluster, Connect to Existing.

  4. In the Host text box, type the IP address or name of the cluster and click Connect.

  5. From the Clusters window, select the cluster and click Finish to connect.

  6. In the right pane, right-click the cluster name and choose Add Host to Cluster, as shown in Figure 22.11.

    Choosing to add a host to the cluster.

    Figure 22.11. Choosing to add a host to the cluster.

  7. On the Connect page, type the name of the server you want to add to the cluster in the Host text box and click Connect.

  8. In the Interface Available window, select the NIC that will host the cluster IP address and click Next to continue.

  9. On the Host Parameters page, set the cluster node priority. Each node requires a unique host priority, and because this is the first node in the cluster, leave the default of 1.

  10. If the node will perform non-cluster–related network tasks in the same NIC, enter the dedicated IP address and subnet mask. The default is the IP address already bound on the network card.

  11. For nodes that will join the cluster immediately following the cluster creation and after startup, leave the initial host state to Started. When maintenance is necessary, the default state of a particular cluster node can be changed to Stopped or Suspended to keep the server from joining the cluster following a reboot.

  12. After all the information is entered in the Host Parameters page, click Finish to add the node to the cluster.

Realizing Rapid Recovery Using Automated System Recovery (ASR)

Another new feature of Windows Server 2003 is Automated System Recovery (ASR). ASR is more a recovery feature than a fault-tolerance tool, although in an effort to increase server availability in the event of a disaster, ASR can be a valuable component to the overall solution.

The primary goal of ASR is to accelerate recovery time in the event of the loss of a server by bringing a nonbootable system to a state from which a backup and restore application can be executed. This includes configuring the physical storage to its original state, and installing the operating system with all the original settings.

Improving the Disaster Recovery Process

Prior to Windows Server 2003, the process by which a lost server is rebuilt and recovered was a time-consuming ordeal. The old methods usually resembled the following process:

  1. The administrator gets new hardware.

  2. Windows is reinstalled from installation media.

  3. Physical storage is manually configured to match original system.

  4. Backup and restore application and drivers are installed.

  5. The original operating system is manually restored to restore settings.

  6. The server is rebooted, and services are manually adjusted.

  7. Data is restored.

With ASR, many of the steps in the old model are eliminated or automated. The new recovery method now proceeds as follows:

  1. The administrator gets new hardware.

  2. From the Windows CD, the administrator executes ASR (by pressing F2 on startup).

  3. The administrator inserts other media when prompted.

  4. Data is restored.

ASR is broken down into two parts: backup and restore. The backup portion is executed through the Automated System Recovery Preparation Wizard located in Backup. The Automated System Recovery Preparation Wizard backs up the System State data, system services, and all disks associated with the operating system components. It also creates a floppy disk, which contains information about the backup, the disk configurations (including basic and dynamic volumes), and how to accomplish a restore.

The restore portion of ASR is initiated by pressing F2 during the text portion of Windows Server 2003 setup. When the ASR restore process is initiated, ASR reads the disk configurations from the floppy disk and restores all the disk signatures, volumes, and partitions on the disks required to start your computer. ASR then installs a simple installation of Windows and automatically starts to restore from backup using the backup ASR set created by the Automated System Recovery Preparation Wizard.

A Full Data Backup

ASR is primarily involved with restoring the system; it does not back up data. Always include a full data backup in disaster recovery solutions.

To take advantage of ASR in a disaster recovery solution, systems must meet a limited set of requirements:

  • Similar hardware. The restored server must have identical hardware to the original server with the exception of network cards, video cards, and hard drives.

  • Adequate disk space. Obviously, the restored server must have adequate disk space to restore all critical disks from the original server. Disk geometries must also be compatible.

  • ASR state file (asr.sif) must be accessible from a floppy. ASR requires a local floppy drive access. Remote or network recovery procedures do not work with ASR.

  • ASR supports FAT volumes of 2.1GB maximum. For volumes larger than 2.1GB, the volume should be formatted with NTFS.

Using ASR to Recover Cluster Services

ASR can be used to recover a cluster node that is damaged because of corrupt or missing system files, cluster registry files, or hard disk failure. To prepare for an ASR recovery of clustered servers, run the Automated System Recovery Preparation Wizard on all nodes of the cluster and make sure that the cluster service is running when the Automated System Recovery backup is run. Make sure that one of the nodes on which the Automated System Recovery Preparation Wizard is run is listed as the owner of the quorum resource while the wizard is running.

In addition to having the ASR disk, recovering a damaged node in a cluster requires the Windows Server 2003 installation media, backup media containing data backup, and potentially the mass storage driver for the new hardware. With these in hand, perform the following steps to recover a damaged cluster node:

  1. Insert the original operating system installation CD into the CD drive of the damaged cluster node.

  2. Restart the computer. If prompted to press a key to start the computer from CD, press the appropriate key.

  3. If there is a separate driver file for the mass storage device, press F6 when prompted to use the driver as part of setup.

  4. Press F2 when prompted during the text-only mode section of Setup. This will generate a prompt for the ASR disk.

  5. Follow the directions on the screen.

  6. If there is a separate driver file for the mass storage device, press F6 (a second time) when prompted after the system reboots.

  7. Follow the directions on the screen.

  8. After all the restore steps have completed, the restored node can rejoin the cluster.

Restoring a Disk Signature to a Damaged Cluster Disk

If you are restoring a disk signature to a damaged cluster disk, power down all other cluster nodes except the one on which you are performing the ASR restore. This cluster node must have exclusive rights to the damaged cluster disk.

Summary

As this chapter has demonstrated, there are many ways to add fault tolerance to network services and resources running on Windows Server 2003 servers. Moreover, each of the features discussed in this chapter is included with the installation of the operating system: no additional licensing fees for third-party software are required to add redundancy and increase availability to Windows Server 2003. Depending on the type of fault tolerance required in an organization’s Service Level Agreements, there might be an increased investment in hardware. However, with server consolidation opportunities available with Windows Server 2003, organizations might find that they have freed up hardware that can be re-assigned to participate in fault tolerance solutions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset