Using Server Cluster

Server cluster is implemented using the Microsoft Cluster service and is used to provide failover support for applications and services. A server cluster can consist of up to eight nodes. Each node is attached to one or more cluster storage devices. Cluster storage devices allow different servers to share the same data and thus, by reading this data, provide failover for resources. You can use shared Small Computer System Interface (SCSI) or fibre channel devices. The preferred technique is fibre channel, and it is recommended when you have three or more nodes. For server clusters running 64-bit editions of Windows Server 2003, fibre channel is the only technique that should be used.

Server Cluster Configurations

Server clusters can be set up using many different configurations. Servers can be either active or passive, and different servers can be configured to take over the failed resources of another server. Failover can take several minutes, depending on the configuration and the application being used, but is designed to be transparent to the user.

When a node is active, it makes its resources available. Clients access these resources through dedicated virtual servers. The Cluster service uses the concept of virtual servers to specify groups of resources that fail over together. Thus, when a server fails, the group of resources configured on that server for clustering fail over to another server. The server that handles the failover should be configured for the extra capacity needed to handle the additional workload. When the failed server comes back online, the Cluster service can be configured to allow failback to the original server or to allow the current server to continue to process requests.

Windows Server 2003 supports three basic types of server clusters:

  • Single-node clusters

  • Single quorum device multinode clusters

  • Majority node clusters

Figure 18-10 shows an example of a single-node cluster. A single-node cluster doesn't make use of failover but does provide easier administration for sharing resources and network storage. The main advantage of a single-node cluster is that the Cluster service monitors and automatically restarts applications and dependent resources that fail or freeze. A single-node cluster could work with file, print, or Web shares when the primary concern is to make it easy for users to access resources, but it isn't practical otherwise. Single-node clusters are also useful for test and development purposes, allowing you to develop cluster-aware applications and test them using limited hardware.

A single-node server cluster

Figure 18-10. A single-node server cluster

To get the full benefit of clustering, administrators must implement a multinode cluster. The key multinode cluster models are active/passive and active/active. In an active/passive configuration, one or more nodes are actively processing user, application, and system requests, while one or more other nodes are idle. The nodes processing requests are referred to as active, or primary, nodes. The idle nodes are referred to as standby, or passive, nodes. The passive nodes are ready to be used when a failover occurs on a primary node. By contrast, in an active/active configuration all nodes actively process user, application, and system requests and there are no standby nodes. Then when an active node fails the other primary nodes temporarily take up the slack until the failed node can be restarted.

Figure 18-11 shows a multinode cluster with a single quorum device configuration. The nodes are configured so that every node is attached to one or more cluster storage devices that all nodes share and the cluster configuration data is stored on a single cluster storage device called the quorum device.

Server cluster using shared storage

Figure 18-11. Server cluster using shared storage

The final type of server cluster is the majority node set, which is primarily used in larger end-to-end solutions developed by original equipment manufacturers and independent hardware vendors. In a majority node cluster configuration, nodes don't have to be connected to shared storage devices. Each node can have its own storage device. The cluster configuration data is stored on multiple disks across the cluster. This allows each node to have a local quorum device.

As discussed previously, majority node clusters are often used with geographically separated servers. Primarily, this is because each node can have its own storage and its own copy of the cluster configuration data. Geographic separation isn't a requirement, however. The servers could just as easily be in the same location.

Server Cluster Resource Groups

A resource group is a unit of failover. Resources that are related or dependent on each other are associated through resource groups. All resources grouped together must be from the same node. If any of the resources in a group fails, all the resources fail over together according to the failover policy defined for the group. When the cause of a failure is resolved, the group fails back to its original location based on the failback policy of the group.

Note

Only applications that need high availability should be part of a resource group. Other applications can run on a cluster server but don't need to be a part of a resource group.

Before adding an application to a resource group, you must determine whether the application can work within the cluster environment. Applications that can work within the cluster environment and support cluster events are called cluster-aware. Cluster-aware applications can register with the Server cluster to receive status and notification information. Applications and services that are cluster-aware include the following:

  • Distributed File System (DFS)

  • DHCP

  • Exchange Server

  • File shares

  • IIS

  • Microsoft Distributed Transaction Coordinator (MS DTC)

  • Microsoft Message Queuing (MSMQ)

  • Network News Transfer Protocol (NNTP)

  • Print spooler

  • Simple Mail Transfer Protocol (SMTP)

  • SQL Server

  • Windows Internet Naming Service (WINS)

Generic applications and services can also be cluster-aware. Check with the software vendor to determine compatibility with the Cluster service.

Applications that do not support cluster events are called cluster-unaware. Some clusterunaware applications can be assigned to resource groups and can be failed over. The following provisions apply:

  • IP-based protocols are used for cluster communications. The application must use an IP-based protocol for its network communications. Applications cannot use NetBIOS Extended User Interface (NetBEUI), Internetwork Packet Exchange (IPX), AppleTalk, or other protocols to communicate.

  • Nodes in the cluster access application data through shared storage devices. If the application isn't able to store its data in a configurable location, the application data won't be available on failover.

  • Client applications experience a temporary loss of network connectivity when failover occurs. If client applications cannot retry and recover from this, they will cease to function normally.

Applications that meet these criteria can be assigned to resource groups.

Optimizing Hardware for Server Clusters

After determining which applications and services need high availability and which don't, administrators should focus on selecting the right hardware to meet the needs of the business system. A cluster model should be chosen to adequately support resource failover and the availability needs of the system. Based on the model chosen, excess capacity should be added to ensure resources are available in the event a resource fails and failover to a server substantially increases the workload.

The configuration of the hardware should be adjusted to maximize total throughput and optimize performance for the types of applications and services that will experience the greatest demand. Different servers have different optimization needs. A Web server with static HTML pages might need fast hard disk drives and additional RAM to cache files in memory but typically doesn't need high-end CPUs. A typical database server needs high-end CPUs, fast hard disk drives, and additional RAM.

Administrators should carefully optimize performance of each server in the cluster node. A key area where optimization can have huge benefits is with paging files. Key rules for paging files are as follows:

  • Paging files should have a fixed size to prevent excess paging and shouldn't be located on the shared cluster storage device.

  • Whenever more than 4 GB of RAM is installed, the paging file size should be reduced. Try setting it to 2060 megabytes (MB) to ensure effective use of disk space.

  • If multiple local drives are available, place the paging file on separate drives to improve performance.

With a clustered SQL Server configuration, you should consider using high-end CPUs, fast hard disk drives, and additional memory. SQL Server 2000 and standard services together use over 100 MB of memory as a baseline. User connections consume about 24 kilobytes (KB) each. Although the minimum memory for query execution is 1 MB of RAM, the average query can require 2 to 4 MB of RAM. Other SQL Server processes use memory as well.

Cluster storage devices should be optimized based on performance and availability needs. Table 18-1 provides an overview of common hardware RAID configurations for clusters. The table entries are organized listing the highest RAID level to the lowest.

Table 18-1. Hardware RAID Configurations for Clusters

RAID Level

RAID Type

RAID Description

Advantages and Disadvantages

5+1

Disk striping with parity + mirroring

Uses at least six volumes, each on a separate drive. Each volume is configured identically as a mirrored stripe set with parity error checking.

Provides very high level of fault tolerance but has a lot of overhead.

5

Disk striping with parity

Uses at least three volumes, each on a separate drive. Each volume is configured as a stripe set with parity error checking. In the case of failure, data can be recovered.

Provides fault tolerance with less overhead than mirroring. Better read performance than disk mirroring.

1

Disk mirroring

Uses two volumes on two drives. The drives are configured identically and data is written to both drives. If one drive fails, there is no data loss because the other drive contains the data. This approach does not include disk striping.

Provides redundancy with better write performance than disk striping with parity.

0+1

Disk striping with mirroring

Uses two or more volumes, each on a separate drive. The volumes are striped and mirrored. Data is written sequentially to drives that are identically configured.

Provides redundancy with good read and write performance.

0

Disk striping

Uses two or more volumes, each on a separate drive. Volumes are configured as a stripe set. Data is broken into blocks, called stripes, and then written sequentially to all drives in the stripe set.

Provides speed and performance without data protection.

Optimizing Networking for Server Clusters

The network configuration of the cluster can also be optimized. All nodes in a cluster must be a part of the same domain and can be configured as domain controllers or member servers. Ideally, multinode clusters have at least two nodes that act as domain controllers and provide failover for critical domain services. If this isn't the case, the availability of cluster resources might be tied to the availability of the controllers in the domain.

Typically, nodes in a cluster are configured with both private and public network addresses. Private network addresses are used for node-to-node communications, and public network addresses are used for client-to-cluster communications. However, some clusters might not need public network addresses and instead can be configured to use two private networks. Here, the first private network is for node-to-node communications and the second private network is for communicating with other servers that are a part of the service offering.

Increasingly, clustered servers and storage devices are connected over SANs. SANs use highperformance interconnections between secure servers and storage devices to deliver higher bandwidth and lower latency than comparable traditional networks. Enterprise Edition and Datacenter Edition implement a feature called Winsock Direct that allows direct communication over a SAN using SAN providers.

SAN providers have user-mode access to hardware transports. When communicating directly at the hardware level, the individual transport endpoints can be mapped directly into the address space of application processes running in user mode. This allows applications to pass messaging requests directly to the SAN hardware interface, which eliminates unnecessary system calls and data copying.

SANs typically use two transfer modes. One mode is for small transfers, which primarily consist of transfer control information. For large transfers, SANs can use a bulk mode whereby data is transferred directly between the local system and the remote system by the SAN hardware interface without CPU involvement on the local or remote system. All bulk transfers are prearranged through an exchange of transfer-control messages.

In addition to improved communication modes, SANs have other benefits. They allow you to consolidate storage needs, using several highly reliable storage devices instead of many. They also allow you to share storage with non-Windows operating systems, allowing for heterogeneous operating environments.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset