Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 24. Scaling Up and Scaling Out Strategies

In this Chapter

As networks grow from a few servers supporting a few hundred users to a large farm of servers supporting several thousand users it’s important to understand how to scale services to support greater user loads. This chapter offers solutions on how to take some common services to the next level and allow them to support large numbers of users.

Size Does Matter

Any time a server is needed the first question is always “How big a server should I order?” If the server is too big for its role, money is wasted. If the server is too small it will have to be upgraded or replaced long before its useful life runs out and that costs money too. The trick for IT administrators is to purchase a server that is just right. In doing so they are able to not only make their users happy but also keep the accounting department happy.

It is important to note that the size of a server should bring to mind not only the amount of processing capacity and storage but also the physical size and requirements of the server. Most companies don’t have limitless space in the data center nor do they have infinite power and cooling. Minimizing the footprint of the server can go a long way towards controlling costs and increasing overall system stability. The other item that must be addressed is the scalability of the server. Is it upgradeable? Does it have the capacity to grow as the environment grows? If not, how will that be handled? Although the answers to those questions will vary based on application and environment, this chapter endeavors to answer those questions and give you insight into the options for scaling various technologies.

Determining Your Needs

There are many factors that will influence the purchase of a new server. The most obvious factor is the role of the server. Although there are many decision points that are applicable to most any server there are also several issues specific to particular roles of servers that must be addressed. Those specifics are covered in the following sections.

Building Bigger Servers

The simplest path to scaling an application to handle a greater load is improving the hardware. Nowhere is this more apparent than in building a bigger server. Bigger in this sense refers to a server with high-speed processors, usually several of them, and a fully populated bank of memory. These are the big iron of the client server world and are designed to be not only fast but also stable and easily managed.

Beefy Single Boxes

The most common way to scale a server to handle more users or a greater load is to simply upgrade to a bigger box. Upgrading to a higher clock speed processor or a new generation of processor can add significant performance to a server. Adding memory or a faster disk subsystem can also help out performance. Always analyze a system that is going to be upgraded to determine where the bottleneck is on the server. Focus on upgrading that subsystem for the largest performance gains. By increasing the performance of the system it will be able to scale further and support more users.

Windows 2003 and Multiple Processors

Windows 2003 is much better about dealing with multiple processors than previous versions of Windows. Windows 2003 can support up to 64 processors right out of the box.

Multinode Clusters

Sometimes building a bigger server isn’t an option. A server might be as big as it can be. If you can’t add more processors or faster processors or more memory it might be time to look at getting the server some help. Creating multinode clusters enables you to scale a system beyond what a single server could do. Active/Active clusters can literally double the capacity of a system. The area where this concept adds the most performance is in places where it is not feasible to split users off to another system. This is most commonly seen with databases. Sites that have multiterabyte databases often can’t realistically split the data across multiple databases. In these situations the key to scaling out the database is via clustering. Windows 2003 Enterprise offers support for clusters of up to eight nodes. This represents a paradigm shift in how Windows can scale to support high-demand applications. Many Microsoft applications like SQL 2000 and Exchange 2000 were built with clustering in mind.

Requires Technologies Like SAN or NAS

Creating clusters of more than two nodes requires technologies like SAN or NAS to provide the shared storage. A shared SCSI enclosure only works for two node clusters. Always work with the hardware vendor to ensure that servers are certified to work with Windows 2003 clustering.

Building Server Farms

The computer industry is like a pendulum. It swings back and forth between two extremes of computing styles. In the old days it was common to see large groups of computers working together on a common task. This was mostly due to a lack of processing power. As processors became more powerful the industry swung in the opposite direction, moving toward single massively powerful computers. Recently the pendulum has returned toward distributed computing. Not unlike a pendulum, each swing covers less and less distance. In the case of the computer industry the pendulum has swung to a point where groups of computers work together to support a specific need. These groups are commonly referred to as Server Farms. Server Farms consist of servers, usually configured identically, a common source of data, and a load-balancing device. The Server Farm is able to scale performance by simply adding more servers to the farm. Some of the larger farms in the industry deal with applications like Computational Chemistry where literally thousands of servers will work together in a farm to perform complex mathematical modeling.

Avoiding the Pitfalls

Simply buying the biggest, meanest server you can with the most memory isn’t always the best plan for a server. Scaling without planning can often result in more problems than it fixes. By avoiding the pitfalls associated with scaling technologies to handle larger loads, you can build an environment that is not only high performance but also low maintenance.

Buying the Wrong Hardware

All too often administrators purchase a server upgrade for a specific application to run on and it ends up being slower than the old system. As counterintuitive as this might seem, it demonstrates a lack of understanding of some types of applications. Knowing the idiosyncrasies of an application is critical to purchasing hardware for it. An application that is Floating Point Unit–intensive will respond favorably to a system with a large L2 cache. Moving an FPU-intensive application to a newer server with a higher clock speed but lower L2 cache can result in the application actually being slower. Applications that are write-intensive, such as databases, often run faster on independent disks than they do on a RAID 5 subsystem because of the parity check involved on disk writes. Whenever possible you should discuss server selection with the vendor of the application the server is being purchased for and get concrete performance numbers for various hardware configurations. Often the hardware vendors can supply performance numbers for popular applications on their hardware as well. Clever administrators arm themselves with as much information as possible before making hardware purchases.

Is the Application Multiprocessor-Capable?

Many times servers are purchased with multiple processors. All too often people believe that if one processor is good, two must be better. Often the additional processors can add a significant amount of performance to a system. Unfortunately, not all applications are able to take advantage of multiple processors. Before purchasing a multiprocessor server that is destined to run a specific application you should research the application and determine if it will take advantage of the additional processor. If it won’t, you should consider taking the money saved on the secondary processor and upgrading the primary processor.

If a multiprocessor system is inherited and the application it will run is not able to take advantage of the processor there are still ways to improve performance over a single processor server. Through the use of Processor Affinity you can assign a particular process to run on a specific processor. Any threads spawned by the process will automatically inherit the affinity. This means that a particular application can be assigned to the second processor while the first processor handles all the other Windows-related tasks. In this way the application is not affected by the operating system and runs faster than it could have on the first processor alone.

Windows System Resource Manager

Windows System Resource Manager enables you to limit not only the amount of resources used by an application but also tailor its usage to a specific processor. By limiting less important applications and tailoring the desired application of a specific processor, applications can be made to run faster and further scale their ability to support end users.

Protecting Against System Outages

One of the pitfalls of buying big servers is the tendency to load up the basket with multiple eggs. Although consolidating servers into a single powerful server has been shown to reduce costs in the IT environment it also opens the door to single points of failure. A clever administrator understands that part of scaling an environment is ensuring that individual servers don’t get out of hand.

Administrators often fall victim to the affordability of disk space. When a server runs low on space it is very easy to add more disks to the chassis or add an external chassis. This creates two dangerous situations. First of all there is a single server holding a tremendous amount of important data. This makes it very difficult to perform maintenance on the server. If it is a database server the sheer volume of the data might result in database maintenance taking longer than an available maintenance window. If the server fails there will be many users who need to access the data that is now unavailable. It is critical to determine at what point it makes sense to scale up the server by adding a server. Technologies such as DFS, which hide the physical server structure, enable an administrator to add file servers to an environment without altering user configurations or mappings.

The second danger to allowing servers to bloat before splitting off more servers is backup and restore. If a file server has so much data that it would take more than eight hours to restore it, it is probably time to split off data onto a separate server.

Although server consolidation is a good thing, don’t fall into the trap of consolidating blindly. Look at the capacity of your backup and restore system and determine the most capacity you can restore in a reasonable period of time. If the data is going to overshoot that number (you’ve been monitoring disk space, right?) its time to add a server.

Ensuring that Your Facilities Can Support Your Systems

By and large, administrators are experts in the area of technology. They understand servers, they understand IO, and they understand applications. They spend all their time thinking about the next great server and how to tweak it for maximum performance. Ask an administrator how many amps his server draws on startup and how many BTUs of heat it produces and his eyes will go dull.

Far too often administrators purchase server hardware without regard for how it will affect facilities. Knowing how much power the servers draw and knowing how the electrical circuits in the data center are provisioned is critical to avoiding unnecessary system outages. Knowing things like HVAC capacity of the data center is critical in making informed decisions about hardware. It’s depressing to have a 4TB SAN arrive only to find out the data center can’t support its electrical or cooling needs. That can be an expensive oversight.

It’s also important to avoid falling into the trap of always adding servers. It used to be a very common practice to scale Web sites by simply adding more and more Web servers. It was not uncommon to hear of sites that had more than a thousand front-end Web servers. Even with 1U servers that is 42 servers per rack. That’s 24 racks of servers. A typical rack with servers installed takes up six square feet of space. That’s 144 square feet just for Web servers. Companies with data centers will understand the cost associated with that amount of data center space. Factor in the 1000amp current draw and the amount of heat generated and you will quickly realize that blindly adding servers isn’t the best method.

Look Beyond the Plug

When considering 220V hardware it is important to not only determine whether the data center can support 220V but to look beyond the plug. Determine whether you can support 220V at your UPS. If there is a recovery site you must ensure that it can also support the 220V devices.

Making It Perform

One of the easiest ways to scale a system is to purchase the system with parts that are known to be able to support the current load and two to three years of anticipated growth. Factors such as the number of processors, the amount and type of memory, and the disk subsystem work together to determine the maximum capacity of the server. Careful choices here will allow you to get the most bang for your buck.

Never Over Clock the Processor on a Production Server

Never let the desire for performance overshadow the need for stability. No matter how tempting it might look on paper, never over clock the processor on a production server.

Choosing the Right Processor Type

There are many processors on the market today. Sixty-four bit processors have already hit the market and 32-bit processors are faster than ever. With 64-bit operating systems reaching the market it can be quite tempting to jump on the bandwagon of 64-bit processing. When you are faced with this decision it is critical to do the research and make sure it’s the right decision. Don’t be fooled by 64-bit processors and 64-bit operating systems. If the application isn’t 64 bit the system will run it in a backwards compatible mode. More often than not this will result in slower performance than a high-end 32-bit processor. As more and more 64-bit applications are released this will become less and less of an issue.

Always Ask for Benchmarks and Let the Numbers Tell the Story

Not all 64-bit applications will take advantage of a 64-bit processor and operating system. Just because an application is ported to 64 bit doesn’t mean that it is optimized for 64 bit. Always ask for benchmarks and let the numbers tell the story.

Eliminating Unnecessary Services

By reducing the number of unnecessary services running on a server the overall performance of the server can be enhanced. Services that need to be present but don’t require a lot of resources can be tightly controlled by Windows System Resource Manager. A side benefit of removing unnecessary services is decreased exposure to security vulnerabilities. You should exercise common sense in determining what services should be removed. Although a Web server might not need the printer spooler service, it might still need the server service. More information on this type of tuning can be found in Chapter 23, “Tuning and Optimization Techniques.”

Not All Memory Is Created Equal

Many of today’s servers are using a type of memory known as DDR. DDR stands for Double Data Rate, which means that data can be transferred on both the upswing and downswing of the electrical cycle. This effectively doubles the speed of the memory. This speed is based on the front side bus speed of the motherboard, which in turn determines the speed of the processor. DDR memory is rated by the speed at which it is capable of running. For example PC3500 memory is capable of running at 433Mhz. The competing memory standard is RDRAM or RamBUS. RamBUS is also rated based on the speed at which it can run. For example RamBUS is offered as RamBUS1066. This runs at 1066Mhz and is designed to run with a system that supports a 533Mhz FSB. Not unlike processors, people tend to automatically associate clock speed with performance. This can be misleading because the real story is told by the memory bandwidth. RDRAM1066 is capable of 4.2GB/sec of transfer whereas PC3500 DDR is capable of 3.5GB/sec. Suddenly the gap has closed.

To further confuse the issue, memory is rated in terms of its latency. Most server memory is rated with a cache latency of 2.5. Memory with a cache latency of 2.0 is noticeably faster and will allow shrewd administrators to squeeze every ounce of performance from their servers.

If the Server Vendor Certifies the Memory, You Have Options

When purchasing memory for a server ask the server vendor if any other brands of memory are certified for use in his server. Oftentimes third-party companies produce memory that is faster and less expensive than the OEM memory from the server manufacturer. Think of the primary server manufacturers and ask yourself which of them are memory companies. If the server vendor certifies the memory, you have options.

Planning for Disk Subsystems

There are many ways to access disks on a server. Careful planning of the disk subsystem will enable you to take full advantage of space and performance now and enable you to scale storage and performance later as the need for additional capacity arises.

Current SCSI controllers are capable of up to 320MB of combined throughput. It is important to understand that this is the speed of the bus and not the speed of the SCSI drives. Unless the system is running multiple disks, an Ultra 320 controller isn’t necessarily going to be faster than an Ultra 160 controller.

Fiber channel controllers have even greater throughput than SCSI controllers. Using Fiber channel to connect to locally attached drive arrays or Storage Area Networks can result in amazing disk IO performance.

Although IDE technologies have long lagged behind SCSI solutions, current generation Serial ATA drives offer very attractive pricing for excellent disk performance. Serial ATA RAID controllers (RAID 0,1,0+1) offer even better performance with the option of redundancy. For systems like Web servers in a Web farm, Serial ATA can be a very viable alternative to more costly SCSI drives.

Scaling the Active Directory

When a company becomes very large it can be a challenge to make sure the environment can properly scale to support the expanding Active Directory. As more and more objects are added to Active Directory and as more and more fields are used the domain controllers can become more and more taxed. Many companies take an approach of making all the domain controllers identical in terms of hardware and configuration. This greatly reduces the support load of the servers. To do this it is important to properly size the domain controllers to handle the load. Toward this end, Microsoft offers a tool called the Active Directory Sizer Tool.

Active Directory Sizer Tool

In a perfect world, each domain controller would be capable of handling the entire authentication load of the entire environment. In this way if the entire network went to hell in a handbasket, users would still be able to authenticate to get to resources. In the real world this is actually a very realistic option. The difficulty comes in determining just how beefy a server must be to support the entire user base. Luckily Microsoft offers a specific tool that was developed for exactly this reason. It is called the Active Directory Sizer Tool. By following a simple wizard and having the necessary information about the enterprise you can determine the required specifications for the domain controllers. This tool takes into account factors such as Exchange 200x, typical user activity, and replication schedules. Although it is not an exact science it does provide a good starting point for determining the hardware specification of the domain controllers. Do not forget to factor in room for growth. If a company has a policy for replacing hardware on a regular schedule you must take that period of time of growth into account.

Based on user inputs and internal formulas, this tool can provide estimates for the number of

Domain controllers per domain per site.
Global Catalog servers per domain per site.
CPUs per machine and type of CPU.
Disks needed for Active Directory data storage.
Amount of memory required.
Network bandwidth utilization.
Domain database size.
Global Catalog database size.
Inter-site replication bandwidth required.

Additional information on the Active Directory Sizer Tool can be found on the Microsoft Web site at http://www.microsoft.com/windows2000/techinfo/planning/activedirectory/adsizer.asp.

File Locations Matter

On any application that uses a database and log files, it behooves you to pay careful attention to where files are placed on the system. In a perfect world the operation system would have its own set of spindles, the swap file would have a separate set, the database would have another set, and finally the log files would have a fourth. The concept is that any spindle can be read or written independently of any other set of spindles. The side benefit of this is that if the spindles supporting the database were to fail, this would not affect the log files. This would greatly aid the recovery process for the database.

Active Directory is an application that uses a database and log files. By placing the database and log files on separate disks, you enable the system to read and write both of these simultaneously. This eliminates bottlenecks where the log files aren’t written until the database access is done. Similarly by placing the operating system on its own set of disks, operating system tasks that access the disk will not affect the database or logs. Placing the swap file on its own set of disks is perhaps the most critical of all. The swap file is read from and written to almost constantly. If this activity had to compete with the OS, the database, and the log files, overall performance of the system would suffer.

Many Companies Would Rather Mirror Disks than Break Out the Roles

Because of cost constraints and disk backplane capacity it is often not realistic to break up the OS, swap file, database, and logs to the degree mentioned previously. For system resiliency, many companies would rather mirror disks than break out the roles. When faced with this compromise on a domain controller you can prioritize the functions as follows with regards to which function should get its own spindle first:

Swap file

NTDS.DIT

Active Directory logs

Configuring Your Disks the Right Way

The optimal configuration for a domain controller is for the operating system to be mirrored on a pair of drives. Each of the drives should be on their own channel of their own controller. This protects against disk failure and controller failure. The swap file should also be on a pair of mirrored drives with each drive on its own channel of the same two controllers running the OS. A third pair of drives should be mirrored in the same manner as the OS drives and these should hold the log files. The drives for the Active Directory database should also be mirrored in the same manner as the prior drives. If the database will be larger than a single drive it is preferred to run mirrored stripe sets (RAID 0+1) for the database. This is preferred over RAID 5 because of the performance hit on writes that is associated with RAID 5.

RAID 0+1 combines the performance of striping with the redundancy of mirroring. Striping multiple disks allows each disk to be read or written simultaneously. This allows the disk performance to scale nearly 1:1 with additional disks. RAID 5 uses a parity check that requires reading all the disks in the RAID and determining the parity value to write. This configuration results in good read speed but reduced write speed. The parity allows the RAID to lose one disk and still operate based on the ability to re-create the value on the missing drive. This also allows the RAID to rebuild the data on the missing disk when it is replaced.

Understanding Your Replication Topology

Properly scaling Active Directory goes beyond simply sizing the domain controllers and optimizing the location of files. When active directory becomes very large it is critical to address the replication topology. Logical placement of bridgehead servers helps to break up replication traffic. Rather than force all domain controllers to replicate back to a hub site, plan out replication to reduce traffic across slow links. If a network had a main office in San Jose, an office in New York, and an office in New Jersey and the New Jersey office connected only to New York it would not be optimal to have a hub and spoke replication back to the main office in San Jose. By allowing New York to act as a bridgehead with New Jersey using New York as a preferred bridgehead, replication traffic would be reduced. If the domain controller in New York failed, the domain controller in New Jersey could still replicate with the domain controllers in San Jose assuming site link bridging was still enabled. Site link bridging is enabled by default.

If an Active Directory site is going to have more than one domain controller one of the DCs should be configured as a bridgehead server. This allows the other DCs in that site to get their replication from the local server. Without this type of architecture it would be hard to scale Active Directory across a large environment. The use of site link bridging is useful for creating simple redundancy but as an environment grows the Knowledge Consistency Checker—the function that determines replication across bridged site links—is unable to scale and manually managed site links are required. A good rule of thumb is to check the environment against a complexity formula. To determine if the topology is too complex for the KCC to handle, a complexity formula is used:

KCC in Windows 2003 Is Greatly Improved from the Version in Windows 2000

By default, site link bridging is enabled. The KCC in Windows 2003 is greatly improved from the version in Windows 2000. Replication traffic in Windows 2003 is also reduced as a result of replicating attribute level changes instead of entire user objects. This being the case, it is still important to monitor the KCC to ensure that replication is occurring correctly.

(1 + D) * S^2 <= 100,000 (where D = Number of Domains and S = Number of Sites in your network)

Scaling for the File System

As companies begin to take advantage of new technologies such as Volume Shadow Copy, Redirected Folders, and desktop backups, there is a need for larger and larger file servers. The issue becomes that as more users are accessing the systems the servers are unable to keep up with the demand. Increasing the available disk space on servers only encourages users to store more data and this serves to further affect performance. Because historically the amount of data stored per user has consistently grown, the only option is to increase scalability of the file servers.

Scalability is the key to reducing operation costs

By properly scaling file servers it is possible to consolidate file servers. This reduces hardware costs and maintenance costs, and frees up valuable space in the data center. Scalability is the key to reducing operation costs.

Disk IO is Critical—SCSI/RAID/IDE

Most modern servers come with a SCSI controller for the disk subsystem with the option of a hardware RAID controller. It is important to distinguish a hardware RAID from a software RAID. The easy way to distinguish them is that a hardware RAID is RAID “all the time.” A software RAID is only RAID after the operating system has started. Software RAID requires processor time and is generally less efficient. Hardware RAID traditionally offers more advanced features in the area of distributing memory caches and in dynamic reconfigurations such as adding drives to an existing array.

SCSI comes in many flavors—Wide, Ultra, Ultra Wide, Ultra 160, Ultra 320. Each of these flavors refers to a specific type of drive it supports and an overall bandwidth of the bus. Ultra 320, for example, has a total bandwidth of 320MB/sec. The important thing to note is that the bandwidth of the controller doesn’t have anything to do with the bandwidth of a drive. An Ultra 320 hard drive doesn’t have a throughput of 320MB/sec. The advantage of the controller having that amount of bandwidth is that it is able to control multiple hard drives before it becomes the bottleneck in the system. This allows a server to scale more efficiently because adding hard drives will increase performance until they saturate the bus. When this occurs you have the option of adding more controllers and reallocating disks such that none of the controllers are oversubscribed.

RAID traditionally refers to a Hardware RAID controller with an attached set of disks. RAID has the ability to take attached disks to another level. By writing to the disks in a specific manner a system can gain the ability to increase read and write performance and offer the ability to continue serving data even after disk failures have occurred. RAID is offered in several levels, each with different characteristics.

Using RAID over single attached disks allows servers to scale well because data is protected and access to data is improved. RAID technologies allow larger numbers of users to be supported on file servers. Striping disks allows the aggregate space to be treated as a single disk. This enables an administrator to surpass the physical limitations of a single disk.

BEST PRACTICE: Always Be Aware of the Implications

When making cache settings on a RAID 5 subsystem always be aware of the implications. Allowing write caching on a RAID 5 volume nearly eliminates the write penalty associated with RAID 5 as long as the cache is not full. Failure to commit the write cache to disk will almost always result in file corruption. Ensure that the cache has a functional battery backup if the plan for the RAID 5 calls for write caching. Some controllers allow the cache to be physically moved to another controller without losing the RAID configuration or any of the data stored in the cache. This allows the drives to be moved to another system and the cache flushed to disk.

BEST PRACTICE: RAID Types

RAID 0 is referred to as a striped disk array. The data is broken down into blocks and each block is written to a separate disk drive. The first block goes to the first drive, the second block to the second drive, and so on for all the drives and then the writes go to the first disk and follow sequentially. I/O performance is greatly improved by spreading the I/O load across many channels and drives because by having multiple read/write heads the disks can be accessed simultaneously. This scales performance in a nearly 1:1 manner.

RAID 1 is referred to as mirrored disks (or duplexed if there are two controllers). Any data that is written to disk 1 is written to disk 2 as well. There is no performance gain but if disk 1 fails there is an exact mirror of the data on disk 2 that can be utilized by the server.

RAID 5 is Independent Data disks with distributed parity blocks. This essentially means that as blocks are written as 0s or 1s the values are added up for n–1 drives and the resulting 0 or 1 (remember, this is binary and we are checking parity, so two 1s become a 0) is written as a parity bit on the remaining drive. In RAID 5 the parity is distributed across all drives. RAID 3 is a similar concept except that the parity is kept on a dedicated drive. This had a disadvantage of the parity drive seeing more accesses than any other drive and it became a bottleneck, as such it is rarely if ever seen in modern networks. RAID 5 has good read performance because the drives are read simultaneously and the additional heads will scale performance. Writes, on the other hand, suffer a penalty due to the need to check parity and possibly rewrite it. By having the parity bit a RAID 5 system can continue running if a drive is lost. When the drive is replaced the calculated parity is written back to the disk.

RAID 6 is similar to RAID 5 but with the addition of a second parity disk. This allows the system to survive the failure of two disks in the array. It suffers from even worse write performance and is rarely seen in use.

RAID 0+1 is a combination of striping and mirroring. The disks are striped for performance and mirrored for redundancy. It is the least efficient use of disks but it results in the best overall performance for applications that are both read- and write-intensive.

There are other forms of RAID (2, 7, 53, and so on) but they are rarely seen in production either because of a lack of performance advantage or because they are proprietary in design.

When Does an Environment Justify Using SAN/NAS?

As requirements for data storage and data access become extreme a server with locally attached SCSI or RAID storage can become unable to keep up with the rate of requests for data. Network operating systems such as Windows, Unix, or even Linux are very good at handling and servicing data requests but eventually they become overtaxed and another technology must be employed.

NAS stands for Network Attached Storage. SAN stands for Storage Area Network. These two technologies differ in one key area. NAS utilizes file level access and SAN utilizes block level access. SAN allows another system to believe that a portion of the SAN is local raw disk. NAS uses an additional abstraction layer to make another system believe that a portion of its disk is a virtual local disk. SAN is generally higher performance and is often used on databases because the performance is so high. NAS has only recently entered the area of databases as improvements in its technology and associated abstraction layers have resulted in performance that is sufficient for databases. NAS and SAN have a big advantage over attached storage in that they do not run a full operating system that was designed with hundreds of tasks in mind. They have very dedicated cores that are designed purely for high performance data access. The other key area in which NAS and SAN differ is in their method of attachment. NAS runs over ethernet (TCP/IP) and can take advantage of an existing LAN environment. The use of ethernet somewhat limits the bandwidth available to NAS. SAN, on the other hand, runs over fiber channel. This technology has much greater bandwidth than Ethernet but is also significantly more expensive. Not unlike most things in life, as performance goes up, so does cost.

NAS and SAN offer great flexibility in their ability to centrally manage storage and dynamically allocate space to servers. Some technologies such as large node clustering and large application farms would be nearly impossible without NAS or SAN.

When an environment gets to the point where the file servers are unable to service user requests for data in a timely manner or when attached storage capacity is simply exceeded it is time to strongly consider a NAS or SAN. For applications like Terminal Server farms, where users will attach to the system on any of the nodes, it is highly advisable to store the user’s files on a SAN or NAS device. This ensures high performance access to these files from any server in the farm. Without this type of central storage, management of users and their data would be very difficult.

Fiber Channel

Fiber channel can be run across tremendous distances. Companies often use fiber channel networks to maintain mirrored data in other states. The bandwidth combined with the long haul features makes fiber channel a very valuable technology to use with data storage.

Remember RAM-disks?

Some situations call for extremely fast access to data but not necessarily large volumes of data. Computational analysis, databases, and system imaging software are just a few examples of applications that could benefit from extremely fast access to read-only data in the under 2GB range. This type of situation can greatly benefit from the use of RAM-disks. By partitioning off a chunk of system memory and treating it as a disk you can get memory speed performance for applications that traditionally accessed disks. Although this information can be written back to disk for storage, it somewhat defeats the purpose of the RAM-disk. By preloading information into the RAM-disk the system can spool out the data as fast as the network interface can handle. For situations like imaging hundreds and hundreds of desktops from a single server image the increase in performance can be stunning. Although Windows 2003 does not natively offer a RAM-disk, there are several third-party RAM-disk programs available such as RamDisk Plus from Superspeed or SuperDisk from EEC Systems.

RAM-disks Are Best Suited to Read-only Data

RAM-disks lose all data when the power is turned off. Ensure that the data will be committed to disk upon shutdown if the data will be read/write. Be aware that a system crash will result in any new data in the RAM-disk being lost. RAM-disks are best suited to read-only data.

Distributed File System

Another great way to scale file server performance is through Distributed File System. DFS essentially enables you to hide the file servers behind an abstracted layer. Instead of accessing shares in the traditional method of \fileservershare the user attaches to \dfsshareshare. The DFS structure is comprised of links to other file shares. This hides the location of the data from the user. The advantage of this is that shares can be moved to larger servers without the user having to remap her resources. Replicas of the data can be created and Active Directory will allow the user to connect to the closest replica of the data. This allows a DFS structure to scale without consuming all available WAN bandwidth. It also offers a level of redundancy to the environment. If a DFS replica is down, users will connect to the next closest source of the data. This gives you tremendous flexibility in scaling the file servers.

Excellent Candidate for DFS

Read-only data that is accessed heavily is an excellent candidate for DFS. By placing multiple replicas of the data on the same network the DFS structure will load balance the access to the data, resulting in excellent end-user performance.

Scaling for RAS

Companies today are moving toward the philosophy that data should be available anywhere and anytime. Users should be able to access resources from home, from hotels, and even from Internet cafes. Technologies such as Virtual Private Networks, wireless, and modems work together to allow users to access their data. Setting up basic remote access systems can be fairly straightforward. Scaling these systems is another situation entirely. Companies like AT&T support literally millions of users in their Remote Access systems.

Never Compromise Security Policies

Don’t let the scaling of RAS get in the way of network security. Never compromise security policies to increase VPN or RAS performance.

Hardware Cryptographic Accelerators

VPNs are an amazing way to take advantage of the Internet as a backbone network for remote access. Windows 2003 offers support for both Point-to-Point Tunneling Protocol and Layer 2 Transport Protocol (with IPSec) as VPN technologies. Windows 2003 does a pretty good job of handling these services but as administrators attempt to scale this access to larger and larger numbers of users they quickly discover that the VPN takes up a fair amount of system resources. Rather than just add more and more RAS servers, a clever administrator can increase performance by using a hardware cryptographic accelerator. A cryptographic accelerator offloads encryption tasks from the CPU and performs them on dedicated hardware. This allows a RAS server to greatly increase the number of simultaneous connections it can service. This also allows administrators to enforce a higher level of encryption than they would have otherwise used because of performance constraints.

A Hardware IPSec Accelerator Is Probably Overkill

In many environments, the Internet bandwidth becomes the VPN bottleneck long before the VPN server does. If a company only has a T-1 connection to the Internet, a hardware IPSec accelerator is probably overkill.

When to Make the Move from Software to Hardware

Many hardware RAS solutions on the market offer features and levels of performance not found in Windows 2003 Routing and RAS. One of the primary factors in moving to an appliance for remote access is to move away from a multipurpose operating system like Windows 2003 to a more dedicated operating system. Because RAS devices are often run parallel to the firewall the security of the system is of paramount performance. By eliminating the general purpose code these appliances are able to greatly mitigate their exposure to security exploits.

When looking at hardware VPN/RAS devices pay special attention to whether they support native VPN clients. Having the ability to use PPTP or L2TP/IPSec can be a great advantage in not having to purchase or manage a third-party VPN client.

PPTP Security

The industry often gives PPTP a pretty hard time about its security. White papers were published accusing PPTP of being susceptible to a “man in the middle attack.” It is important to point out that the security flaw exposed was not in PPTP but in MS-CHAP, the authentication protocol that was used in PPTP at the time. This flaw has long since been fixed in MS-CHAPv2. This is the authentication protocol used for PPTP in Windows 2000 and Windows 2003.

Multiplexing for Modem Support

Companies that maintain their own dial-up services can take advantage or newer technologies to reduce their costs and maintenance efforts. When looking at adding analog lines for modems, always look into getting an aggregated line and a multiplexer. In many areas it is cheaper and easier to get a T-1 line than it is to get 10 analog lines. The T-1 is cheaper, takes up less space in the Intermediate Data Frame, and has the capacity for a total of 23 analog lines. Always work closely with the telecom group to see what facilities you already have in place and take advantage of them whenever possible.

Software modems take this concept to another level. By plugging a single T-1 into a Software Modem device the device creates up to 23 virtual modems that act exactly like physical modems. This is a more cost-effective modem solution and it takes up less space in the data center. Most ISPs use virtual modem technologies to support their dial-up users.

Taking Advantage of Multihoming Your Internet Connection

As companies become more and more dependent on their VPN environments they often look into making the VPN more resilient. Redundant VPN hardware is usually the first upgrade with things like bandwidth being second. Multihoming the Internet connection is an upgrade that is often overlooked. By attaching to multiple ISPs a company can protect against failures of their upstream providers. Additionally, by multihoming a network can become closer to other specific networks. For example, if a company had its Internet connection through one company and was using dial-up services through another, there is no guarantee as to the performance when going from one network to another. Traffic from ISP A will hit the Internet through a public access point as quickly as possible and eventually reach ISP B. From there it would reach the company’s VPN system. By attaching the company’s VPN system to ISP B as well (ISP B being the provider of the dial-up services with POPs in every city) there is a much more direct path back to the VPN system. This situation not only improves performance by reducing latency and hop count but it acts as a secondary route for Internet connectivity. Technologies such as BGP4 (Border Gateway Protocol) allow companies to be reachable via multiple ISPs without having to route only through the primary ISP. This allows a company to scale its VPN and RAS solutions through added capacity and added resiliency.

Scaling Web Services

For many companies, Web services see more traffic than any other single system. As the company grows its identity grows. As its identity grows, more and more people want to find out about the company. This results in more and more traffic for the Web servers. Companies are using the Web for providing not only information but also for supporting their products. Fully indexed searches, dynamic content, and driver and patch downloads all result in increased loads on the Web servers. The Web services must be able to scale if the company is to keep up with the rest of the industry.

Beefy Boxes Versus Many Boxes

Traditionally, applications scaled by improving the performance of the hardware. Almost all applications ran on a single server and the only way to make it faster or to increase its capacity was to upgrade the server. Web services introduced the industry to an application where much of the data was static. Even today’s dynamic sites are mostly static frameworks with bits of data read from another system. This created an environment where a large portion of the data was read-only. This meant that data could be replicated to multiple locations and updated in batches. Changes would not be made by users to one system and need to be replicated to the others. This was a prime environment in which to use multiple servers and load balancers. By adding Web servers and giving them a local copy of the static content and pointing them to a central source for dynamic content performance could be scaled to amazing levels. Soon the Internet was filled with farms of hundreds of front-end Web servers servicing hundreds of millions of hits each day.

Using Cryptographic Accelerators for SSL

As the increase in Web server usage swept the Internet, new uses for Web servers were appearing. Traditional brick and mortar companies were doing business on the Internet. Security for these business transactions became a strict requirement. Companies turned to encryption to offer a secure method of doing business on the Internet. SSL, or Secure Socket Layer, became something of a de facto standard for encrypting Web traffic. The use of SSL requires the Web server to perform certain cryptographic processes on data. These processes take up CPU cycles and can quickly bog down a Web server. To continue to scale Web services with SSL, administrators continued to add more and more Web servers. The industry quickly realized that this was not an optimal solution and SSL accelerators were created. By offloading cryptographic processes onto a dedicated hardware device the CPU is freed up to perform other tasks. SSL encryption loads can reduce the performance of a Web server by as much as 75%. An SSL accelerator can return that performance without having to add servers. This reduces maintenance tasks and warranty costs and frees up valuable data center space.

SSL accelerators

Many load balancers, also known as layer 4-7 switches, are offering SSL acceleration. Other SSL accelerators come in the form of PCI cards destined for the servers themselves.

n-tier Application Model

Many Web-based applications start their lives as a single box that is a Web server, an application, and a data store. This works well for small applications and keeps the data neatly bundled in a single system. As these applications are scaled, a single box is often not sufficient to keep up with the needs of the application. To scale these types of applications it is useful to take the application to an n-tier model. By separating the database from the application an administrator is able to dedicate the performance of a system to being a database. This allows the system to be built and tuned with a specific database in mind. This allows it to scale well. The application layer often has different requirements as well. It might be demanding enough to warrant multiple application servers running in a load-balanced group to offer enough performance to keep up with the application. The Web layer can be scaled like any other Web server. By load-balancing a group of Web servers, they can be scaled to meet the demands of the users. By pointing them to the load-balanced application layer, they can take advantage of the distributed processing of the application. Those applications draw their data from the database and feed it up into the Web presentation layer. This type of model scales very well. As components of the system prove too demanding to share resources with other components, they are simply split off onto dedicated hardware.

Scaling Web Services via Web Farms

When the Dot Com boom first started to hit companies scrambled to build systems powerful enough to keep up with the demands of their users. Early Dot Com companies put up powerful Unix systems to run their Web sites. It was soon realized that this was a very inefficient method. Because the Dot Com world required resources to be accessible 24 hours a day, seven days a week it became very expensive to maintain redundant Unix systems. The concept of the Web Farm caught on very quickly. By running multiple Web servers the load from the user base was distributed across the systems through the use of a load-balancer. With this architecture, the environment could run with very inexpensive servers. The stability of the systems was not a great concern because if a server failed the other servers would take up the load. If the load became too high the administrator could simply add more Web servers to the farm. This became the de facto standard for high traffic Web sites. By replicating the content or by having the servers draw their content dynamically from another source the systems could be brought online very quickly and easily. Sites using this methodology have scaled to the point of being able to support more than 300 million hits per day.

Scaling for Terminal Services

Terminal services have changed drastically over the years. Early versions of terminal services had issues with multiple processors, they didn’t have the ability to load balance, and they had no ability to leverage local client resources. Third-party add-ons to Terminal Services addressed these types of issues and increased the ability to scale Terminal Services into larger and larger deployments. The current version of Terminal Services in Windows 2003 has natively addressed these concerns and has proven its ability to scale to very large implementations.

Big Processors Versus Multi-Processors

Current processor technologies allow servers to perform incredible amounts of computing. Single servers can host literally hundreds of simultaneous users. Windows 2003 Terminal Services is able to scale performance nearly 1:1 with the addition of multiple processors up to four processors. Although Terminal Services can be run with more processors, benchmarking has shown that scaling beyond four processors results in greatly diminishing returns.

Memory, Memory, and More Memory

Terminal Services can be a fairly memory hungry beast. Exact memory requirements will vary dependant on the types of users accessing the system and the applications running on the system itself. A safe rule of thumb is 16MB per user if he is running specific applications and 32MB per user if he is running a desktop session. Memory for the system itself should be added to this value to determine the total memory needed for the terminal server. Without enough memory to support the users the individual user’s performance will suffer. Sufficient memory is needed to properly scale Terminal Services.

Terminal Service Farms

To support very large numbers of Terminal Services users it is necessary to go beyond one or two Terminal Servers and into a full Terminal Service farm. Because of the unique needs of Terminal services there are a few components that are critical to the success of the farm. Some applications require the tracking of a session state. A Web server that is load-balanced might need the user to return to a specific Web server that was tracking the user’s actions in order to make an application work properly. Terminal Services takes this concept much further. Terminal Services give the user the capability to disconnect from the session but have the applications continue to run. For this reason it is an absolute necessity for a user to reconnect to his original session. This is accomplished via the Session Directory.

By having Terminal Servers join the Session Directory, the Session Directory Servers will track which users are on which Terminal Servers. This allows users who are intentionally or unintentionally disconnected to return to their original session when they reconnect. Because of the importance of this role, it is recommended that the Session Directory be run on a cluster.

Because users of a Terminal Services farm can conceivably connect to any server in the farm it is important that their personal resources be reachable from any session. It is best to think of a user’s terminal server session as a disposable resource. Nothing unique to that user should be stored on a Terminal Server. The easiest way to accomplish this is through the use of Terminal Server profiles and redirected folders. These folders should redirect to a central file store. This file store could potentially be used by hundreds of Terminal Servers and therefore thousands of Terminal Server users. Use of NAS with a clustered head is highly recommended for this role. To make sure that users don’t store data locally to the session the session should be locked down via GPOs. Decisions on whether or not to allow users to connect to their local host are left to the individual administrator.

The responsible administrator will take the time to lock down the servers application by application to prevent users from altering resources that they shouldn’t have access to.

Improving Scalability by Load Balancing Applications

Windows 2003 offers load balancing amongst Terminal Servers. For some environments, this isn’t enough. Several third-party vendors have added the concept of Application Load Balancing to Terminal Services. This means that if a user wanted to run Word their request would reach an Application Load Balancer and it would check to see which server offering Word had the lowest load. This is the server that the user would be connected to. This allows an administrator to load balance all of her servers to a single name and IP address and not have to install all applications on all servers. This can be especially helpful when running applications that have specific local peripheral requirements.

Summary

In this chapter you saw that there are two primary ways to scale a technology. The first way is to improve the hardware on which the technology runs. Servers with multiple processors, loads of memory, and high performance disk subsystems are a great way to get a technology to support more users.

You learned that most technologies can be scaled by adding more of the same item and balancing the load across them. By creating these farms, you can scale an application to handle far more users than a single system ever could.

You saw how dedicated hardware devices like accelerators can increase performance beyond the original system’s capabilities. Dedicated operating systems can also serve to further scale applications by removing unnecessary services and therefore removing potential security flaws.

Third-party applications can serve not only to extend the functionality of systems like Web servers and Terminal Servers but also they can add scalability by offering load distribution technologies that were not originally present.

By taking advantage of these features and implementing best practices you can scale almost any application to support huge numbers of users and performance loads. By carefully planning the applications, you can scale without falling into the trap of making a system unmanageable.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 24. Scaling Up and Scaling Out Strategies

Create new playlist

Sign In

Sign Up