Chapter 7. Storage Management Strategies

One of the most important aspects of a database is the storage system it is housed on and how it is managed. Advances in CPU, memory, and disk systems have allowed us to extend database technology to meet the needs of myriad data-intensive workloads. Indeed, advances in hardware continue to move at a fast pace. CPUs have for many years progressed according to Moore's Law. Memory density and memory capacity follow a similar path, and we are now seeing solid-state disk drives that will eventually allow us to host an entire database in memory. Although technology advances allow us to do more, they also allow us to ignore the fundamental principles of good storage and database design. We have seen poorly designed databases run twice as fast on new hardware, and we have also witnessed well-designed databases see little or no performance improvement on the latest hardware. In this chapter, we will cover the various storage options that are available, how best to configure your database on the storage selected, I/O best practices, and finally high availability options.

Storage Systems

Database administration is a complex role. There are many different aspects of the database environment that must be cared for. With every new release of SQL Server, both the management and the maintenance of the database environment become easier. We find ourselves using DBCC less and less in favor of SQL Server Management Studio. But even with the help of Management Studio, the workload for the typical DBA has grown in complexity and volume. For example, the number of databases that you are expected to manage may have grown from 20 to 200; or, the 20 databases that you currently manage now account for terabytes of storage. Interestingly, we have customers who are looking at petabyte data warehouses in the not too distant future. Yes, that's around a quadrillion bytes.

To the novice, setting up SQL Server is a simple matter of clicking through a wizard and answering a series of questions during the installation process. Accepting the defaults is perfectly fine for a simple development environment, but for a serious data management environment, you need to address a number of fundamentals and best practices. One of the most important among these is the storage system configuration and how it's used with SQL Server. You goal as a DBA is to ensure the best performance of the storage system while at the same time maximizing the availability of the system.

You will have to consider many factors as you select your storage system and as you configure SQL Server with the storage system. For example, you will need to factor in cost, expected workload, required uptime, data size, and expected growth rate, just to name a few. In this section, we will start at the lowest level, the disk subsystem.

As shown in Figure 7-1, there are three major categories of storage: direct attached storage (DAS), network attached storage (NAS), and storage area network (SAN). In the following sections, we will describe each. Keep in mind that these storage types are not mutually exclusive. That is, you can combine them to achieve the storage system that is appropriate for your needs.

The three major categories of storage (DAS, SAN, and NAS)

Figure 7.1. The three major categories of storage (DAS, SAN, and NAS)

Before we discuss the different storage types, we'll cover the different connection technologies that will allow you to attach your disk subsystem to your server (and database). Figure 7-1 illustrates how application software, in our case the database, connects via the operating system to the storage subsystem.

Storage System Interfaces

You can use a number of interface technologies to connect your disk subsystem to your database. In this section, we will cover most of the popular interface standards. The interface that you choose can have a major impact on performance in that the interface will support a maximum transfer rate according to the manufacturer's specification. To get the maximum performance out of your storage subsystem, you will need to make sure that the interface can handle the data load being transferred across it. Although we will focus strictly on the storage subsystem in this chapter, it is important to note that a database server must be balanced with respect to data input/output. For example, if your disk subsystem (through your interface) can support a sustained transfer rate of 320 megabytes per second (MB/s), then you want to make sure that the CPU and memory can support at least that volume of data.

IDE ATA

IDE ATA stands for Integrated Drive Electronics Advanced Technology Attachment. It is a disk drive technology where the controller is integrated with the disk drive, as opposed to having a separate controller on or connected to the motherboard. IDE ATA is a parallel technology and is based on parallel data processing. Data travels asynchronously from the drive to the data bus.

SATA

Serial Advanced Technology Attachment (SATA) is a physical storage interface with serial data processing. SATA II is a second generation of SATA. When people speak of SATA today, they often mean SATA II.

SATA's interface provides a mechanism for connecting a computer's host bus adapter to mass storage devices. Data travels synchronously at high speed via a serial cable. The main design advantage over IDE is that SATA II supports command queuing, a feature known as Tagged Command Queuing (TCQ) or Native Command Queuing (NCQ). This technology is designed to increase the performance of the hard drive by allowing it to optimize the order in which read and write commands are executed. Command queuing can decrease the movement of the disk heads and increase performance.

SATA uses a thin cable design and smaller form factor and length, which makes wiring more efficient. Another advantage of SATA is that it supports the hot-plugging and hot-swapping of disk drives. SATA can support transfer speeds in the range of 150–300MB/s, based on SATA type.

SCSI

Small Computer System Interface (SCSI) is a parallel interface (also referred to as SPI) that is used for connecting and transferring data between a computer and peripherals such as hard disks. In general, SCSI provides a faster data transmission rate than SATA. For example, Ultra-640 SCSI can support transfer rates of 640MB/s.

Serial Attached SCSI

Serial attached SCSI (SAS) is a serial interface that combines the benefits of SCSI with SATA's physical advantages listed earlier. The current SAS version operates slightly slower that the latest parallel SCSI interface, but the transfer speeds are expected to double with the next release of the technology.

Fibre Channel Direct Attached Storage

Fibre Channel direct attached storage (FC DAS) is a gigabit-speed network technology implemented as a physical layer protocol that enables serial duplex interfacing that facilitates communication between high-performance storage systems. Fiber Channel performs at rates as high as 10 gigabits per second (Gb/s). Fibre Channel in DAS topology is directly attached to a server and not openly accessible to other servers. FC DAS is not commonly used.

Fibre Channel Storage Area Network

Fibre Channel storage area network (FC SAN) refers to a dedicated network that allows access between storage devices and servers using the Fibre Channel technology. The Fibre Channel technology uses Fibre Channel Protocol (FCP), which is similar to TCP/IP for transporting SCSI commands over the FC network.

iSCSI

iSCSI stands for Internet Small Computer System Interface. It's an implementation of Internet Protocol (IP) used for linking data storage to servers. The iSCSI protocol transmits SCSI commands as IP packets. Because it is using IP, iSCSI can leverage any existing TCP/IP network to facilitate data transfers over intranets and can manage storage that is geographically dispersed. iSCSI is a popular SAN protocol because it allows organizations to centrally locate storage shared across servers independent of proximity, as is the case with Fibre Channel.

InfiniBand

InfiniBand is a point-to-point bidirectional serial interface like Fibre Channel and Serial ATA. The interface is a switched-fabric communications link that is designed to be scalable. The interface supports double (duplex data rate) and quad (quad duplex rate) signaling rates, which effectively double and quadruple the base transfer rates. There are three interface widths: 1×, 4×, and 12×, which represent 2, 8, and 24 differential pairs. The most common is the 4× interface width, which provides 8Gb/s base and 16Gb/s duplex data rate.

Storage System Types

Earlier in this chapter, we briefly talked about the three main storage system types (DAS, SAN, and NAS). In this section, we will discuss each storage system type in detail and the impact of each on the performance of your SQL Server 2008 database.

Direct Attached Storage

DAS is considered the "traditional" storage type. In this type of storage, the disks are connected directly to the server. The storage device can be either local disks or an external disk array. Since the disks are not networked, their use is limited to the server to which they are connected. Although the setup is easy and the cost is usually low, you run the risk, using DAS storage, of running out of storage capacity. You can add more disks to your DAS storage array, but once the cabinet is filled, your only option is to purchase a larger cabinet, increase disk density, or purchase an additional cabinet that will force you to add another HBA to your server, which may not be possible. Obviously, the direct attach strategy imposes limits on your ability to extend your storage capacity because of the 1:1 pairing of server interface cards (HBA or disk controller) to disks (DAS storage cabinets or local server disks).

DAS can use different types of interconnects, such as SCSI, iSCSI, or Fibre Channel. DAS can also use different types of storage, such as external disk cabinets, local disks, or even a SAN. The key factor in the definition of DAS is the physical connection between the server and the storage.

DAS is typically used for small SQL Server databases where the database growth is well defined and can comfortably fit within the capacity of the physical storage device. As an example, you can use an external disk cabinet that may be 50 percent populated with disks, allowing for expansion by adding disks to the enclosure as required.

You can cluster DAS systems for high availability. The number of nodes supported in a SQL Server 2008 cluster is based on the SQL Server 2008 edition. SQL Server 2008 Enterprise supports up to 16 nodes, while Standard supports only 2 nodes. All other SQL Server 2008 editions do not support clustering.

Storage Area Network

SAN is the most commonly used type of storage system for databases in medium to large businesses. A SAN is a centralized network of storage devices. A SAN provides a number of benefits over DAS, as follows:

  • Centralizes and consolidates disk resources and supporting equipment (for example, cooling and power)

  • Promotes high availability

  • Improves data storage management and reduces costs

  • Enables efficient hardware deployment and utilization

  • Improves data backup efficiency and accessibility

  • Enables storage virtualization, including heterogeneous disk arrays

  • Supports hosts running on different operating systems

SAN technology transfers raw disk blocks directly onto the network. A centralized storage network allows for server power and utilization to focus on the business application and I/O processing to be offloaded. That is, the disk controllers/processors are built into the SAN rather than residing on the host. See Figure 7-2 for an example of using SAN storage as part of an application architecture.

This allows the host to directly boot from the SAN and provides for storage expansion.

SAN architecture

Figure 7.2. SAN architecture

A host server operating system views the SAN as a collection of logical unit numbers (LUNs), and the OS will maintain a file system on each LUN; see the "Logical Unit Numbers (LUNs)" section for a more detailed discussion on LUNs. A SAN typically is comprised of two controllers and a cabinet of disks. Depending on the SAN model, you can add more cabinets of disks for expansion. However, make sure you check for your particular model, because there are limits. In addition, each SAN will host only a limited number of host machines, so plan your selection of SAN model based on future growth.

A SAN can be based on either iSCSI or Fibre Channel interface standards. Typically the choice of iSCSI vs. Fibre Channel comes down to one of cost. Fibre Channel SANs are well designed for mission-critical I/O-intensive applications where availability and high performance are required. The Fibre Channel environment requires dedicated Fibre Channel HBAs on the hosts to support the Fibre Channel network infrastructure that is complex to manage and maintain.

Fibre Channel networks use dedicated optic cables that are limited by connection distance. iSCSI, as an alternative to Fibre Channel, is a more affordable solution since it does not require specialized hardware and network infrastructure (it leverages the IP network). iSCSI can support much greater distances than Fibre Channel; however, it is dependent on the Ethernet and IP network bandwidth. iSCSI HBAs are available to offload the iSCSI processing from the host processor for improved performance.

It is possible to use iSCSI on a host server to connect to an FC SAN without having to add the HBA Fibre Channel hardware to the host. The technique is called iSCSI bridging. As you might imagine, there is some overhead and latency to this approach due primarily to translation between network protocols.

A SAN can offer superior performance as compared to DAS by supporting a disk cache. However, since a SAN is typically shared among a number of hosts, the cache may not be available when SQL Server needs it, which will result in degraded read-write performance. If you are expecting to leverage the benefits of the SAN cache, then you should test to ensure that the SAN cache is adequate.

Logical Unit Numbers (LUNs)

An important part of how a SAN is configured is related to defining logical units of storage. A logical unit is a SCSI protocol entity, the only one that may be addressed by the actual input/output (I/O) operations. Each SCSI target provides one or more logical units and does not perform I/O as itself, but only on behalf of a specific logical unit. A LUN is used on a SCSI bus to distinguish between devices that share the same bus. Commands that are sent to the SCSI controller identify devices based on their LUNs.

The SAN can be "carved up" into LUNs, which associate raw physical space with logical space and a unit number. There is not a 1:1 ratio between drives and LUNs. For example, a 500GB drive can be partitioned into one 200GB LUN and one 300GB LUN, which would appear as two unique drives to the host server. Most SANs support zoning, which allows the SAN administrator to dedicate entire disks in the SAN to your LUN in order to isolate the I/O on this drive to your SQL Server 2008 database.

Select the largest appropriate LUN possible because this will distribute the I/O better over the multiple disks that comprise your LUN. Ultimately this will offer better I/O performance. You should also ensure that your log files are on a separate LUN consisting of a dedicated disk. Log files typically are written sequential patterns, unlike data files. Having a LUN with drives shared with another application will not provide optimal IO performance. We have seen this particular problem with a lot of SQL Server database implementations. During peak load, the performance of the SQL Server databases degrades at a rapid rate because of disk contention based on that the physical disks are shared across LUNs. Even though you might think that the database data and log files are on separate disks, the Windows OS volumes (which are based on LUNs) are sharing the physical disks behind the scenes. Be careful of this problem. In a lot of organizations, the team that manages the SAN is completely separate from the database team, so you will have to coordinate with them.

Network Attached Storage

A NAS device is a dedicated server to provide file sharing. In essence, a NAS unit is a dedicated computer connected to an IP network whose sole purpose is supplying file-based storage. The NAS can exist anywhere in the network. The consolidation advantages with NAS are very similar to the ones noted previously for a SAN (for example, centralized management and extensibility).

The major difference between NAS and a SAN is that NAS accesses data by file name and byte offsets, whereas a SAN accesses data by raw data blocks. We do not recommend using NAS with SQL Server because of the overhead that is incurred by file-level data access protocols. Common Internet File System (CIFS), Server Message Block (SMB), and Network File System (NFS) are examples of popular file-based protocols. Depending on the deployment, a NAS may be subject to the performance impact of a network containing a large number of users. See Figure 7-3 for an example of NAS storage architecture.

NAS architecture

Figure 7.3. NAS architecture

Some NAS systems cannot guarantee the write ordering and write-through required by SQL Server. However, systems that are Windows Hardware Quality Lab (WHQL) certified meet the requirement of guaranteeing those two things. For more information, see the Microsoft Knowledge Base article "Description of Support for Network Database Files in SQL Server" at http://support.microsoft.com/kb/304261.

If you decide to use NAS storage, you must enable the use of network files before proceeding because SQL Server 2008 disables this feature by default.

The following example shows how to enable trace flag 1807 globally (so you can place SQL Server files on a NAS):

DBCC TRACEON (1807, −1);
GO

This command tells SQL Server 2008 to bypass the check for data files on a network drive. If you attempt to place SQL Server 2008 data files on a network drive without enabling the 1807 trace flag, you will get one of two errors, as follows:

5105 Device Activation Error
5110 File 'file_name' is on a network device not supported for database files.

The recommendation from Microsoft is that you do not store database files on a network storage device because of the risk of network errors (which can cause integrity issues) and possible performance issues associated with network latencies. Further, Microsoft does recommend that database files are stored on direct attached storage or on a SAN.

In many circumstances, database administrators take SQL Server storage for granted. Although it may seem like a relatively mundane task, the selection and management of the storage system is an area that you must understand very well. By understanding the storage system selected, you can maximize the performance of SQL Server, plan and understand expansion choices, adhere to service-level agreements, and troubleshoot issues related to database I/O. In the next section, we will talk about the concept of high availability and how to configure your storage system to provide the level of availability to meet the needs of your organization.

Disk Configuration: RAID

Data is the most valuable asset of any business today. Loss of data cannot be tolerated. Even if you back up regularly, you need a fail-safe way to ensure that your data is protected and can be accessed without interruption in the event of a disk failure. Adding Redundant Array of Independent Disk (RAID) technology to your storage configurations is one of the most cost-effective ways to maintain both data protection and efficient data access.

RAID is a software or hardware technique that works with a collection of disk drives to optimize data reliability and/or performance. There are various RAID schemes or levels that divide and/or replicate data among multiple, physical hard drives. When a disk collection is set up to use RAID technology, it is known as a RAID array. Although a RAID array distributes data across multiple disks, the array as a whole appears to the host as a single disk.

RAID 0 (Striping)

RAID 0 is the simplest of the RAID configurations. RAID 0 provides improved performance but does not provide any redundancy or fault tolerance. Data striping refers to sequentially writing data in a round-robin style up to a certain stripe size, which is a multiple of a disk sector (usually 512 bytes). Data striping yields improved performance because multiple disks are concurrently servicing the I/O requests. The positive points for RAID 0 are the cost, the performance, and the storage efficiency. Since there is no redundancy, all the disks are used, with the only limits being set by the controller and the smallest disk size. A failure of any of the drives will take the entire RAID array offline. For critical data, the risk of a failure taking out the entire RAID array outweighs the performance benefits.

RAID 1 (Mirroring and Duplexing)

A RAID 1 configuration creates an exact copy or mirror of all the data on two or more disks. This RAID level provides fault tolerance from disk errors and failure from all but one of the drives. The drawback to this approach is poor storage efficiency. For example, if two drives are used, then the storage efficiency is 50 percent because the data is duplicated on each disk. A variant of RAID 1 is duplexing, which specifies a controller per disk. The result of duplexing is that your data is duplicated, and you also have two controllers that are each capable of accessing your data. Thus, you can lose one controller, you can lose one disk, and you will still be running with full access to all of your data because of the second controller and disc. Duplexing, or splitting, removes single-point failures and allows multiple redundant paths. Because of the additional hardware costs, duplexing is not that common.

RAID 10 (Stripe of Mirrors)

RAID 10, also known as RAID 1+0, is in essence a combination of RAID 1 and RAID 0. The order here is significant. So, RAID 10 is RAID 1 (striping) of RAID 0 mirrored disks (see Figure 7-4). RAID 10 requires a minimum of four disks. It further requires an even number of disks. RAID 10 provides fault tolerance and improved performance but increases complexity.

RAID 10 write operation and architecture

Figure 7.4. RAID 10 write operation and architecture

You can also combine your RAID environment in the reverse fashion in RAID 0+1, which is a mirrored (RAID 0) set of striped disks (RAID 1). To summarize, the key difference between RAID 10 and RAID 0+1 is that RAID 10 creates a striped set from a series of mirrored drives, whereas RAID 0+1 creates a mirrored set from striped drives. In a failed disk situation, RAID 10 performs better because all the remaining disks continue to be used. The array can sustain multiple drive losses so long as no mirror loses all its drives.

RAID 10 is essentially many sets of RAID 1 or mirrored drives in a RAID 0 configuration. This configuration combines the best attributes of striping and mirroring: high performance and good fault tolerance. For these reasons, we recommend using this RAID level with SQL Server 2008 production (critical) operational workloads. However, there is a drawback, and that is the loss of storage capacity that you pay for the high performance and reliability inherent in RAID 10.

RAID 5 (Striping with Parity)

RAID 5 uses block-level striping and distributed parity. It is one of the most popular RAID levels. RAID 5 stripes both data and parity information across three or more drives. Fault tolerance is provided by placing the parity information for any given block of data on a different drive than the data itself (see Figure 7-5). This technique requires all drives but one to be present to operate; drive failure requires replacement, but an array is not destroyed by a single drive failure. Upon drive failure, any missing data can be calculated from the distributed parity such that the drive failure is masked from the end user. The array will have data loss in the event of a second drive failure and is vulnerable until the data that was on the failed drive is rebuilt onto a replacement drive. RAID 5 gives fault tolerance and storage efficiency. However, RAID 5 results in a negative impact from write operations because of the parity overhead. That's because every write operation implies a parity calculation, and the result of that calculation must also be written.

RAID 5 write operation and architecture

Figure 7.5. RAID 5 write operation and architecture

Table 7-1 summarizes the various RAID levels on the columns and the ';primary features on the rows. Note that in all levels of RAID configuration the storage efficiency is always limited to a multiple of the smallest drive size.

Table 7.1. I/O Activity of RAID Levels and Their Recommended Use

Feature

RAID 0

RAID 1

RAID 5

RAID 10

Reliability

Low No fault tolerance; results in data loss

Very good Even better with duplexing

Good. Can tolerate single fault

Great

Minimum number of disks required

2

2 (exactly)

3

4

Storage efficiency

100%

50%

>50%, <100% (#drives – 1 / #drives)

50% (from mirroring)

Random read

Great

Fair Worst of the RAID levels but better than a single drive

Great

Great

Random write

Great

Fair Worse than a single drive but better than some RAID levels

Fair Generally better with larger stripe sizes

Very good

Sequential read

Great

Fair Comparable to a single drive

Very good Generally, better with smaller stripe sizes

Great

Sequential write

Great

Good Better than other RAID levels

Fair

Very good

Cost

Low (maximizes the use of disk storage)

Moderate Relatively high cost due to redundant drives

Moderate

High

Recommended use

Good for noncritical data or data requiring fast write performance at very low cost

Good for data that requires high fault tolerance at relatively low hardware cost Best for log files

Very good for read-only data Good for data warehousing databases

Data requiring high performance for both read and write and high reliability Appropriate for production and critical database solutions.

Hardware and Software RAID

RAID can be implemented in a variety of ways: hardware, software, and firmware. From an operational perspective, there is really no difference.

The hardware implementation refers to RAID logic implemented on a special-purpose RAID controller. The RAID controller implements the RAID level. Usually hardware RAID controllers use proprietary disk layouts, so it may not be possible to span controllers from different manufacturers. Most hardware implementations provide a read-write cache, which may improve performance based on your workload. One key advantage to a hardware implementation is guaranteed performance and no impact on the host CPU. Hardware implementations normally support hot swapping of failed drives.

The software implementation refers to RAID logic provided by the operating system. A software layer sits above the disk drivers and provides an abstraction layer between the RAID-level implementation and the physical drives. Software RAID is typically the cheaper of the two solutions. However, there are a few drawbacks to using software RAID. The software must run on a host server attached to storage, and the server's processor must dedicate processing time to run the RAID software. For RAID 0 and RAID 1, this processing time is not significant, but it may become significant when using parity-based RAID implementations. In contrast, the hardware RAID solution avoids this issue because the RAID software executes on a separate dedicated chip.

Selecting a Storage System for SQL Server 2008

Up to this point we have talked about the fundamentals of storage systems. Now you will leverage that information to determine which implementation is best for your SQL Server 2008 storage needs.

I/O Performance

SQL Server performance is dependent on I/O performance, so you want to ensure that you select a storage system that will maximize I/O. The first thing to consider is the transport protocol. As we discussed earlier, a block-level protocol will give you the best I/O performance. Therefore, you should consider DAS or SAN with one of the block-level interface technologies.

There are a number of factors that will help guide you in your decision as to whether DAS or SAN is most appropriate for your environment. You should consider the following:

  • The database(s) size, because the size of your databases will play a key role in determining your storage requirements.

  • The number of users and the activity they will be performing. Think about the following:

  • How many transaction per second

  • What is the read-write ratio

  • The number of drives required for storage including backups to meet your sizing estimates.

  • The speed of the drives and the number of spindles required to give you the best I/O for your system.

  • The growth rate of the system: both users and data.

  • Storage system sharing and the potential impacts, such as how many other applications will be accessing the storage environment and how will that impact your performance.

  • The service-level agreements (SLAs) and high availability (HA) requirements that will this system be required to adhere to. Answers will include:

  • The expected response time for applications using the database environment

  • The RAID configuration

  • The SQL Server HA technique (discussed later in this chapter)

A number of hardware vendors have produced SQL Server sizing calculators that can provide insight into your specific storage needs. For example, the HP Transaction Storage Planning Calculator gives you storage-related information and produces alternative HP storage solutions and pricing for you (see Figure 7-6). You download the calculator at http://h71019.www7.hp.com/ActiveAnswers/us/en/sizers/sizer_trans_proc_s_plan_calculator.html. Currently, the calculator has not been updated for SQL Server 2008; however, since this tool is used for storage calculation, it will still give you valuable information about how to set up your storage system and the options that are available to you.

HP Transaction Storage Planning Calculator

Figure 7.6. HP Transaction Storage Planning Calculator

To illustrate the analysis you would go through to determine the storage selection, we will discuss characteristics of three sample database environments: small, medium and large:

  • Small database server: This would be a database system that has a small number of users and a low number of transactions per second. For instance, the number of users may be less than 50, and the size of the database(s) would be from several hundred megabytes to tens of gigabytes (including backup). In this case, it would likely make sense to use a DAS system with a SCSI interconnect to keep the costs down. You'll select the RAID type based on cost and your HA requirements. With a small data size and a small number of users, a RAID 5 configuration would suffice, again keeping the costs down with respect to the number of disks.

  • Medium database server: This would be a system that has a larger number of users and about 100+ transactions per second. This would equate to around 500 to 1,000 users (assuming no more than 10 percent user concurrency). The size of the database(s) can be in the range of tens of gigabytes to a few terabytes. At this point, DAS is still an option; however, it is likely that you will need to deal with expansion and you want high I/O performance, which may lead you to FC connect. From a RAID configuration perspective, you are moving into the RAID 10 level, which will give you fault tolerance and very good read-write speeds. Also, bear in mind that as you move to SAN storage and RAID 10, your costs increase.

  • The large database server: This system is for thousands of users and many hundreds to many thousands of transactions per second. The database(s) size can be from several hundred gigabytes to tens or hundreds of terabytes When designing systems of this size, not only break apart the database files, log files, and tempdb database files, but also break apart the database into smaller data files on multiple drives. For a large database with a lot of workload, you should consider the RAID configurations carefully. To maximize the I/O performance, you should look into the different options available per database file based on your workload(s). This can include moving the indexes to separate files on RAID 1 or RAID 5 arrays or moving the blobs to a separate RAID 5 array.

Later in this chapter we will talk about I/O best practices including SQL Server file layout (data, logs, and tempdb).

Redundancy Is Critical

Database storage is one of the primary points of failure in a database environment. We have talked about RAID technology, which can give you a degree of redundancy on the disk array. If your storage is local to your server, then a fault-tolerant RAID level and a solid backup strategy are sufficient for redundancy. However, when you move to disk arrays that are separate from your hosts, you have to consider the possibility of losing your connection to the array. Redundant connection hardware is the best way to ensure that a communications failure will not take down your database. In the case of DAS, redundant host HBAs can be used. Midrange and high-end DAS systems have embedded redundant controllers to provide hardware fault tolerance for the DAS (see Figure 7-7).

Redundancy built into a DAS

Figure 7.7. Redundancy built into a DAS

Microsoft Cluster Service (MSCS) can be used to increase the availability of the data environment with DAS and SAN (discussed in a moment). A cluster can be configured in an active/active or active/passive mode. The term active/active refers to a cluster with at least one virtual server running on each node. The term active/passive refers to failover cluster configurations in which one or more cluster nodes are actively processing requests for cluster applications, while at least one cluster node simply waits for an active node to fail. An active/passive configuration is more costly in terms of price/performance because one or more servers remain idle most of the time.

A direct attached clustered server has redundant and active paths to the storage system. Failover software running on each node monitors the paths to the storage system and can reroute traffic in the event of a failure in the HBA, cable system, or storage processor (see Figure 7-8).

Direct attached cluster with redundant active paths to the DAS storage enclosure

Figure 7.8. Direct attached cluster with redundant active paths to the DAS storage enclosure

In the case of a SAN, each server and each storage controller will have at least two connections, one to each fabric. Those two fabrics are completely different switches that do not communicate with each other. This holds true for Fibre Channel SANs and iSCSI SANs. In the case of iSCSI SANs, the architecture prescribes using two physically separate networks with two sets of switches (see Figure 7-9).

SAN clustered server with redundant networks and storage enclosures

Figure 7.9. SAN clustered server with redundant networks and storage enclosures

Like the direct attached cluster discussed earlier, a SAN cluster has redundant and active paths to the storage systems. Failover software running on each node monitors the paths to the storage system and can reroute traffic in the event of a failure in an HBA, cable system, Fibre Channel switch, or storage processor. As you might imagine, you can extend this model on the server-side cluster by adding multiple clusters to the SAN, as shown in Figure 7-10.

Multiple server clusters with redundant connectivity to a single SAN

Figure 7.10. Multiple server clusters with redundant connectivity to a single SAN

You should also consider having hot-spare hard drives within a disk array/enclosure. Although spare drives do reduce the amount of total usable storage within an array, they will allow your array to recover automatically from a disk failure.

Of course, redundancy and availability come at a cost. You need to consider the cost of the additional hardware required to achieve the level of availability in which you are interested. It goes without saying that no matter what redundancy solution you select, the importance of a solid backup strategy is not diminished.

Configuring Your Database

SQL Server uses operating system files to map the database onto a storage system. Filegroups are used to simplify data placement and administration by "grouping" files into a collection. You can improve database performance by placing files and filegroups on separate disks based on your storage system. Be careful, though, to make sure that you understand the storage subsystem and how it is configured. For example, you might have two separate Windows drives that were created from a SAN. If the SAN was "carved up" from the same set of disks, then you are not truly separating your files (or filegroups) onto separate disks. Refer to the beginning of this chapter to understand the various storage types and disk configurations.

In addition, specific tables can be placed on a specific filegroup. Doing that can improve performance because I/O for a specific table can be directed to a specific disk or disk set. The following are suggestions to improve I/O on your database system and environment. Make sure that you do not overoptimize and saturate the I/O bus. You can use load-testing utilities to test for that problem (such as SQLIOSim).

  • Maximize parallel disk I/O:

  • Put tables used in the same join queries in different filegroups.

  • Put heavily accessed tables and the nonclustered indexes that belong to those tables on different filegroups.

  • Do not put the transaction log file or files on the same physical disk that has the other files and filegroups.

  • Place multiple database files on separate drives or drive arrays.

  • Limit the number of database files to the number of CPUs. The maximum number of threads that SQL Server will utilize to access database files is equal to the number of CPUs (CPU cores) included in your system, taking into account any affinity configuration settings.

  • We recommend, for each CPU (CPU core), allocating .25 to 1 data files for allocation-intensive workloads. So if you have a server with two quad-core CPUs (eight CPU cores), you would allocate between two and eight data files for allocation-intensive workloads. Since you can't allocate a .25 data file, you need to round up if you have a partial file.

  • Especially for tempdb, take care to ensure that you do not exceed one data file per CPU.

  • When you create files in filegroups, make sure they are all the same size and do not use the autogrow option. SQL Server uses a proportional fill algorithm that favors allocations in files with more free space.

Note

For a more detailed list of storage best practices, visit http://sqlcat.com/top10lists/archive/2007/11/21/storage-top-10-best-practices.aspx.

Microsoft provides a utility to simulate SQL Server activity on a storage subsystem (see http://support.microsoft.com/kb/231619/en-us for more details), which will allow you to gauge the performance of your storage subsystem under load. SQLIOSim will test I/O patterns and does not require that SQL Server 2008 be installed because it simulates SQL Server activity instead on your storage system. If you have already installed SQL Server 2008, then you don't have to download the tool since it is included with the SQL Server 2008 installation in the mssqlin directory. Figure 7-11 shows an example of the application's interface.

SQLIOSim database I/O testing tool

Figure 7.11. SQLIOSim database I/O testing tool

Data Compression

SQL Server 2008 advances data compression over what was available in SQL Server 2005. Data compression can significantly impact the amount of storage required by your database. The amount of compression will differ based on the type of data you are storing in SQL Server, but later in this section we will show you how to determine how much savings in storage you can get with compression. SQL Server 2005 Service Pack 2 provided a storage format called VARDECIMAL that can be enabled for compression. In addition, SQL Server 2005 supported NTFS compression for secondary nonprimary data files in read-only mode.

SQL Server 2008 provides two types of compression, data and backup (see Chapter 8 for details). SQL 2008 data compression is comprised of row- and page-level compression for tables and indexes. The following objects are supported:

  • Clustered and nonclustered tables

  • Indexed views

  • Partitioned tables and indexes

  • Heap tables

Row-level compression stores all fixed-length data types in a variable-length storage format. This is a superset of the vardecimal data type used for compression in SQL 2005. The data types that can be compressed include SMALLINT, INT, BIGINT, DECIMAL, NUMERIC, BIT, SMALLMONEY, MONEY, FLOAT, REAL, DATETIME, DATETIME2, CHAR, NCHAR, BINARY, and TIMESTAMP/ROWVERSION. BLOB/LOB is not row compressed.

For more details on the specific compression applied, see http://msdn.microsoft.com/en-us/library/cc280576.aspx. No application changes are required to take advantage of compression with the benefit of reducing the amount of storage space for tables and indexes. The following examples show how to enable row compression for a table and index, respectively:

CREATE TABLE T1
(c1 int, c2 nvarchar(50) )
WITH (DATA_COMPRESSION = ROW);

CREATE NONCLUSTERED INDEX IX_INDEX_1
    ON T1 (C2)
WITH (DATA_COMPRESSION = ROW) ;

Page-level compression applies the concept of storing redundant or duplicate values only once on a page and is a superset of row-level compression. In addition to row-level compression, page-level compression adds column prefix and dictionary compression. BLOB/LOB data types can potentially benefit from page compression if they are in-row. As data is added to a page, the compression algorithms are applied. For more details on the specifics of how page-level compression is applied, see http://msdn.microsoft.com/en-us/library/cc280464.aspx.

The following examples show how to enable page compression for a table and index, respectively:

CREATE TABLE T2
(c1 int, c2 nvarchar(50) )
WITH (DATA_COMPRESSION = PAGE);

CREATE NONCLUSTERED INDEX  IX_INDEX_1
    ON T2 (C2)
WITH (DATA_COMPRESSION = PAGE) ;

To determine how changing the compression state will affect a table or an index, use the sp_estimate_data_compression_savings stored procedure. The sp_estimate_data_compression_savings stored procedure is available only in the editions of SQL Server 2008 that support data compression (Enterprise and Developer).

Of course, compression comes at a cost in terms of CPU usage to compress the data on write and uncompress the data on read. In many cases, the increased I/O performance (because of smaller data footprint and less data moving over the data storage interface) far outweighs the additional CPU overhead. Configurations that are bottlenecked on I/O may see a significant increase in performance. In some cases, tests have shown a 60 percent savings in disk space with page compression and 3× performance increase in index rebuild time.

Disk Volume Alignment

Physical disks read and write data in 64 sectors per track. The disks report to Windows Server 2003 that there are 63 hidden sectors in all new partitions. Therefore, there is misalignment from the start. When Windows Server 2003 writes the first chunk of data to the disk, it is placed in the 64th sector of the first track. This means that both reads and writes require two tracks. To remedy this problem, it is highly recommended that all partitions are configured by using the DISKPART (Windows 2003) commands. When using DISKPART, the disk should be aligned to 64. Windows Server 2008 remedies this problem; however, once a partition has been created with the alignment offset, you cannot change it. The only way to fix it is to create a new volume and migrate your SQL Server files to the new partition.

See "Disk Configuration Best Practices & Common Pitfalls" at http://technet.microsoft.com/en-us/library/cc966412.aspx for more information.

Summary

If you have one drive that needs fast storage and a second drive that needs slower storage for archive data, then request storage from two different tiers. This way, you can get the storage you need, and the storage admin can balance storage needs across the entire environment.

An example of leveraging a SAN for a data warehouse database is the SQL Server 2008 Fast Track reference architecture: http://msdn.microsoft.com/en-us/library/dd459178.aspx. This architecture is a collection of best practices from the SQL Server product and field teams related to maximizing the I/O for SQL Server 2008. This includes balancing the CPU, memory, and SAN throughput. You will notice that this solution recommends compression be used to maximize storage capacity and, for sequential data, increases performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset