Virtual Volume Management

With standard disk devices, each disk slice has its own physical and logical device. In addition, with standard Solaris file systems, a file system cannot span more than one disk slice. In other words, the maximum size of a file system is limited to the size of a single disk. On a large server with many disk drives, standard methods of disk slicing are inadequate and inefficient. This has been a limitation in all UNIX systems until the introduction of virtual disks, also called virtual volumes. To eliminate the limitation of one slice per file system, there are virtual volume management packages that are able to create virtual volume structures in which a single file system can consist of nearly an unlimited number of disks or partitions. The key feature of these virtual volume management packages is that they transparently provide a virtual volume that can consist of many physical disk partitions. In other words, disk partitions are grouped across several disks to appear as one single volume to the operating system.

Each flavor of UNIX has its own methods of creating virtual volumes, and Sun has addressed virtual volume management with its Solaris Volume Manager product called SVM, which is included in Solaris 9.

Solaris Volume Manager (SVM)

SVM, called Solstice DiskSuite in prior Solaris releases, comes bundled with the Solaris 9 operating system and uses virtual disks called volumes to manage physical disks and their associated data. A volume is functionally identical to a physical disk in the view of an application. You might also hear volumes referred to as virtual or pseudo devices.

Note

If you are familiar with DiskSuite, you’ll remember that virtual disks were called metadevices. SVM uses a special driver, called the metadisk driver, to coordinate I/O to and from physical devices and volumes, enabling applications to treat a volume like a physical device. This type of driver also is called a logical, or pseudo, driver.


In SVM, volumes are built from standard disk slices that have been created using the format utility. Using the format command, however, limits you to eight partitions per disk. Now, with SVM, you can utilize soft partitions to break this eight-slice-per-disk barrier. You simply use format to create a single slice (usually spanning the entire disk) and use SVM to create soft partitions on that slice.

Using the SVM command-line utilities or the graphical user interface— Solaris Management Console (SMC) described in Chapter 18 “Solaris Management Console,”—the system administrator creates each device by dragging slices onto one of four types of SVM objects: volumes, disk sets, state database replicas, and hot spare pools. These elements are described in Table 11.4.

Table 11.4. SVM Elements
Object Description
Volume A volume is a group of physical slices that appear to the system as a single, logical device. A volume is used to increase storage capacity and increase data availability. The various types of volumes are described in the next section.
State database A state database is a database that stores information about the state of the SVM configuration. Each state database is a collection of multiple, replicated database copies. Each copy is referred to as a state database replica. SVM cannot operate until you have created the state database and its replicas.
Disk sets A set of disk drives containing state database replicas, volumes, and hot spares that can be shared exclusively but not at the same time by multiple hosts. If one host fails, another host can take over the failed host’s disk set. This type of fail-over configuration is referred to as a clustered environment.
Hot spare pool A collection of slices (hot spares) reserved for automatic substitution in case of slice failure in either a submirror or RAID5 metadevice. Hot spares are used to increase data availability.

SVM Volumes

The types of SVM volumes you can create using Solaris Management Console or the SVM command-line utilities are concatenations, stripes, concatenated stripes, mirrors, RAID5 volumes, and transactional volumes. All of these are described here:

  • Concatenation Concatenations work much the way the UNIX cat command is used to concatenate two or more files to create one larger file. If partitions are concatenated, the addressing of the component blocks is done on the components sequentially, which means that data is written to the first available stripe until it is full and then moves to the next available stripe. The file system can use the entire concatenation, even though it spreads across multiple disk drives. This type of volume provides no data redundancy, and the entire volume fails if a single slice fails.

  • Stripe A stripe is similar to concatenation, except that the addressing of the component blocks is interlaced on the slices rather than sequentially. In other words, all disks are accessed at the same time in parallel. Striping is used to gain performance. When data is striped across disks, multiple controllers can access data simultaneously. Interlacing refers to the size of the logical data chunks on a stripe. Different interlace values can increase performance.

  • Concatenated stripe A concatenated stripe is a stripe that has been expanded by concatenating additional striped slices.

  • Mirror A mirror is composed of one or more stripes or concatenations. The volumes that are mirrored are called submirrors. SVM makes duplicate copies of the data located on multiple physical disks and presents one virtual disk to the application. All disk writes are duplicated; disk reads come from one of the underlying submirrors. A mirror replicates all writes to a single logical device (the mirror) and then to multiple devices (the submirrors) while distributing read operations. This provides redundancy of data in the event of a disk or hardware failure.

  • RAID5 This stripes the data across multiple disks to achieve better performance (see striping earlier in this list). In addition to striping, RAID5 replicates data by using parity information. In the case of missing data, the data can be regenerated using available data and the parity information. A RAID5 metadevice is composed of multiple slices. Some space is allocated to parity information and is distributed across all slices in the RAID5 metadevice. The striped metadevice performance is better than the RAID5 metadevice, but it doesn’t provide data protection (redundancy).

  • Transactional Used to log a UFS file system. A transactional volume is composed of a master device and a logging device. Both of these devices can be a slice, simple metadevice, mirror, or RAID5 metadevice. The master device contains the UFS file system.

RAID

When describing SVM volumes, it’s common to describe which level of RAID the volume conforms to. RAID is an acronym for redundant array of inexpensive (or independent) disks. Usually these disks are housed together in a cabinet and referred to as an array. There are several RAID levels, each referring to a method of distributing data while ensuring data redundancy. These levels are not ratings but rather classifications of functionality. Different RAID levels offer dramatic differences in performance, data availability, and data integrity, depending on the specific I/O environment. Table 11.5 describes the various levels of RAID.

Table 11.5. The Various Levels of Raid
Raid Level Description
0 Striped disk array without fault tolerance.
1 Maintains duplicate sets of all data on separate disk drives.
2 Data striping and bit interleave. Data is written across each drive in succession one bit at a time. Checksum data is recorded in a separate drive.This method is very slow for disk writes and is seldom used today because ECC is embedded in almost all modern disk drives.
3 Data striping with bit interleave and parity checking. Data is striped across a set of disks one byte at a time, and parity is generated and stored on a dedicated disk. The parity information is used to recreate data in the event of a disk failure.
4 Same as level 3 except that data is striped across a set of disks at a block level. Parity is generated and stored on a dedicated disk.
5 Unlike RAID 3 and 4, where parity is stored on one disk, both parity and data are striped across a set of disks.
6 Similar to RAID-5 but with additional parity information written to recover data if two drives fail.
7 Optimized asynchrony for high I/O rates as well as high datatransfer rates
10 Combination of RAID 0 for performance and RAID 1 for fault tolerance.
53 Combines RAID 0 for performance and RAID 3 for fault tolerance.

SVM supports RAID levels 0, 1, and 5 RAID level 0 does not provide data redundancy but is usually included as a RAID classification because it is the basis for the majority of RAID configurations in use. Table 11.5 describes all of the available RAID levels, but many are not provided in SVM. The following is a more in-depth description of the RAID levels provided in SVM.

RAID 0

Although they do not provide redundancy, stripes and concatenations are often referred to as RAID 0. With striping, data is spread across relatively small, equally sized fragments that are allocated alternately and evenly across multiple physical disks. Any single drive failure can cause the volume to fail and could result in data loss. RAID 0, especially true with stripes, offers a high data-transfer rate and high I/O throughput, but it suffers lower reliability and availability than a single disk. RAID 0 is used on file servers where you want to get the lowest cost per megabyte of storage, and high availability and performance are not required.

RAID 1

RAID 1 employs data mirroring to achieve redundancy. Two copies of the data are created and maintained on separate disks, each containing a mirror image of the other. RAID 1 provides an opportunity to improve performance for reads because read requests will be directed to the mirrored copy if the primary copy is busy. RAID 1 is the most expensive of the array implementations because the data is duplicated. In the event of a disk failure, RAID 1 provides the highest performance because the system can switch automatically to the mirrored disk with no impact on performance and no need to rebuild lost data. RAID 1 is commonly used in financial and accounting applications where high availability is required. Because these applications are read intensive, they lend themselves well to a RAID1 environment.

RAID 5

RAID 5 provides data striping with distributed parity. RAID 5 does not have a dedicated parity disk; instead, it interleaves both data and parity on all disks. In RAID 5, the disk access arms can move independently of one another. This enables multiple concurrent accesses to the multiple physical disks, thereby satisfying multiple concurrent I/O requests and providing higher transaction throughput. RAID 5 is best suited for random access data in small blocks. There is a “write penalty” associated with RAID 5. Every write I/O will result in four actual I/O operations: two to read the old data and parity and two to write the new data and parity.

Raid 5 provides more storage space than RAID 1 while providing some level of redundancy over RAID 0. Some sites might choose this option when data redundancy is important and the RAID 1 solution is too expensive.

RAID 1+0 and RAID 0+1

SVM supports both RAID 1+0 (mirrors that are then striped) and RAID 0+1 (stripes that are then mirrored) redundancy, depending on the context. This combines the benefits of RAID 0 for performance and RAID 1 for redundancy.

Planning Your SVM Configuration

When designing your storage configuration, keep the following guidelines in mind:

  • Striping generally has the best performance, but it offers no data protection. For write-intensive applications, RAID 1 generally has better performance than RAID 5.

  • RAID 1 and RAID 5 volumes both increase data availability, but they both generally have lower performance, especially for write operations. Mirroring does improve random read performance.

  • RAID 5 requires less disk space; therefore, RAID 5 volumes have a lower hardware cost than RAID 1 volumes. RAID 0 volumes have the lowest hardware cost.

  • Identify the most frequently accessed data and increase access bandwidth to that data with mirroring or striping.

  • Both stripes and RAID 5 volumes distribute data across multiple disk drives and help balance the I/O load.

  • Use available performance-monitoring capabilities and generic tools such as the iostat command to identify the most frequently accessed data. After identified, the “access bandwidth” to this data can be increased using striping.

  • The RAID 0 stripe’s performance is better than that of the RAID 5 volume, but RAID 0 stripes do not provide data protection (redundancy).

  • RAID 5 volume performance is lower than stripe performance for write operations because the RAID 5 volume requires multiple I/O operations to calculate and store the parity.

  • For raw random I/O reads, the RAID 0 stripe and the RAID 5 volume are comparable. Both the stripe and RAID 5 volume split the data across multiple disks, and the RAID 5 volume parity calculations aren’t a factor in reads except after a slice failure.

  • For raw random I/O writes, the stripe is superior to RAID 5 volumes.

Using SVM, you can utilize volumes to provide increased capacity, higher availability, and better performance. In addition, the hot spare capability provided by SVM can provide another level of data availability for mirrors and RAID 5 volumes. A hot spare pool is a collection of slices (hot spares) reserved by SVM to be automatically substituted in case of a slice failure in either a submirror or RAID5 volume.

After you have set up your configuration, you can use Solaris utilities such as iostat, metastat, and the mdmonitord daemon to report on its operation. You can also use SVM’s SNMP trap-generating daemon to work with a network monitoring console to automatically receive SVM error messages. Configure SVM’s SNMP trap to trap the following instances:

  • A RAID 1 or RAID 5 subcomponent goes into “needs maintenance” state.

  • A hot spare volume is swapped into service.

  • A hot spare volume starts to resynchronize.

  • A hot spare volume completes resynchronization.

  • A mirror is taken offline.

  • A disk set is taken by another host and the current host panics.

Metadisk Driver

The metadisk driver, the driver used to manage SVM volumes, is implemented as a set of loadable pseudo device drivers. It uses other physical device drivers to pass I/O requests to and from the underlying devices. The metadisk driver operates between the file system and application interfaces and the device driver interface. It interprets information from both the UFS or applications and the physical device drivers. After passing through the metadevice driver, information is received in the expected form by both the file system and the device drivers. The metadevice is a loadable device driver, and it has all the same characteristics as any other disk device driver.

The standard volume name begins with “d” and is followed by a number. By default, there are 128 unique metadisk devices in the range 0 to 127. Additional volumes, up to 8192, can be added to the kernel by editing the /kernel/drv/md.conf file. The metablock device accesses the disk using the system’s normal buffering mechanism. There also is a character (or raw) device that provides for direct transmission between the disk and the user’s read or write buffer. The names of the block devices are found in the /dev/ md/dsk directory, and the names of the raw devices are found in the /dev/md/ rdsk directory. The following is an example of a block and raw logical device name for metadevice d0:

/dev/md/dsk/d0   - block metadevice d0 
/dev/md/rdsk/d0  - raw metadevice d0 

You must have root access to administer SVM or have equivalent privileges granted through RBAC (described in Chapter 17, “Role-Based Access Control.”

Configuring Solaris Volume Manager is a complex topic and would require an entire book to describe. For more information on SVM, refer to the Solaris Volume Manager Administration Guide by Sun Microsystems at http://docs.sun.com.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset