SCSI reservations
This appendix describes SCSI reservation, and how it can be used to provide faster disk fallover times when the underlaying storage supports this feature. For example, SCSI 3 Persistent Reservation allows the stripe group manager, also known as file system manager, to “fence” disks during node fallover by removing the reservation keys for that node. In contrast, non-PR disk fallover causes the system to wait until the disk lease expires.
 
Attention: You do not run these commands in your systems. By running these commands, this section shows you how disk reservations work, especially in a clustered environment, which demands more care while managing disk reservations.
This appendix describes SCSI reservations. This appendix covers the following topics:
SCSI reservations
SCSI 2 reservations provide a mechanism to reserve and control access to a SCSI device from a node. An initiator obtains ownership of the device by using the reserve system call and works as a lock against any I/O attempt from other imitators. Another initiator trying to access this reserved disk gets a reservation conflict error code. Only the original initiator can release this reservation by issuing a release or reset system call.
SCSI 3 Persistent Reservations provide the mechanism to control access to a shared device from multiple nodes. The reservation persists even if the bus is reset for error recovery, which is not the case with the SCSI 2 command, where device reservations do not survive after a node restarts. Also, SCSI 3 PR supports multiple paths to a host, where SCSI 2 works only with one path from host to a disk. The scope of a persistent reservation is the entire logical unit.
SCSI 3 Persistent Reservations use the concept of register and reserve. Multiple nodes can register their reservation keys (also known as PR_Key) with the shared device and establish a reservation in any of the following modes, as shown in Table A-1.
Table A-1 Types of SCSI reservations
Types
Code
Write exclusive
1h
Exclusive access
3h
Write exclusive - Registrants only
5h
Exclusive Access - Registrants only
6h
Write Exclusive - All registrants
7h
Exclusive Access - All registrants
8h
In All Registrants type of reservations (WEAR and EAAR), each registered node is a Persistent Reservation (PR) Holder. The PR Holder value is set to zero. The All registrants type is an optimization that makes all cluster members equal, so if any member fails, the others continue.
In all other types of reservation, there is a single reservation holder, which is one of the following I_T nexus examples:
The nexus for which the reservation was established with a PERSISTENT RESERVE OUT command with the RESERVE service action, the PREEMPT service action, the PREEMPT AND ABORT service action, or the REPLACE LOST RESERVATION service action.
The nexus to which the reservation was moved by a PERSISTENT RESERVE OUT command with the REGISTER AND MOVE service action.
An I_T nexus refers to the combination of the initiator port on the host with the target port on the server:
1h Write Exclusive (WE)
Only the Persistent reservation holder shall be permitted to perform write operations to the device. Only one persistent reservation holder at a time.
3h Exclusive Access (EA)
Only the Persistent reservation holder shall be permitted to access (includes read/write operations) the device. Only one persistent reservation holder at a time.
5h Write Exclusive Registrants only (WERO)
Write access commands are permitted only to registered nodes. A cluster that is designed around this type must declare one cluster owner (the persistent reservation holder) at a time. If the owner fails, another must be elected. The PR_key_Holder value is pointing to the PR_Key of the I_T nexus that holds the reservation of the disk. Only one persistent reservation holder at a time, but all registered I_T nexuses are allowed to do write operations on the disk.
6h Exclusive Access Registrants only (EARO)
Access to the device is limited only to the registered nodes, and as in WERO, if the current owner fails, the reservation must be established again to gain access to the device. Only one persistent reservation holder at a time is permitted, but all registered I_T nexuses may do read/write operations on the disk.
7h Write exclusive All Registrants (WEAR)
While this reservation is active, only the registered initiators may perform write operations to the indicated extent. This reservation shall not inhibit read operations from any initiator or conflict with a read exclusive reservation from any initiator. Each registered I_T nexus is a reservation holder, and may write to the disk.
8h Exclusive access All Registrant (EAAR)
While this reservation is active, no other initiator shall be permitted any access to the indicated extent apart from registered nodes. Each registered I_T nexus is a reservation holder, and is allowed to read/write to the disk.
Table A-2 shows the read/write operations with the type of All Registrants.
Table A-2 Read and write operations with the All Registrants type
Type
WEAR (7h)
WERO (5h)
EAAR (8h)
EARO (6h)
Registered?
Not registered
Registered
Not registered
Registered
WRITE
Not allowed
Allowed
Not allowed
Allowed
READ
Allowed
Allowed
Not allowed
Allowed
In the Registrants Only (RO) type, a reservation is exclusive to one of the registrants. The reservation of the device is lost if the current PR holder removes this PR Key from the device. To avoid losing the reservation, any other registrant can replace themselves (known as a preempt) as the Persistent Reservation Holder. Alternatively, in the All Registrants (AR) type, the reservation is shared among all registrants.
ODM reserve policy
The AIX ODM device reserve_policy attribute must be set to open the device in any of the previous reservation types. The values are the current valid values of the reserve_policy attribute, which can be seen by using lsattr with the -R option, as shown in Example A-1.
Example A-1 Current valid values of the reserve_policy attribute
#lsattr -Rl <hdisk#> -a reserve_policy
no_reserve
single_path
PR_exclusive
PR_shared
 
Note: The values that are shown in Example A-1 on page 411 can change according to the ODM definitions or host attachment scripts that are provided by the disk or storage vendors.
The following attribute values are valid:
The no_reserve value does not apply a reservation methodology for the device. The device can be accessed by any initiators.
The single_path value applies a SCSI 2 reserve methodology.
The PR_exclusive value applies SCSI 3 persistent reserve, which is an exclusive host methodology. Write Exclusive Registrants Only type of reservations require the reserve_policy attribute to be set to PR_exclusive.
The PR_shared value applies a SCSI 3 persistent reserve shared host methodology. Write Exclusive All Registrants type of reservations require the reserve_policy attribute to be set to PR_shared.
This attribute can be set and read as shown in Example A-2.
Example A-2 Setting the disk attribute to PR_shared
# chdev -l hdisk1 -a reserve_policy=PR_shared
hdisk1 changed
 
# lsattr -El hdisk1 -a reserve_policy
reserve_policy PR_shared Reserve Policy True+
The lsattr command with the -E option displays the effective policy for the disk in the AIX ODM. The -P option displays the policy when the device was last configured. This is the reservation information about the AIX kernel that is used to enforce the reservation during disk opens.
Setting these attributes by using the chdev command can fail if the resource is busy, as shown in Example A-3.
Example A-3 Setting the disk attribute with the chdev command
# chdev -l hdisk1 -a reserve_policy=PR_shared
Method error (/usr/lib/methods/chgdisk):
0514-062 Cannot perform the requested function because the specified device is busy.
When the device is in use, you can use the -P flag to chdev to change the effective policy only. The change is made to the database and the changes are applied to the device when the system is restarted. Another method is to use the -U flag where the reservation information is updated with the AIX ODM and the AIX kernel. However, not all devices support the -U flag. One of the ways to determine this support is to look for the True+ value in the lsattr output, as shown in Example A-4.
Example A-4 Checking whether the device supports the U flag by using the lsattr command output
# lsattr -Pl hdisk1 -a reserve_policy
reserve_policy PR_shared Reserve Policy True+
Persistent Reserve IN
 
Attention: You do not run these commands in your systems. By running these commands, this section shows you how disk reservations work, especially in a clustered environment, which demands more care while managing disk reservations.
Persistent Reserve IN (PRIN) commands are used to obtain information about active reservations and registrations on a device. The following PRIN service actions are commonly used:
Read keys To read PR Keys of all registrants of the device.
Read reservation To obtain information about the Persistent Reservation Holder. The PR Holder value is zero if the All Registrants type of reservation exists on the device; otherwise, it is the PR Key of the node holding the reservation of the device exclusively.
Report capabilities To read the capability information of the device. The capability bits indicate whether the device supports persistent reservations and the types of reservation that are supported by the device. A devrsrv implementation of this service action is shown in Example A-5.
Example A-5 Output of the devrsrv implementation
# devrsrv -c prin -s 2 -l hdisk1
PR Capabilities Byte[2] : 0x1 PTPL_C
PR Capabilities Byte[3] : 0x81 PTPL_A
PR Types Supported : PR_WE_AR PR_EA_RO PR_WE_RO PR_EA PR_WE PR_EA_AR
Persistent Preserve OUT
 
Attention: You do not run these commands in your systems. By running these commands, this section shows you how disk reservations work, especially in a clustered environment, which demands more care while managing disk reservations.
Persistent Preserve OUT (PROUT) commands are used to reserve, register, and remove the reservations and reservation keys. The following PROUT service actions are commonly used:
Register To register and unregister a PR key with a device.
Reserve To create a persistent reservation for the device.
Release To release the selected persistent reservation and not remove any registrations.
Clear To release any persistent reservations and remove all registrations on the device.
Preempt To replace the persistent reservation or remove registrations.
Preempt and abort Along with preempting, to abort all tasks for one or more preempted nodes.
The value of the service action key and the reservation type matters when Preempt or Preempt and Abort actions are performed. Therefore, a little more information about these service actions is necessary.
A PROUT command with PREEMPT or PREEMPT AND ABORT is used to perform one of the following actions:
Preempt (for example, replace) the persistent reservation and remove registrations.
Remove registrations.
The PREEMPT AND ABORT service action is identical to the responses to a PREEMPT service action except that all tasks from the device that is associated with the persistent reservations or registrations being preempted (but not the task containing the PROUT command itself) shall be aborted. See Table A-3.
Table A-3 Effects of preempt and abort under different reservation types
Reservation type
Service action reservation key
Action
All registrants
Zero
Preempt the persistent reservation and remove registrations.
Not zero
Remove registrations.
All other types
Zero
Invalid request.
Reservation holder’s reservation key
Preempt the persistent reservation and remove registrations.
Any other, nonzero reservation key
Remove registrations.
Understanding register, reserve, and preempt
As an example, we have a cluster of four systems with shared access to disk, as shown in Figure A-1. Assign PR_key_value from each node, and also set the reserve_policy of the target disk to PR_shared or PR_exclusive. The unique PR_key of each device is registered with the disk and the reserved disk with SCSIPR reservation, which gives access to registered devices only.
Figure A-1 Four-node cluster setup with shared disk
Perform the register action from each system (1 - 4) to register its reservation key with the disk and reserve action to establish the reservation. The PR_Holder_key value represents the current reservation holder of the disk. As shown in Table A-4 on page 415, in the RO type only one system can hold the reservation of the disk at a time (key 0x1 in our example). However, all of the four registrant systems hold the reservation of the disk under the AR type, so you see that the PR_Holder_key value is zero.
Table A-4 Differences with RO and AR
Type
All registrant (Types 7h/8h)
Registrant only (Types 5h/6h)
Registrants
0x1 0x2 0x3 0x4
0x1 0x2 0x3 0x4
PR_Holder_Key
0
0x1
A read key command displays all of the reservation keys that are registered with the disk (0x1, 0x2, 0x3, and 0x4). The read reservation command gives the value of PR_Holder_Key, which varies per reservation type. If there is a network or any other failure such that system 1 and the rest of the systems cannot communicate with each other for a certain period, a split-brain or split-cluster situation results, as shown in Figure A-2.
Figure A-2 Split-cluster situation
Suppose that your cluster manager decides on system 2 to take ownership (or the subcluster with system 2); then, the system can issue a PROUT command with a preempt or preempt and abort option and remove the PR_Key 0x1 registration from the disk. The result is that the reservation is moved away from system 1, as shown in Table A-5, and is denied access to the shared disk.
Table A-5 Differences with RO and AR
Type
All registrant (Types 7h/8h)
Registrants only (Types 5h/6h)
PR_Holder_Key
0
0x2
Preempt or preempt and abort functions can take the following arguments:
Current_key PR_key of nodes issuing the command, for example, 0x2.
Disk The shared disk in discussion.
Action_key PR_key on which the action must be taken.
The action_key is 0x1 with the RO type of reservation. The action_key can be either 0 or 0x1 with the AR type of reservation. The two methods of preempting in case of an AR type are explained as follows:
Method 1: Zero action key
If the action key is zero, the following actions take place:
 – Registration of systems 1, 3, and 4 are removed.
 – The persistent reservation is removed.
 – A reservation from system 2 is created.
These actions result in access only to system 2, as shown in Figure A-3.
Figure A-3 Result of the preempt with action key zero
If the access to the rest of the system in active subclusters must be regained, perform an event to reregister the keys of the systems of the active cluster (systems 3 and 4).
Method 2: Nonzero action key
If the action key is nonzero, in our case, the key of the system, there is no release of the persistent reservation, but the registration of the PR_Key 0x1 is removed, which achieves fencing, as shown in Figure A-4.
Figure A-4 Disk fencing
Table A-6 shows the result of prin commands after preempting system 1.
Table A-6 Difference with RO and AR
The scsipr command
All registrants (Types 7h/8h)
Registrants only (Types 5h/6h)
 
Method 1
Method 2
 
Read key
0x2
0x2 0x3 0x4
0x2 0x3 0x4
Read reservation
0
0
0x2
Unregister request
A registered PR_key can be removed by issuing a register or register and ignore command through that node. The service action key must be set to zero to unregister a reservation key. The list of registrants and PR_key_holder are shown in Table A-7.
Table A-7 Differences with RO and AR
Type
All registrants (Types 7h/8h)
Registrants only (Types 5h/6h)
Registrants
0x2 0x3 0x4
0x2 0x3 0x4
PR_Holder_Key
0
0x2
If the unregistered key is the PR_Holder_key (0x2) in the RO type of reservation, along with the PR_key, the reservation to the disk is also lost. Removing Key 0x2 has no impact on the reservation in the case of the AR reservation type. The same is true when other keys are removed.
Any preempt attempt by system 1 fails with a conflict because its key is not registered with the disk.
Release request
A release request from persistent reservation holder node releases the reservation of the disk only, and the pr_keys remains registered. Referring to Table A-7, with AR type of reservation, a release command from any of the registrants (0x2 0x3 0x4) results in the reservation being removed. In the case of RO type, a release command from non pr_holders (0x3 0x4) returns successfully, but with no impact on the reservation or registration. Release requests come from PR_holder (0x2) in this case.
Clear request
Referring again to Table A-7, if a clear request is made to the target device from any of the nodes, the persistent reservation of the disk is lost, and all of the pr_keys that are registered with the disk (0x2 0x3 0x4) are removed. As the T10 document suggests, the clear action must be restricted to recovery operations because it defeats the persistent reservation feature that protects data integrity.
 
Note: When a node opens the disk or a register action is performed, it registers with the PR_key value through each path to the disk. Therefore, you can see multiple registrations (I_T nexuses) with the same key. The number of registrations is equal to the number of active paths from the host to the target because each path represents an I_T nexus.
Storage
Contact your storage vendor to discover whether your device or multipathing driver is capable of SCSI Persistent Reservation, and the types of reservations it supports. Your storage vendor can also provide you the minimum firmware level, driver version that is needed, and the flags that are required to enable support for persistent reservations.
The following configurations provide examples of support for persistent reservations:
IBM XIV®, IBM DS8000, and IBM SAN Volume Controller storages with native AIX MPIO supports1 SCSI PR Exclusive and Shared reservations by default, as shown in Example A-6.
Example A-6 IBM storage support with native AIX MPIO of the SCSI PR Exclusive
# lsattr -Rl hdiskx -a reserve_policy | grep PR
PR_exclusive
PR_shared
The devrsrv utility verifies the capability of your disks.
Hitachi disks with native AIX MPIO support2 all SCSI PR reservation types if the Host Mode Options (HMOs) 2 and 72 are set. The minimum code to support HMO72 is 70-04-31-00/00.
EMC disks support3 PR Shared reservations and not Exclusive reservation with powerpath V6.0, as shown in Example A-7.
Example A-7 EMC disk reservation support with powerpath V6.0
# lsattr -Rl hdiskpowerX -a reserve_policy | grep PR
PR_shared
Director bits SCSI3 Interface (SC3) and SCSI Primary Commands (SC2) must be enabled. Flag SCSI3_persist_reserv must also be enabled to use persistent reservation on powerpath devices.
More about PR reservations
During the reset sequence of the disk through a path, send a PR IN command with service action READ RESERVATION(01h), which returns the current reserved key on the disk if any persistent reservation exists. If an All Registrant type reservation is on the disk, the reserved key is zero.
In the case of a PR_exclusive type of reservation, the following actions occur:
If the current reservation key is same as the node’s key as in ODM, register the key by using the PR OUT command with the Register and Ignore Existing Key service action.
If the current reservation key is zero and the TYPE field (persistent reservation type, as shown in Table A-1 on page 410) is also 0, which means no persistent reservation is on the disk, complete the following steps:
a. Register the key on the disk by using a PR OUT command with the Register and Ignore Existing Key service action.
b. If not reserved already by this host, reserve it by using a PR OUT command with the Reserve service action and a type of Write Exclusive Registrants Only (5h).
If the current reservation key is different from the current host’s key, then it means that some other host holds the reservation. If you are not trying to open the disk with the -force flag, the open call fails. If you are trying to open the disk with the -force flag, complete the following steps:
a. Register the disk with your key running a PR OUT command with the Register and Ignore Existing Key service action.
b. Preempt the current reservation by running a PR OUT command with the Preempt and Abort service action to remove the registration and reservation of the current reservation holder. The key of the current reservation holder is given in the Service Action Reservation Key field.
In the case of a PR_shared reservation, the following actions occur:
If the current reservation key is zero and the TYPE field (persistent reservation type as shown in Table A-1 on page 410) is also 0, this means that there is no persistent reservation on the disk. If the TYPE field is Write Exclusive All Registrants(7h), then some other host is already registered for shared access. In either case, complete the following actions:
1. Register our key on to the disk by using a PR OUT command with the Register and Ignore Existing Key service action.
2. Reserve the disk by using a PR OUT command with the RESERVE service action and the type of Write Exclusive All Registrants (7h).
While closing the disk, for PR_exclusive reservations alone, send a PR OUT command with the Clear service action to the disk to clear all of the existing reservations and registration. This command is sent through any one of the good paths of the disk (the I_T nexus where registration was done successfully).
While changing the reserve_policy by using chdev from PR_shared to PR_exclusive, from PR_shared or PR_exclusive to single_path (or no_reserve if the key in ODM is one of the registered keys on the disk), send a PR OUT command with the Clear service action to the disk to clear all of the existing reservations and registration.
Persistent reservation commands
The devrsrv command of AIX queries, and can even break, persistent reservations on the device. For more information about the usage of the devrsrv command, see the IBM Knowledge Center.
Use the following syntax for the devrsrv command:
devrsrv -c query | release | prin -s sa | (prout -s sa -r rkey -k sa_key -t prtype) -l devicename
The clrsrvmgr command of PowerHA V7.2 lists and clears the reservation of a disk or a group of disks in a volume group (VG).
Use the following syntax for the clrsrvmgr command:
clrsrvmgr -r {[-l DiskName]|[-g VGname]} [-v]
clrsrvmgr -c {[-l DiskName]|[-g VGname]} [-v]
clrsrvmgr -h
This command lists or clears the reservation status of a disk or a VG. The command displays the following key attributes that are related to disk reservations:
Configured Reserve Policy The reservation information in the AIX kernel that is used to enforce the reservation during disk opens.
Effective Reserve Policy Reservation policy for the disk in the AIX ODM.
Reservation Status The status of the actual reservation on the storage disk itself.
The options are mostly self explanatory:
-r Read
-c Clear
-h Help
-v Verbose
-l Expects a disk name
-g Expects a volume group name
The manager does not guarantee the operation because disk operations depend on the accessibility of the device. However, it tries to show the reason for failure when used with the -v option. The utility does not support operations at both the disk and VG levels together. Therefore, the -l and -g options cannot coexist. At the VG level, the number of disks in the VG, and each target disk name, are displayed as shown in the following code:
# clrsrvmgr -rg PRABVG
Number of disks in PRABVG: 2
hdisk1011
Configured Reserve Policy : no_reserve
Effective Reserve Policy : no_reserve
Reservation Status : No reservation
hdisk1012
Configured Reserve Policy : no_reserve
Effective Reserve Policy : no_reserve
Reservation Status : No reservation
At disk level, the disk name is not mentioned because the target device is known:
# clrsrvmgr -rl hdisk1015 -v
Configured Reserve Policy : PR_shared
Effective Reserve Policy : PR_shared
Reservation Status : No reservation
 

1 Confirm with the storage and driver vendors.
2 Confirm with the storage and driver vendors.
3 Confirm with the storage and driver vendors.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset