Clusterware Stack Management and Troubleshooting
by Syed Jaffar Hussain, Kai Yu
In Chapter 1, we mentioned that the Oracle RAC cluster database environment requires cluster manager software (“Clusterware”) that is tightly integrated with the operating system (OS) to provide the cluster management functions that enable the Oracle database in the cluster environment.
Oracle Clusterware was originally introduced in Oracle 9i on Linux with the original name Oracle Clusterware Management Service. Cluster Ready Service (CRS) as a generic cluster manager was introduced in Oracle 10.1 for all platforms and was renamed to today’s name, Oracle Clusterware, in Oracle 10.2. Since Oracle 10g, Oracle Clusterware has been the required component for Oracle RAC. On Linux and Windows systems, Oracle Clusterware is the only clusterware we need to run Oracle RAC, while on Unix, Oracle Clusterware can be combined with third-party clusterware such as Sun Cluster and Veritas Cluster Manager.
Oracle Clusterware combines a group of servers into a cluster environment by enabling communication between the servers so that they work together as a single logical server. Oracle Clusterware serves as the foundation of the Oracle RAC database by managing its resources. These resources include Oracle ASM instances, database instances, Oracle databases, virtual IPs (VIPs), the Single Client Access Name (SCAN), SCAN listeners, Oracle Notification Service (ONS), and the Oracle Net listener. Oracle Clusterware is responsible for startup and failover for the resources. Because Oracle Clusterware plays such a key role in the high availability and scalability of the RAC database, the system administrator and the database administrator should pay careful attention to its configuration and management.
This chapter describes the architecture and complex technical stack of Oracle Clusterware and explains how those components work. The chapter also describes configuration best practices and explains how to manage and troubleshoot the clusterware stack. The chapter assumes the latest version of Oracle Clusterware 12cR1.
The following topics will be covered in this chapter:
Clusterware 12cR1 and Its Components
Before Oracle 11gR2, Oracle Clusterware was a distinct product installed in a home directory separate from Oracle ASM and Oracle RAC database. Like Oracle 11gR2, in a standard 12cR1 cluster, Oracle Clusterware and Oracle ASM are combined into a product called Grid Infrastructure and installed together as parts of the Grid Infrastructure to a single home directory. In Unix or Linux environments, some part of the Grid Infrastructure installation is owned by the root user and the rest is owned by special user grid other than the owner of the Oracle database software oracle. The grid user also owns the Oracle ASM instance.
Only one version of Oracle Clusterware can be active at a time in the cluster, no matter how many different versions of Oracle Clusterware are installed on the cluster. The clusterware version has to be the same as the Oracle Database version or higher. Oracle 12cR1 Clusterware supports all the RAC Database versions ranging from 10gR1 to 12cR1. ASM is always the same version as Oracle Clusterware and can support Oracle Database versions ranging from 10gR1 to 12cR1.
Oracle 12cR1 introduced Oracle Flex Cluster and Flex ASM. The architecture of Oracle Clusterware and Oracle ASM is different from the standard 12cR1 cluster. We will discuss Oracle Flex Cluster and Flex ASM in Chapter 5. This chapter will focus on the standard 12cR1 cluster.
Storage Components of Oracle Clusterware
Oracle Clusterware consists of a storage structure and a set of processes running on each cluster node. The storage structure consists of two pieces of shared storage: the Oracle Cluster Registry (OCR) and voting disk (VD) plus two local files, the Oracle Local Registry (OLR) and the Grid Plug and Play (GPnP) profile.
OCR is used to store the cluster configuration details. It stores the information about the resources that Oracle Clusterware controls. The resources include the Oracle RAC database and instances, listeners, and virtual IPs (VIPs) such as SCAN VIPs and local VIPs.
The voting disk (VD) stores the cluster membership information. Oracle Clusterware uses the VD to determine which nodes are members of a cluster. Oracle Cluster Synchronization Service daemon (OCSSD) on each cluster node updates the VD with the current status of the node every second. The VD is used to determine which RAC nodes are still in the cluster should the interconnect heartbeat between the RAC nodes fail.
Both OCR and VD have to be stored in a shared storage that is accessible to all the servers in the cluster. They can be stored in raw devices for 10g Clusterware or in block devices in 11gR1 Clusterware. With 11g R2 and 12cR1 they should be stored in an ASM disk group or a cluster file system for a freshly installed configuration. They are allowed to be kept in raw devices and block devices if the Clusterware was just being upgraded from 10g or 11gR1 to 11gR2; however, it is recommended that they should be migrated to an ASM disk group or a cluster file system soon after the upgrade. If you want to upgrade your Clusterware and Database stored in raw devices or block devices to Oracle Clusterware 12c and Oracle Database 12c, you must move the database and OCR/VDs to ASM first before you do the upgrade, as Oracle 12c no longer supports the use of raw device or block storage. To avoid single-point-of failure, Oracle recommends that you should have multiple OCRs, and you can have up to five OCRs. Also, you should have at least three VDs, always keeping an odd number of the VDs. On Linux, the /etc/oracle/ocr.loc file records the OCR location:
$ cat /etc/oracle/ocr.loc
ocrconfig_loc=+VOCR
local_only=FALSE
In addition, you can use the following command to find the VD location:
$ ./crsctl query css votedisk
The Oracle ASM disk group is the recommended primary storage option for OCR and VD. Chapter 5 includes a detailed discussion of storing OCR and VDs in an ASM disk group.
Two files of Oracle Clusterware (OLR) and GPnP profile are stored in the grid home of the local file system of each RAC node. OLR is the OCR’s local version, and it stores the metadata for the local node and is managed by the Oracle High Availability Services daemon (OHASD). OLR stores less information than OCR, but OLR can provide this metadata directly from the local storage without the need to access the OCR stored in an ASM disk group. One OLR is configured for each node, and the default location is in $GIHOME/cdata/<hostname>.olr. The location is also recorded in /etc/oracle/olr.loc, or you can check it through the ocrcheck command:
$ cat /etc/oracle/olr.loc
olrconfig_loc=/u01/app/12.1.0/grid/cdata/knewracn1.olr
crs_home=/u01/app/12.1.0/grid
$ ocrcheck -local -config
Oracle Local Registry configuration is :
Device/File Name : /u01/app/12.1.0/grid/cdata/knewracn1.olr
The GPnP profile records a lot of important information about the cluster, such as the network profile and the VD. The information stored in the GPnP profile is used when adding a node to a cluster. Figure 2-1 shows an example of the GPnP profile. This file default is stored in $GRID_HOME/gpnp/<hostname>/profiles/peer/profile.xml.
Figure 2-1. GPnP profile
Beginning with Oracle 11gR2, Oracle redesigned Oracle Clusterware into two software stacks: the High Availability Service stack and CRS stack. Each of these stacks consists of several background processes. The processes of these two stacks facilitate the Clusterware. Figure 2-2 shows the processes of the two stacks of Oracle 12cR1 Clusterware.
Figure 2-2. Oracle Clusterware 12cR1 stack
High Availability Cluster Service Stack
The High Availability Cluster Service stack is the lower stack of the Oracle Clusterware. It is based on the Oracle High Availability Service (OHAS) daemon. The OAHS is responsible for starting all other clusterware processes. In the next section, we will discuss the details of the clusterware sequences.
OHAS uses and maintains the information in OLR. The High Availability Cluster Service stack consists of the following daemons and services:
GPnP daemon (GPnPD): This daemon accesses and maintains the GPnP profile and ensures that all the nodes have the current profile. When OCR is stored in an ASM diskgroup, during the initial startup of the clusterware, OCR is not available as the ASM is not available; the GPnP profile contains enough information to start the Clusterware.
Oracle Grid Naming Service (GNS): This process provides the name resolutions with the cluster. With 12cR1, GNS can be used for multiple clusters in contrast to the single-cluster version.
Grid Interprocess Communication (GIPC): This daemon supports Grid Infrastructure communication by enabling Redundant Interconnect Usage.
Multicast Domain Name Service (mDNS): This daemon works with GNS to perform name resolution.
This stack also includes the System Monitor Service daemon (osysmond) and Cluster Logger Service daemon (ologgerd).
The CRS stack is an upper-level stack of the Oracle Clusterware which requires the support of the services of the lower High Availability Cluster Service stack. The CRS stack includes the following daemons and services:
CRS: This service is primarily responsible for managing high availability operations. The CRS daemon (CRSD) manages the cluster resource’s start, stop monitor, and failover operations. CRS maintains the configuration information in OCR. If the cluster has an Oracle RAC database, the resources managed by CRS include the Oracle database and its instances, listener, ASM instance, VIPs, and so on. This service runs as the crs.bin process on Linux/Unix and OracleOHService on Windows.
CSS: This service manages and monitors the node membership in the cluster and updates the node status information in VD. This service runs as the ocssd.bin process on Linux/Unix and OracleOHService (ocssd.exe) on Windows.
CSS Agent: This process monitors, starts, and stops the CSS. This service runs as the cssdagent process on Linux/Unix and cssdagent.exe on Windows.
CSS Monitor: This process works with the cssdagent process to provide the I/O fencing to ensure data integrity by rebooting the RAC node in case there is an issue with the ocssd.bin process, a CPU starvation, or an OS locked up. This service runs as cssdmonitor on Linux/Unix or cssdmonitor.exe on Windows. Both cssdagent and cssdmonitor are the new features started in 11gR2 that replace the previous Oracle Process Monitor daemon (oprocd) in 11gR1.
Cluster Time Synchronization Service (CTSS): A new daemon process introduced with 11gR2, which handles the time synchronization among all the nodes in the cluster. You can use the OS’s Network Time Protocol (NTP) service to synchronize the time. Or, if you disable NTP service, CTSS will provide the time synchronization service. This service runs as the octssd.bin process on Linux/Unix or octssd.exe on Windows.
Event Management (EVM): This background process publishes events to all the members of the cluster. On Linux/Unix, the process name is evmd.bin, and on Windows, it is evmd.exe.
ONS: This is the publish and subscribe service that communicates Fast Application Notification (FAN) events. This service is the ons process on Linux/Unix and ons.exe on Windows.
Oracle ASM: Provides the volume manager and shared storage management for Oracle Clusterware and Oracle Database.
Clusterware agent processes: Oracle Agent (oraagent) and Oracle Root Agent (orarootagent). The oraagent agent is responsible for managing all Oracle-owned ohasd resources. The orarootagent is the agent responsible for managing all root-owned ohasd resources.
Oracle Clusterware is started up automatically when the RAC node starts. This startup process runs through several levels. Figure 2-3 shows the multiple-level startup sequences to start the entire Grid Infrastructure stack plus the resources that Clusterware manages.
Figure 2-3. Startup sequence of 12cR1 Clusterware processes
Level 0:The OS automatically starts Clusterware through the OS’s init process. The init process spawns only one init.ohasd, which in turn starts the OHASD process. This is configured in the /etc/inittab file:
$cat /etc/inittab|grep init.d | grep –v grep
h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
Oracle Linux 6.x and Red Hat Linux 6.x have deprecated inittab. init.ohasd is configured in startup in /etc/init/oracle-ohasd.conf:
$ cat /etc/init/oracle-ohasd.conf
......
start on runlevel [35]
stop on runlevel [!35]
respawn
exec /etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
This starts up "init.ohasd run", which in turn starts up the ohasd.bin background process:
$ ps -ef | grep ohasd | grep -v grep
root 4056 1 1 Feb19 ? 01:54:34 /u01/app/12.1.0/grid/bin/ohasd.bin reboot
root 22715 1 0 Feb19 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
Once OHASD is started on Level 0, OHASD is responsible for starting the rest of the Clusterware and the resources that Clusterware manages directly or indirectly through Levels 1-4. The following discussion shows the four levels of cluster startup sequence shown in the preceding Figure 2-3.
Level 1:OHASD directly spawns four agent processes:
Level 2: On this level, OHASD oraagent spawns five processes:
Then, OHASD oraclerootagent spawns the following processes:
Next, the cssdagent starts the CSSD (CSS daemon) process.
Level 3: The CRSD spawns two CRSD agents: CRSD orarootagent and CRSD oracleagent.
Level 4: On this level, the CRSD orarootagent is responsible for starting the following resources:
Then, the CRSD orarootagent is responsible for starting the rest of the resources as follows:
ASM and Clusterware: Which One is Started First?
If you have used Oracle RAC 10g and 11gR1, you might remember that the Oracle Clusterware stack has to be up before the ASM instance starts on the node. Because 11gR2, OCR, and VD also can be stored in ASM, the million-dollar question in everyone’s mind is, “Which one is started first?” This section will answer that interesting question.
The Clusterware startup sequence that we just discussed gives the solution: ASM is a part of the CRS of the Clusterware and it is started at Level 3 after the high availability stack is started and before CRSD is started. Then, the question is, “How does the Clusterware get the stored cluster configuration and the clusterware membership information, which are normally stored in OCR and VD, respectively, without starting an ASM instance?” The answer is that during the startup of the high availability stack, the Oracle Clusterware gets the clusterware configuration from OLR and the GPnP profile instead of from OCR. Because these two components are stored in the $GRID_HOME in the local disk, the ASM instance and ASM diskgroup are not needed for the startup of the high availability stack. Oracle Clusterware also doesn’t rely on an ASM instance to access the VD. The location of the VD file is in the ASM disk header. We can see the location information with the following command:
$ kfed read /dev/dm-8 | grep -E 'vfstart|vfend'
kfdhdb.vfstart: 352 ; 0x0ec: 0x00000160
kfdhdb.vfend: 384 ; 0x0f0: 0x00000180
The kfdhdb.vfstart is the begin AU offset of the VD file, and the kfdhdb.vfend indicates the end AU offset of the VD file. Oracle Clusterware uses the values of kfdhdb.vfstart and kfdhdb.vfend to locate the VD file.
In this example, /dev/dm-8 is the disk for the ASM disk group VOCR which stores the VD file, as shown with running the following command:
$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 7141f13f99734febbf94c73148c35a85 (/dev/dm-8) [VOCR]
Located 1 VD(s).
The Grid Infrastructure Universal Installer takes care of the installation and configuration of the Oracle Clusterware and the ASM instance. After this installation, the Clusterware and ASM get restarted automatically every time when the server starts. Most times, this entire stack works well without need for a lot of manual intervention. However, as the most important infrastructure for Oracle RAC, this stack does need some proper management and ongoing maintenance work. Oracle Clusterware provides several tools, utilities, and log files for a Clusterware admin to perform management, troubleshooting, and diagnostic work. This section will discuss tools and Clusterware management, and the next few sections will discuss Clusterware troubleshooting and diagnosis.
Clusterware Management Tools and Utilities
Oracle provides a set of tools and utilities that can be used for Oracle Grid Infrastructure management. The most commonly used tool is the Clusterware control utility crsctl, which is a command-line tool for managing Oracle Clusterware. Oracle Clusterware 11gR2 has added to crsctl the cluster-aware commands that allow you to perform CRS check, start, and stop operations of the clusterware from any node. Use crsctl –help to print all the command Help with crsctl.
$ crsctl -help
Usage: crsctl add - add a resource, type, or other entity
crsctl backup - back up voting disk for CSS
crsctl check - check a service, resource, or other entity
crsctlconfig - output autostart configuration
crsctl debug - obtain or modify debug state
crsctl delete - delete a resource, type, or other entity
crsctl disable - disable autostart
crsctl discover - discover DHCP server
crsctl enable - enable autostart
crsctleval - evaluate operations on resource or other entity without performing them
crsctl get - get an entity value
crsctlgetperm - get entity permissions
crsctllsmodules - list debug modules
crsctl modify - modify a resource, type, or other entity
crsctl query - query service state
crsctl pin - pin the nodes in the nodelist
crsctl relocate - relocate a resource, server, or other entity
crsctl replace - replace the location of voting files
crsctl release - release a DHCP lease
crsctl request - request a DHCP lease or an action entrypoint
crsctlsetperm - set entity permissions
crsctl set - set an entity value
crsctl start - start a resource, server, or other entity
crsctl status - get status of a resource or other entity
crsctl stop - stop a resource, server, or other entity
crsctl unpin - unpin the nodes in the nodelist
crsctl unset - unset a entity value, restoring its default
You can get the detailed syntax of a specific command, such as crsctl status -help. Starting with 11gR2, crsctl commands are used to replace a few deprecated crs_* commands, such as crs_start, crs_stat, and crs_stop. In the following sections, we discuss the management tasks in correlation with the corresponding crsctl commands.
Another set of command-line tools are based on the srvctl utility. These commands are used to manage the Oracle resources managed by the Clusterware.
A srvctl command consists of four parts:
$ srvctl <command> <object> [<options>]
The command part specifies the operation of this command. The object part specifies the resource where this operation will be executed. You can get Help with the detailed syntax of the srvctl by running the srvctl Help command. For detailed Help on each command and object and its options for use, run the following commands:
$ srvctl <command> -h or
$ srvctl <command> <object> -h
There are also other utilities:
As we discussed in the previous section, through the OS init process, Oracle Clusterware is automatically started up when the OS starts. The clusterware can also be manually started and stopped by using the crsctl utility.
The crsctl utility provides the commands to start up the Oracle Clusterware manually:
Start the Clusterware stack on all servers in the cluster or on one or more named server in the cluster:
$ crsctl start cluster [-all | - n server1[,..]]
For example:
$crsctl start cluster –all
$ crsctl start cluster –n k2r720n1
Start the Oracle High Availability Services daemon (OHASD) and the Clusterware service stack together on the local server only:
$crsctl start crs
Both of these two crsctl startup commands require the root privilege on Linux/Unix to run. The 'crsctl start crs' command will fail if OHASD is already started.
The crsctl utility also provides similar commands to stop the Oracle Clusterware manually. It also requires root privilege on Linux/Unix to stop the clusterware manually.
The following command stops the clusterware stack on the local node, or all nodes, or specified local or remote nodes. Without the [-f] option, this command stops the resources gracefully, and with the [-f] option, the command forces the Oracle Clusterware stack to stop, along with the resources that Oracle Clusteware manages.
$ crsctl stop cluster [-all | -n server_name[...]] [-f]
The following command stops the Oracle High Availability service on the local server. Use the [-f] option to force any resources to stop, as well as to stop the Oracle High Availability service:
$ crsctl stop crs [-f]
You can use the following command to check the cluster status:
$ crsctl check cluster {-all}
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
Check the CRS status with the following command:
$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
Check the OHASD status:
$GRID_HOME/bin/crsctl check has
CRS-4638: Oracle High Availability Services is online
Check the current status of all the resources using the following command. It replaces the crs_stat –t command on 11gR1 and earlier.
[grid@knewracn1 ∼]$ crsctl status resource -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
ONLINE ONLINE knewracn1 STABLE
ONLINE ONLINE knewracn2 STABLE
ONLINE ONLINE knewracn4 STABLE
ora.DATA1.dg
ONLINE ONLINE knewracn1 STABLE
ONLINE ONLINE knewracn2 STABLE
ONLINE ONLINE knewracn4 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE knewracn1 STABLE
ONLINE ONLINE knewracn2 STABLE
ONLINE ONLINE knewracn4 STABLE
ora.LISTENER_LEAF.lsnr
OFFLINE OFFLINE knewracn5 STABLE
OFFLINE OFFLINE knewracn6 STABLE
OFFLINE OFFLINE knewracn7 STABLE
OFFLINE OFFLINE knewracn8 STABLE
ora.net1.network
ONLINE ONLINE knewracn1 STABLE
ONLINE ONLINE knewracn2 STABLE
ONLINE ONLINE knewracn4 STABLE
ora.ons
ONLINE ONLINE knewracn1 STABLE
ONLINE ONLINE knewracn2 STABLE
ONLINE ONLINE knewracn4 STABLE
ora.proxy_advm
ONLINE ONLINE knewracn1 STABLE
ONLINE ONLINE knewracn2 STABLE
ONLINE ONLINE knewracn4 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE knewracn2 STABLE
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE knewracn4 STABLE
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE knewracn1 STABLE
ora.MGMTLSNR
1 ONLINE ONLINE knewracn1 169.254.199.3 192.16.
8.9.41,STABLE
ora.asm
1 ONLINE ONLINE knewracn1 STABLE
2 ONLINE ONLINE knewracn2 STABLE
3 ONLINE ONLINE knewracn4 STABLE
ora.cvu
1 ONLINE ONLINE knewracn1 STABLE
ora.gns
1 ONLINE ONLINE knewracn1 STABLE
ora.gns.vip
1 ONLINE ONLINE knewracn1 STABLE
ora.knewdb.db
1 ONLINE ONLINE knewracn2 Open,STABLE
2 ONLINE ONLINE knewracn4 Open,STABLE
3 ONLINE ONLINE knewracn1 Open,STABLE
ora.knewracn1.vip
1 ONLINE ONLINE knewracn1 STABLE
ora.knewracn2.vip
1 ONLINE ONLINE knewracn2 STABLE
ora.knewracn4.vip
1 ONLINE ONLINE knewracn4 STABLE
ora.mgmtdb
1 ONLINE ONLINE knewracn1 Open,STABLE
ora.oc4j
1 ONLINE ONLINE knewracn1 STABLE
ora.scan1.vip
1 ONLINE ONLINE knewracn2 STABLE
ora.scan2.vip
1 ONLINE ONLINE knewracn4 STABLE
ora.scan3.vip
1 ONLINE ONLINE knewracn1 STABLE
-------------------------------------------------------------------------------------
These commands can be executed by root user, grid (GI owner), and Oracle (RAC owner). You also can disable or enable all the CRSDs:
$GRID_HOME/bin/crsctl disable crs
$GRID_HOME/bin/crsctl enable crs
Managing OCR and the Voting Disk
Oracle provides three tools to manage OCR: ocrconfig, ocrdump, and ocrcheck. The ocrcheck command lists the OCR and its mirrors.
The following example lists the OCR location in the +VOCR diskgroup and its mirror in the +DATA1 diskgroup. In 11gR2 and 12cR1, OCR can have up to five mirrored copies. Each mirrored copy can be an ASM diskgroup or a cluster file system:
$ ocrcheck
Status of Oracle Cluster Registry is as follows:
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3192
Available space (kbytes) : 258928
ID : 1707636078
Device/File Name : +VOCR
Device/File integrity check succeeded
Device/File Name : +DATA1/
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
You also can use the ocrconfig command to add/delete/replace OCR files, and you can add another mirror of OCR in +DATA2:
$GRID_HOME/bin/ocrconfig –add +DATA2
Or remove the OCR copy from +DATA1 :
$GRID_HOME/bin/ocrconfig –delete +DATA1
The ocrdump command can be used to dump the contents of the OCR to a .txt or .xml file. It can be executed only by the root user, and the default file name is OCRDUMPFILE:
$ ./ocrdump
$ ls -l OCRDUMPFILE
-rw------- 1 root root 212551 Dec 28 20:21 OCRDUMPFILE
The OCR is backed up automatically every four hours on at least one of the nodes in the cluster. The backups are stored in the $GRID_HOME/cdata/<cluster_name> directory. To show the backup information, use the ocrconfig -showbackup command:
$GRID_HOME/bin/ocrconfig –showbackup
knewracn1 2013/03/02 07:01:37 /u01/app/12.1.0/grid/cdata/knewrac/backup00.ocr
knewracn1 2013/03/02 03:01:33 /u01/app/12.1.0/grid/cdata/knewrac/backup01.ocr
knewracn1 2013/03/01 23:01:32 /u01/app/12.1.0/grid/cdata/knewrac/backup02.ocr
knewracn1 2013/03/01 03:01:21 /u01/app/12.1.0/grid/cdata/knewrac/day.ocr
knewracn1 2013/02/20 02:58:55 /u01/app/12.1.0/grid/cdata/knewrac/week.ocr
knewracn1 2013/02/19 23:15:34 /u01/app/12.1.0/grid/cdata/knewrac/backup_20130219_231534.ocr
knewracn1 2013/02/19 23:05:26 /u01/app/12.1.0/grid/cdata/knewrac/backup_20130219_230526.ocr
.....
The steps to restore OCR from a backup file are as follows:
You can use the following command to check the VD location:
$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 7141f13f99734febbf94c73148c35a85 (/dev/dm-8) [VOCR]
Located 1 voting disk(s).
To move the VD to another location, you can use the following crsctl command:
$GRID_HOME/bin/crscrsctl replace votedisk +DATA3
The srvctl utility can be used to manage the resources that the Clusterware manages. The resources include database, instance, service, nodeapps, vip, asm, diskgroup, listener, scan, scan listener, serer pool, server, oc4j, home, file system, and gns. This managed resource is specified in the <object> part of the command. The command also specifies the management operation on the resource specified in the <action> part of the command. The operations include enable, disable, start, stop, relocate, status, add, remove, modify, getenv, setenv, unsetenv, config, convert, and upgrade.
srvctl <action> <object> [<options>]
Here are a few examples of SRVCTL commands.
Check the SCAN configuration of the cluster:
$srvctl config scan
SCAN name:knewrac-scan.kcloud.dblab.com, Network: 1
Subnet IPv4: 172.16.0.0/255.255.0.0/eth0
Subnet IPv6:
SCAN 0 IPv4 VIP: -/scan1-vip/172.16.150.40
SCAN name:knewrac-scan.kcloud.dblab.com, Network: 1
Subnet IPv4: 172.16.0.0/255.255.0.0/eth0
Subnet IPv6:
SCAN 1 IPv4 VIP: -/scan2-vip/172.16.150.83
SCAN name:knewrac-scan.kcloud.dblab.com, Network: 1
Subnet IPv4: 172.16.0.0/255.255.0.0/eth0
Subnet IPv6:
SCAN 2 IPv4 VIP: -/scan3-vip/172.16.150.28
Check the node VIP status on knewracn1:
$ srvctl status vip -n knewracn1
VIP 172.16.150.37 is enabled
VIP 172.16.150.37 is running on node: knewracn1
Check the node apps on knewracn1:
$ srvctl status nodeapps -n knewracn1
VIP 172.16.150.37 is enabled
VIP 172.16.150.37 is running on node: knewracn1
Network is enabled
Network is running on node: knewracn1
ONS is enabled
ONS daemon is running on node: knewracn1
Adding and Removing Cluster Nodes
The flexibility of Oracle Clusterware is exhibited through its ability to scale up and scale down the existing cluster online by adding and removing nodes in conformity with the demands of the business. This section will outline the procedure to add and remove nodes from the existing cluster.
Assume that you have a two-node cluster environment and want to bring in an additional node (named rac3) to scale up the existing cluster environment, and that the node that is going to be part of the cluster meets all prerequisites essential to begin the procedure to add a node.
Adding a new node to the existing cluster typically consists of the following stages:
When the new node is ready with all necessary prerequisites to become part of the existing cluster, such as storage, network, OS, and patches, use the following step-by-step procedure to add the node:
From the first node of the cluster, execute the following command to initiate integrity verification checks for the cluster and on the node that is going to be part of the cluster:
$ cluvfy stage –pre nodeadd –n rac3 –fixup -verbose
When no verification check failures are reported, use the following example to launch the procedure to add the node, assuming that the Dynamic Host Configuration Protocol (DHCP) and Grid Naming Service (GNS) are not configured in the current environment:
$ $GRID_HOME/oui/bin/addNode.sh –silent "CLUSTER_NEW_NODES={rac3}"
"CLUSTER_NEW_VIRTUAL_HOSTNAMES={rac3_vip}"
Use the following example when adding to the Flex Cluster setup:
$ $GRID_HOME/oui/bin/addNode.sh –silent "CLUSTER_NEW_NODES={rac3}"
"CLUSTER_NEW_VIRTUAL_HOSTNAMES={rac3-vip}" "CLUSTER_NEW_NODE_ROLES={hub}"
Execute the root.sh script as the root user when prompted on the node that is joining the cluster. The script will initialize cluster configuration and start up the cluster stack on the new node.
After successfully completing the procedure to add a new node, perform post node add verification checks from any cluster node using the following example:
$ cluvfy stage –post nodeadd –n rac3
$ crsctl check cluster –all –- verify the cluster health from all nodes
$ olsnodes –n -- to list all existing nodes in a cluster
After a successful node addition, execute the following from $ORACLE_HOME to clone the Oracle RDBMS software over the new node to complete the node addition procedure:
$ORACLE_HOME/oui/bin/addNode.sh "CLUSTER_NEW_NODES={rac3}"
When prompted, execute the root.sh script as the root user on the new node.
Once a new node is successfully added to the cluster, run through the following post-addnode command:
$ ./cluvfy stage –post addnode –n rac3 -verbose
Assume that you have a three-node cluster environment and want to delete the rac3 node from the existing cluster. Ensure the node that is going to be dropped has no databases, instances, or other services running. If any do exist, either drop them or just move them over to other nodes in the cluster. The following steps outline a procedure to remove a node from the existing cluster:
The node that is going to be removed shouldn’t be pinned. If so, unpin the node prior to starting the procedure. The following examples demonstrate how to identify if a node is pinned and how to unpin the node:
$ olsnodes –n –s -t
You will get the following typical output if the nodes are pinned in the cluster:
rac1 1 Active Pinned
rac2 2 Active Pinned
rac3 3 Active Pinned
Ensure that the cluster stack is up and running on node rac3. If the cluster is inactive on the node, you first need to bring the cluster up on the node and commence the procedure to delete the node.
Execute the following command as the root user from any node if the node that is going to be removed is pinned:
$ crsctl unpin css –n rac3
Run the following command as the root user on the node that is going to be removed:
$GRID_HOME/deinstall/deinstall –local
Note The –local argument must be specified to remove the local node; otherwise, the cluster will be deinstalled from every node of the cluster.
Run the following command as the root user from an active node in a cluster:
$crsctl delete node –n rac3
From any active node, execute the following command to update the Oracle inventory for GI and RDBMS homes across all nodes:
$GRID_HOME/oui/bin/runInstaller –updateNodeList ORACLE_HOME=$GRID_HOME cluster_nodes={rac1,rac2} CRS=TRUE -silent
$GRID_HOME/oui/bin/runInstaller –updateNodeList ORACLE_HOME=$ORACLE_HOME cluster_nodes={rac1,rac2} CRS=TRUE –silent
When you specify the –silent option, the installer runs in silent mode and therefore doesn’t display any interactive screens. In other words, it will run in non-interactive mode.
From any active node, verify the post-node deletion:
$cluvfy stage –post nodedel –n rac3 –verbose
$olsnodes –n –s -t
Clean up the following directories manually on the node that was just dropped:
/etc/oraInst.loc, /etc/oratab, /etc/oracle/ /tmp/.oracle, /opt/ORCLmap
Also, the filesystem where cluster and RDBMS software was installed.
Troubleshooting common Clusterware Stack Start-Up Failures
Various factors could contribute to the inability of the cluster stack to come up automatically after a node eviction, failure, reboot, or when cluster startup initiated manually. This section will focus and cover some of the key facts and guidelines that will help with troubleshooting common causes for cluster stack startup failures. Though the symptoms discussed here are not exhaustive or complete, the key points explained in this section indeed provide a better perspective to diagnose various cluster daemon processes common start-up failures and other issues.
Just imagine: a node failure or cluster manual shutdown, and subsequent cluster startup doesn’t start the Clusterware as expected. Upon verifying the cluster or CRS health status, one of the following error messages have been encountered by the DBA:
$GRID_HOME/bin/crsctl check cluster
CRS-4639: Could not contact Oracle High Availability Services
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Check failed, or completed with errors
OR
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
ohasd start up failures –This section will explain and provide most significant information to diagnose common issues of Oracle High Availability Services (OHAS) daemon process startup failures and provide workarounds for the following issues:
CRS-4639: Could not contact Oracle High Availability Services
OR
CRS-4124: Oracle High Availability Services startup failed
CRS-4000: Command Start failed, or completed with errors
First, review the Clusterware alert and ohasd.log files to identify the root cause for the daemon startup failures.
Verify the existence of the ohasd pointer, as follows, in the OS-specific file: /etc/init, /etc/inittab
h1:3:respawn:/sbin/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
This pointer should have been added automatically upon cluster installation and upgrade. In case no pointer is found, add the preceding entry toward the end of the file and as the root user start the cluster manually or initiate the inittab to start up these things automatically.
If the ohasd pointer exists, the next thing to check is the cluster high availability daemon auto start configuration. Use the following command as the root user to confirm the auto startup configuration:
$ GRID_HOME/bin/crsctl config has -- High Availability Service
$ GRID_HOME/bin/crsctl config crs -- Cluster Ready Service
Optionally, you can also verify the files under the /var/opt/oracle/scls_scr/hostname/root or /etc/oracle/scls_scr/hostname/root location to identify whether the auto config is enabled or disabled.
As the root user, enable the auto start and bring up the cluster manually on the local node when the auto startup is not configured. Use the following examples to enable has/crs auto-start:
$ CRS-4621: Oracle High Availability Services autostart is disabled.
Example:
$ GRID_HOME/bin/crsctl enable has – turns on auto startup option of ohasd
$ GRID_HOME/bin/crsctl enable crs - turns on auto startup option of crs
$ GRID_HOME/bin/crsctl start has – initiate OHASD daemon startup
$ GRID_HOME/bin/crsctl start crs – initiate CRS daemon startup
Despite the preceding, if the ohasd daemon process doesn’t start and the problem persists, then you need to examine the component-specific trace files to troubleshoot and identify the root cause. Follow these guidelines:
Verify the existence of the ohasd daemon process on the OS. From the command-line prompt, execute the following:
ps -ef |grep init.ohasd
Examine OS platform–specific log files to identify any errors (refer to the operating system logs section later in this chapter for more details).
Refer the ohasd.log trace file under the $GRID_HOME/log/hostname/ohasd location, as this file contains useful information about the symptoms.
Address any OLR issues that are being reported in the trace file. If OLR corruption or inaccessibility is reported, repair or resolve the issue by taking appropriate action. In case of a restore, restore it from a previous valid backup using the $ocrconfig -local –restore $backup_location/backup_filename.olr command.
Verify Grid Infrastructure directory ownership and permission using OS level commands.
Additionally, remove the cluster startup socket files from the /var/tmp/.oracle, /usr/tmp/.oracle, /tmp/.oracle directory and start up the cluster manually. The existence of the directory is subject to operating system dependency.
CSSD startup issues – In case the CSSD process fails to start up or is reported to be unhealthy, the following guidelines help in identifying the root cause of the issue:
Error : CRS-4530: Communications failure contacting Cluster Synchronization Services daemon:
Review the Clusterware alert.log and ocssd.log file to identify the root cause of the issue.
Verify the CSSD process on the OS:
ps –ef |grep cssd.bin
Examine the alert_hostname.log and ocssd.log logs to identify the possible causes that are preventing the CSSD process from starting.
Ensure that the node can access the VDs. Run the crsctl query css votedisk command to verify accessibility. If the node doesn’t access the VD files for any reason, check for disk permission and ownership and for logical corruptions. Also, take the appropriate action to resolve the issues by either resetting the ownership and permission or by restoring the corrupted OCR file.
If any heartbeat (network|disk) problems are reported in the logs mentioned earlier, verify the private interconnect connectivity and other network-related settings on the node.
If the VD files are placed on ASM, ensure that the ASM instance is up. In case the ASM instance is not up, refer to the ASM instance alert.log to identify the instance’s startup issues.
Use the following command to verify asm, cluster_connection, cssd, and other cluster resource status:
$ crsctl stat res -init -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE rac1 Started,STABLE
ora.cluster_interconnect.haip
1 ONLINE OFFLINE rac1 STABLE
ora.crsd
1 ONLINE OFFLINE rac1 STABLE
ora.cssd
1 ONLINE OFFLINE rac1 STABLE
ora.cssdmonitor
1 ONLINE UNKNOWN rac1 STABLE
ora.ctssd
1 ONLINE ONLINE rac1 ACTIVE:0,STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.drivers.acfs
1 ONLINE ONLINE rac1 STABLE
ora.evmd
1 ONLINE ONLINE rac1 STABLE
ora.gipcd
1 ONLINE ONLINE rac1 STABLE
ora.gpnpd
1 ONLINE ONLINE rac1 STABLE
ora.mdnsd
1 ONLINE ONLINE rac1 STABLE
ora.storage
1 ONLINE ONLINE rac1 STABLE
If you find the ora.cluster_interconnect.hiap resource is OFFLINE, you might need to verify the interconnect connectivity and check the network settings on the node. Also, you can try to startup the offline resource manually using the following command:
$GRID_HOME/bin/crsctl start res ora.cluster_interconnect.haip –init
Bring up the offline cssd daemon manually using the following command:
$GRID_HOME/bin/crsctl start res ora.cssd –init
The following output will be displayed on your screen:
CRS-2679: Attempting to clean 'ora.cssdmonitor' on 'rac1'
CRS-2681: Clean of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
CRSD startup issues – When the CRSD–related startup and other issues are being reported, the following guidelines provide assistance to troubleshoot the root cause of the problem:
CRS-4535: Cannot communicate with CRS:
Verify the CRSD process on the OS:
ps –ef |grep crsd.bin
Examine the crsd.log to look for any possible causes that prevent the CRSD from starting.
Ensure that the node can access the OCR files; run the 'ocrcheck' command to verify. If the node can’t access the OCR files, check the following:
Check the OCR disk permission and ownership.
If OCR is placed on the ASM diskgroup, ensure that the ASM instance is up and that the appropriate diskgroup is mounted.
Repair any OCR-related issue encountered, if needed.
Use the following command to ensure that the CRSD daemon process is ONLINE:
$ crsctl stat res -init -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.crsd
1 ONLINE OFFLINE rac1
You can also start the individual daemon manually using the following command:
$GRID_HOME/bin/crsctl start res ora.cssd –init
In case the Grid Infrastructure malfunctions or its resources are reported as being unhealthy, you need to ensure the following:
Beginning with 11gR2 (11.2.0.2), the cluster stack can be invoked in an exclusive mode to carry out a few exclusive cluster maintenance tasks, such as restoring OCR and VDs, troubleshooting root.sh issues, and so on. To start the cluster in this mode on any particular node, the cluster stack must not be active on other nodes in the cluster. When a cluster is started in exclusive mode, no VD and networks are required. Use the following command as the root user to bring the cluster in exclusive mode:
$crsctl start crs –excl {-nocrs} {-nowait}
With the –nocrs argument, Oracle Clusterware will be started without the CRSD process, and with the –nowait argument, Clusterware start doesn’t depend on Oracle High Availability Service (ohasd) daemon start.
In the event of any OCR issues, such as logical corruption, missing permissions and ownership, integrity, and loss of mirror copy, the following troubleshooting and workaround methods are extremely helpful for identifying the root cause as well as resolving the issues:
Verify the OCR file integrity using the following cluster utilities:
$ocrcheck - verifies OCR integrity & logical corruption
$ocrcheck –config - lists OCR disk location and names
$ocrcheck –local –config - lists LOR name and location
$cluvfy comp ocr -n all –verbose – verifies integrity from all nodes
$cluvfy comp ocr -n rac1 –verbose - verifies integrity on the local node
With the ocrdump utility, you can dump either the entire contents or just a section from the OCR file into a text file. The following commands achieve that:
$ ocrdump <filename.txt>
-- to obtains a detailed output, run the command as the root user
With the preceding command issued, OCR contents will be dumped into a text file, and if the output filename is not mentioned, a file named OCRDUMPFILE will be generated in the local directory.
$ ocrdump –stdout –keyname SYSTEM.css {-xml}
The preceding command lists css section–specific contents from the current OCR file, and the contents will be displayed on the prompt if the output is not diverted into any file.
$ ocrdump –backupfile <filename and location>
-- will dump specific backup contents.
Diagnose, Debug, Trace Clusterware and RAC Issues
When the default debugging information generated by the Oracle Clusterware processes based on their default trace level settings doesn’t provide enough clues to reach a conclusion to a problem, it is necessary to increase the default trace levels of specific components and their subcomponents to get comprehensive information about the problem. The default tracing levels of Clusterware components is set to value 2, which is sufficient in most cases.
In the following sections, we will demonstrate how to modify, enable, and disable the debugging tracing levels of various cluster components and their subcomponents using the cluster commands.
To understand and list various cluster attributes and their default settings under a specific Clusterware component, use the following example command:
$ crsctl stat res ora.crsd -init -t –f
The output from the preceding example helps you to find the default settings for all arguments of a specific component, like stop/start dependencies, logging/trace level, auto-start, failure settings, time, and so on.
Debugging Clusterware Components and Resources
Oracle lets you dynamically modify and disable the default tracing levels of any of the cluster daemon (CRSD, CSSD, EVMD) processes and their subcomponents. The crsctl set {log|trace} command allows modification of the default debug setting dynamically. The trace levels range from 1 to 5, whereas the value 0 turns off the tracing option. Higher trace levels generate additional diagnostic information about the component.
The following example lists the default log settings for all modules of a component; the command must be executed as the root user to avoid an Insufficient User Privileges error:
$ crsctl get log {css|crs|evm} ALL
The following output fetches the default trace levels of various subcomponents of CSSD:
Get CSSD Module: BCCM Log Level: 2
Get CSSD Module: CLSF Log Level: 0
Get CSSD Module: CLSINET Log Level: 0
Get CSSD Module: CSSD Log Level: 2
Get CSSD Module: GIPCBCCM Log Level: 2
Get CSSD Module: GIPCCM Log Level: 2
Get CSSD Module: GIPCGM Log Level: 2
Get CSSD Module: GIPCNM Log Level: 2
Get CSSD Module: GPnP Log Level: 1
Get CSSD Module: OLR Log Level: 0
Get CSSD Module: SKGFD Log Level: 0
To list all components underneath of a module, use the following example as the root user:
$ crsctl lsmodules -- displays the list of modules
$ crsctl lsmodules {css|crs|evm} -– displays the sub-components of a module
To set a non-default tracing level, use the following syntax as the root user:
Syntax:
$ crsctl set log {module} "component_name=debug_level"
$ crsctl set log res "resourcename=debug_level"
Example:
$ crsctl set log crs crsmain=3
$ crsctl set log crs crsmain=3,crsevt=4
--- let you set different log levels to multiple modules
$ crsctl set log crs all=5
$ crsctl set log res ora.rondb.db:5
If the node is evicting due to some mysterious network heartbeat (NHB) issues and the default information is not sufficient to diagnose the cause, you can increase the CSSD tracing level to a higher number. To troubleshoot NHB-related issues, you can set the log level to 3 as the root user, as shown in the following example:
$ crsctl set log css ocssd=4
The following examples disable the tracing:
$ crsctl set log crs crsmain=0
$ crsctl set log res ora.rondb.db:0
$ crsctl set log res ora.crs:0-init
The –init flag must be specified while modifying the debug mode of a key cluster daemon process. To list the current logging and tracing levels of a particular component and its subcomponents, use the following example:
$crsctl stat res ora.crsd –init –f |grep LEVEL
Tracing levels also can be set by specifying the following environmental variables on the local node (however, you need to restart the cluster on the local node to enforce the logging/tracing changes):
$ export ORA_CRSDEBUG_ALL=1 --sets debugging level 1 to all modules
$ export ORA_CRSDDEBUG_CRS=2 --sets debugging level 2 to CRS module
You should also be able to use the OS-specific tracing utility (gdb, pstack, truss, strace, and so on) to dump the debug information of an OS process. The following exercise demonstrates the procedure:
Identify the process ID that you want to set the OS level tracing; for example:
$ps –ef |grep oraagent.bin
Attach the process with the OS-specific debug utility; for example, on the HP-UX platform:
$pstack /u00/app/12.1.0/grid/bin/orarootagent.bin 4558
You can then provide the information to Oracle Support or consult your OS admin team to help you identify any issues that were raised from the OS perspective.
The cluvfy:runCluvfy utility can be used to accomplish pre-component and post-component verification checks, including OS, network, storage, overall system readiness, and clusterware best practices. When the utility fails to execute for no apparent reason, and in addition the –verbose argument doesn’t yield sufficient diagnostic information about the issue, enable the debugging mode for the utility and re-execute the command to acquire adequate information about the problem. The following example demonstrates enabling debugging mode:
$ export SRVM_TRACE=true
Rerun the failed command after setting the preceding environmental variable. A detailed output file will be generated under the $GRID_HOME/cv/log location, which can be used to diagnose the real cause. When debug settings are modified, the details are recorded in the OCR file and the changes will be affected on that node only.
In addition, when Java-based Oracle tools (such as srvctl, dbca, dbua, cluvfy, and netca) fail for unknown reasons, the preceding setting will also help to generate additional diagnostic information that can be used to troubleshoot the issues.
Example:
$srvctl status database –d
Note When the basic information from the CRS logs doesn’t provide sufficient feedback to conclude the root cause of any cluster or RAC database issue, setting different levels of trace mode might produce useful, additional information to resolve the problem. However, the scale of the debug mode level will have an impact on the overall cluster performance and also potentially generate a huge amount of information in the respective log files. On top of that, it is highly advised to seek the advice of Oracle Support prior to tampering with the default settings of cluster components.
Grid Infrastructure Component Directory Structure
Each component in Grid Infrastructure maintains a separate log file and records sufficient information under normal and critical circumstances. The information written in the log files will surely assist in diagnosing and troubleshooting Clusterware components or cluster health-related problems. Exploring the appropriate information from these log files, the DBA can diagnose the root cause to troubleshoot frequent node evictions or any fatal Clusterware problems, in addition to Clusterware installation and upgrade difficulties. In this section, we explain some of the important CRS logs that can be examined when various Clusterware issues occur.
alert<HOSTNAME>.log: Similar to a typical database alert log file, Oracle Clusterware manages an alert log file under the $GRID_HOME/log/$hostname location and posts messages whenever important events take place, such as when a cluster daemon process starts, when a process aborts or fails to start a cluster resource, or when node eviction occurs. It also logs information about node eviction occurrences and logs when a voting, OCR disk becomes inaccessible on the node.
Whenever Clusterware confronts any serious issue, this should be the very first file to be examined by the DBA seeking additional information about the problem. The error message also points to a trace file location where more detailed information will be available to troubleshoot the issue.
Following are a few sample messages extracted from the alert log file, which explain the nature of the event, like node eviction, CSSD termination, and the inability to auto start the cluster:
[ohasd(10937)]CRS-1301:Oracle High Availability Service started on node rac1.
[/u00/app/12.1.0/grid/bin/oraagent.bin(11137)]CRS-5815:Agent '/u00/app/12.1.0/grid/bin/oraagent_oracle' could not find any base type
entry points for type 'ora.daemon.type'. Details at (:CRSAGF00108:) {0:1:2} in /u00/app/12.1.0/grid/log/rac1/agent/ohasd/oraagent_oracle/oraagent_oracle.log.
[cssd(11168)]CRS-1713:CSSD daemon is started in exclusive mode
[cssd(11168)]CRS-1605:CSSD voting file is online: /dev/rdsk/oracle/vote/ln1/ora_vote_002; details in /u00/app/12.1.0/grid/log/rac1/cssd/ocssd.log.
[cssd(11052)]CRS-1656:The CSSdaemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u00/app/12.1.0/grid/log/rac1/cssd/ocssd.log
[cssd(3586)]CRS-1608:This node was evicted by node 1, rac1; details at (:CSSNM00005:) in /u00/app/12.1.0/grid/log/rac2/cssd/ocssd.log.
ocssd.log: Cluster Synchronization Service daemon (CSSD) is undoubtedly one of the most critical components of Clusterware, whose primary functionality includes node monitoring, group service management, lock services, and cluster heartbeats. The process maintains a log file named ocssd.log under the $GIRD_HOME/log/<hostname>/cssd location and writes all important event messages in the log file. This is one of the busiest CRS log files and is continuously being written; when the debug level of the process is set too high, the file tends to get more detailed information about the underlying issue. Before the node eviction happens on the node, it writes the warning message in the log file. If a situation such as node eviction, VD access issues, or inability of the Clusterware to start up on the local node is raised, it is strongly recommended that you examine the file to find out the reasons.
Following are a few sample entries of the log file:
2012-11-30 10:10:49.989: [CSSD][21]clssnmvDiskKillCheck: not evicted, file /dev/rdsk/c0t5d4 flags 0x00000000, kill block unique 0, my unique 1351280164
ocssd.l04:2012-10-26 22:17:26.750: [CSSD][6]clssnmvDiskVerify: discovered a potential voting file
ocssd.l04:2012-10-26 22:36:12.436: [CSSD][1]clssnmvDiskAvailabilityChange: voting file /dev/rdsk/c0t5d4 now online
ocssd.l04:2012-10-26 22:36:10.440: [CSSD][1]clssnmReadDiscoveryProfile: voting file discovery string(/dev/rdsk/c0t5d5,/dev/rdsk/c0t5d4)
2012-12-01 09:54:10.091: [CSSD][30]clssnmSendingThread: sending status msg to all nodes
ocssd.l01:2012-12-01 10:24:57.116: [CSSD][1]clssnmInitNodeDB: Initializing with OCR id 1484043234
Oracle certainly doesn’t recommend removing the log file manually for any reason, as it is governed by Oracle automatically. Upon reaching a size of 50 MB, the file will be automatically archived as cssd.l01 in the same location as part of the predefined rotation policy, and a fresh log file (cssd.log) will be generated. There will be ten archived copies kept for future reference in the same directory as part of the built-in log rotation policy and ten-times-ten file retention formula.
When the log file is removed before it reaches 50 MB, unlike the database alter.log file, Clusterware will not generate a new log file instantly until the removed file reaches 50 MB. This is because despite the removal of the file from the OS, the CSS process will be still writing the messages to the file until the file becomes a candidate for the rotation policy. When the removed file reaches a size of 50 MB, a new log file will appear or be generated and will be available in this way. However, the previous messages won’t be able to recall in this context.
When the CSSD fails to start up, or it is reported to be unhealthy, this file can be referred to ascertain the root cause of the problem.
crsd.log: CRSD is another critical component of Clusterware whose primary functionality includes resource monitoring, resource failover, and managing OCR. The process maintains a log file named crsd.log under the $GIRD_HOME/log/<hostname>/crsd location and writes all important event messages in the log file. Whenever a cluster or non-cluster resource stops or starts, or a failover action is performed, or any resource-related warning message or communication error occurs, the relevant information is written to the file. In case you face issues like failure of starting resources, the DBA can examine the file to get relevant information that could assist in a resolution of the issue.
Deleting the log file manually is not recommended, as it is governed by Oracle and archived automatically. The file will be archived as crsd.l01 under the same location on reaching a size of 10 MB, and a fresh log file (crsd.log) will be generated. There will be ten archived copies kept for future reference in the same directory.
When the CRSD fails to start up, or is unhealthy, refer to this file to find out the root cause of the problem.
ohasd.log: The Oracle High Availability Service daemon (OHASD), a new cluster stack, was first introduced with 11gR2 to manage and control the other cluster stack. The primary responsibilities include managing OLR; starting, stopping, and verifying cluster health status on the local and remote nodes; and also supporting cluster-wide commands. The process maintains a log file named crsd.log under the $GIRD_HOME/log/<hostname>/ohasd location and writes all important event messages in the log file. Examine the file when you face issues running root.sh script, such as when the ohasd process fails to start up or in case of OLR corruption.
Oracle certainly doesn’t encourage deleting the log file for any reason, as it is governed by Oracle automatically. The file will be archived as ohasd.l01 under the same location on reaching a size of 10 MB, and a fresh log file (ohasd.log) will be generated. Like crsd.log and crs.log, there will be ten archived copies kept for future reference in the same directory.
2013-04-17 11:32:47.096: [ default][1] OHASD Daemon Starting. Command string :reboot
2013-04-17 11:32:47.125: [ default][1] Initializing OLR
2013-04-17 11:32:47.255: [ OCRRAW][1]proprioo: for disk 0 (/u00/app/12.1.0/grid_1/cdata/rac2.olr), id match (1), total id sets,
Upon successful execution of ocrdump, ocrconfig, olsnodes, oifcfg, and ocrcheck commands, a log file will be generated under the $GRID_HOME/log/<hostname>/client location. For EVM daemon (EVMD) process–relevant details, look at the evmd.log file under the $GRID_HOME/log<hostname>/evmd location. Cluster Health Monitor Services (CHM) and logger services are maintained under the $GRID_HOME/log/<hostname>/crfmond, crflogd directories.
Figure 2-4 depicts the hierarchy of the Clusterware component directory structure.
Figure 2-4. Unified Clusterware log directory hierarchy
Operating system (OS) logs: Referring to the OS-specific log file will be hugely helpful in identifying Clusterware startup and shutdown issues. Different platforms maintain logs at different locations, as shown in the following example:
HPUX - /var/adm/syslog/syslog.log
AIX - /bin/errpt –a
Linux - /var/log/messages
Windows - Refer .TXT log files under Application/System log using Windows Event Viewer
Solaris - /var/adm/messages
Caution Oracle Clusterware generates and maintains 0-sized socket files in the hidden './oracle' directory under the location /etc or /var/tmp (according to the platform). Removing these files as part of regular log cleanup or unintentionally removing them might lead to a cluster hung situation.
Note It is mandatory to maintain sufficient free space under the file system on which grid and RDBSM software are installed to prevent Clusterware issues; in addition, Oracle suggests not to remove the logs manually.
Oracle Clusterware Troubleshooting - Tools and Utilities
Managing and troubleshooting various issues related to Clusterware and its components are two of the key responsibilities of any Oracle DBA. Oracle provides a variety of tools and utilities in this context that the DBA can use to monitor Clusterware health and also diagnose and troubleshoot any serious Clusterware issues. Some of the key tools and utilities that Oracle provides are as follows: CHM, diagcollection.sh,ProcWatcher, RACcheck, oratop, OSWatcher Black Box Analyzer (OSWbba), The Light on-board monitor (LTOM),Hang File Generator (HANGFG).
In the following sections, we will cover some of the uses of these very important tools and describe their advantages.
Starting with 11.2.0.3, the cluster verification utility (cluvfy) is capable of carrying out the post-Clusterware and Database installation health checks. With the new –healthcheck argument, cluster and database components best practices, mandatory requirements, deviation, and proper functionality can be verified.
The following example collects detailed information about best-practice recommendations for Clusterware in an HTML file named cvucheckreport_<timestamp>.htm:
$./cluvfy comp healthcheck –collect cluster –bestpractice -html
When no further arguments are attached with the healthcheck parameter, the Clusterware and Database checks are carried out. Use the following example to perform the health checks on the cluster and database because no –html argument was specified; the output will be stored in a text file:
$./cluvfy comp healthcheck
The cluvfy utility supports the following arguments:
-collect cluster|database
-bestpractices|-mandatory|-deviations
-save –savedir --to save the output under a particular location
-html -- output will be written in an HTML file
Real-Time RAC Database Monitoring - oratop
The oratop utility, which is currently restricted to the Linux operating system, resembles an OS-specific top-like utility, providing near–real-time resource monitoring capability for a RAC and single-instance database from 11.2.0.3 onward. It is a very lightweight monitoring utility that utilizes very minimal resources, 0.20% memory and <1% CPU, on the server. With this utility, you can monitor a RAC database, a stand-alone database, and local as well as remote databases.
Download the oratop.zip file from My Oracle Support (MOS) at https://support.oracle.com/epmos/faces/MosIndex.jspx?_afrLoop=463945118311568&_afrWindowMode=0&_adf.ctrl-state=19ctrm4ozz_4. Unzip the file and set the appropriate permission to the oratop file, which is chmod 755 oratop on the Linux platform. Ensure the following database init parameters: timed_statistics set to TRUE and the statistics_level set to TYPICAL. Also, the following environmental settings need to be set on the local node before invoking the utility:
$ ORACLE_UNQNAME=<dbname>
$ ORACLE_SID=<instance_name1>
$ ORACLE_HOME=<db_home>
$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib
$ export PATH=$ORACLE_HOME/bin:$PATH
The following example runs the utility and sets the interval to every ten seconds to refresh the window (default is every three seconds):
$ratop –i 10
$oratop –t <tns_name_for_remote_db> -- to monitor remote database.
Input the database user name and password credentials when prompted. When no credentials are entered, it will use the default user SYSTEM with MANAGER as the default password to connect to the database. If you are using a non-system database user, ensure that the user has read permission on some of the dictionary dynamic views, such as v_$SESSION, v_vSYSMETRIC, v_$INSTANCE, v_ $PROCESS, v_$SYSTEM_EVENT, and so on.
Figure 2-5 shows the output window of the oratop utility.
Figure 2-5. oratop output screen shot
The granular statistics that appear in the window help to identify database performance contention and bottlenecks. The live window guidelines are categorized into three major sections: 1) top five events (similar to the AWR/ASH report), 2) top Oracle sessions on the server in terms of high I/O and memory and 3) DB load (also provides blocking session details, etc.). Press q or Q to quit from the utility and press Cont+C to abort.
Note The tool is available for downloading only through MOS, which requires additional support licensing.
RAC Configuration Audit Tool - RACcheck
RACcheck is a tool that performs audits on various important configuration settings and provides a comprehensive HTML-based assessment report on the overall health check status of the RAC environment.
The tool is currently certified on the majority of operating systems that can be used in interactive and non-interactive modes and also supports multiple databases at a single run. It can be run across all nodes, on a subset of cluster nodes, or on a local node. When the tool is invoked, it carries out the health checks on various components, such as cluster-wide, CRS, Grid, RDBMS, ASM, general database initialization parameters, OS kernel settings, and OS packages. The most suitable time for performing health checks with this tool is immediately after deploying a new RAC environment, before and after planned system maintenances, prior to major upgrades, and quarterly.
With its Upgrade Readiness Assessment Module ability, it will simplify and enhance system upgrade readiness reliability. Apart from regular upgrade prerequisite verifications, the module lets you perform automatic prerequisite verification checks for patches, best practices, and configuration. This will be of great assistance before planning any major cluster upgrades.
Invoke the RACcheck Tool
Download the raccheck.zip file from MOS, unzip it, and set the appropriate permission to the raccheck file, which is chmod 755 raccheck on Unix platforms. To invoke the tool in interactive mode, use the following example at the command prompt as the Oracle software owner and provide the following input when prompted:
$./raccheck
To perform RAC upgrade readiness verification checks, use the following example and response with your inputs when prompted with questions:
$./raccheck –u –o pre
Following are the supported arguments with the tool:
$ ./raccheck -h
Usage : ./raccheck [-abvhpfmsuSo:c:rt:]
-a All (Perform best practice check and recommended patch check)
-b Best Practice check only. No recommended patch check
-h Show usage
-v Show version
-p Patch check only
-m exclude checks for Maximum Availability Architecture -u Run raccheck to check pre-upgrade or post-upgrade best
practices.-o pre or -o post is mandatory with -u option like ./raccheck -u -o pre
-f Run Offline.Checks will be performed on data already -o Argument to an option. if -o is followed by
v,V,Verbose,VERBOSE or Verbose, it will print checks which
passs on the screen
if -o option is not specified,it will print only failures on
screen. for eg: raccheck -a -o v
-r To include High availability best practices also in regular
healthcheck eg ./racchekck -r(not applicable for exachk)
-c Pass specific module or component to check best practice
for.
The assessment report provides a better picture of the RAC environment and includes an overall system health check rating (out of 100), Oracle Maximum Availability Architecture (MAA) best practices, bug fixes, and patch recommendations.
Note The tool is available for download only through MOS, which requires additional support licensing. Executing the tool when the systems are heavily loaded is not recommended. It is recommended to test the tool in a non-production environment first, as it doesn’t come by default with Oracle software.
Cluster Diagnostic Collection Tool - diagcollection.sh
Every time you run you into a few serious Clusterware issues and confront node eviction, you typically look at various CRS-level and OS-level logs to gather the required information to comprehend the root cause of the problem. Because Clusterware manages a huge number of logs and trace files, it will sometimes be cumbersome to review many logs from each cluster node. The diagcollection.sh tool, located under GRID_HOME/bin, is capable of gathering the required diagnostic information referring to various important sources, such as CRS logs, trace and core files, OCR data, and OS logs.
With the diagnostic collection tool, you have the flexibility to collect diagnostic information at different levels, such as cluster, Oracle RDBMS home, Oracle base, and Core analysis. The gathered information from various resources will then be embedded into a few zip files. You therefore need to upload these files to Oracle Support for further analysis to resolve the problem.
The following example will collect the $GRID_HOME diagnostic information:
./diagcollection.sh --collect --crs $GRID_HOME
The following CRS diagnostic archives will be created in the local directory:
crsData_usdbt43_20121204_1103.tar.gz -> logs, traces, and cores from CRS home.
Note Core files will be packaged only with the --core option.
ocrData_usdbt43_20121204_1103.tar.gz -> ocrdump, ocrcheck etc
coreData_usdbt43_20121204_1103.tar.gz -> contents of CRS core files in text format
osData_usdbt43_20121204_1103.tar.gz -> logs from operating system
Collecting crs data
log/usdbt43/cssd/ocssd.log: file changed size
Collecting OCR data
Collecting information from core files
Collecting OS logs
After data collection is complete, the following files will be created in the local directory:
crsData_$hostname_20121204_1103.tar.gz
ocrData_$hostname _20121204_1103.tar.gz
coreData_$hostname _20121204_1103.tar.gz
osData_$hostname _20121204_1103.tar.gz
The following example will assist you in getting the supported parameters list that can be used with the tool (output is trimmed):
./diagcollection.sh -help
--collect
[--crs] For collecting crs diag information
[--adr] For collecting diag information for ADR; specify ADR location
[--chmos] For collecting Cluster Health Monitor (OS) data
[--all] Default.For collecting all diag information. <<<>>>
[--core] Unix only. Package core files with CRS data
[--afterdate] Unix only. Collects archives from the specified date.
[--aftertime] Supported with -adr option. Collects archives after the specified
[--beforetime] Supported with -adr option. Collects archives before the specified
[--crshome] Argument that specifies the CRS Home location
[--incidenttime] Collects Cluster Health Monitor (OS) data from the specified
[--incidentduration] Collects Cluster Health Monitor (OS) data for the duration
NOTE:
1. You can also do the following
./diagcollection.pl --collect --crs --crshome <CRS Home>
--clean cleans up the diagnosability
information gathered by this script
--coreanalyze Unix only. Extracts information from core files
and stores it in a text file
Use the –-clean argument with the script to clean up previously generated files.
Note Ensure that enough free space is available at the location where the files are being generated. Furthermore, depending upon the level used to collect the information, the script might take a considerable amount of time to complete the job. Hence, keep an eye on resource consumption on the node. The tool must be executed as root user.
The Oracle CHM tool is designed to detect and analyze OS–and cluster resource–related degradations and failures. Formerly known as Instantaneous Problem Detector for Clusters or IPD/OS, this tool tracks the OS resource consumption on each RAC node, process, and device level and also connects and analyzes the cluster-wide data. This tool stores real-time operating metrics in the CHM repository and also reports an alert when certain metrics pass the resource utilization thresholds. This tool can be used to replay the historical data to trace back what was happening at the time of failure. This can be very useful for the root cause analysis of many issues that occur in the cluster such as node eviction.
For Oracle Clusterware 10.2 to 11.2.0.1, the CHM/OS tool is a standalone tool that you need to download and install separately. Starting with Oracle Grid Infrastructure 11.2.02, the CHM/OS tool is fully integrated with the Oracle Grid Infrastructure. In this section we focus on this integrated version of the CHM/OS.
The CHM tool is installed to the Oracle Grid Infrastructure home and is activated by default in Grid Infrastructure 11.2.0.2 and later for Linux and Solaris and 11.2.0.3 and later for AIX and Windows. CHM consists of two services: osysmond and ologgerd. osysmond runs on every node of the cluster to monitor and collect the OS metrics and send the data to the cluster logger services. ologgerd receives the information from all the nodes and stores the information in the CHM Repository. ologgerd runs in one node as the master service and in another node as a standby if the cluster has more than one node. If the master cluster logger service fails, the standby takes over as the master service and selects a new node for standby. The following example shows the two processes, osysmond.bin and ologgerd:
$ ps -ef | grep -E 'osysmond|ologgerd' | grep -v grep
root 3595 1 0 Nov14 ? 01:40:51 /u01/app/11.2.0/grid/bin/ologgerd -m k2r720n1 -r -d /u01/app/11.2.0/grid/crf/db/k2r720n2
root 6192 1 3 Nov08 ? 1-20:17:45 /u01/app/11.2.0/grid/bin/osysmond.bin
The preceding ologgerd daemon uses '-d /u01/app/11.2.0/grid/crf/db/k2r720n2', which is the directory where the CHM repository resides. The CHM repository is a Berkeley DB-based database stored as *.bdb files in the directory. This directory requires 1GB of disk space per node in the cluster.
$ pwd
/u01/app/11.2.0/grid/crf/db/k2r720n2
$ ls *.bdb
crfalert.bdb crfclust.bdb crfconn.bdb crfcpu.bdb crfhosts.bdb crfloclts.bdb crfts.bdb repdhosts.bdb
Oracle Clusterware 12cR1 has enhanced the CHM by providing a highly available server monitor service and also support for the Flex Cluster architecture. The CHM in Oracle Clusterware 12cR1 consists of three components:
The System Monitor Service process (osysmon) runs on every node of the cluster. The System Monitor Service monitors the OS and cluster resource–related degradation and failure and collects the real-time OS metric data and sends these data to the cluster logger service.
Instead of running on every cluster node as in Oracle Clusterware 11gR2, there is only one cluster logger service per every 32 nodes in Oracle Clusterware 12cR1. For high availability, this service will be restarted in another node if this service fails.
On the node that runs both osysmon and ologgerd:
grid@knewracn1]$ ps -ef | grep -E 'osysmond|ologgerd' | grep -v grep
root 4408 1 3 Feb19 ? 08:40:32 /u01/app/12.1.0/grid/bin/osysmond.bin
root 4506 1 1 Feb19 ? 02:43:25 /u01/app/12.1.0/grid/bin/ologgerd -M -d /u01/app/12.1.0/grid/crf/db/knewracn1
On other nodes that run only osysmon:
[grid@knewracn2 product]$ ps -ef | grep -E 'osysmond|ologgerd' | grep -v grep
root 7995 1 1 Feb19 ? 03:26:27 /u01/app/12.1.0/grid/bin/osysmond.bin
In Oracle Clusterware 12cR1, all the metrics data that the cluster logger service receives are stored in the central Oracle Grid Infrastructure Management Repository (the CHM repository), which is a new feature in 12c Clusterware. The repository is configured during the installation or upgrade to Oracle Clusterware by selecting the “Configure Grid Infrastructure Management Repository” option in Oracle Universal Installer (OUI), as shown in Figure 2-6.
Figure 2-6. Configure Grid Infrasturecture Management Repository in OUI
This repository is an Oracle database. Only one node runs this repository in a cluster. If the cluster is a Flex Cluster, this node must be a hub node. Chapter 4 will discuss the architecture of Oracle Flex Clusters and different types of cluster nodes in a Flex Cluster.
To reduce the private network traffic, the repository database (MGMTDB) and the cluster logger service process (osysmon) can be located to run on the same node as shown here:
$ ps -ef | grep -v grep | grep pmon | grep MGMTDB
grid 31832 1 0 Feb20 ? 00:04:06 mdb_pmon_-MGMTDB
$ ps -ef | grep -v grep | grep 'osysmon'
root 2434 1 1 Feb 20 ? 00:04:49 /u01/app/12.1.0/grid/bin/osysmond.bin
This repository database runs under the owner of the Grid Infrastructure, which is the “grid” user in this example. The database files of the CHM repository database are located in the same diskgroup as the OCR and VD. In order to store the Grid Infrastructure repository, the size requirement of this diskgroup has been increased from the size for the OCR and VD. The actual size and the retention policy can be managed with the oclumon tool. The oclumon tool provides a command interface to query the CHM repository and perform various administrative tasks of the CHM repository. The actual size and the retention policy can be managed with the oclumon tool.
For example, we can get the repository information such as size, repository path, the node for the cluster logger service, and all the nodes that the statistics are collected from using a command like this:
$ oclumon manage -get repsize reppath alllogger -details
CHM Repository Path = +DATA1/_MGMTDB/DATAFILE/sysmgmtdata.260.807876429
CHM Repository Size = 38940
Logger = knewracn1
Nodes = knewracn1,knewracn2,knewracn4,knewracn7,knewracn5,knewracn8,knewracn6
The CHM admin directory $GRID_HOME/crf/admin has crf(hostname).ora, which records the information about the CHM repository:
cat /u01/app/12.1.0/grid/crf/admincrfknewracn1.ora
BDBLOC=default
PINNEDPROCS=osysmond.bin,ologgerd,ocssd.bin,cssdmonitor,cssdagent,mdb_pmon_-MGMTDB,kswapd0
MASTER=knewracn1
MYNAME=knewracn1
CLUSTERNAME=knewrac
USERNAME=grid
CRFHOME=/u01/app/12.1.0/grid
knewracn1 5=127.0.0.1 0
knewracn1 1=127.0.0.1 0
knewracn1 0=192.168.9.41 61020
MASTERPUB=172.16.9.41
DEAD=
knewracn1 2=192.168.9.41 61021
knewracn2 5=127.0.0.1 0
knewracn2 1=127.0.0.1 0
knewracn2 0=192.168.9.42 61020
ACTIVE=knewracn1,knewracn2,knewracn4
HOSTS=knewracn1,knewracn2,knewracn4
knewracn5 5=127.0.0.1 0
knewracn5 1=127.0.0.1 0
knewracn4 5=127.0.0.1 0
knewracn4 1=127.0.0.1 0
knewracn4 0=192.168.9.44 61020
knewracn8 5=127.0.0.1 0
knewracn8 1=127.0.0.1 0
knewracn7 5=127.0.0.1 0
knewracn7 1=127.0.0.1 0
knewracn6 5=127.0.0.1 0
knewracn6 1=127.0.0.1 0
You can collect CHM data on any node by running the diagcollection.pl utility on that node as a privileged user root. The steps are as follows:
First, find the cluster node where the cluster logger service is running:
$/u01/app/12.1.0/grid /bin/oclumon manage -get master
Master = knewracn1
Log in to the cluster node that runs the cluster logger service node as a privileged user (in other words, the root user) and run the diagcollection.pl utility. This utility collects all the available data stored in the CHM Repository. You can also specify the specific time and duration to collect the data:
[root@knewracn1 ∼]# /u01/app/12.1.0/grid/bin/diagcollection.pl -collect -crshome /u01/app/12.1.0/grid
Production Copyright 2004, 2010, Oracle. All rights reserved
CRS diagnostic collection tool
The following CRS diagnostic archives will be created in the local directory.
crsData_knewracn1_20130302_0719.tar.gz -> logs,traces and cores from CRS home. Note: core files will be packaged only with the --core option.
ocrData_knewracn1_20130302_0719.tar.gz -> ocrdump, ocrcheck etc
coreData_knewracn1_20130302_0719.tar.gz -> contents of CRS core files in text format
osData_knewracn1_20130302_0719.tar.gz -> logs from operating system
Collecting crs data
/bin/tar: log/knewracn1/cssd/ocssd.log: file changed as we read it
Collecting OCR data
Collecting information from core files
No corefiles found
The following diagnostic archives will be created in the local directory.
acfsData_knewracn1_20130302_0719.tar.gz -> logs from acfs log.
Collecting acfs data
Collecting OS logs
Collecting sysconfig data
This utility creates two .gz files, chmosData_<host>timestamp.tar.gz and
osData_<host>timestamp.tar.gz, in the current working directory:
[root@knewracn1 ∼]# ls -l *.gz
-rw-r--r--. 1 root root 1481 Mar 2 07:24 acfsData_knewracn1_20130302_0719.tar.gz
-rw-r--r--. 1 root root 58813132 Mar 2 07:23 crsData_knewracn1_20130302_0719.tar.gz
-rw-r--r--. 1 root root 54580 Mar 2 07:24 ocrData_knewracn1_20130302_0719.tar.gz
-rw-r--r--. 1 root root 18467 Mar 2 07:24 osData_knewracn1_20130302_0719.tar.gz
These .gz files include various log files that can be used for the diagnosis of your cluster issues.
You also can use the OCLUMON command-line tool to query the CHM repository to display node-specific metrics for a specified time period. You also can print the durations and the states for a resource on a node during a specified time period. The states can be based on predefined thresholds for each resource metric and are denoted as red, orange, yellow, and green, in decreasing order of criticality. OCLUMON command syntax is as follows:
$oclumon dumpnodeview [[-allnodes] | [-n node1 node2] [-last "duration"] |
[-s "time_stamp" -e "time_stamp"] [-v] [-warning]] [-h]
-s indicates the start timestamp and –e indicates the end timestamp
For example, we can run the command like this to write the report into a text file:
$GRID_HOME/bin/oclumon dumpnodeview -allnodes -v -s "2013-03-0206:20:00" -e "2013-03-0206:30:00"> /home/grid/chm.txt
A segment of /home/grid/chm.txt looks like this:
$less /home/grid/chm.txt
----------------------------------------
Node: knewracn1 Clock: '13-03-02 06.20.04' SerialNo:178224
----------------------------------------
SYSTEM:
#pcpus: 1 #vcpus: 2 cpuht: Y chipname: Intel(R) cpu: 7.97 cpuq: 2 physmemfree: 441396 physmemtotal: 5019920 mcache: 2405048 swapfree: 11625764 swaptotal: 12583912 hugepagetotal: 0 hugepagefree: 0 hugepagesize: 2048 ior: 93 iow: 242 ios: 39 swpin: 0 swpout: 0 pgin: 90 pgout: 180 netr: 179.471 netw: 124.380 procs: 305 rtprocs: 16 #fds: 26144 #sysfdlimit: 6815744 #disks: 5 #nics: 4 nicErrors: 0
TOP CONSUMERS:
topcpu: 'gipcd.bin(4205) 5.79' topprivmem: 'ovmd(719) 214072' topshm: 'ora_ppa7_knewdb(27372) 841520' topfd: 'ovmd(719) 1023' topthread: 'crsd.bin(4415) 48'
CPUS:
cpu0: sys-4.94 user-3.10 nice-0.0 usage-8.5 iowait-10.93
cpu1: sys-5.14 user-2.74 nice-0.0 usage-7.88 iowait-4.68
PROCESSES:
name: 'ora_smco_knewdb' pid: 27360 #procfdlimit: 65536 cpuusage: 0.00 privmem: 2092 shm: 17836 #fd: 26 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_gtx0_knewdb' pid: 27366 #procfdlimit: 65536 cpuusage: 0.00 privmem: 2028 shm: 17088 #fd: 26 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_rcbg_knewdb' pid: 27368 #procfdlimit: 65536 cpuusage: 0.00 privmem: ......
RAC Database Hang Analysis
In this section, we will explore the conceptual basis for invoking and interpreting a hang analysis dump to diagnose a potential RAC database hung/slow/blocking situation. When a database either is running unacceptably slow, is hung due to an internal system developed interdependence deadlock or a latch causing database hung/slowness, or else a prolonged deadlock/block hurts overall database performance, it is advisable to perform a hang analysis, which helps greatly in identifying the root cause of the problem. The following set of examples explains how to invoke and use the hang analysis:
SQL> sqlplus / as sysdba
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug setinst all -- enables cluster-wide hang analysis
SQL> oradebug –g all hanganalyze 3 --is the most commonly used level
<< wait for couple of minutes >>
SQL> oradebug –g all hanganalyze 3
The hang analysis levels can be the currently set value between 1 to 5 and 10. When hanganlyze is invoked, the diagnostic information will be written to a dump file under $ORACLE_BASE/diag/rdbms/dbname/instance_name/trace, which can be used to troubleshoot the problem.
We have built the following test case to develop a blocking scenario in a RAC database to demonstrate the procedure practically. We will then interpret the trace file to understand the contents to troubleshoot the issue. The following steps were performed as part of the test scenario:
Create an EMP table:
SQL> create table emp (eno number(3),deptno number(2), sal number(9));
Load a few records in the table.
From instance 1, execute an update statement:
SQL> update emp set sal=sal+100 where eno=101; -- not commit performed
From instance 2, execute an update statement for the same record to develop a blocking scenario:
SQL> update emp set sal=sal+200 where eno=101;
At this point, the session on instance 2 is hanging and the cursor doesn’t return to the SQL prompt, as expected.
Now, from another session, run the hang analysis as follows:
SQL>oradebug setmypid
Statement processed.
SQL >oradebug setinst all
Statement processed.
SQL >oradebug -g all hanganalyze 3 <level 3 is most suitable in many circumstances>
Hang Analysis in /u00/app/oracle/diag/rdbms/rondb/RONDB1/trace/RONDB1_diag_6534.trc
Let’s have a walk-through and interpret the contents of the trace file to identify the blocker and holder details in context. Here is the excerpt from the trace file:
Node id: 1
List of nodes: 0, 1, << nodes (instance) count >>
*** 2012-12-16 17:19:18.630
===============================================================================
HANG ANALYSIS:
instances (db_name.oracle_sid): rondb.rondb2, rondb.rondb1
oradebug_node_dump_level: 3 << hanganlysis level >>
analysis initiated by oradebug
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 17:19:17 ]
NOTE: scheduling delay has not been sampled for 0.977894 secs 0.000000 secs from [ 17:19:14 - 17:19:18 ], 5 sec avg
0.000323 secs from [ 17:18:18 - 17:19:18 ], 1 min avg
0.000496 secs from [ 17:14:19 - 17:19:18 ], 5 min avg
===============================================================================
Chains most likely to have caused the hang:
[a] Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
Chain 1 Signature Hash: 0x38c48850
===============================================================================
Non-intersecting chains:
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by: << waiter >>
{
instance: 2 (rondb.rondb2)
os id: 12250
process id: 40, oracle@hostname (TNS V1-V3)
session id: 103
session serial #: 1243
}
is waiting for'enq: TX - row lock contention' with wait info:
{
p1: 'name|mode'=0x54580006
p2: 'usn<<16 | slot'=0x20001b
p3: 'sequence'=0x101fc
time in wait: 21.489450 sec
timeout after: never
wait id: 33
blocking: 0 sessions
current sql: update emp set sal=sal+100 where eno=1
and is blocked by
=> Oracle session identified by: << holder >>
{
instance: 1 (imcruat.imcruat1)
os id: 8047
process id: 42, oracle@usdbt42 (TNS V1-V3)
session id: 14
session serial #: 125
}
which is waiting for 'SQL*Net message from client' with wait info:
{
p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
time in wait: 27.311965 sec
timeout after: never
wait id: 131
blocking: 1 session
*** 2012-12-16 17:19:18.725
State of ALL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[102]/2/103/1243/c0000000e4ae9518/12250/NLEAF/[262]
[262]/1/14/125/c0000000d4a03f90/8047/LEAF/
*** 2012-12-16 17:19:47.303
===============================================================================
HANG ANALYSIS DUMPS:
oradebug_node_dump_level: 3
===============================================================================
State of LOCAL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[102]/2/103/1243/c0000000e4ae9518/12250/NLEAF/[262]
===============================================================================
END OF HANG ANALYSIS
===============================================================================
In the preceding example, SID 102 on instance 2 is blocked by SID 261 on instance 1. Upon identifying the holder, either complete the transaction or abort the session to release the lock from the database.
It is sometimes advisable to have the SYSTEMSTATE dump along with the HANGANALYSIS to generate more detailed diagnostic information to identify the root cause of the issue. Depending upon the level that is used to dump the SYSTEMSTATE, the cursor might take a very long time to return to the SQL prompt. The trace file details also can be found in the database alter.log file.
You shouldn’t be generating the SYSTEMSTATE dump under normal circumstances; in other words, unless you have some serious issues in the database or are advised by Oracle support to troubleshoot some serious database issues. Besides, the SYSTEMSTATE tends to generate a vast trace file, or it can cause an instance crash under unpredictable circumstances.
Above all, Oracle provides the HANGFG tool to automate the collection of systemstate and hang analysis for a non-RAC and RAC database environment. You need to download the tool from my_oracle_suport (previously known as metalink). Once you invoke the tool, it will generate a couple of output files, named hangfiles.out and hangfg.log, under the $ORACLE_BASE/diag/rdbms/database/instance_name/trace location.
Summary
This chapter discussed the architecture and components of the Oracle Clusterware stack, including the updates in Oracle Clusterware 12cR1. We will talk about some other new Oracle Clusterware features introduced in Oracle 12cR1 in Chapter 4.
This chapter also discussed tools and tips for Clusterware management and troubleshooting. Applying the tools, utilities, and guidelines described in this chapter, you can diagnose many serious cluster-related issues and address Clusterware stack startup failures. In addition, you have learned how to modify the default tracing levels of various Clusterware daemon processes and their subcomponents to obtain detailed debugging information to troubleshoot various cluster-related issues. In a nutshell, the chapter has offered you all essential cluster management and troubleshooting concepts and skills that will help you in managing a medium-scale or large-scale cluster environment.