Runtime diagnostics
This chapter describes the enhancements that were implemented for runtime diagnostics (RTD) that are running in z/OS V2R2.
RTD is a z/OS component that helps to find and remove soft failures that might lead to sick but not dead (SBND) situations.
This chapter includes the following topics:
5.1 RTD overview
RTD was originally introduced in z/OS V1R12. It analyzes SBND systems quickly and searches for evidence of soft failures. For more information about soft failures, see “PFA overview” on page 38. Soft failures can be of the following areas:
Component issues
Global resource contention
Important address space execution issues
You can use RTD when your operations staff report a problem on the system. The benefit of RTD is that it provides a timely, comprehensive analysis at a critical time without the need for a storage dump. This advantage can save you time.
You can use RTD to quickly analyze an ailing system for the following types of problems:
Component problems that are identified as critical messages in OPERLOG
ENQ, GRS latch contention for system address spaces, and z/OS UINX file system contention
Address spaces with high CPU usage
Address spaces that appear to be in a task control block (TCB) enabled loop
Local lock conditions
JES2 health exceptions
Server address space health exceptions
With that information, you can take the next step, including the following tasks:
Cancel the relevant jobs
Further investigate the class of resources, or a single address space by using a monitor, such as IBM RMF™ or Omegamon XE for z/OS.
Use the following z/OS command to start RTD from your console or SDSF:
S HZR,SUB=MSTR
You can then start analyzing your system by entering the following command:
F HZR,ANALYZE
When you enter the analyze command, a report displays, as shown in Figure 5-1.
F HZR,ANALYZE
HZR0200I RUNTIME DIAGNOSTICS RESULT 319
SUMMARY: SUCCESS
REQ: 001 TARGET SYSTEM: SC81 HOME: SC81 2015/08/18 - 13:40:11
INTERVAL: 60 MINUTES
EVENTS:
FOUND: 04 - PRIORITIES: HIGH:02 MED:02 LOW:00
TYPES: CF:01 DUMPS:02 ENQ:01
----------------------------------------------------------------------
EVENT 01: HIGH - ENQ - SYSTEM: SC81 2015/08/18 - 13:40:11
ENQ WAITER - ASID:0035 - JOBNAME:HZSPROC - SYSTEM:SC81
ENQ BLOCKER - ASID:0014 - JOBNAME:HZSPROC - SYSTEM:SC81
QNAME: SYSDSN
RNAME: SYS1.SC81.HZSPDATA
ERROR: ADDRESS SPACES MIGHT BE IN ENQ CONTENTION.
ACTION: USE YOUR SOFTWARE MONITORS TO INVESTIGATE BLOCKING JOBS AND
ACTION: ASIDS.
----------------------------------------------------------------------
EVENT 02: HIGH - CF - SYSTEM: SC81 2015/08/18 - 12:42:41
IXC585E STRUCTURE HZS_HEALTHCHKLOG IN COUPLING FACILITY CF8B,
PHYSICAL STRUCTURE VERSION CF61F249 A0D1E082,
IS AT OR ABOVE STRUCTURE FULL MONITORING THRESHOLD OF 80%:
SPACE USAGE IN-USE TOTAL %
ENTRIES: 1645 1954 84
ERROR: INDICATED STRUCTURE IS APPROACHING FULL MONITORING THRESHOLD.
ACTION: D XCF,STR,STRNAME=strname TO GET STRUCTURE INFORMATION.
ACTION: INCREASE STRUCTURE SIZE OR TAKE ACTION AGAINST APPLICATION.
----------------------------------------------------------------------
EVENT 03: MED - DUMPS - SYSTEM: SC81 2015/08/18 - 13:05:10
IEA799I AUTOMATIC ALLOCATION OF SVC DUMP DATASET FAILED
DUMPID=018 REQUESTED BY JOB (CONSOLE )
DYNALLOC FAILED RETURN CODE=04 ERROR RSN CODE=970C INFO RSN CODE=0000
SMS RSN CODE=4379
ERROR: THE SYSTEM WAS UNABLE TO ALLOCATE A DUMP DATA SET FOR A DUMP.
ACTION: D D TO VIEW ALLOCATION STATUS. DD ADD,VOL=volser TO ADD DUMP
ACTION: RESOURCES.
----------------------------------------------------------------------
EVENT 04: MED - DUMPS - SYSTEM: SC81 2015/08/18 - 13:05:20
IEA799I AUTOMATIC ALLOCATION OF SVC DUMP DATASET FAILED
DUMPID=019 REQUESTED BY JOB (HSIBMGR )
DYNALLOC FAILED RETURN CODE=04 ERROR RSN CODE=970C INFO RSN CODE=0000
SMS RSN CODE=4379
ERROR: THE SYSTEM WAS UNABLE TO ALLOCATE A DUMP DATA SET FOR A DUMP.
ACTION: D D TO VIEW ALLOCATION STATUS. DD ADD,VOL=volser TO ADD DUMP
ACTION: RESOURCES.
----------------------------------------------------------------------
 
Figure 5-1 Output of the RTD analyze command
5.2 Health-based routing integration with runtime diagnostics
In z/OS V2R2, health-based routing is an enhancement to Workload Manager (WLM) dynamic workload routing. The focus here is to further reduce the effect that is caused by middleware or transaction manager server health issues.
WLM provides a health service that is called IWM4HLTH to enable multiple callers to report on a server’s health. The server identifies itself and can provide reasons for its health ratings.
When you run the F HZR,ANALYZE command, RTD starts a new query service that is called IWM4QHLT. This service obtains server health states. The information is then used for diagnostic and serviceability purposes.
If any servers show a current health value that is less than 100, a SERVERHEALTH event is returned to PFA, and PFA starts RTD for health checks that can indicate that the metric is too low. The event is included in the predictive failure analysis (PFA) check exception report. The health indicator is a number that shows how well a server is performing. It can be an integer number of 0 - 100.
Benefits of these new functions are improved routing recommendations and diagnostic reporting about server health states.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset