CICS TS for z/OS V5.5
IBM CICS Transaction Server for z/OS (CICS TS) V5.5 release introduces various technical and operational capabilities. Included in these updates are improvements that provide performance benefits over previous CICS releases. The performance report CICS TS V5.5 includes the following subject areas:
Key performance benchmarks that are presented as a comparison with the CICS TS V5.4 release.
An outline of improvements made regarding the threadsafe characteristics of the CICS TS run time.
Details of the changes that are made to performance-critical CICS initialization parameters, and the effect of these updates.
A description of all the updated statistics and monitoring fields.
High-level views of new functionality that was introduced in the CICS TS V5.5 release, including performance benchmark results where appropriate.
Studies of several areas that have received frequent performance questions from customers, including the use of:
 – CICS policies
 – encrypted zFS file systems with CICS
 – multiple JVM servers in a single CICS region
 – shared class cache to improve JVM server startup performance
This chapter includes the following topics:
9.1 Introduction
When we compiled the results for this chapter, the workloads were executed an on IBM z14™ model M04 (machine type 3906). A maximum of 32 dedicated CPs were available on the measured LPAR, with a maximum of six dedicated CPs available to the LPAR used to simulate users. These LPARs are configured as part of a parallel sysplex. An internal coupling facility was co-located on the same central processor complex (CPC) as the measurement and driving LPARs, connected by using internal coupling peer (ICP) links. An IBM System Storage DS8870 (machine type 2424) was used to provide external storage.
This chapter presents the results of several performance benchmarks when executed in a CICS TS for z/OS V5.5 environment. Unless otherwise stated in the results, the CICS TS V5.5 environment was the code available at GA time. Several of the performance benchmarks are presented in the context of a comparison against CICS TS V5.4. The CICS TS V5.4 environment contained all PTFs issued before 20 November 2017. All LPARs used z/OS V2.3.
For a definition of performance terms used in this chapter, see Chapter 1, “Performance terminology” on page 3. A description of the test methodology that is used can be found in Chapter 2, “Test methodology” on page 11. For a full description of the workloads used, see Chapter 3, “Workload descriptions” on page 21.
Where reference is made to an LSPR processor equivalent, the indicated machine type and model can be found in the large systems performance reference (LSPR) document. For more information about obtaining and using LSPR data, see 1.3, “Large Systems Performance Reference” on page 6.
9.2 Release-to-release comparisons
This section describes some of the results from a selection of regression workloads that are used to benchmark development releases of CICS TS. For more information about the use of regression workloads, see Chapter 3, “Workload descriptions” on page 21.
9.2.1 Data Systems Workload static routing
The static routing variant of the Data Systems Workload (DSW) is described in 9.2.1, “Data Systems Workload static routing”. This section presents the performance figures that were obtained by running this workload. Table 9-1 lists the results of the DSW static routing workload that used the CICS TS V5.4 release.
Table 9-1 CICS TS V5.4 results for DSW static routing workload
ETR
CICS CPU
CPU per transaction (ms)
4180.81
76.37%
0.183
4938.15
89.21%
0.181
6054.32
108.09%
0.179
6582.22
116.78%
0.177
7143.83
126.76%
0.177
Table 9-2 on page 199 lists the same figures for the CICS TS V5.5 release.
Table 9-2 CICS TS V5.5 results for DSW static routing workload
ETR
CICS CPU
CPU per transaction (ms)
4180.34
73.22%
0.175
4948.82
85.95%
0.174
6057.17
103.90%
0.172
6591.39
112.31%
0.170
7151.20
121.70%
0.170
The average CPU per transaction value for CICS TS V5.4 is calculated to be 0.179 ms. The same value for CICS TS V5.5 is calculated to be 0.172 ms. This is a relative performance improvement of 4% for this workload. However, this is only an absolute improvement of around 7 µs. Such a small change in CPU consumption is unlikely to be measurable outside of lab conditions. However, we can conclude that the performance between CICS TS V5.4 and CICS TS V5.5 is not degraded for this workload.
The figures from Table 9-1 and Table 9-2 are plotted in the chart in Figure 9-1.
Figure 9-1 Plot of CICS TS V5.4 and V5.5 performance figures for DSW static routing workload
The measured CPU cost for each transaction rate scales linearly in both cases, with CICS TS V5.5 showing a slight improvement as described above.
9.2.2 Java servlet that uses JDBC and VSAM
The Java servlet application is hosted in a CICS JVM server that uses the embedded WebSphere Liberty server. The workload is driven through HTTP requests by using IBM Workload Simulator for z/OS, as described in 2.4, “Driving the workload” on page 16. The servlet application accesses VSAM data by using the JCICS API and accesses IBM Db2® by using the JDBC API. For more information about the workload, see 3.4, “WebSphere Liberty servlet with JDBC and JCICS access” on page 26.
The hardware that is used for the benchmarks is described in 9.1, “Introduction” on page 198. The measurement LPAR was configured with three GCPs and one zIIP, which resulted in an LSPR equivalent processor of 3906-704.
The CICS TS V5.4 and CICS TS V5.5 releases were compared by using the software levels as described in 9.1, “Introduction” on page 198. The CICS TS V5.4 configuration also had the PTF applied for APAR PI99650 to ensure that both configurations used WebSphere Liberty V18.0.0.2. Both configurations used a single CICS region and the following additional software levels and configuration options:
IBM Db2 V12
Java 8 SR5 FP25 (64-bit)
Single JVMSERVER resource with THREADLIMIT=256
Both JVM servers used the following JVM options:
-Xgcpolicy:gencon
-Xcompressedrefs
-XXnosuballoc32bitmem
-Xmx200M
-Xms200M
-Xmnx60M
-Xmns60M
-Xmox140M
-Xmos140M
As described in 2.3.1, “Repeatability for Java workloads” on page 14, this workload requires a warm-up period of 20 minutes. After this warm-up phase completed, the request injection rate was increased every 10 minutes. CPU usage data was collected by using IBM z/OS Resource Measurement Facility (RMF). An average CPU per request value was calculated by using the last 5 minutes of each 10-minute interval.
Table 9-3 lists the performance results of the Java servlet workload that used the CICS TS V5.4 release. This data is presented in the same format as described in 8.2.5, “The Java servlet that uses JDBC and VSAM” on page 156.
Table 9-3 CICS TS V5.4 results for WebSphere Liberty JDBC and VSAM workload
ETR
CICS CPU
not zIIP-eligible
CICS CPU
zIIP-eligible
CICS CPU
total
837.63
20.56%
27.55%
48.11%
1664.88
42.38%
55.29%
97.67%
3002.68
85.17%
94.56%
179.73%
Table 9-4 lists the performance results of the JDBC and VSAM workload that used the CICS TS V5.5 release, presented in the same format as Table 9-3.
Table 9-4 CICS TS V5.5 results for WebSphere Liberty JDBC and VSAM workload
ETR
CICS CPU
not zIIP-eligible
CICS CPU
zIIP-eligible
CICS CPU
total
838.90
20.95%
28.66%
49.61%
1662.18
42.82%
55.07%
97.89%
2980.13
85.66%
95.07%
180.73%
The CICS CPU total values from Table 9-3 and Table 9-4 are plotted in Figure 9-2 on page 201.
Figure 9-2 Total CPU comparison for CICS TS V5.4 and V5.5 JDBC and VSAM workload
The zIIP-eligibility figures are presented as a chart in Figure 9-3.
Figure 9-3 zIIP-eligibility comparison for JDBC and VSAM workload with CICS TS V5.4 and V5.5
The average CPU per transaction value for the JDBC and VSAM workload using the CICS TS V5.4 release is calculated to be 0.587 ms. The same value for the CICS TS V5.5 configuration is calculated to be 0.596 ms. The method used to calculate of zIIP-eligibility is described in 8.2.5, “The Java servlet that uses JDBC and VSAM” on page 156. The average zIIP-eligibility for both the CICS TS V5.4 and CICS TS V5.5 workloads was 55.5%.
The average CPU per transaction and the zIIP eligibility calculations show that the performance for the CICS TS V5.4 and CICS TS V5.5 releases is equivalent within measurable limits. This is true for both the total CPU consumed and the fraction that is eligible for offload to a zIIP engine.
9.2.3 The Java OSGi workload
The Java OSGi workload is composed of several applications, as described in 3.5, “Java OSGi workload” on page 27.
The hardware that is used for the benchmarks is described in 9.1, “Introduction” on page 198. The measurement LPAR was configured with three GCPs and one zIIP, which resulted in an LSPR equivalent processor of 3906-704.
The CICS TS V5.4 and CICS TS V5.5 releases were compared by using the software levels as described in 9.1, “Introduction” on page 198. Both configurations used a single CICS region and the following additional software levels and configuration options:
IBM Db2 V12
Java 8 SR5 FP25 (64-bit)
Single JVMSERVER resource with THREADLIMIT=25
Both JVM servers used the following JVM options:
-Xgcpolicy:gencon
-Xcompressedrefs
-XXnosuballoc32bitmem
-Xmx100M
-Xms100M
-Xmnx70M
-Xmns70M
-Xmox30M
-Xmos30M
As described in 2.3.1, “Repeatability for Java workloads” on page 14, this workload requires a warm-up period of 20 minutes. After this warm-up phase completed, the request injection rate was increased every 5 minutes. CPU usage data was collected by using IBM z/OS Resource Measurement Facility (RMF). An average CPU per request value was calculated using the last minute of each 5-minute interval.
Table 9-5 lists the performance results of the Java OSGi workload that used the CICS TS V5.4 release.
Table 9-5 CICS TS V5.4 performance results for OSGi workload
ETR
CICS CPU
not zIIP-eligible
CICS CPU
zIIP-eligible
CICS CPU
total
233.98
20.07%
69.89%
89.96%
467.98
39.29%
139.95%
179.24%
831.07
74.65%
249.21%
323.86%
The performance results for the CICS TS V5.5 release are shown in Table 9-6 on page 203.
Table 9-6 CICS TS V5.5 performance results for OSGi workload
ETR
CICS CPU
not zIIP-eligible
CICS CPU
zIIP-eligible
CICS CPU
total
233.98
20.42%
70.22%
90.64%
467.93
39.61%
140.40%
180.01%
822.57
75.10%
247.61%
322.71%
The CICS CPU total values from Table 9-5 and Table 9-6 are plotted in Figure 9-4 on page 204.
Figure 9-4 Comparing overall CPU utilization for Java OSGi workload with CICS TS V5.4 and V5.5
The offload eligibility figures are presented as a chart in Figure 9-5 on page 204.
Figure 9-5 Comparing offload-eligible CPU utilization for OSGi workload with CICS TS V5.4 and V5.5
The average CPU per transaction value for this workload using the CICS TS V5.4 release is calculated to be 3.857 ms. The same value for the CICS TS V5.5 release is calculated to be 3.881 ms.
Using the methodology to calculate the zIIP eligibility of the workload described in Chapter 8, the CICS TS V5.4 release had an average zIIP eligibility of 77.6%. See 8.2.5, “The Java servlet that uses JDBC and VSAM” on page 156. The CICS TS V5.5 release had an average zIIP eligibility of 77.4%.
As observed with the Java servlet workload, the performance of Java OSGi applications is similar in CICS TS V5.5 when compared to CICS TS V5.4. This similarity includes both total CPU consumed and the fraction that is eligible for offload to a zIIP engine.
9.2.4 Relational Transactional Workload threadsafe
The Relational Transactional Workload (RTW) is described in 3.3, “Relational Transactional Workload” on page 25. This section presents the performance figures that were obtained by running this workload.
Table 9-7 on page 205 lists the performance results for the RTW threadsafe workload that used the CICS TS V5.4 release.
Table 9-7 CICS TS V5.4 results for the RTW threadsafe workload
ETR
CICS CPU
CPU per transaction (ms)
713.33
89.25%
1.251
996.88
124.43%
1.248
1417.03
177.47%
1.252
1959.66
248.73%
1.269
2401.43
309.99%
1.291
Table 9-4 lists the performance results for the RTW threadsafe workload that used the CICS TS V5.5 release.
Table 9-8 CICS TS V5.5 results for the RTW threadsafe workload
ETR
CICS CPU
CPU per transaction (ms)
713.41
88.59%
1.242
997.00
123.74%
1.241
1417.54
176.81%
1.247
1960.32
248.39%
1.267
2402.72
309.49%
1.288
The figures from Table 9-7 and Table 9-8 are shown in Figure 9-6.
Figure 9-6 Plot of CICS TS V5.4 and V5.5 performance figures for RTW threadsafe workload
The average CPU per transaction value for CICS TS V5.4 is calculated to be 1.262 ms. The same value for CICS TS V5.5 is calculated to be 1.257 ms. Notice that this very small absolute performance improvement is in line with that observed with the DSW static routing workload in 9.2.1, “Data Systems Workload static routing” on page 198. The overall effect to the total CPU cost of the workload is negligible. And we can conclude that the performance between CICS TS V5.4 and CICS TS V5.5 has not degraded for this workload.
9.3 Improvements in threadsafety
There are three areas in CICS TS V5.5 that have improved performance by reducing the number of TCB switches required for API commands.
9.3.1 QUERY SECURITY API command
The EXEC CICS QUERY SECURITY command has been enhanced such that the number of TCB switches has been reduced if more than one access level is specified on the command.
For more information on the EXEC CICS QUERY SECURITY command, see the “QUERY SECURITY” topic in IBM Knowledge Center at this website:
9.3.2 Coupling Facility Data Tables
Access to coupling facility data tables (CFDTs) is now threadsafe, so CFDTs can be accessed by applications that are running on open TCBs without incurring a TCB switch. Syncpoint processing of CFDTs can also run on an open TCB. However, note that the open and loading of a CFDT still occurs on the QR TCB. See 9.11, “Threadsafe Coupling Facility Data Tables” on page 217 for performance study relating to CFDTs.
For more information on CFDTs, see the “Using coupling facility data tables” topic in IBM Knowledge Center at this website:
9.3.3 System subtasking and auxiliary temporary storage
The SUBTSKS SIT parameter controls the use of the CO TCB when performing I/O. APAR PH05298 was released after the general availability of CICS TS V5.5. It removes the switch to the CO TCB if the application is executing on an open TCB when it uses CICS auxiliary temporary storage. If the application is currently executing on the QR TCB, then subtasking is performed as normal.
This removal of TCB switches provides a small performance benefit for applications the execute on an open TCB. The threadsafe characteristics of the relevant API commands are unaffected.
 
Note: APAR PH05298 also provides this optimization to all CICS V5 releases.
For more information on the SUBTSKS SIT parameter, see the “SUBTSKS” topic in IBM Knowledge Center at this website:
9.4 Changes to system initialization parameters
Two CICS system initialization table (SIT) parameters that might have a performance impact have been modified in CICS TS V5.5.
For a detailed view of changes to SIT parameters in the CICS TS V5.5 release, see the “Changes to SIT parameters” section of the “Changes to externals in this release” topic in IBM Knowledge Center at this website:
9.4.1 High Performance Option (HPO)
The HPO parameter can now be specified in the PARM parameter on an EXEC PGM=DFHSIP statement or in the SYSIN data set.
For more information, see the “HPO” topic in IBM Knowledge Center at this website:
9.4.2 Minimum TLS level (MINTLSLEVEL)
The default value for the MINTLSLEVEL parameter has changed from TLS10 to TLS12.
For more information, see the “MINTLSLEVEL” topic in IBM Knowledge Center at this website:
9.5 Enhanced instrumentation
The CICS TS V5.5 release continues the expansion of information that is reported by the CICS monitoring and statistics component. This section describes the extra data that is now available in the CICS statistics and monitoring SMF records.
9.5.1 The DFHSOCK performance group
The following new field was added to the DFHSOCK performance group:
New connection indicator (field SOCONMSG)
Indicates whether the task processed the first message for establishing a new connection for a client.
For more information about counters that are available in the DFHSOCK performance group, see the “Performance data in group DFHSOCK” topic in IBM Knowledge Center at the following website:
9.5.2 The DFHWEBB performance group
The following fields were added to the DFHWEBB performance group:
WEB OPEN URIMAP request elapsed time (field WBURIOPN)
The total elapsed time that the user task was processing WEB OPEN URIMAP requests that are issued by the user task.
WEB RECEIVE and WEB CONVERSE receive portion elapsed time (field WBURIRCV)
The total elapsed time that the user task was processing WEB RECEIVE requests and the receiving side of WEB CONVERSE requests that are issued by the user task. The sessions these requests target to are opened by the WEB OPEN URIMAP command.
WEB SEND and WEB CONVERSE send portion elapsed time (field WBURISND)
The total elapsed time that the user task was processing WEB SEND requests and the sending side of WEB CONVERSE requests that are issued by the user task. The sessions these requests target to are opened by the WEB OPEN URIMAP command.
Node.js application name (field NJSAPPNM)
Node.js application name from which the task was started.
For more information about counters that are available in the DFHWEBB performance group, see the “Performance data in group DFHWEBB” topic in IBM Knowledge Center at this website:
9.5.3 The DFHWEBC performance group
A new performance group has been created with the following field:
INVOKE SERVICE request elapsed time (field WBSVINVK)
The total elapsed time that the user task was processing INVOKE SERVICE requests for WEBSERVICEs.
For more information about counters that are available in the DFHWEBC performance group, see the “Performance data in group DFHWEBC” topic in IBM Knowledge Center at this website:
9.5.4 ISC/IRC system entry resource statistics
The following new field was added to the collected ISC/IRC system entry resource statistics:
Peak aids in chain (field A14EAHWM)
The peak number of automatic initiate descriptors (AID) that were present in the AID chain at any one time.
A fragment of a sample DFHSTUP report that shows the new statistics field is shown in Example 9-1.
Example 9-1 Fragment of ISC/IRC system entry resource statistics report produced by CICS TS V5.5 DFHSTUP
Connection name. . . . . . . . . . . . . . . . . : FOR
Connection netname . . . . . . . . . . . . . . . : IYCUZC27
Access Method / Protocol . . . . . . . . . . . . : XM /
Autoinstalled Connection Create Time . . . . . . :
Send session count . . . . . . . . . . . . . . . : 400
Aids in chain. . . . . . . . . . . . . . . . . . : 0
Peak aids in chain . . . . . . . . . . . . . . . : 74
ATIs satisfied by contention losers. . . . . . . : 0
Current contention losers. . . . . . . . . . . . : 0
For more information about ISC/IRC system entry statistics, see the topic “ISC/IRC system and mode entry statistics” in IBM Knowledge Center at this website:
9.5.5 Policy statistics
Statistics are now available for CICS policy rules. CICS collects resource statistics for each rule that is defined in a policy, and supplies a summary report.
Example 9-2 shows a sample DFHSTUP report for an installed policy.
Example 9-2 Extract of sample policy statistics report produced by CICS TS V5.5 DFHSTUP
Policy name. . . . . . . . . . . . : file_v51
Policy user tag. . . . . . . . . . :
Bundle name. . . . . . . . . . . . : PLCY51FC
Bundle directory . . . . . . . . . :  /u/iburnet/git/cics-perf-workload-dsw-lsr/bu
                                   :  ndles/com.ibm.cics.perf.workload.dsw.lsr.pol
                                   :  icy.V51.file/
Rule name. . . . . . . . . . . . . : READ
Rule type. . . . . . . . . . . . . : filerequest
Rule subtype . . . . . . . . . . . : read
Action type. . . . . . . . . . . . : abend
Action count . . . . . . . . . . . : 0
Action time. . . . . . . . . . . . :
For more information about CICS policy statistics, see the “Policy statistics” topic in IBM Knowledge Center at this website:
9.5.6 Transaction resource statistics
The following field was added to the collected transaction resource statistics:
Abend count (field XMRAENDC)
The number of times that this transaction has abended.
For more information about transaction resource statistics, see the topic “Transaction statistics” in IBM Knowledge Center at this website:
9.5.7 Transaction resource class data
CICS monitoring is enhanced with new monitoring records URIMAP and WEBSERVICE in the resource monitoring class. Multiple URIMAP or WEBSERVICE records can be monitored for one task.
The following fields are now available for each URIMAP entry in a transaction resource monitoring record:
MNR_URIMAP_NAME
MNR_URIMAP_CIPHER
MNR_URIMAP_WEBOPEN
MNR_URIMAP_WEBRECV
MNR_URIMAP_WEBSEND
The following fields are now available for each WEBSERVICE entry in a transaction resource monitoring record:
MNR_WEBSVC_NAME
MNR_WEBSVC_PIPE
MNR_WEBSVC_INVK
For more information about fields that are available in the transaction resource class data, see the “Transaction resource class data: Listing of data fields” topic in IBM Knowledge Center at this website:
9.6 Virtual storage constraint relief
The Web domain (WB) now uses internal 64-bit buffer storage when it sends and receives HTTP outbound messages. This change relieves constraint on 31-bit virtual storage and enables more 31-bit application use in a CICS region.
Minor improvements in 24-bit storage usage were also introduced in CICS TS V5.5. The amount of 24-bit storage that is used by the CICS auxiliary trace mechanism was reduced. This changes provided a small performance improvement for both the DSW static routing and the RTW single region workloads when running with auxiliary trace enabled.
9.7 z/OS WLM Health API
As described in 8.8, “z/OS WLM Health API” on page 173, CICS TS V5.4 uses the z/OS Workload Manager (WLM) Health API as a means of controlling the flow of work into a CICS region. This awareness of the z/OS WLM health value has been enhanced in CICS TS V5.5.
For more information about how CICS TS V5.5 uses the z/OS WLM Health API to control the flow of work into a CICS region, see the following article in the CICS Developer Center:
9.7.1 CICSPlex SM workload routing
The z/OS WLM health value of a region is now a more effective factor in CICSPlex SM workload routing decisions. When it determines the target region to route workload to, CICSPlex SM workload management assigns penalizing weights in the routing algorithm based on the actual health value of each region. The higher the health value, the lower the penalizing weight that is assigned, so a region with a greater health value becomes more favorable as a target. In addition, a region with a health value of zero is now deemed as ineligible to receive work.
With this enhancement to CICSPlex SM workload routing, you can have better control of flow of work into regions that are in warm-up or cool-down.
 
Note: The refined use of z/OS WLM Health when making routing decisions was also made available in CICS TS V5.4 with APAR PI90147.
For more information on how the z/OS WLM health value affects CICSPlex SM routing decisions, see the “Effect of the z/OS WLM health service on CICSPlex SM workload routing” topic in IBM Knowledge Center at this website:
9.7.2 Throttle on number of MQGET calls issued by an MQMONITOR
In CICS TS V5.4 when the region's z/OS WLM health value is less than 100%, there is a throttle on the number of MQGET calls that an MQMONITOR can issue per second. In this way, the number of trigger tasks that are started is controlled. The throttle affects all started MQMONITORs in the region. When the region's health value reaches 100%, the throttle on MQGET calls is removed. This behavior has been enhanced in CICS TS V5.5 by also reacting to the maximum tasks (MXT) condition.
If CICS encounters an MXT condition, the CICS-MQ Alert Monitor (CKAM) calculates the maximum number of MQGET calls that an MQMONITOR can issue per second when this condition exists. In effect, this action imposes a restriction on the number of tasks being started by MQMONITOR resources while CICS is at the MXT limit. While CICS is at the MXT limit the number of MQGET calls an MQMONITOR resource can issue per second is given by the calculation MXT + 10%.
 
Note: The limit applied is a per-MQMONITOR resource limit and not a global limit. Tasks that are not associated with MQMONITOR resources will not be subject to any throttling.
For more information about how the z/OS WLM Health service affects IBM MQ resources in CICS TS V5.5, see the topic “Effect of z/OS Workload Manager health service on MQMONITORs” in IBM Knowledge Center at this website:
9.8 Disabling of VSAM dynamic buffer addition
From z/OS V2.2, VSAM provides a dynamic buffer addition capability that allows for the addition of extra buffers for an LSR pool if no buffer is available for a given VSAM request. For CICS, it is preferable to retry the request rather than allow uncontrolled expansion of an LSR pool, so dynamic buffer addition is not enabled for CICS LSR pools.
 
Note: The disabling of VSAM dynamic buffer addition was provided in all CICS V5 releases by APAR PI92486.
9.9 USS processes associated with L8, L9, X8, and X9 TCBs
CICS TS V5.5 now manages the release of USS (UNIX System Services) processes from X8, X9, L8, and L9 TCBs when the TCB is released from the CICS task and returned to the relevant CICS dispatcher pool of open TCBs.
The performance overhead of this additional USS process management was measured by using a development build of CICS TS V5.5. For each task that uses USS APIs, this overhead was measured to be approximately 410 µs of CPU. Approximately half of the CPU overhead occurs in the CICS address space, and the remainder occurs in the OMVS address space. Of the CPU overhead measured in the CICS address space, approximately half of that is observed in the CICS performance class monitoring records.
A summary of the use of USS processes can be found in the topic “The SYS1.PARMLIB(BPXPRMxx) parameters” in IBM Knowledge Center at this website:
9.10 Channels performance improvement
Containers are named blocks of data that are designed for passing information between programs. Programs can pass any number of containers between each other. Containers are grouped in sets that are called channels. A channel is analogous to a parameter list. The CICS TS V5.5 release introduces a performance improvement that benefits applications where many containers are stored in a single channel.
CICS TS V5.5 improves performance by using a hash table to access containers, rather than searching a list. The performance improvement changes the order in which containers are returned when browsing a channel. Therefore, applications should not rely on the order in which containers are returned from calls to EXEC CICS GETNEXT CONTAINER (CHANNEL) commands. The CICS feature toggle com.ibm.cics.container.hash can be set to false to restore CICS to the previous behavior. For more information see the “Upgrading applications” topic in IBM Knowledge Center at this website:
This section presents a performance comparison of the two containers implementation options.
For more information about developing CICS applications by using channels, see the “Transferring data between programs using channels” topic in IBM Knowledge Center at this website:
9.10.1 Containers performance comparison
The application takes a value as input and creates the specified number of containers in a single channel. These are BIT containers 8 bytes in length. Next, the application reads each of these containers in reverse order and then the transaction completes. The overall response time and CPU per transaction is measured by using RMF. Several scenarios were tested, varying the number of containers from 10 to 750 per transaction.
The application was tested in both non-threadsafe and threadsafe configurations. The application was executed by using a development build of CICS TS V5.5 with the feature toggle com.ibm.cics.container.hash set to false and then set to true (the default). The use of the feature toggle provides the ability to directly compare CICS TS V5.4 and CICS TS V5.5 performance without other differences between the releases affecting the results.
The transactions were initiated from a terminal by using the methodology described in 2.4, “Driving the workload” on page 16 using 500 simulated clients. Where possible, the transaction rate was sustained at around 570 transactions per second.
Table 9-9 details the performance results obtained when running the workload in a non-threadsafe configuration with the com.ibm.cics.container.hash feature toggle set to false. This has the effect of using the CICS TS V5.4 channels and containers implementation.
Table 9-9 Performance data for a non-threadsafe configuration using V5.4 implementation
Number of containers
ETR
CPU per transaction
(ms)
Response time
(ms)
10
570.53
0.002
0.397
100
570.52
0.014
2.140
250
570.27
0.048
2.751
500
566.51
0.146
8.027
750
327.19
0.302
498.886
The data in Table 9-9 shows a significantly lower transaction rate and significantly higher response time for the scenario with 750 containers. During this test scenario, the QR TCB was fully utilized and became the primary bottleneck for the workload. From the 750 containers scenario in Table 9-9, we can make the observation that 327.19 × 0.302 = 98.8% utilization for the QR TCB.
Table 9-10 details the performance results when running the workload in a non-threadsafe configuration with the com.ibm.cics.container.hash feature toggle set to true. This is the default in CICS TS V5.5 and allows the use of the improved containers implementation.
Table 9-10 Performance data for a non-threadsafe configuration using V5.5 default implementation
Number of containers
ETR
CPU per transaction
(ms)
Response time
(ms)
10
570.54
0.002
0.448
100
570.54
0.011
1.659
250
570.55
0.027
2.687
500
570.28
0.057
2.828
750
570.21
0.093
3.093
The results of the non-threadsafe configuration are summarized for CPU time in Figure 9-7. The scenario using 750 containers has been omitted from the charts as this produces an unnecessarily large maximum value for the y-axes when measuring both CPU and response time.
Figure 9-7 Summary of CPU per transaction for non-threadsafe configuration
The results of the non-threadsafe configuration are summarized for response time in Figure 9-8, again omitting the 750 containers scenario.
Figure 9-8 Summary of response time for non-threadsafe configuration
The application was next configured to execute using open TCBs and remove the QR TCB constraint. Table 9-11 on page 215 details the performance results when running the workload in a threadsafe configuration with the com.ibm.cics.container.hash feature toggle set to false — the equivalent of using the CICS TS V5.4 implementation.
Table 9-11 Performance data for a threadsafe configuration using V5.4 implementation
Number of containers
ETR
CPU per transaction
(ms)
Response time
(ms)
10
570.60
0.002
0.500
100
570.58
0.023
1.432
250
570.27
0.079
2.549
500
569.87
0.236
4.322
750
517.43
0.696
69.239
Finally, the test was repeated in a threadsafe configuration with the feature toggle set to true in order to use the improved CICS TS V5.5 implementation. The performance data for this test is detailed in Table 9-12.
Table 9-12 Performance data for a threadsafe configuration using V5.5 default implementation
Number of containers
ETR
CPU per transaction
(ms)
Response time
(ms)
10
570.54
0.002
0.523
100
570.50
0.018
1.144
250
570.19
0.055
2.248
500
569.97
0.113
3.207
750
565.04
0.244
10.917
The results of the threadsafe configuration are summarized for CPU time in Figure 9-9.
Figure 9-9 Summary of CPU per transaction for threadsafe configuration
The results of the threadsafe configuration are summarized for response time in Figure 9-10.
Figure 9-10 Summary of response time for threadsafe configuration
9.10.2 Containers performance summary
When using very small numbers of containers per transaction, the CPU consumed was equivalent regardless of internal implementation for both non-threadsafe and threadsafe configurations. Response times increased by a very small amount using the CICS TS V5.5 implementation. However, this increase is not expected to be significant in a real-world application. In all test scenarios with more than 10 containers, both CPU and response times were improved when using the CICS TS V5.5 default implementation.
Scenarios with more than 750 containers were tested. However, for clarity their results are not included in this document. A maximum of 9,000 containers per transaction were tested and their results continued the trend that is demonstrated by the data in the tables above.
9.11 Threadsafe Coupling Facility Data Tables
As described in 9.3.2, “Coupling Facility Data Tables” on page 206, access to Coupling Facility Data Tables (CFDTs) is now threadsafe. Performance tests were executed comparing CICS TS V5.4 and CICS TS V5.5. The primary goal was to ensure that performance did not degrade when upgrading to the newest release, with the secondary goal to demonstrate the improved throughput available when using the threadsafe APIs.
9.11.1 CFDT performance test configuration
The threadsafe VSAM workload described in 3.7, “File control workload” on page 29 was used to validate the performance of CFDTs. Two files with record lengths of 64 bytes were defined as CFDTs and each specified the value of the UPDATEMODEL attribute to LOCKING. One CICS region was used and the workload was driven as described in 2.4, “Driving the workload” on page 16 using 1,000 simulated terminals.
The ratio of transactions used was as follows:
70% read only
30% update
Two files were defined, each with a record length of 64 bytes. Every transaction accessed 50 records in one of the defined files and this produced an average of 65 File Control requests per transaction with the following mix:
EXEC CICS READ - 54%
EXEC CICS READ UPDATE - 23%
EXEC CICS REWRITE - 23%
Eight dedicated CPs were configured on the performance measurement LPAR, plus two dedicated CPs were configured in the Coupling Facility (CF). The LPAR and the CF were connected by using ICP links.
9.11.2 Non-threadsafe CFDT application performance results
The application was configured to run only on the QR TCB. Table 9-13 on page 217 presents the performance results when running the non-threadsafe application in a CICS TS V5.4 environment.
Table 9-13 CICS TS V5.4 results for non-threadsafe CFDT workload
ETR
CICS CPU
CPU per transaction (ms)
Response time
(ms)
798.12
36.11%
0.452
1.312
1597.61
71.95%
0.450
2.963
2204.22
99.20%
0.450
131.748
The same non-threadsafe application test was executed using CICS TS V5.5 and the results are presented in Table 9-14.
Table 9-14 CICS TS V5.5 results for non-threadsafe CFDT workload
ETR
CICS CPU
CPU per transaction (ms)
Response time
(ms)
798.09
36.21%
0.454
1.281
1597.65
72.06%
0.451
2.954
2203.78
99.32%
0.451
131.900
The CPU results for each CICS release are summarized in Figure 9-11.
Figure 9-11 Plot of CICS TS V5.4 and V5.5 results for CFDT workload in non-threadsafe configuration
When running the benchmark in a non-threadsafe configuration, the throughput of the CICS region was limited by the capacity of the QR TCB. At peak throughput, during the 5-minute measurement interval the QR TCB was dispatched for more than 4 minutes 59.95 seconds (> 99.9%). From the data, it can be seen that the CPU per transaction for both the CICS TS V5.4 and the CICS TS V5.5 releases are equivalent, with the total CPU scaling linearly up to the throughput limit of approximately 2,200 transactions per second. Response times in both configurations increased dramatically as the transaction rate and hence the QR TCB utilization increased.
9.11.3 Threadsafe CFDT application performance results
The application program was configured with the CONCURRENCY attribute set to the value THREADSAFE. In both releases, the application started on an L8 TCB. In CICS TS V5.4 CFDT access is not threadsafe. Therefore, execution switches to the QR TCB at the time of first CFDT access and remains there until task termination. In CICS TS V5.5 CFDT access is threadsafe. Therefore, execution continues on the L8 TCB until the application writes a completion message to the terminal. See Chapter 4, “Open transaction environment” on page 31 for a description of the TCB switching process.
Table 9-15 presents the performance results when running the application in a threadsafe configuration in CICS TS V5.4.
Table 9-15 CICS TS V5.4 results for threadsafe CFDT workload
ETR
CICS CPU
CPU per transaction (ms)
Response time
(ms)
798.09
36.92%
0.463
1.474
1597.80
73.63%
0.461
2.440
2399.82
110.10%
0.459
93.556
2399.14
110.14%
0.459
243.851
2398.59
110.16%
0.459
293.903
The performance results for the CICS TS V5.5 release with a threadsafe configuration are presented in Table 9-16.
Table 9-16 CICS TS V5.5 results for threadsafe CFDT workload
ETR
CICS CPU
CPU per transaction (ms)
Response time
(ms)
798.14
45.73%
0.573
0.883
1598.78
104.65%
0.655
1.295
3079.04
217.79%
0.707
1.881
5708.62
443.08%
0.776
3.285
7936.61
624.47%
0.787
4.641
The total CPU performance results are plotted in Figure 9-12 on page 220.
Figure 9-12 Plot of CICS TS V5.4 and V5.5 results for CFDT workload in threadsafe configuration
The response time data is plotted in Figure 9-13 on page 220.
Figure 9-13 CICS TS V5.4 and V5.5 response time results for threadsafe CFDT workload
9.11.4 CFDT performance results summary
From the chart in Figure 9-12 on page 220 the following can be observed:
As noted in the non-threadsafe configuration, the requirement to switch to the QR TCB for CFDT requests in the CICS TS V5.4 release causes a peak throughput of around 2,400 transactions per second. This QR TCB contention is removed in the CICS TS V5.5 release.
Total CPU consumed by the CICS region scales linearly in CICS TS V5.4 up to the throughput limit.
Total CPU consumed by the CICS region scales linearly in CICS TS V5.5 up to the tested throughput limit. Note that this testing limit was arbitrary: no CICS constraint existed at this point and could have increased further. Using this workload, a rate of 8,000 transactions per second correlates to over 500,000 CICS File Control requests per second in a single CICS region.
The CPU cost per transaction in the CICS TS V5.5 release is greater than in the CICS TS V5.4 release. This is due to the significantly increased number of concurrent TCBs executing within the z/OS LPAR. Therefore, the CPU per transaction results cannot be directly compared. Section 9.11.2, “Non-threadsafe CFDT application performance results” on page 217 demonstrated that on a like-for-like per-API call basis, the two CICS releases provided equivalent performance.
The chart in Figure 9-13 on page 220 demonstrates the significant response time improvements that CICS TS V5.5 can provide for threadsafe workloads that access CFDTs. Non-threadsafe CFDT access in CICS TS V5.4 causes QR TCB saturation and extended response times. But the removal of the QR TCB constraint in CICS TS V5.5 provides better response times with greater scalability.
9.12 CICS policy rules
The behavior of CICS can be controlled during run time, based on predefined policies. CICS performs the action that is defined for a policy rule when all the conditions that are specified by the rule are met.
Policies define the action that CICS is to take when one of the following conditions is met:
A CICS user task makes excessive use of system resources; for example, a user task consumes too much storage.
A CICS system or user task changes the state of a system resource; for example, a FILE resource is closed.
The overall system health changes; for example, the number of active tasks exceeds the maximum user tasks in the CICS system (the MXT value).
A condition and action pair make up a policy rule, and one or more policy rules can be defined within a policy. A policy is defined in a CICS bundle and a CICS bundle can consist of one or more policies. For more information on CICS policies, see the “CICS policies” topic in IBM Knowledge Center at this website:
This section looks at the performance overhead of enabling policy task rules when running a standard performance benchmark application.
9.12.1 Policy task rules overhead performance study
The standard DSW static routing workload described in 3.2.1, “DSW static routing” on page 22 was used for the performance study. The benchmark was executed twice: one time with no policies installed and then again with a set of 19 policies installed. When combining all installed policies, this applied a task rule for every threshold supported by CICS TS V5.5.
The aim of this performance study is to measure the overhead of enabling task rules — not to measure the overhead executing the action. To ensure that no actions were invoked, all rules were coded with threshold values that would never be exceeded by the application. The implementation of the benchmark is such that not all rules can be triggered. For example, the DSW application does not use IBM Db2, therefore the task rules for EXEC SQL commands will never be used. The DSW application touches at least the following rules:
EXEC CICS requests
 – Total number of EXEC CICS requests issued by the application
File requests
 – DELETE
 – READ
 – READNEXT
 – READ UPDATE
 – REWRITE
 – STARTBR
 – WRITE
Program requests
 – LINK commands
Start requests
 – START commands
Storage allocation
 – Task 24-bit storage
 – Task 31-bit storage
 – Shared 24-bit storage
 – Shared 31-bit storage
Storage requests
 – Task 24-bit storage
 – Task 31-bit storage
 – Shared 24-bit storage
 – Shared 31-bit storage
TD queue requests
 – READQ
 – WRITEQ
Time
 – CPU time
 – Elapsed time
TS queue bytes
 – WRITEQ all TS queue bytes
 – WRITEQ auxiliary TS queue bytes
TS queue requests
 – WRITEQ all TS queue requests
 – WRITEQ auxiliary TS queue requests
RMF was used to obtain the transaction rate and CPU cost for the whole CICS region. Using this data, the average CPU per transaction value can be calculated. Table 9-17 lists the performance results for the configuration where no policies were installed.
Table 9-17 Results for DSW static routing workload with no policies installed
ETR
CICS CPU
CPU per transaction (ms)
4181.98
75.48%
0.180
4948.22
88.90%
0.180
6063.40
107.03%
0.177
6610.11
116.12%
0.176
7165.88
125.24%
0.175
The performance results for the configuration where all policies were installed are shown in Table 9-18.
Table 9-18 Results for DSW static routing workload with all 19 policies installed
ETR
CICS CPU
CPU per transaction (ms)
4176.96
75.36%
0.180
4932.58
88.96%
0.180
6057.89
107.33%
0.177
6602.11
115.99%
0.176
7176.31
126.33%
0.176
Figure 9-14 Plot of CICS TS V5.5 performance data with and without policy task rules installed
9.12.2 Policy task rules overhead performance summary
The performance data shows that there is no measurable overhead when using policy task rules to monitor user tasks. The data that is presented in Table 9-17 on page 223 and Table 9-18 on page 223 demonstrates that the CPU per transaction is equivalent within measurable limits. The data also shows that CICS continues to scale linearly as the transaction rate increases.
9.13 Encrypted zFS file systems
In z/OS V2.3 zFS added support for encrypting file system data using DFSMS access method encryption. This section presents the results of a performance test that investigated the overhead of using an encrypted zFS file system for a CICS workload. A development build of CICS TS V5.5 was used when testing encrypted zFS file system support. However, any release of CICS that uses the zFS file system can use this functionality.
For more information on encrypting zFS file system data, see the “Encrypting and compressing zFS file system data” topic in IBM Knowledge Center at this website:
9.13.1 zFS file system encryption performance comparison
The WebSphere Liberty workload described in 3.4, “WebSphere Liberty servlet with JDBC and JCICS access” on page 26 was used as the benchmark application. To generate significant quantities of zFS data, CICS tracing was enabled specifying the value of ALL for the SJ domain. For both the encryption disabled and the encryption enabled configurations, the transaction rate was sustained at approximately 1,650 requests per second. Enabling this level of trace at the given workload request rate resulted in approximately 30 MB of data that is written to zFS per second. Where enabled, the zFS file system used AES-256 encryption.
The performance data that is obtained during the test is listed in Table 9-19. The CPU per request is separated into that consumed by the CICS address space and that consumed by the ZFS address space. The overall zIIP eligibility is also presented for comparison.
Table 9-19 CPU cost comparison when enabling encryption for a zFS file system
Encryption
CICS CPU per request (µs)
ZFS CPU per request (µs)
zIIP eligibility
Disabled
2913.44
17.28
72.1%
Enabled
2933.98
20.18
72.4%
The CPU per request data that is presented in Table 9-19 is summarized in the chart in Figure 9-15.
Figure 9-15 Summary of performance data for encrypted zFS
The chart in Figure 9-15 demonstrates that the CPU consumed by the ZFS address space is a very small fraction of the overall CPU consumption per request. To more clearly demonstrate the difference in CPU attributed to the ZFS address space when enabling zFS encryption, only the ZFS address space data is plotted in Figure 9-16 on page 225.
Figure 9-16 Summary of performance data for the ZFS address space when enabling zFS encryption
9.13.2 zFS file system encryption performance summary
The total CPU overhead for this workload when writing encrypted data to zFS is very small: approximately 23 µs per request. Although the ZFS address space showed a significant relative increase in CPU cost per request (+17%), the overall total cost to the workload was negligible. As observed in Table 9-19 on page 224, the overall zIIP eligibility of the workload remained unchanged.
The use of zFS encrypted file systems is fully supported in a CICS environment and the CPU overhead is expected to be negligible in a full production workload.
9.14 Multiple Liberty JVM servers in a single CICS region
CICS TS V5.5 introduced the ability to run multiple Liberty JVM servers in a single CICS region. It is no longer necessarily to disable angel process security with the (deprecated) JVM server option WLP_ZOS_PLATFORM=FALSE to achieve multi-tenancy of JVM servers in a single region.
 
Note: APAR PI98174 enables the ability to run multiple Liberty JVM servers in a single CICS region for CICS TS V5.4.
This section examines the performance and storage characteristics of running multiple Liberty JVM servers within a single CICS region when compared with JVM servers across multiple CICS regions.
9.14.1 Shared libraries support
The shared library region is a z/OS USS feature that enables address spaces to share dynamic link library (DLL) files. This feature enables CICS regions to share the DLLs that are needed for JVMs, rather than each region loading them individually. A CICS address space utilizes shared library support if any of the USS processes or JVM servers within the CICS region enable shared library support. In CICS TS V5.4 and earlier this feature was enabled by default, but in CICS TS V5.5 (with APAR PH09400) it must be enabled using the _BPXK_DISABLE_SHLIB=NO parameter in the JVM profile.
Using shared libraries support can reduce the amount of real storage that is used by z/OS and the time it takes for the regions to load the DLL files. The disadvantage of using shared libraries is that any address space that uses this feature reserves an area of 31-bit virtual storage that is equal in size to the value of the z/OS SHRLIBRGNSIZE parameter, which is likely to increase the virtual storage footprint of each region.
 
Note: When shared libraries are enabled, the full size of the 31-bit area will be allocated, regardless of the utilization achieved by an individual address space. Therefore, it is important to adjust the SHRLIBRGNSIZE parameter to accommodate all the libraries, but avoid over-allocation and waste 31-bit virtual storage.
For more information on the use of the shared library region in JVM servers within a CICS environment, see the “Tuning the z/OS shared library region” topic in IBM Knowledge Center at this website:
9.14.2 Multiple Liberty JVM servers performance workload configuration
The hardware that is used for the benchmarks is described in 9.1, “Introduction” on page 198. The measurement LPAR was configured with three GCPs and three zIIPs running in SMT mode 1, which resulted in an LSPR equivalent processor of 3906-706. The measurement LPAR had 16 GB of real storage allocated.
The measurement LPAR was running z/OS V2.3 and a development build of CICS TS V5.5. Java 8.0 SR5 was used with the following options:
RMODE64 enabled
From z/OS V2.3, 64-bit residency mode for applications (RMODE64) is enabled by default. This feature allows the JIT to allocate executable code caches above the 2 GB memory bar.
Compressed references enabled
The IBM Java SDK for z/OS can use compressed references on 64-bit platforms to decrease the size of Java objects and make more effective use of the available space. The result is less frequent garbage collection and improved memory cache utilization.
Shared libraries disabled
See section 9.14.1, “Shared libraries support” on page 226 for a discussion of shared libraries.
All CICS regions set the MXT parameter to 150 and all JVM servers specified the value of 64 for the THREADLIMIT parameter.
The application used was the standard servlet workload as described in 3.4, “WebSphere Liberty servlet with JDBC and JCICS access” on page 26. CICS Liberty security was enabled and all Liberty JVM servers were connected to the same WebSphere Liberty angel process. The workload was driven through HTTP requests by using IBM Workload Simulator for z/OS, as described in 2.4, “Driving the workload” on page 16. The workload used 1,000 simulated web browsers, each supplying a username and password via HTTP basic authentication.
The application was cloned to produce five versions that can be deployed in separate Liberty JVM servers that used different TCP/IP ports. The configurations tested were:
1. One CICS region with one Liberty JVM server
2. One CICS region with three Liberty JVM servers
3. Three CICS regions each with one Liberty JVM server
4. One CICS region with five Liberty JVM servers
5. Five CICS regions each with one Liberty JVM server
9.14.3 Comparing CPU costs per request and maximum throughput
The workload was run in all five configurations. The total CPU cost and number of transactions completed was obtained by using IBM Resource Measurement Facility (RMF). Using this data, the CPU per request and the throughput rate was calculated. z/OS storage information was obtained using CICS MVS TCB statistics data.
Table 9-20 on page 228 presents the CPU per request and total throughput data for each of the five configurations.
Table 9-20 CPU per request comparison for multiple JVM server configurations


Scenario
Not zIIP-eligible
CPU per request (ms)
zIIP-eligible
CPU per request (ms)
Throughput
(requests per sec)
1 CICS region with 1 Liberty JVM server
0.74
0.75
3,245
1 CICS region with 3 Liberty JVM servers
1.10
0.93
2,766
3 CICS regions each with 1 Liberty JVM server
0.99
1.05
2,804
1 CICS region with 5 Liberty JVM servers
1.14
0.96
2,695
5 CICS regions each with 1 Liberty JVM server
1.02
1.07
2,736
The CPU per request data from Table 9-20 is presented in Figure 9-17, separated into non-zIIP-eligible and zIIP-eligible components.
Figure 9-17 Chart showing CPU per request comparison for multiple JVM server configurations
It can be seen from the Figure 9-17 that the lowest CPU cost per request was provided by single JVM server in a single CICS region (configuration 1). This lower cost is because the JVM in configuration 1 will process more requests than each individual JVM used in configurations 2 through 5. The more requests that are processed by a JVM, the more effectively the JIT compiler can optimize the code path, resulting in a lower CPU per request.
When running at very high CPU utilization with multiple JVM servers in a single CICS region, there are a large number of TCBs active in the CICS address space. This causes increased z/OS dispatcher activity, which slightly reduces the zIIP eligibility by reducing the zIIP lazy switching benefit. At lower throughput rates with lower CPU utilization — which is more likely in a customer production system — the zIIP eligibility was seen to be similar for all configurations investigated.
zIIP lazy switching is described in the IBM Systems Magazine article Understanding zIIP Usage in CICS:
As an example of the additional JIT optimization, at the end of the test the JVM in configuration 1 (one CICS region with one Liberty JVM server) had optimized a total of 7,079 Java methods. Conversely, one of the JVMs in configuration 5 (five CICS regions each with one Liberty JVM server) had optimized only 1,362 Java methods.
The total throughput data from Table 9-20 is presented in Figure 9-18 on page 229.
Figure 9-18 Chart showing total throughput comparison for multiple JVM server configurations
During all test scenarios, the LPAR was 97% busy and therefore the throughput was limited by the CPU cost per request.
9.14.4 Comparing 31-bit memory usage
The amount of 31-bit storage used was collected from the CICS MVS TCB statistics data and is summarized in Figure 9-19. Where a configuration used multiple CICS regions, the chart presents the average amount of 31-bit storage that was used per CICS region.
Figure 9-19 Summary of 31-bit storage used per CICS region for multiple JVM server configurations
The amount of CICS TCB storage per CICS region is related to the number of concurrent tasks and TCBs used. To restrict the number of concurrent TCBs in a CICS region for a Java workload, use the THREADLIMIT attribute of the JVMSERVER resource definition.
The amount of non-CICS TCB storage that is used per CICS region is related to the number of JVM servers. As documented in 9.14.2, “Multiple Liberty JVM servers performance workload configuration” on page 226, this test used compressed references. Disabling of compressed references reduces the amount of 31-bit storage that is used, at the expense of some CPU and 64-bit storage usage.
Each configuration that only had one JVM per CICS region shows very similar 31-bit storage usage. Where multiple JVM servers are configured per CICS region, the increased storage use is a result of each JVM having its own private copy of runtime data. These copies are mostly held in non-CICS TCB storage.
In contrast to Figure 9-19 on page 230 that presented the storage that is used per CICS region, the chart in Figure 9-20 summarizes the total 31-bit storage that is used across all CICS regions in a given configuration.
Figure 9-20 Summary of total 31-bit storage used for multiple JVM server configurations
The data in Figure 9-20 demonstrates the storage savings that are achieved by using multiple JVM servers in a single CICS region, compared to using one JVM server in multiple CICS regions.
9.14.5 Comparing 64-bit memory usage
To achieve the required concurrency, the JVM used in configuration 1 (one CICS region with one Liberty JVM server) specified a heap size of 1000 MB. All other JVMs specified a heap size of 200 MB.
The amount of 64-bit storage that was used was collected from the CICS storage statistics data and is summarized in Figure 9-21 on page 232. Where a configuration used multiple CICS regions, the chart presents the average amount of 64-bit storage that is used per CICS region. The shared class cache is held in a z/OS shared memory object. A shared class cache will be included in the ‘Bytes Allocated Shared Memory Objects’ data that is reported in the CICS storage overview statistics report, but does not count toward the MEMLIMIT of the CICS region.
Figure 9-21 Summary of 64-bit storage used per CICS region for multiple JVM server configurations
As expected, the amount of 64-bit storage that is used per CICS region is related to the number of JVM servers configured. Each JVM server requires its own copy of 64-bit runtime data areas including heap and JIT caches.
The total 31-bit storage used was presented in Figure 9-20 on page 231 and Figure 9-22 presents a similar view of total 64-bit storage usage across all CICS regions.
Figure 9-22 Summary of total 64-bit storage used for multiple JVM server configurations
Section 9.14.4, “Comparing 31-bit memory usage” on page 229 demonstrated how multiple JVMs in a single CICS region gives a reduction in overall 31-bit storage that is used and Figure 9-22 shows this is also true for 64-bit storage usage.
9.14.6 Multiple Liberty JVM servers performance conclusion
The most efficient configuration is a single large JVM server when you consider the following:
CPU costs
Multiple JVM servers may not JIT methods to the same level of optimization as a more frequently used single JVM server.
Multiple JVM servers will probably use more CICS T8 TCBs and will each have their own set of JVM-related TCBs (such as JIT and GC helpers). Management of these additional TCBs introduces an extra CPU overhead.
Throughput
The increased cost per request of using multiple JVM servers means that the maximum throughput, when all CPU resource is consumed, is lower than for a single JVM server configuration.
Memory
The 31-bit memory usage per CICS region significantly increased when using multiple JVM servers.
The use of 31-bit memory can be minimized by using Java 8.0 SR5 with z/OS v2.3 to place JIT code caches in 64-bit memory. JIT data caches are always in 64-bit memory.
Restricting the number of CICS T8 TCBs by specifying low values for the JVMSERVER THREADLIMIT attribute reduces ECDSA use (each CICS TCBs requires 28 KB of kernel stack storage).
Using uncompressed references with the JVM profile setting -Xnocompressedrefs moves all Java class data to 64-bit memory.
As described in 9.14.1, “Shared libraries support” on page 226 the use of shared libraries has an impact on the amount of 31-bit storage allocated. The size of the shared library area is controlled by the z/OS SHRLIBRGNSIZE parameter.
This study does not report on response times. However, no significant difference was observed across all of the configurations measured.
Although a single large JVM server can provide the best performance, this does not provide high availability or application separation. When you deploy applications, consideration should also be given to the following requirements:
Protection against the failure of an individual JVM server
Protection against the failure of an individual CICS region
Protection against the failure of a z/OS LPAR
The ability to apply maintenance to the application, CICS or z/OS
9.15 Liberty JVM server and application startup times
After enabling a JVMSERVER resource, the Liberty environment and hosted applications require a finite amount of time to start. This section looks at minimizing this startup time by using shared class cache. This section also examines the time that is taken to start multiple Liberty JVM servers in a single CICS region.
The time taken for a CICS Liberty JVM server to start is measured from enabling the JVMSERVER resource until the CWWKF0011I message is emitted. The time taken for a Liberty application to start is reported in the CWWKZ0001I message.
9.15.1 Startup times with shared class cache
The class sharing feature offers the transparent and dynamic sharing of data between multiple JVMs. When enabled, JVMs use shared memory to obtain and store data, including information about: loaded classes, Ahead-Of-Time (AOT) compiled code, commonly used UTF-8 strings, and Java Archive (JAR) file indexes. For more information, see the “Class data sharing” topic in IBM Knowledge Center at this website:
Using class data sharing, the time that is required to start a CICS Liberty JVM server and Liberty applications within this server can be reduced. The use of the -Xtune:virtualized JVM option further improves JVM and application startup time. For more information, see the “-Xtune:virtualized” topic in IBM Knowledge Center at this website:
These timings are presented for each of four configurations in Table 9-21.
Table 9-21 Summary of startup timings with varying shared cache class configurations

Configuration
Liberty
startup time (s)
Application
startup time (s)
No class cache
9.726
0.973
Class cache (first use)
10.427
1.147
Class cache (second use)
3.939
0.414
Class cache (second use) with -Xtune:virtualized
3.603
0.378
The time taken for the Liberty JVM server to start is plotted in Figure 9-23.
Figure 9-23 Summary of Liberty JVM server startup time with varying class cache configurations
The time taken for the application to start is plotted in Figure 9-24 on page 235.
Figure 9-24 Summary of application startup time with varying class cache configurations
It can be seen that the first use of a shared class cache slightly increases startup times for both the Liberty JVM server and any applications. However, subsequent starts are significantly improved with shared class cache enabled. The use of the -Xtune:virtualized option slightly reduces startup times in addition to the benefits of a shared class cache.
9.15.2 Application startup times with multiple JVM servers
This section looks at application startup times along with section 9.14, “Multiple Liberty JVM servers in a single CICS region” on page 226. The time taken to start each application was recorded, firstly when running five JVMs in one CICS region, then when running one JVM in each of five CICS regions.
For all JVM configurations shared class cache was enabled and was already been populated before the test. The -Xtune:virtualized option was also specified. The time taken to start an application in each instance of a JVM is plotted in the chart in Figure 9-25 on page 236.
Figure 9-25 Summary of application startup times with single and multiple CICS regions
The results plotted in Figure 9-25 show that there is no significant difference in application startup time using one JVM server in multiple CICS regions, when compared to using multiple JVM servers in a single CICS region.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset