CICS TS for z/OS V5.3
The CICS Transaction Server for z/OS (CICS TS) V5.3 release introduces a significant number of performance improvements. Included in the CICS TS V5.3 performance report are the following subject areas:
Key performance benchmarks that are presented as a comparison against the CICS TS V5.2 release.
An outline of improvements made regarding the threadsafe characteristics of the CICS run time.
Details of the changes that are made to performance-critical CICS initialization parameters, and the effect of these updates.
Description of all the updated statistics and monitoring fields.
Benchmarks that document improvements in XML and JavaScript Object Notation (JSON) web services.
A description of how CICS can protect itself from unconstrained resource demand from inbound HTTP requests.
High-level views of new functionality that was introduced in the CICS TS V5.3 release, including performance benchmark results where appropriate.
This chapter includes the following topics:
7.1 Introduction
When the results were compiled for this chapter, the workloads were run on an IBM z13® model NE1 (machine type 2964). A maximum of 32 dedicated central processors (CPs) were available on the measured logical partition (LPAR), with a maximum of 4 dedicated CPs available to the LPAR that was used to simulate users. These LPARs are configured as part of a Parallel Sysplex. An internal coupling facility was co-located on the same central processor complex (CPC) as the measurement and driving LPARs. They were connected by using internal coupling peer (ICP) links. An IBM System Storage DS8870 (machine type 2424) was used to provide external storage.
This chapter presents the results of several performance benchmarks when run in a CICS TS for z/OS V5.3 environment. Unless otherwise stated in the results, the CICS V5.3 environment was the code that was available at general availability (GA) time. Several of the performance benchmarks are presented in the context of a comparison against CICS TS V5.2. The CICS TS V5.2 environment contained all PTFs that were issued before 10 March 2015. All LPARs used z/OS V2.1.
For more information about performance terms that are used in this chapter, see Chapter 1, “Performance terminology” on page 3. For more information about the test methodology that was used, see Chapter 2, “Test methodology” on page 11. For more information about the workloads that were used, see Chapter 3, “Workload descriptions” on page 21.
Where reference is made to an LSPR processor equivalent, the indicated machine type and model can be found in the large systems performance reference (LSPR) document. For more information about obtaining and the use of LSPR data, see 1.3, “Large Systems Performance Reference” on page 6.
7.2 Release-to-release comparisons
This section describes some of the results from a selection of regression workloads that are used to benchmark development releases of CICS TS. For more information about the use of regression workloads, see Chapter 3, “Workload descriptions” on page 21.
7.2.1 Data Systems Workload dynamic routing
The Data Systems Workload (DSW) dynamic routing workload is used in 7.6, “Low-level CICS optimizations” on page 118 to demonstrate several performance benefits that are combined to reduce the overall CPU cost per transaction. For more information about a comparison between CICS TS V5.2 and CICS TS V5.3 performance, see 7.6, “Low-level CICS optimizations” on page 118.
7.2.2 RTW threadsafe
This section presents the performance figures for the threadsafe variant of the IBM Rational® Transactional Workload (RTW), as described in 3.3, “Relational Transactional Workload” on page 25.
Table 7-1 lists the results of the RTW threadsafe workload that uses the CICS TS V5.2 release. Table 7-2 lists the same figures for the CICS TS V5.3 release.
Table 7-1 Performance results for CICS TS V5.2 with RTW threadsafe workload
ETR
CICS CPU
CPU per transaction (ms)
333.49
45.83%
1.374
499.64
68.29%
1.367
713.32
98.79%
1.385
996.24
138.84%
1.394
1241.42
173.42%
1.397
Table 7-2 Performance results for CICS TS V5.3 with RTW threadsafe workload
ETR
CICS CPU
CPU per transaction (ms)
334.12
46.29%
1.385
500.50
69.16%
1.382
714.30
98.77%
1.383
997.32
139.06%
1.394
1242.71
175.74%
1.414
The average CPU per transaction figure for CICS TS V5.2 is calculated to be 1.383 ms. The CICS TS V5.3 figure is calculated to be 1.392 ms. The difference between these two figures is 0.6%, which is within our measurement accuracy of ±1%; therefore, the performance of the two releases is considered to be equivalent.
These figures are shown in Figure 7-1.
Figure 7-1 Plot of CICS TS V5.2 and V5.3 performance results for RTW threadsafe workload
As shown in Figure 7-1 on page 111, the lines are straight, which indicates linear scaling as transaction throughput increases. The lines also are overlaid, which indicates equivalent performance when the releases are compared.
7.3 Improvements in threadsafety
All new CICS API commands in CICS V5.3 are threadsafe. Also, some system programming interface (SPI) commands were made threadsafe in this release. There also were some specific functional areas that were improved to reduce task control block (TCB) switches.
7.3.1 Threadsafe API and SPI commands
The following new CICS API commands are threadsafe:
REQUEST PASSTICKET
CHANNEL commands:
 – DELETE CHANNEL
 – QUERY CHANNEL
The WRITE OPERATOR CICS API command was made threadsafe.
For more information about CICS API commands, see the “CICS command summary” topic in IBM Knowledge Center at this website:
The following CICS SPI commands were made threadsafe:
INQUIRE RRMS
INQUIRE STORAGE
INQUIRE STREAMNAME
INQUIRE SUBPOOL
INQUIRE TASK LIST
INQUIRE TSPOOL
INQUIRE UOWENQ
PERFORM SECURITY REBUILD
PERFORM SSL REBUILD
ENQMODEL commands:
 – INQUIRE ENQMODEL
 – SET ENQMODEL
 – DISCARD ENQMODEL
JOURNALMODEL commands:
 – INQUIRE JOURNALMODEL
 – DISCARD JOURNALMODEL
JOURNALNAME commands:
 – INQUIRE JOURNALNAME
 – SET JOURNALNAME
 – DISCARD JOURNALNAME
TCLASS commands:
 – INQUIRE TCLASS
 – SET TCLASS
TCP/IP commands:
 – INQUIRE TCPIP
 – SET TCPIP
TCPIPSERVICE commands:
 – INQUIRE TCPIPSERVICE
 – SET TCPIPSERVICE
 – DISCARD TCPIPSERVICE
TDQUEUE commands:
 – INQUIRE TDQUEUE
 – SET TDQUEUE
 – DISCARD TDQUEUE
TRANCLASS commands:
 – INQUIRE TRANCLASS
 – SET TRANCLASS
 – DISCARD TRANCLASS
TSMODEL commands:
 – INQUIRE TSMODEL
 – DISCARD TSMODEL
TSQUEUE / TSQNAME commands:
 – INQUIRE TSQUEUE / TSQNAME
 – SET TSQUEUE / TSQNAME
UOW commands:
 – INQUIRE UOW
 – SET UOW
WEB commands:
 – INQUIRE WEB
 – SET WEB
For more information about CICS SPI commands, see the “System commands” topic in IBM Knowledge Center at this website:
7.3.2 Optimizations for SSL support
Several TCB switches were removed for inbound requests that use SSL. For more information about this and other improvements in CICS web support, see IBM CICS Performance Series: Web Services Performance in CICS TS V5.3, REDP-5322, which is available at this website:
7.3.3 Offloading authentication requests to open TCBs
RACF APAR OA43999 introduced the Enhanced Password Algorithm, which applies to z/OS V1.12, V1.13, and V2.1. This RACF APAR implements the following support:
Accept more special characters within passwords
Allow stronger encryption of passwords
Define users with a password phrase and no password
Expire a password without changing it
Clean up password history
For more information about the new function APAR, see the following IBM support website:
If the APARs are installed, CICS starts a new callable service IRRSPW00 for password authentication. This service is used for the following authentication operations:
Basic authentication requests
EXEC CICS VERIFY PASSWORD API command
EXEC CICS VERIFY PHRASE API command
EXEC CICS SIGNON API command
The IRRSPW00 service runs on open TCBs or switches to an L8 TCB, which reduces contention on the resource-owning (RO) TCB.
 
Note: The ability to perform authentication requests on an open TCB was also made available to CICS TS V4.2 in APAR PI21865, and CICS TS V5.1 and V5.2 in APAR PI21866.
7.4 Changes to system initialization parameters
Several performance-related CICS system initialization (SIT) parameters were changed in the CICS TS V5.3 release. This section describes changes to the SIT parameters that have the most affect on CICS performance. All comparisons to previous limits or default values refer to CICS TS V5.2.
7.4.1 Storage protection (STGPROT)
Storage protection (SIT parameter STGPROT) is now enabled by default. For more information about storage protection, see the “The storage protection global option” topic in IBM Knowledge Center at this website:
7.4.2 Internal trace table size (TRTABSZ)
The default size for the internal trace table (SIT parameter TRTABSZ) increased to 12 MB. For more information about the internal trace facility, see the “Internal trace table” topic in IBM Knowledge Center at this website:
Storage for the internal trace table is allocated outside of any CICS DSA. In CICS releases since CICS TS V4.2, the internal trace table is allocated in 64-bit virtual storage.
7.5 Enhanced instrumentation
The CICS TS V5.3 release continues the expansion of information that is reported by the CICS monitoring and statistics component. This section describes the extra fields that are now available in the CICS statistics SMF records.
For more information about changes in monitoring fields across a range of CICS releases, see the “Changes to CICS monitoring” topic in IBM Knowledge Center at this website:
7.5.1 The DFHCICS performance group
The number of named counter server GET requests (field NCGETCT) field was added to the DFHCICS performance group. This field shows the total number of requests to a named counter server to satisfy EXEC CICS GET COUNTER and EXEC CICS GET DCOUNTER API commands that are issued by the user task.
For more information about counters that are available in the DFHCICS performance group, see the “Performance data in group DFHCICS” topic in IBM Knowledge Center at this website:
7.5.2 The DFHTASK performance group
The dispatcher allocate pthread wait time (field DSAPTHWT) field was added to the DFHTASK performance group. This field shows the dispatcher allocated pthread wait time. This time is the time that the transaction waited for a WebSphere Liberty pthread to be allocated during links to WebSphere Liberty programs.
For more information about counters that are available in the DFHTASK performance group, see the “Performance data in group DFHTASK” topic in IBM Knowledge Center at this website:
7.5.3 The DFHTEMP performance group
The following fields were added to the DFHTEMP performance group:
Number of shared temporary storage GET operations (field TSGETSCT)
Number of temporary storage GET requests from shared temporary storage that are issued by the user task.
Number of shared temporary storage GET operations (field TSPUTSCT)
Number of temporary storage PUT requests to shared temporary storage that are issued by the user task.
The total temporary storage operations (field TSTOTCT) field in the DFHTEMP performance group was updated. This field is the sum of the temporary storage read queue (TSGETCT), read queue shared (TSGETSCT), write queue auxiliary (TSPUTACT), write queue main (TSPUTMCT), write queue shared (TSPUTSCT), and delete queue requests that are issued by the user task.
For more information about counters that are available in the DFHTEMP performance group, see the “Performance data in group DFHTEMP” topic in IBM Knowledge Center at this website:
7.5.4 The DFHWEBB performance group
The following fields were added to the DFHWEBB performance group:
JSON request body length (field WBJSNRQL)
For JSON web service applications, the JSON message request length.
JSON response body length (field WBJSNRPL)
For JSON web service applications, the JSON message response length.
For more information about counters that are available in the DFHWEBB performance group, see the “Performance data in group DFHWEBB” topic in IBM Knowledge Center at this website:
7.5.5 Monitoring domain global statistics
The following fields were added to the collected monitoring domain statistics:
Total transaction CPU time (field MNGCPUT)
The total transaction CPU time that is accumulated for the CICS dispatcher managed TCB modes that are used by the transactions that completed during the interval.
Total transaction CPU time on CP (field MNGTONCP)
The total transaction CPU time on a standard processor that is accumulated by the CICS dispatcher managed TCB modes that are used by the transactions that completed during the interval.
Total transaction CPU offload on CP (field MNGOFLCP)
The total transaction CPU time on a standard processor but was eligible for offload to a specialty processor (zIIP or zAAP) that was accumulated for the CICS dispatcher that was managed TCB modes used by the transactions that completed during the interval.
A sample DFHSTUP report that contains the new fields is shown in Example 7-1.
Example 7-1 Sample CICS TS V5.3 DFHSTUP monitoring domain global statistics report fragment
Average user transaction resp time. . : 00:00:00.001256
Peak user transaction resp time . . . : 00:00:00.061583
Peak user transaction resp time at. . : 11/24/2015 22:25:58.7568
Total transaction CPU time. . . . . . : 00:00:14.192698
Total transaction CPU time on CP. . . : 00:00:14.192698
Total transaction CPU offload on CP . : 00:00:00.000000
For more information about monitoring domain statistics, see the “Monitoring domain: global statistics” topic in IBM Knowledge Center at this website:
7.5.6 TCP/IP global statistics
The following fields were added to TCP/IP global statistics:
Performance tuning for HTTP connections (field SOG_SOTUNING)
Indicates whether performance tuning for HTTP connections occurs.
Socket listener has paused listening for HTTP connections (field SOG_PAUSING_HTTP_LISTENING)
Indicates whether the listener paused listening for HTTP connection requests because the number of tasks in the region reached the limit for accepting new HTTP connection requests.
Number of times socket listener notified at task accept limit (field SOG_TIMES_AT_ACCEPT_LIMIT)
Is the number of times the listener was notified that the number of tasks in the region reached the limit for accepting new HTTP connection requests.
Last time socket listener paused listening for HTTP connections (field SOG_TIME_LAST_PAUSED_HTTP_LISTENING)
The last time the socket listener paused listening for HTTP connection requests because the number of tasks in the region reached the limit for accepting new HTTP connection requests.
Region stopping HTTP connection persistence (field SOG_STOPPING_PERSISTENCE)
Indicates whether the region is stopping HTTP connection persistence because the number of tasks in the region exceeded the limit.
Number of times region stopped HTTP connection persistence (field SOG_TIMES_STOPPED_PERSISTENT)
The number of times the region took action to stop HTTP connection persistence because the number of tasks in the region exceeded the limit.
Last time stopped HTTP connection persistence (field SOG_TIME_LAST_STOPPED_PERSISTENT)
The last time the region took action to stop HTTP connection persistence because the number of tasks in the region exceeded the limit.
Number of persistent connections made non-persistent (field SOG_TIMES_MADE_NON_PERSISTENT)
The number of times a persistent HTTP connection was made non-persistent because the number of tasks in the region exceeded the limit.
Number of times disconnected an HTTP connection at max uses (field SOG_TIMES_CONN_DISC_AT_MAX)
The number of times a persistent HTTP connection was disconnected because the number of uses exceeded the limit.
For more information about performance tuning for HTTP connections and a sample DFHSTUP report, see 7.13, “HTTP flow control” on page 140. For more information about TCP/IP global statistics, see the “TCP/IP: Global statistics” topic in IBM Knowledge Center at this website:
7.5.7 URIMAP global statistics
The direct attach count (field WBG_URIMAP_DIRECT_ATTACH) field was added to URIMAP global statistics. This field shows the number of requests that are processed by a directly attached user task.
The direct attach count statistics field was added in support of the web optimizations, as described in 7.7, “Web support and web service optimization” on page 122. For more information about URIMAP global statistics, see the “URIMAP definitions: Global statistics” topic in IBM Knowledge Center at this website:
7.6 Low-level CICS optimizations
The CICS TS V5.3 release includes the following low-level optimizations that can provide a performance benefit to many workloads:
Use of the store clock fast (STCKF) hardware instruction that was introduced by the IBM System z9 processor.
Storage alignment of some key CICS control blocks to improve the interaction between the CICS TS run time and the hardware cache subsystem.
Use of hardware instructions to pre-fetch data into the processor cache, which reduces the number of CPU cycles that are wasted while waiting for data.
A reduction in lock contention through tuning the CICS Monitoring Facility algorithms.
More efficient algorithms that are used for multiregion operation (MRO) session management.
More tuning of other internal procedures.
These improvements in efficiency have particular benefit for CICS trace, CICS monitoring, and for MRO connections that have high session counts.
The remainder of this section describes the results of performance benchmarks that use the DSW workload. For this performance benchmark, two TOR regions were configured to dynamically route transactions to four AOR regions that use CICSPlex System Manager. Each AOR function shipped file control requests to an FOR, where VSAM data is accessed in Local Shared Resources (LSR) mode. A more information about the workload, see 3.2, “Data Systems Workload” on page 22.
The following configurations were tested to show the relative benefits of the improvements in each of the monitoring, trace, and MRO session management components:
Monitoring and trace enabled
Monitoring disabled, trace enabled
Monitoring enabled, trace disabled
Monitoring and trace disabled
Monitoring and trace disabled with low numbers of MRO sessions
Comparisons are made between CICS TS V5.2 and CICS TS V5.3.
7.6.1 Monitoring and trace enabled
For this scenario, performance class monitoring was enabled by using MN=ON and MNPER=ON. Internal trace was enabled with INTTR=ON. All other trace-related SIT parameters used their default values. Figure 7-2 shows the benchmark results for this configuration that uses CICS TS V5.2 and V5.3.
Figure 7-2 DSW performance results with monitoring and trace enabled
The average CPU per transaction for CICS TS V5.2 was 0.702 ms, and the equivalent value for V5.3 was 0.643 ms. For this workload, a reduction of 0.059 ms per transaction represents a decrease of 8%.
The straight lines in the plot indicate that both configurations scale linearly as the transaction rate increases.
7.6.2 Monitoring disabled, trace enabled
This scenario extends the scenario that is described in 7.6.1, “Monitoring and trace enabled” on page 119 by disabling performance class monitoring. Performance class monitoring was disabled by using the SIT parameter MN=OFF. Internal trace was enabled by using INTTR=ON, and all other trace-related SIT parameters used their default values. Figure 7-3 on page 120 shows the results of the benchmark for CICS TS V5.2 and V5.3.
Figure 7-3 DSW performance results with monitoring disabled and trace enabled
Average CPU per transaction for CICS TS V5.2 was 0.625 ms, and the equivalent value for V5.3 was 0.593 ms. A reduction of 0.032 ms per transaction represents a decrease of 5% for this workload.
7.6.3 Monitoring enabled, trace disabled
In this scenario, the configuration is a mirror of the scenario that is described in 7.6.2, “Monitoring disabled, trace enabled” on page 119. In this scenario, performance class monitoring was enabled by using MN=ON and MNPER=ON. Internal trace was disabled with INTTR=OFF and all other trace-related SIT parameters used their default values. Figure 7-4 shows the results of the benchmark results for CICS TS V5.2 and V5.3.
Figure 7-4 DSW performance results with monitoring enabled and trace disabled
Average CPU per transaction for CICS TS V5.2 was 0.486 ms, and the equivalent value for V5.3 was 0.440 ms. A reduction of 0.046 ms per transaction represents a decrease of 9% for this workload.
7.6.4 Monitoring and trace disabled
In this scenario, performance class monitoring and trace were disabled. Performance class monitoring was disabled by using MN=OFF. Internal trace was disabled by setting INTTR=OFF and all other trace-related SIT parameters used their default values. Figure 7-5 shows the results of the benchmark results for CICS TS V5.2 and V5.3.
Figure 7-5 DSW performance results with monitoring and trace disabled
Average CPU per transaction for CICS TS V5.2 was 0.447 ms, and the equivalent value for V5.3 was 0.428 ms. A reduction of 0.019 ms per transaction represents a decrease of 4% for this workload.
7.6.5 Monitoring and trace disabled with low numbers of MRO sessions
The final scenario isolates the performance improvements in CICS that are not directly related to monitoring, trace, or MRO session management. Performance class monitoring was disabled by using MN=OFF. Internal trace was disabled with INTTR=OFF and all other trace-related SIT parameters used their default values. All MRO connections were configured to have a minimal number of sessions defined. Figure 7-6 on page 122 shows the benchmark results for CICS TS V5.2 and V5.3.
Figure 7-6 DSW performance results with monitoring and trace disabled and low session count
Average CPU per transaction for CICS TS V5.2 was 0.438 ms, and the equivalent value for V5.3 was 0.431 ms. A reduction of 0.007 ms per transaction represents a decrease of 2% for this workload.
7.6.6 Low-level CICS optimizations conclusions
Each scenario demonstrated a reduction in CPU usage per transaction for this workload. Where a workload uses any combination of performance class monitoring, trace, or many MRO sessions, the benefits that are realized in CICS V5.3 can be significant.
Even workloads that do not use these facilities can achieve a reduction in CPU use, as described in 7.6.5, “Monitoring and trace disabled with low numbers of MRO sessions” on page 121.
7.7 Web support and web service optimization
In CICS TS V5.3, the pipeline processing of HTTP requests is streamlined so that an intermediate web attach task (CWXN transaction) is no longer required in most situations. Removing the intermediate web attach task reduces CPU and memory overheads for most types of SOAP and JSON-based HTTP CICS web services.
The socket listener task (CSOL transaction) is optimized to attach user transactions directly for fast-arriving HTTP requests. The web attach task is bypassed, which reduces the CPU time that is required to process each request.
There also is a benefit for inbound HTTPS requests, where SSL support is provided by the Application Transparent Transport Layer Security (AT-TLS) feature of IBM z/OS Communications Server. In CICS, TCPIPSERVICE resources define the association between ports and CICS services, including CICS web support. These resources can be configured as AT-TLS aware and obtain security information from AT-TLS.
Performance is also improved for HTTPS requests where SSL support is provided by CICS. Although these requests still require the CWXN transaction, the number of TCB change mode operations was reduced.
For more information about the CPU savings that were achieved for an HTTP web services workload in several configuration scenarios, see IBM CICS Performance Series: Web Services Performance in CICS TS V5.3, REDP-5322, which is available at this website:
7.8 Java workloads
Optimizations to the thread and TCB management mechanisms in CICS TS V5.3 provide a benefit to Java applications that are hosted in OSGi JVM servers and WebSphere Liberty JVM servers.
This section presents a comparison between CICS TS V5.2 and V5.3 when Java workloads are run.
7.8.1 Java workload configuration
The hardware and software that was used for the benchmarks is described in 7.1, “Introduction” on page 110. The measurement LPAR was configured with three GCPs and one zIIP, which resulted in an LSPR equivalent processor of 2964-704. The driving LPAR was configured with three GCPs, which resulted in an LSPR equivalent processor of 2964-703.
To minimize variance in the performance results that might be introduced by the Just-In-Time compiler (JIT), the workload was run at a constant transaction rate for 20 minutes to provide a warm-up period. The request rate was increased every 5 minutes, with the mean CPU usage per request calculated by using the final minute of data from the 5-minute interval. CPU usage data was collected by using IBM z/OS Resource Measurement Facility (RMF).
All configurations used a single CICS region with one installed JVMSERVER resource with a configured maximum of 25 threads. CICS TS V5.2 and CICS TS V5.3 used Java 7.1 SR3 (64-bit) and IBM WebSphere Application Server Liberty V8.5.5.7.
 
Note: IBM WebSphere Application Server Liberty V8.5.5.7 support for CICS V5.1 and V5.2 is provided by CICS APAR PI50345.
For database access, all workload configurations accessed DB2 V10 by using the JDBC type 2 driver.
7.8.2 Java servlet workload
The Java servlet application is hosted in a CICS JVM server that uses the embedded WebSphere Liberty server. The workload is driven through HTTP requests by using IBM Workload Simulator for z/OS, as described in section 2.4, “Driving the workload” on page 16. The servlet application accesses VSAM data by using the JCICS API and accesses DB2 by using the JDBC API. For more information about the workload, see 3.4, “WebSphere Liberty servlet with JDBC and JCICS access” on page 26.
Both configurations used the following JVM options:
-Xgcpolicy:gencon
-Xcompressedheap
-XXnosuballoc32bitmem
-Xmx200M
-Xms200M
-Xmnx60M
-Xmns60M
-Xmox140M
-Xmos140M
The results of the benchmark are shown in Figure 7-7.
Figure 7-7 Comparing overall CPU utilization for Java servlet workload with CICS TS V5.2 and V5.3
As shown in Figure 7-7, the new thread management mechanism in CICS WebSphere Liberty provides reduced CPU costs and improved scalability characteristics, with V5.3 maintaining cost per request to higher request rates than V5.2.
The chart in Figure 7-8 presents the same data as Figure 7-7 on page 124, but broken into usage that is non-eligible for offload and usage that is eligible for offload to a zIIP engine.
Figure 7-8 Comparing offload-eligible CPU utilization for Java workload with CICS TS V5.2 and V5.3
The chart in Figure 7-8 shows better scalability for the non-eligible component of the CPU usage. The chart also shows that the overall reduction in CPU usage that is shown in Figure 7-7 on page 124 is achieved by reducing the amount of zIIP-eligible CPU.
7.8.3 Java OSGi workload
The Java OSGi workload is composed of several applications and is described in 3.5, “Java OSGi workload” on page 27. The CICS TS V5.2 and CICS TS V5.3 configurations both used the following JVM options:
-Xgcpolicy:gencon
-Xcompressedheap
-XXnosuballoc32bitmem
-Xmx100M
-Xms100M
-Xmnx70M
-Xmns70M
-Xmox30M
-Xmos30M
The benchmark results are shown in Figure 7-9.
Figure 7-9 Comparing overall CPU utilization for Java OSGi workload with CICS TS V5.2 and V5.3
The chart in Figure 7-9 shows a slight reduction in overall CPU usage per transaction because of the improved TCB management.
The chart in Figure 7-10 shows the same data as Figure 7-9, but broken into usage that is non-eligible for offload and usage that is eligible for offload to a zIIP engine.
Figure 7-10 Comparing offload-eligible CPU utilization for OSGi workload with CICS TS V5.2 and V5.3
Both configurations scale well, with the ratio of eligible to non-eligible work remaining consistent between the V5.2 and V5.3 releases.
7.9 Java 8 performance
Every new release of Java provides more scope for performance improvements, the magnitude of which depends on the application. This section describes the effects of varying the Java release within a CICS environment for various workloads.
A JVM server in CICS TS for z/OS V5.3 can use Java 7.0, Java 7.1, or Java 8 as the runtime environment. A single CICS region can host multiple JVM server instances, with a different Java runtime version used in each instance.
7.9.1 Improvements in Java 7.0, Java 7.1, and Java 8
Java 7.0 uses hardware instructions that were introduced in the IBM zEnterprise 196 (z196) and the IBM zEnterprise EC12 (zEC12) machines. When running on a zEC12, the JVM also uses the new transactional memory capabilities of the hardware.
Java 7.1 extends the zEC12 exploitation by using technologies, such as IBM z Systems Data Compression (zEDC) for zip acceleration. Java 7.1 SR3 introduces improved zIIP-offload characteristics, which can reduce cost for Java applications in CICS.
Java 8 introduces the use of hardware instructions that were introduced in the IBM z13 machine. Also used are technologies, such as single instruction multiple data (SIMD) instructions and improved cryptographic performance that uses Crypto Express5S and CP Assist for Cryptographic Function (CPACF).
The IBM Java Crypto Engine (JCE) in Java 8 SR1 automatically detects and uses an on-core hardware cryptographic accelerator that is available through the CPACF. It also uses the SIMD vector engine that is available in the IBM z13 to provide industry-leading security performance. CPACF instructions are used to accelerate the following cryptographic functions:
Symmetric key algorithms (AES, 3DES and DES with CBC, CFB and OBF modes)
Hashing (SHA1 and SHA2)
Optimized routines accelerate the popular P256 NIST Elliptic Curve (ECC) Public Key Agreement. SIMD instructions are used in these routines to further enhance performance.
Java 8 SR2 also introduces the same improved zIIP-offload characteristics as seen in Java 7.1 SR3.
7.9.2 Java performance benchmarks in CICS
The following workloads were used to examine the behavior of Java applications in a CICS environment:
A OSGi JVM server with a mixture of applications that use JDBC and JCICS calls to access DB2, VSAM data, and CICS temporary storage
A WebSphere Liberty servlet application that uses JDBC and JCICS calls to access DB2 and VSAM data
A WebSphere Liberty JSON-based web service that uses z/OS Connect
For performance testing, the following Java runtime environment levels were used:
Java 7.0 SR9
Java 7.1 SR3
Java 8 SR2
7.9.3 Java 8 and OSGi applications
This workload uses the configuration as described in 7.8.3, “Java OSGi workload” on page 125. Several applications provide a mixture of operations, including JDBC access, VSAM access, string manipulation, and mathematical operations.
Figure 7-11 shows the average cost per transaction for each of the Java versions under test when the mixed OSGi application workload is run.
Figure 7-11 Comparing Java versions for OSGi JVM server workload
The chart shows a slight improvement in zIIP eligibility in Java 7.1 when compared to Java 7.0, but with no reduction in overall CPU per transaction.
Java 8 improves the Java 7.1 benchmark result by reducing the overall cost per transaction (from 1.40 ms to 1.29 ms) and reducing the amount of non-eligible CPU (from 0.73 ms to 0.68 ms). The improvements in the Java 8 environment are achieved by improvements to the JIT compiler and Java class library changes.
7.9.4 Java 8 and WebSphere Liberty servlet applications
This workload uses the configuration as described in 3.4, “WebSphere Liberty servlet with JDBC and JCICS access” on page 26. In all, 200 simulated web clients accessed the Java application at a rate of approximately 2,500 requests per second.
Figure 7-12 on page 129 shows the cost per request for each of the Java 7.0, Java 7.1, and Java 8 run times when the CICS WebSphere Liberty servlet application is run.
Figure 7-12 Comparing Java versions for JDBC and JCICS servlet workload
No significant differences in total CPU per request are observed for this workload when comparing Java 7.0, Java 7.1, and Java 8. zIIP eligibility is slightly improved when Java 8 is used.
7.9.5 Java 8 and z/OS Connect applications
The z/OS Connect application that is described in 7.12, “z/OS Connect for CICS” on page 134 was used to compare the effects of the supported Java versions. A small JSON request and response was used, which contained 32 bytes of user data for each HTTP flow. The data was transmitted by using SSL with persistent connections.
The results of the benchmark comparing the three Java versions are shown in Figure 7-13.
Figure 7-13 Comparison of Java versions for a z/OS Connect workload
Java 7.1 provides a reduction in overall CPU per request by reducing the amount of non-eligible CPU that is used.
Java 8 further improves on the Java 7.1 result through a reduction in non-eligible and overall CPU cost for each request. The use of persistent SSL connections means that most of the performance improvements are achieved because of the increased AES performance.
As the transmitted document size increases, the SSL payload size increases. Increasing the size of the SSL payload allows an application to achieve greater performance benefits when compared to Java 7.0 or Java 7.1.
7.10 Simultaneous multithreading with Java workloads
The zIIP processors in a z13 system can run to two threads simultaneously in a single core while sharing certain processor resources, such as execution units and caches. This capability is known as simultaneous multithreading (SMT). The use of SMT for two threads concurrently is known as SMT mode 2.
This section describes SMT, the methods that are used to measure the effectiveness of the technology, and the results of a Java benchmark in CICS to demonstrate the increased capacity that is available when SMT is enabled.
7.10.1 Introduction to SMT
SMT technology allows instructions from more than one thread to run in any pipeline stage at a time. Each thread has its own unique state information, such as program status word (PSW) and registers. The simultaneous threads cannot necessarily run instructions instantly and at times must compete to use certain core resources that are shared between the threads. In some cases, threads can use shared resources that are not experiencing competition.
Generally, SMT mode 2 can run more threads over the same period on a single core. This increased core usage leads to greater core capacity and a higher throughput of work. Figure 7-14 shows how SMT increases the capacity of a single core by enabling the simultaneous running of two threads.
Figure 7-14 Demonstrating increased capacity by enabling SMT mode 2
Although each of the threads that are shown in Figure 7-14 on page 130 can take longer to run, the capability of SMT to run both simultaneously means that more threads can complete during a specific period, which increases the overall thread execution rate of a single core. Running more threads in a specific time increases the system throughput.
For more information about SMT, see IBM z13 Technical Guide, SG24-8251, which is available at this website:
7.10.2 Measuring SMT performance
IBM z/OS RMF fully supports the extra performance information that is available when operating in SMT mode 2.
The IIP service times that are found in an RMF Workload Activity report are normalized by an SMT capacity factor (CF) when zIIP processors are in SMT mode 2. The CF is the ratio of work performed with SMT mode 2 enabled, when compared to SMT disabled. The normalization process reflects the increased ability of a zIIP in SMT mode 2 to perform more work.
RMF provides key metrics in the Multi-threading Analysis section of a CPU Activity report when zIIP processors are in SMT mode 2. The following terms are used when describing the workload performance:
MAX CF reports the maximum CF: The ratio of the maximum amount of work the zIIPs performed with multithreading enabled compared to disabled.
The MAX CF value can be 0.0 - 2.0, with typical values 1.1 - 1.4.
CF reports the average capacity factor: The ratio of the average amount of work the zIIPs performed with multithreading enabled compared to disabled.
The CF value can be 0.0 - 2.0, with typical values 1.0 - 1.4.
AVG TD reports the average thread density: The average number of running threads while the core is busy.
The AVG TD value can be 1.0 - 2.0.
Figure 7-15 shows an extract of an RMF CPU Activity report. The average CF for the zIIP processors is highlighted for use in a later calculation.
Figure 7-15 Extract of RMF CPU Activity report
In the IBM z13 hardware, SMT mode 2 is available for zIIP processors only; therefore, the MODE and CF values for general CPs is always 1.
Figure 7-16 shows an extract of an RMF Workload Activity report. The IIP service time and IIP APPL% figures are highlighted for use in a later calculation.
Figure 7-16 Extract of RMF Workload Activity report
The APPL% IIP value is the amount of actual zIIP resource used. The APPL% IIP value is not normalized and shows how busy the processors are. The LPAR that was used for this benchmark was configured with three dedicated CPs and two dedicated zIIPs. Therefore, the maximum value for APPL% CP is 300%, and the maximum value for APPL% IIP is 200%.
The SERVICE TIME IIP value is the normalized zIIP time, factored by the CF. Note the relationship between SERVICE TIME IIP and APPL% IIP in the following equation:
The reports that are shown in Figure 7-15 on page 131 and Figure 7-16 are extracted from an RMF report with an interval of 60 seconds; therefore, the highlighted values that are shown in Figure 7-15 on page 131 and Figure 7-16 can be used in the previous equation, as shown in the following equation:
The slight discrepancy between the calculated and reported APPL% IIP values is because other values in the report are rounded.
7.10.3 CICS throughput improvement
A z/OS Connect workload was used to demonstrate the change in CPU utilization when running Java in CICS with SMT mode 2 disabled and enabled. Figure 7-17 on page 133 shows a comparison of a Java-based workload when running with the two SMT configurations.
Figure 7-17 Comparing a z/OS Connect workload with SMT mode disabled and SMT mode 2
The chart that is shown in Figure 7-17 plots the sum of the APPL% CP and APPL% IIP values from the RMF Workload Activity report.
Comparing the SMT-1 total and SMT-2 total lines, it can be seen that the total CPU cost is lower with SMT mode 2 enabled and the maximum throughput is increased.
The plot lines that show the amount of work that was not eligible to be offloaded to a System z Integrated Information Processor (zIIP) remains constant between the comparisons. The performance benefits are achieved through increased zIIP capacity.
7.11 Reporting of CPU time to z/OS Workload Manager
Mobile Workload Pricing is an IBM Software Pricing Option that was announced in May 2014. It offers a discount on MSUs consumed by transactions that originated on a mobile device. To use this discount, customers need a process that is agreed upon by IBM to identify (tag and track) their mobile-sourced transactions and their use.
Before CICS TS V5.3, the identification and accumulation of CPU time for certain transaction types required CICS Performance class monitoring to be active. The collection of high-volume SMF data in a production environment can introduce significant overhead.
z/OS Workload Manager (WLM) APAR OA47042 introduces enhancements to simplify the identification and reporting of mobile-sourced transactions and their processor consumption. For more information about updates to WLM, see the following APAR website:
The associated APAR OA48466 is available for IBM z/OS RMF, which provides support for the new WLM function that is provided by APAR OA47042. For more information about the updates to RMF, see the following APAR website:
The CICS TS V5.3 release introduces support for the new functions that were introduced by WLM APAR OA47042. CPU time is reported to WLM on a per-transaction basis, which enables a granular approach to transaction CPU tracking without the requirement for CMF data.
No configuration changes are required in CICS to use the WLM updates. CPU information is reported to WLM if CICS detects Mobile Workload Pricing support was installed, with no other CPU overhead in the CICS region.
7.12 z/OS Connect for CICS
IBM z/OS Connect is software that enables systems that run on z/OS to better participate in today’s mobile computing environment. z/OS Connect for CICS enables CICS programs to be called with a JSON interface.
z/OS Connect is distributed with CICS to enable connectivity, such as between mobile devices and CICS programs. The CICS embedded version of z/OS Connect is a set of capabilities that are used to enable CICS programs as JSON web services. z/OS Connect is an alternative to the JSON capabilities of the Java-based pipeline. The two technologies are broadly equivalent. Most JSON web services can be redeployed from one environment to the other without application or WSBind file changes. However, the URI and security configuration can be different in each environment.
7.12.1 CICS TS V5.3 performance enhancement
A significant performance enhancement in the CICS TS V5.3 release is the introduction of a JSON parser that is implemented in native (non-Java) code.
The parser implementation that is used by CICS is controlled by the java_parser attribute of the provider_pipeline_json XML element in the pipeline configuration file. For more information about the provider_pipeline_json element, see the “The <provider_pipeline_json> element” topic in IBM Knowledge Center at this website:
A sample XML configuration file for the z/OS Connect pipeline handler is supplied in the following location relative to the CICS installation root directory:
./samples/pipelines/jsonzosconnectprovider.xml
Example 7-2 shows a pipeline file that uses the CICS JVMSERVER resource that is named DFHWLP and specifies the use of the native parser.
Example 7-2 Sample pipeline configuration file that specifies the native parser implementation
<provider_pipeline_json java_parser="no">
<jvmserver>DFHWLP</jvmserver>
</provider_pipeline_json>
This section provides a performance comparison when various JSON request and response sizes for the Java and native parser implementations are used. In all configurations, SSL was used.
The methodology and applications that were used to produce the performance test results for z/OS Connect in CICS were similar to the methodology and applications that were used when testing the JSON support in CICS TS V5.2. For more information, see 6.7, “JSON support” on page 95. To expand the workload, an extra request and response size of 64 KB was added.
7.12.2 Varying payload sizes by using Java parser
By using z/OS Connect for CICS with the default Java parser, CPU usage was measured for a range of payload sizes. The CPU cost per request is shown in Figure 7-18 for a range of request and response size combinations. Total CPU cost per request is broken into non-zIIP-eligible and zIIP-eligible components.
Figure 7-18 CPU comparison for various request and response payloads by using the Java parser
It is clear that the CPU cost per request depends on the size of the JSON documents that were received or transmitted. A significant fraction of the CPU cost incurred for larger JSON documents is zIIP-eligible.
7.12.3 Comparing Java and native parsers
By using a medium-sized JSON request and response, the CPU usage was compared for the Java and native parsers. The scenario used a 4 KB request and 4 KB response. The result of this comparison is shown in Figure 7-19. As per Figure 7-18 on page 135, the CPU usage is broken into non-zIIP-eligible and zIIP-eligible components.
Figure 7-19 Comparing Java and native parsers for a medium-sized request and response
The chart in Figure 7-19 shows that for a medium-size request and response, the overall CPU cost per request is reduced with the native parser. Use of the native parser slightly increases the amount of non-zIIP-eligible CPU time from 0.33 ms to 0.39 ms per request.
7.12.4 Comparing Java and native parsers with varying request sizes
Extending the test scenario that is described in 7.12.3, “Comparing Java and native parsers” on page 136, various request sizes were used. Each request returns a response of 32 bytes. The following request sizes were tested:
32 bytes (as shown in Figure 7-20 on page 137)
4 KB (as shown in Figure 7-21 on page 137)
64 KB (as shown in Figure 7-22 on page 138)
Figure 7-20 Comparing Java and native parsers for 32-byte request with 32-byte response
For the 32-bytes request with 32-bytes response scenario, there is no significant difference in CPU usage or zIIP-eligibility between the Java and native parsers.
Figure 7-21 Comparing Java and native parsers for 4 KB request with 32-byte response
For the 4 KB request with 32-bytes response scenario, the use of the native parser results in a reduction in total CPU per request, from 1.41 ms to 1.09 ms. However, the native parser uses more slightly non-zIIP-eligible CPU time, which increases from 0.29 ms to 0.33 ms.
Figure 7-22 Comparing Java and native parsers for 64 KB request with 32-byte response
The 64 KB request with 32-byte response scenario shows a significant reduction in total CPU usage per request. Overall CPU usage per request reduces from 18.62 ms to 13.31 ms, but non-zIIP-eligible usage increases from 0.41 ms to 1.39 ms per request.
The charts that are shown in Figure 7-20 on page 137, Figure 7-21 on page 137, and Figure 7-22 show that the native parser reduces overall CPU usage for each JSON request. As expected, the largest performance gains are realized with large request sizes. The use of the native parser also has the expected effect of the use of more non-zIIP-eligible CPU than the Java parser.
7.12.5 Comparing Java and native parsers with varying response sizes
The benchmark was further modified such that various response sizes were used. Each request was 64 KB and the performance of the Java and native parsers were compared. The following response sizes were tested:
32 bytes (as shown in Figure 7-22 on page 138)
4 KB (as shown in Figure 7-23 on page 139)
64 KB (as shown in Figure 7-24 on page 139)
Unlike as described in 7.12.4, “Comparing Java and native parsers with varying request sizes” on page 136, this section describes scenarios in which the response sizes were modified. With an invariant request size, the benefit of the native parser is expected to remain constant across all scenarios because the parser operates on the incoming request only.
Performance results for a 64 KB request with 32-byte response are described in 7.12.4, “Comparing Java and native parsers with varying request sizes” on page 136. Figure 7-23 on page 139 shows the native parser reducing the overall CPU by 5.31 ms, but increasing the non-zIIP-eligible CPU by 0.98 ms per request.
Figure 7-23 Comparing Java and native parsers for 64 KB request with a 4 KB response
With a 64 KB request and a 4 KB response, the native parser reduces overall CPU usage by 5.34 ms (as shown in Figure 7-24). The native parser uses 0.98 ms more non-zIIP-eligible CPU.
Figure 7-24 Comparing Java and native parsers for 64 KB request with a 64 KB response
For the 64 KB request and 64 KB response scenario, the native parser again reduces overall CPU usage by 5.11 ms. The native parser again increases non-zIIP-eligible CPU usage by 0.98 ms per request.
7.12.6 Native parser conclusion
The native parser can provide a significant reduction in overall CPU usage per request. The potential reduction in CPU usage is determined by the size of the inbound request. The native parser is not implemented by using Java; therefore, CPU usage by the native parser cannot be offloaded to a specialty engine.
The performance improvements in CICS TS V5.3 apply to the JSON parser component only; therefore, it has no effect on the CPU costs that are involved in producing a JSON response.
7.13 HTTP flow control
CICS TS V5.3 introduces the ability to enable performance tuning for HTTP to protect CICS from unconstrained resource demand. TCP/IP flow control in CICS TS V5.3 is for HTTP connections only. If enabled, it addresses the following situations:
A pacing mechanism to prevent HTTP requests from continuing to be accepted by a CICS region, when the region reached its throughput capacity.
Gives an opportunity to rebalance persistent connections on a periodic basis.
If HTTP flow control is enabled and the region becomes overloaded, CICS temporarily stops listening for new HTTP connection requests. If overloading continues, CICS closes HTTP persistent connections and marks all new HTTP connections as non-persistent. These actions prevent oversupply of new HTTP work from being received and queued within CICS, which allows feedback to TCP/IP port sharing and Sysplex Distributor. This ability promotes a balanced sharing of workload with other regions that are sharing the IP endpoint and allowing the CICS region to recover more quickly.
7.13.1 Server accept efficiency fraction
CICS HTTP flow control is implemented by queuing new HTTP connection requests in the TCP/IP socket backlog. Queuing requests in the TCP/IP socket backlog affects the server accept efficiency fraction (SEF).
 
Note: When ports that are managed by CICS are used, it is the CICS address space that is accepting connections; therefore, CICS is the server application (in IBM z/OS Communications Server terminology).
SEF is a measure (calculated at intervals of approximately 1 minute) of the efficiency of the server application in accepting new connection setup requests and managing its backlog queue. The SEF value is reported as a percentage. A value of 100% indicates that the server application is successfully accepting all its new connection setup requests. A value of 0% indicates that the server application is not responding to new connection set up requests. The SEF field is only available for a connection that is in listen state.
The netstat command can display the SEF value for an IP socket. This command is available in the TSO and z/OS UNIX System Services environments. The following sample commands produce the same output when inquiring about the state of port 4025:
TSO environment
NETSTAT ALL (PORT 4025
z/OS UNIX System Services environment
netstat -A -P 4025
Example 7-3 shows a fragment of the output that is produced by the netstat command.
Example 7-3 Sample fragment of the output of the netstat command
ReceiveBufferSize: 0000065536 SendBufferSize: 0000065536
ConnectionsIn: 0000008574 ConnectionsDropped: 0000000000
MaximumBacklog: 0000001024 ConnectionFlood: No
CurrentBacklog: 0000000000
ServerBacklog: 0000000000 FRCABAcklog: 0000000000
CurrentConnections: 0000001464 SEF: 100
When the SHAREPORTWLM option in a port definition is used, the SEF value is used to modify the IBM Workload Manager for z/OS server-specific weights, which influences how new connection setup requests are distributed to the servers sharing this port.
When the SHAREPORT option in a port definition is used, the SEF value is used to weight the distribution of new connection setup requests among the SHAREPORT servers.
Whether SHAREPORT or SHAREPORTWLM is specified, the SEF value is reported back to the sysplex distributor to be used as part of the target server responsiveness fraction calculation, which influences how new connection setup requests are distributed to the target servers.
For more information about the configuration of ports in IBM z/OS Communications Server, see the “PORT statement” topic in IBM Knowledge Center at this website:
7.13.2 Flow control configuration
The behavior of HTTP flow control is specified by using the new system initialization parameter SOTUNING, which can be set to one of the following values:
YES
Performance tuning for HTTP connections occurs to protect CICS from unconstrained resource demand. YES is the default value.
520
No performance tuning occurs.
 
Note: If sharing IP endpoints, ensure that all regions have the same SOTUNING value or uneven loading might occur.
For more information about the SOTUNING SIT parameter, see the “SOTUNING” topic in IBM Knowledge Center at this website:
7.13.3 Flow control operation
When a CICS region reaches the maximum task limit (MXT), it stops accepting new HTTP connections and incoming requests are queued in the backlog for the TCP/IP socket. When the MXT condition is relieved, CICS starts accepting new connections again.
In the case of persistent connections (that is, connections that were accepted and maintained their connection), work can continue to be received even after reaching MXT. In this situation, when the number of active transactions in CICS and the number of queued requests in the IP socket reaches 110% of the MXT value, client connections are actively disconnected to route work away from an overloaded CICS region.
When actively disconnecting clients, the current request is permitted to complete and then the connection is closed. New connection requests are made non-persistent until the region drops below 95% of the MXT value.
In addition to these mechanisms, CICS also disconnects a client connection every 1,000 requests. This disconnection rate gives an opportunity for rebalancing the connection when the client reconnects.
7.13.4 CICS statistics enhancements
CICS TCP/IP global statistics was enhanced to provide information about how incoming work is being processed and the effect flow control is having on HTTP requests. Example 7-4 shows an extract of a sample DFHSTUP report for an HTTP workload where flow control is enabled.
Example 7-4 Extract of sample TCP/IP global statistics report produced by CICS TS V5.3 DFHSTUP
Performance tuning for HTTP connections . . . . . . . . . . . . . : Yes
Socket listener has paused listening for HTTP connections . . . . : Yes
Number of times socket listener notified at task accept limit . . : 25672
Last time socket listener paused listening for HTTP connections . : 10/15/2015 11:13:26.3862
Region stopping HTTP connection persistence . . . . . . . . . . . : Yes
Number of times region stopped HTTP connection persistence. . . . : 0
Last time stopped HTTP connection persistence. . . . . . . . . . : --/--/---- --:--:--:----
Number of persistent HTTP connections made non-persistent . . . . : 52554
Number of times disconnected an HTTP connection at max uses . . . : 0
For more information about available CICS statistics fields, see 7.5.6, “TCP/IP global statistics” on page 117.
7.13.5 Comparison of SOTUNING options
Table 7-3 lists CICS statistics reports, comparing a workload that is running in CICS with SOTUNING=YES to the same workload that is running in a CICS system with SOTUNING=520 for the same period. The workload in this case consisted of a simple HTTP web application where each inbound request made a new TCP/IP connection.
Table 7-3 Extract of CICS statistics reported values with SOTUNING=520 and SOTUNING=YES
CICS statistic
SOTUNING=520
SOTUNING=YES
Number of completed transactions
101,538
105,117
Peak queued transactions
2,193
3
Peak active transactions
150
150
Times stopped accepting new sockets
n/a
26,674
Number of times at MXT limit
1 (continuously)
29,418
CPU used
62.11 s
60.86 s
CPU per transaction
0.611 ms
0.578 ms
EDSA used (MB)
121 MB
60 MB
Although this example is an extreme case, it does demonstrate that it is more effective to queue work outside of CICS by preventing new connections being accepted in terms of CPU and EDSA usage.
In the SOTUNING=520 case, MXT was reached and the CICS region did not drop below that value of concurrent tasks, which remained permanently at MXT for the measurement interval. In the SOTUNING=YES case, the CICS system kept dropping in and out of MXT as it stopped new work arriving and then started accepting work as it dropped below MXT.
7.14 High transaction rate performance study
To demonstrate many of the performance improvements that were introduced in the CICS V5.3 release, a performance study was undertaken to drive a high rate of transactions through a CICS configuration. The study consisted of the following workloads:
The first workload runs on a single z13 LPAR with 18 CPs up to a rate of 174,000 CICS transactions per second.
The second workload runs on a single z13 LPAR with 26 CPs up to a rate of 227,000 CICS transactions per second.
For more information about the full results of this study, see IBM CICS Performance Series: CICS TS V5.3 Benchmark on IBM z13, REDP-5320, which is available at this website:
7.15 WebSphere Liberty zIIP eligibility
As described in 7.8.2, “Java servlet workload” on page 123, a new thread management mechanism was introduced in CICS TS V5.3 for WebSphere Liberty workloads. As part of the CICS continuous delivery strategy this thread management support is further enhanced by APAR PI54263, increasing the zIIP eligibility of WebSphere Liberty workloads in CICS. This enhancement to CICS TS V5.3 can reduce the total general purpose (GP) CPU consumed for a given workload.
Prior to APAR PI54263, WebSphere Liberty assigned an HTTP request to a T8 TCB using a thread from its thread pool. Execution on the T8 TCB would then be suspended while the CICS transaction context was built for the task by executing on the QR TCB. On completion of the transaction initialization, the T8 TCB would resume, but it would resume in a state that was not eligible for execution on a zIIP, and even though regaining zIIP eligibility quickly the T8 would remain on a GP CPU until redispatched by z/OS.
APAR PI54263 improves the zIIP eligibility of WebSphere Liberty workloads by allowing the transaction context to be built on a T8 TCB. This improves performance by removing the overhead of two TCB switches, and also never suspending or resuming the T8 whilst not zIIP eligible.
For more information about the changes in APAR PI54263, see the following article in the CICS Developer Center:
To measure the performance improvements introduced by this change, a WebSphere Liberty workload was executed. You can find details about the performance results in the CICS Developer Center article referenced previously, and Table 7-4 presents a summary.
Table 7-4 Summary of performance improvements introduced by APAR PI54263
 
GP per request
(ms)
zIIP per request
(ms)
CPU per request
(ms)
CICS TS V5.2
0.039
0.421
0.460
CICS TS V5.3
0.018
0.322
0.340
Table 7-4 shows the total CPU consumed per request is reduced by 26% from 0.460 ms to 0.340 ms. The amount of GP CPU time is also reduced: from 0.039 ms to 0.018 ms per request, or a 53% reduction. Overall zIIP-eligibility of the workload is increased from 91.6% to 94.7%.
7.16 Link to WebSphere Liberty
A further continuous delivery enhancement to CICS TS V5.3 is provided by APAR PI63005, which enables any CICS program to link to a Java Platform, Enterprise Edition (Java EE) application, running in a WebSphere Liberty JVM server inside CICS.
To be invoked by a CICS program, the Java EE application is required to contain a plain old Java object (POJO), packaged as a web archive (WAR) or enterprise archive (EAR) file. A method in the Java EE application can be made a CICS program by use of the @CICSProgram annotation.
CICS creates the program resources defined by the @CICSProgram annotations when the application is started in a WebSphere Liberty JVM server. The WebSphere Liberty instance must be configured with the feature cicsts:link-1.0.
Data can be passed between non-Java and Java programs using the channels and containers interface; COMMAREA and INPUTMSG are not supported.
For more details about the Link to WebSphere Liberty functionality, see the topic “Invoking a Java EE application from a CICS program” in IBM Knowledge Center at this website:
7.16.1 Performance comparison
A benchmark was created to understand the relative CPU consumption of the following three scenarios, when linking from a COBOL application to a program that is written in one of the following languages:
COBOL
Java and hosted in a WebSphere Liberty JVM server
Java and hosted in an OSGi JVM server
The benchmark consists of two programs, with the logic flow indicated:
1. PROGRAMA
 – Create a CICS container (CONTAINER1) to pass data to.
 – Use the EXEC CICS LINK command to PROGRAMB.
 – Extract and validate contents of the CICS container, RESPONSE.
 – Write a success message.
 – End the transaction.
2. PROGRAMB
 – Read the container, CONTAINER1.
 – Build the container, RESPONSE.
 – Return to the caller.
PROGRAMA was implemented using only COBOL. PROGRAMB was implemented using COBOL and Java, with two separate versions that are suitable for deployment in an OSGi JVM server and WebSphere Liberty JVM server environment. The COBOL version of PROGRAMB was defined as CONCURRENCY(REQUIRED) so that the program executed on an Open TCB.
The workload was executed on an IBM z13 with an LPAR configuration equivalent to a 2964-703, with three dedicated general-purpose CPs, and one dedicated zIIP in SMT=1 mode. The workload used z/OS V2.2, CICS TS V5.3 with APAR PI63005, and Java 8.0 SR3.
7.16.2 Link to WebSphere Liberty performance results
RMF was used to accurately measure the CPU cost per transaction. Response time information was obtained using CICS monitoring data. Table 7-5 lists the results of the workload when executed at a steady rate of approximately 5,000 transactions per second.
Table 7-5 CPU per transaction and response time comparison for COBOL and Java implementations
Implementation language
Total CPU (ms)
zIIP-eligible CPU (ms)
Response time (ms)
COBOL
0.0455
0.0000
1.3130
Java (WebSphere Liberty JVM server)
0.1014
0.0290
1.3810
Java (OSGi JVM server)
0.1491
0.0777
2.0310
The results from Table 7-5 are summarized in the chart in Figure 7-25 on page 146.
Figure 7-25 Plot of link to WebSphere Liberty benchmark results
7.16.3 Link to WebSphere Liberty performance conclusion
The programs used for this study are simple, with little application logic. They demonstrate the difference in CPU cost of the EXEC CICS LINK infrastructure within CICS that enables calls to COBOL or Java CICS programs.
Linking to COBOL is the lowest cost, both in terms of total and non-zIIP eligible CPU. Linking to the Java EE version of the application running in a WebSphere Liberty JVM server costs just over twice the total CPU cost of the COBOL version and about half of this extra time can be offloaded to run on a zIIP processor. Using either version of the Java program costs more total CPU than the COBOL version and also costs more general processor time too.
Much of the cost of running Java is zIIP-eligible but profiling analysis has shown that the management of the extra TCBs needed in the Java cases needs increased z/OS dispatcher time. The increased time spent in z/OS dispatching routines is not zIIP-eligible. Using JCICS also creates more calls to the CICS EXEC interface (with ASSIGN and CONTAINER management calls) which leads to a further increase in general processor time compared to the COBOL case.
Compared to the OSGi JVM server case, Java and UNIX threads are managed more efficiently in the WebSphere Liberty JVM server case by employing thread reuse techniques. This improved efficiency makes the WebSphere Liberty JVM server case significantly cheaper in terms of total CPU cost than using an OSGi JVM server.
The response times for the COBOL and WebSphere Liberty JVM server versions were similar and CICS suspend time was the main contributor to this time. At the transaction rate used, waiting for first CICS dispatch (recorded in the CICS monitor data field DSPDELAY) accounted for most of the time the transactions were suspended. A longer response time was observed for the OSGi JVM server version. This was due to a longer dispatch time caused by the T8 TCB being placed into an operating wait state for much longer times by the less efficient Java and UNIX thread management of the OSGi JVM server.
For more details about the performance testing described previously, see the following article in the CICS Developer Center:
7.16.4 Comparison of CICS monitoring and RMF data
The CPU data presented in Table 7-5 on page 145 was obtained using RMF because CICS monitoring facility (CMF) data does not account for all the CPU time consumed by a CICS region. Time spent on non-CICS TCBs, the SRB time for networking, or the overhead to initialize and terminate a transaction is not included in monitoring data.
Figure 7-26 plots the CPU per transaction for the Link to WebSphere Liberty workload, obtained using both RMF and CMF data.
Figure 7-26 Comparison of RMF and CMF data for the Link to WebSphere Liberty workload
Comparing the total CPU cost obtained using the CMF and the RMF data, the COBOL example shows this difference is about 0.017ms.
For the OSGi JVM server example this difference is about 0.022ms. This difference includes the additional overhead of running in a JVM environment on non-CICS managed TCBs, such as JIT compilation and some garbage collection.
The WebSphere Liberty JVM server example shows a much greater difference of about 0.034ms. This greater difference is attributed to the additional overhead of running a WebSphere Liberty server, with time spent on extra non-CICS managed TCBs (UNIX pthreads) used by WebSphere Liberty support functions.
In all these cases the percentage difference between RMF and CICS monitoring data is maximized by the trivial nature of the application. For a more complex application the relative difference would be smaller. When measuring the total cost for a workload, it is important to note this discrepancy between CICS monitoring data and RMF data for any application type. Java applications will typically show a greater delta between RMF and CMF data due to the non-CICS TCBs used for normal JVM operation.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset