20. Dealing with XA Distributed Transaction Issues

In today’s complex Oracle database environments, it is not uncommon to encounter database application architectures that leverage Java components that connect to two or more different databases using connection pools and require processing a single transaction in a coordinated fashion. This multiconnection, multidatabase transaction processing method is typically known as a distributed transaction, but it is sometimes also termed a global transaction or XA (short for X/Open XA). The concept of distributed transactions is not particularly new—it was created back in 1991—but as Java has come to the fore as an application language in recent years, the use of distributed transactions has accelerated in many Oracle database environments.

This chapter therefore focuses on some well-known issues that can occasionally arise when an Oracle database is using distributed transactions. It demonstrates just how quickly an XA distributed transaction problem might arise in a complex database environment, how to identify and diagnose the problem, and of course how to quickly repair the issue to restore the corresponding database application to peak performance.


Note

Detailed information about the architecture, components, error handling, and optimization of distributed transactions can be found in the web-based Oracle Database documentation portal. The corresponding JDBC Developer’s Guide for each Oracle database release is especially valuable for gaining a better understanding of the key concepts of distributed transactions.


Repairing Common Distributed Transaction Issues

Imagine an e-commerce database application environment comprising several different types of applications executing against numerous Oracle databases, including at least one whose data sources are leveraging distributed transaction processing. Perhaps a short-lived network interruption occurs; perhaps one of the databases that’s participating in a distributed transaction environment crashes; or perhaps even the application software itself crashes.

In any of these situations, it is not unlikely that the corresponding distributed transaction may be left in an indeterminate state because the transaction may not be able to reconnect to the database with which it lost connectivity. In this situation, the transaction will become what is typically called a lost distributed transaction. Each distributed transaction generates a lock on a resource, and this resource is not released until the transaction coordinator receives either a COMMIT or ROLLBACK. Therefore, if one distributed transaction should hang, it is crucial to fix this pending transaction immediately; otherwise, it is possible that the entire production environment could hang as well.

Fortunately, solving this particular issue requires just a few simple steps:

1. Query the DBA_2PC_PENDING view to locate any pending distributed transactions:

SQL> SELECT COUNT(*) FROM dba_2pc_pending;

Of course, this SQL script can be encapsulated within a customized shell script, embedded within an application monitoring tool such as Oracle APP Manager, or even used to create a user-defined metric (UDM) in Oracle Enterprise Manager Grid Control 11g or Cloud Control 12c.

2. If a pending transaction is indeed detected, it must either be committed or rolled back. Use the following script to create a series of ready-to-run SQL commands that will roll back all pending transactions in your database returned by the prior query:

SQL> SET HEADING OFF
SQL> SELECT 'ROLLBACK FORCE '''||local_tran_id||''';' FROM dba_2pc_pending;

3. In an extreme case, the distributed transaction may not disappear from the DBA_2PC_PENDING view even after committing or rolling back all pending transactions using the commands just generated. In this situation, purge the pending transaction using the PURGE_LOST_DB_ENTRY procedure of package DBMS_TRANSACTION:

SQL> SELECT 'EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('''||local_tran_id||''');COMMIT;'
  FROM dba_2pc_pending;

Querying the DBA_2PC_PENDING view will verify that no pending transactions exist anymore, thus proving that all in-doubt transactions have been repaired.

Repairing Ghost Distributed Transactions

In isolated, unexpected situations, a distributed transaction may simply be lost and become what is known as a ghost transaction. For example, a distributed transaction may appear to be pending in the DBA_2PC_* views, but it is impossible to either commit or roll back the transaction, and it also cannot be purged. Another scenario may involve application users receiving an ORA-01591 error message as follows, but the specific transaction doesn’t appear at all in any of the DBA_2PC_* views:

ORA-01591: lock held by in-doubt distributed transaction 1.21.17

This could happen in cases where the databases involved in the distributed transaction cannot be accessed simultaneously. If the RECO background process responsible for recovering these transactions cannot handle this situation—perhaps because of a failure in network communication between the databases processing the transaction—it is possible that the transaction was unable to be added to the DBA_2PC_* views.

Ghost transactions typically require some rather tricky commands to repair them. There are three special scenarios that require some relatively dangerous commands to repair them, and we discuss each of them in the following sections:

Image There are entries in the DBA_2PC_* views for a distributed transaction, but that transaction has ceased to exist.

Image A distributed transaction is hanging, but there is no corresponding information for that transaction in the DBA_2PC_* views.

Image A COMMIT or ROLLBACK is performed against a distributed transaction that is in a PREPARED state, but the COMMIT or ROLLBACK command hangs.

Information Exists, but Transaction Missing

A transaction exists in the DBA_2PC_* views, but that transaction no longer can be found in the corresponding database. In this situation—when the status of the transaction is COMMITTED, ROLLBACK FORCE, or COMMIT FORCED—the transaction(s) just need to be purged. However, if the transaction is in the PREPARED state and no entry is located in the transaction table for this transaction, then the following steps are required to solve this issue:

1. Identify the transaction in question:

SQL> SELECT local_tran_id, state FROM dba_2pc_pending;

LOCAL_TRAN_ID          STATE
---------------------- ----------------
1704.34.52393          prepared

2. The local_tran_id is <RBS#>.<SLOT#>.<WRAP#>, which means that the rollback segment number is 1704. Locate the active transactions on this rollback segment using this query:

SQL> SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */
KTUXESTA Status,
KTUXECFL Flags
FROM x$ktuxe
WHERE ktuxesta!='INACTIVE'
AND ktuxeusn= 1704; <== this is the rollback segment#

In this particular example, the query did not return any row, and so it is impossible to issue a COMMIT FORCE or ROLLBACK FORCE for this transaction:

SQL> rollback force '1704.34.52393';
ORA-02058: no prepared transaction found with ID 1704.34.52393

3. Unfortunately, the only way to clean up this transaction is to manually delete it from internal tables. Be sure to first issue the SET TRANSACTION command so that it uses the rollback segment of the SYSTEM tablespace:

SQL> SET TRANSACTION USE ROLLBACK SEGMENT SYSTEM;

4. Delete the transaction using the transaction ID (local_tran_id) directly. Since it was carefully deleted manually from the appropriate internal transaction tables, the transaction will cease to exist:

SQL> DELETE FROM sys.pending_trans$ WHERE local_tran_id = '1704.34.52393';
SQL> DELETE FROM sys.pending_sessions$ WHERE local_tran_id = '1704.34.52393';
SQL> DELETE FROM sys.pending_sub_sessions$ WHERE local_tran_id = '1704.34.52393';
SQL> COMMIT;

ORA-1591 Has No Corresponding Information

In this next scenario, a database application user session is receiving an ORA-1591 error message, but a search of the DBA_2PC_* views reveals no such transaction exists.

An application user’s attempt to change a row in a table receives the following error:

ORA-1591: lock held by in-doubt distributed transaction '1704.34.52393'

Resolving this situation requires a slightly different approach:

1. Search the DBA_2PC* views for the local transaction ID. This search returns no rows, which indicates that the transaction seems to have disappeared:

SQL> SELECT *
       FROM dba_2pc_pending
      WHERE local_tran_id='1704.34.52393';

   No rows returned

2. Use the following query to identify the transaction in its rollback segment; it is apparent that it is in the PREPARED state:

SQL> SELECT
       KTUXEUSN,
       KTUXESLT,
       KTUXESQN,
       KTUXESTA Status,
       KTUXECFL Flags
       FROM x$ktuxe
      WHERE ktuxesta != 'INACTIVE'
        AND ktuxeusn= 1704;

KTUXEUSN   KTUXESLT   KTUXESQN   STATUS           FLAGS
---------- ---------- ---------- ---------------- ------------------------
1704       34         52393      PREPARED         SCO|COL|REV|DEAD

However, any attempt to commit or roll back this transaction returns an ORA-2058 error:

SQL> COMMIT FORCE '1704.34.52393';
ORA-02058: no prepared transaction found with ID 1704.34.52393

3. The ORA-02058 error occurs because there are no entries in the DBA_2PC_* views with the specified local_id. Insert a dummy row into the PENDING_TRAN$ and PENDING_SESSION$ tables to essentially re-establish the transaction with a status of PENDING, and then reattempt to commit this transaction:

-- Disable distributed recovery to prevent the RECO background process
-- from handling this dummy row
SQL> Alter System Disable Distributed Recovery;

SQL> INSERT INTO pending_trans$ (
local_tran_id,
global_tran_fmt,
global_oracle_id,
state,
status,
session_vector,
reco_vector,
type#,
fail_time,
reco_time)
VALUES(
'1704.34.52393', /* ONLY THIS ROW SHOULD BE MODIFIED WITH YOUR LOCAL TRAN ID*/
306206,
'XXXXXXX.12345.1.2.3',
'prepared',
'P',
hextoraw( '00000001' ),
hextoraw( '00000000' ),
0,
sysdate,
sysdate );

SQL> INSERT INTO pending_sessions$
VALUES(
'1704.34.52393',
1,
hextoraw('05004F003A1500000104'),
'C',
0,
30258592,
'',
146
);

SQL> COMMIT;

SQL> COMMIT FORCE '1704.34.52393';

4. Should this solution fail, clean out these dummy pending transaction entries and then re-enable recovery of distributed transactions:

SQL> DELETE FROM pending_trans$ WHERE local_tran_id='1704.34.52393';
SQL> DELETE FROM pending_sessions$ WHERE local_tran_id='1704.34.52393';
SQL> COMMIT;
SQL> ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY;

5. Use the following command to purge the lost distributed transaction:

SQL> EXEC DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('69.24.6020053');

Transaction Hangs after COMMIT or ROLLBACK

In this final situation, a distributed transaction is in the PREPARED state; however, any attempt to COMMIT or ROLLBACK this transaction causes the command to hang, and the free global transaction table entry wait event appears in dynamic view V$SESSION_WAIT. Issuing the following commands will typically resolve this problem:

1. Check the transaction ID and its status; if it is indeed in a PREPARED state, it fits this scenario:

SQL> SELECT local_tran_id, state FROM dba_2pc_pending;

LOCAL_TRAN_ID     STATE
----------------- -----------
7.12.102          prepared

2. Verify the x$ktuxe table to see if the transaction really exists in database:

SQL> SELECT
 KTUXEUSN,
 KTUXESLT,
 KTUXESQN,
 KTUXESTA Status,
 KTUXECFL Flags
  FROM x$ktuxe
 WHERE ktuxesta!='INACTIVE'
   AND ktuxeusn= 7; --HERE WE MUST CHANGE TO THE UNDO BLOCK. IN THIS CASE IT'S NUMBER 7

KTUXEUSN   KTUXESLT   KTUXESQN   STATUS           FLAGS
---------- ---------- ---------- ---------------- ------------------------
7          12         102        PREPARED         SCO|COL|REV|DEAD

3. Try to COMMIT or ROLLBACK this transaction; however, issuing either of these commands results in a hung session:

SQL> rollback force '7.12.102';

. . .
{ Session HANGs }
. . .

SQL> commit force '7.12.102';
. . .
{ Session HANGs }
. . .

4. Review the contents of dynamic view GV$SESSION_WAIT for this session. It will reveal that the session is now waiting on the free global transaction table entry event:

SQL> SELECT event
       FROM gv$session_wait
      WHERE event LIKE '%free%global%entry%';

5. Use the following query to isolate the transaction causing the issue:

SQL> SELECT a.sql_text, s.osuser, s.username,s.sid
       FROM v$transaction t, v$session s, v$sqlarea a
      WHERE s.taddr = t.addr
        AND a.address = s.prev_sql_addr
        AND t.xidusn = 7
        AND t.xidslot =12
        AND t.xidsqn = 102;

6. Attempt to purge this transaction using procedure PURGE_LOST_DB_ENTRY of package DBMS_TRANSACTION. This results in a different and unexpected error:

SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('7.12.102');COMMIT;
*
ERROR at line 1:
ORA-06510: PL/SQL: unhandled user-defined exception
ORA-06512: at "SYS.DBMS_TRANSACTION", line 94
ORA-06512: at line 1

7. Here is where things get a little complicated! To fix this issue, first delete the transaction from internal Oracle database tables:

SQL> DELETE FROM sys.pending_sub_sessions$ WHERE local_tran_id ='7.12.102';
SQL> DELETE FROM sys.pending_sessions$ WHERE local_tran_id = '7.12.102';
SQL> DELETE FROM sys.pending_trans$ WHERE local_tran_id = '7.12.102';
SQL> COMMIT;

8. Once the transaction has been successfully deleted, insert a fake record into PENDING_TRANS, as done in the previous scenario:

--Disable distributed recovery to not let the RECO background process work on this dummy row
SQL> ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;

SQL> INSERT INTO pending_trans$ (
local_tran_id,
global_tran_fmt,
global_oracle_id,
state,
status,
session_vector,
reco_vector,
type#,
fail_time,
reco_time)
VALUES(
'7.12.102', /* ONLY THIS ROWS SHOULD BE MODIFIED WITH YOUR LOCAL TRAN ID*/
306206,
'XXXXXXX.12345.1.2.3',
'prepared',
'P',
hextoraw( '00000001' ),
hextoraw( '00000000' ),
0,
sysdate,
sysdate );

SQL> INSERT INTO pending_sessions$
VALUES(
'7.12.102',
1,
hextoraw('05004F003A1500000104'),
'C',
0,
30258592,
'',
146
);

SQL> COMMIT;

9. After inserting these fake records, commit the transaction:

SQL> COMMIT FORCE '7.12.102';

10. Finally, purge the transaction from the database:

SQL> EXEC DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('7.12.102');

Monitoring Distributed Transactions

Some useful queries that help to quickly locate and assist in the repair of distributed transaction problems are shown in Listings 20.1, 20.2, 20.3, 20.4, and 20.5.

Listing 20.1 Locating Pending Transactions


SET LINES 200
COL global_tran_id FORMAT A40
COL host FORMAT A40
SELECT
     local_tran_id
    ,global_tran_id
    ,state
    ,mixed
    ,host
    ,commit#
  FROM dba_2pc_pending;

SET LINES 200
COL GLOBAL_TRAN_ID FORMAT A26
COL HOST FORMAT A30
COL OS_TERMINAL FORMAT A30
COL OS_USER FORMAT A20
COL TRAN_COMMENT FORMAT A10
SELECT
     local_tran_id
    ,global_tran_id
    ,state
    ,mixed
    ,advice
    ,tran_comment
    ,os_user
    ,os_terminal
    ,host
    ,db_user
  FROM dba_2pc_pending;


Listing 20.2 Isolating DML for a Distributed Transaction


SELECT a.sql_text, s.osuser, s.username
  FROM v$transaction t, v$session s, v$sqlarea a
 WHERE s.taddr = t.addr
   AND a.address = s.prev_sql_addr
   AND t.xidusn = (select local_tran_id from dba_2pc_pending)
   AND t.xidslot = <second-part-of -transaction-ID>
   AND t.xidsqn = <third-part-of -transaction-ID>;


Listing 20.3 Constructing ROLLBACK FORCE Commands for All Distributed Transactions


SET HEADING OFF
SELECT 'ROLLBACK FORCE '''||local_tran_id||''';'
  FROM dba_2pc_pending;


Listing 20.4 Constructing Purge Commands for All Distributed Transactions


SELECT
'EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('''||local_tran_id||''');COMMIT;'
  FROM dba_2pc_pending;


Listing 20.5 Verifying User and SQL Text for a Single Distributed Transaction


SELECT a.sql_text, s.osuser, s.username
  FROM v$transaction t, v$session s, v$sqlarea a
 WHERE s.taddr = t.addr
   AND a.address = s.prev_sql_addr
   AND t.xidusn = 3335
   AND t.xidslot =14
   AND t.xidsqn = 632719;


Summary

This chapter illustrated some of the most common critical problems related to distributed transactions and how to resolve those problems, especially for database releases prior to Oracle Database 12c. (Oracle 12c offers new features that can help mitigate some of the challenges of distributed transactions.) It addressed such problems as ghost transactions, missing transactions, and hanging transactions and offered step-by-step resolutions. Finally, this chapter presented queries that can help you monitor, locate, and repair distributed transaction problems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset