Expect the unexpected. Whether the failure occurs after the Streams environment has been up and running or occurs while you are attempting to implement the environment, your best defense is knowing the failure points in a Streamed environment, and the most likely causes for the failures. In this section, we will address this very thing, as well as present a methodical approach to troubleshooting errors. We will also discuss various COTS and custom tools that you may find helpful in monitoring and troubleshooting the environment.
The following diagram shows the overall flow of DML/DDL change through the Streams processes from source to destination. Each transition constitutes a potential point of failure (notated by the circled number). We will start at the beginning and work our way through the flow describing the failure point and the most likely causes. The following section is not intended to be an "All Inclusive" list of every issue you may encounter. Rather, it provides some basic, well known issues at specific points that are intended to help you begin to associate where to look when issues are encountered.
At this FP, we want to make sure that committed changes are being recorded in the redo. If the changes are not in the redo, they cannot be captured.
Make sure the statement was successfully committed.
DML/DDL statements must execute and commit successfully to the redo to be eligible for capture. If the DML/DDL is not successfully executed and committed, it will not be captured. If you are using Synchronous Capture, make sure triggers are enabled and valid. Again, the DML/DDL must be successfully committed to "kick off" the Synchronous Capture triggering.
Mitigate potential use of the NOLOGGING option on the session issuing DML/DDL statements with the NOLOGGING option will keep the change from being entered into the redo log completely. To avoid this, all capture databases should be configured with FORCED LOGGING. This ensures that DML/DDL changes are logged in the redo regardless of the use of the NOLOGGING option specification.
You can either create the database or tablespaces with the FORCE LOGGING option or use the alter database or alter tablespace commands to enable FORCE LOGGING.
At this FP, we want to make sure that LogMiner has the configuration and resources it needs.
Make sure supplemental logging has been configured for each table replicated.
select supplemental_log_data_pk pk_logging,
supplemental_log_data_ui ui_logging
from v$database;
PK_LOGGING UI_LOGGING
------------ ------------
NO NO
select lg.log_group_name, lg.table_name, lg.always, lgc.column_name, lgc.position
from dba_log_groups lg,
dba_log_group_columns lgc
where lg.log_group_name = lgc.log_group_name(+);
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY, UNIQUE INDEX) COLUMNS;
ALTER TABLE
<table_name> ADD SUPPLEMENTAL LOG GOUP <log_group_name> (col1, col2) ALWAYS;
This parameter sets the redo allocation concurrency level. If one or more Capture processes are run on the database, this parameter should be set to 1 (Note: this does not affect capture parallelism). Attempting to set the value greater than 1 could result in an ORA-01374: Log_parallelism_max greater than 1 not supported in this release.
Make sure this parameter is set as equal to, or more than, the number of Capture processes configured on the database (this could result in ORA-01353: Exisiting LogMiner Session).
Each Capture process requires its own persistent LogMiner mining session. If the number of Capture processes exceed the number of persistent sessions, then those Capture processes will not be able to start.
This FP focuses on making sure the Capture process has the resources, configuration, and proper rules in place to Capture changes. If you've confirmed the Capture process is up and running, and capturing changes, you will want to look at the rules defined for the process. On thing to keep in mind, it is possible that changes in data handling via applications can impact behavior of a Capture rule if unexpected or new values are not handled in rule conditions that use those values. If your Capture process starts "missing" changes, take a good hard look at the data values, and compare them to the rule conditions.
Make sure Streams has adequate memory allocation. Either implement Automatic Memory Management (10g and higher), or increase the shared pool and streams pool by at least 10MB per Capture process. In the event you see an ORA-01341: LogMiner out-of-memory, you can increase LogMiner memory allocation by increasing the Capture process parameter _sga_size
. The default for this memory allocation is 10MB. Note that if you increase the _sga_size
, you will most likely want to make a corresponding increase the streams pool size as well to accommodate the additional memory allocation for the capture. The following command allocates 50 MB to LogMiner memory
exec dbms_capture_adm.set_parameter('STRMADMIN_CAPTURE','_SGA_SIZE','50'),
Make sure the Capture process (and queue) is enabled.
select capture_name, queue_name, status, error_number, error_ message from dba_capture; CAPTURE_NAME QUEUE_NAME STATUS ERROR_NUMBER ERROR_MESSAGE SCHEMA_HR_CAPTURE SCHEMA_HR_CAPTURE_Q ENABLED
If the Capture process has a status of DISABLED
, attempt to start the process.
If the Capture process has a status of ABORTED
, you should see accompanying errors. Address the errors and attempt to restart the process.
If the Capture process is ENABLED
, but not capturing recent transactions, it may have fallen behind.
select to_char(enqueue_message_create_time,'DD-MON-YY HH24:MI:SS') last_enqueue, to_char(sysdate,'DD-MON-YY HH24:MI:SS') cur_time from v$streams_capture; LAST_ENQUEUE CUR_TIME 23-AUG-09 20:09:12
Notice the lack of enqueued time. Even though our process is enabled, it is not capturing.
In this case, the next step is to check the state of the Capture process.select capture_name, startup_time, state from v$streams_capture;
CAPTURE_NAME STARTUP_TIME STATE
SCHEMA_HR_
Aug 22, 2009
WAITING FOR DICTIONARY REDO: FIRST SCN
CAPTURE
4:01:00 PM
960867
In this case, the Capture process is expecting to find the archive log with the noted SCN to access the Data Dictionary info in the redo log to build the LogMiner Data Dictionary. However, the archive log has been deleted. If possible, restore the archive log with the FIRST_SCN
to the archive log destination folder and re-register it with the Capture process.
alter database register logfile <pathfilename> for <capture_name>;
If the required archive log is not available, you will need to recreate the Capture process.
If a Capture process appears to be "stuck" in the INITIALIZING
state, it could be a result of one of the following:
START_SCN
is significantly higher than the REQUIRED_CHECKPOINT_SCN/FIRST_SCN
and the process is still scanning the redoCheck for issues in the DBA_2PC_PENDING
and DBA_2PC_NEIGHBORS
views and also, reference the Viewing Information about Distributed Transactions chapter in the Oracle Database Administrator's Guide 11g
Verify that the Capture process is assigned to the correct queue.
select capture_name, queue_name, queue_owner
from dba_capture;
CAPTURE_NAME QUEUE_NAME QUEUE_OWNER
--------------- --------------- -------------
HR_CAPTURE HR_CAPTURE_Q STRM_ADMIN
While most expectations are to capture changes, it is quite possible to not want to capture changes. Keep in mind that if any rule in a rule set evaluates to TRUE
, the whole rule set evaluates to TRUE
.
The obvious verification is to make sure that the capture rule set contains rules (an empty rule set can throw an ORA-01405: Fetched column value is null). There should be at least one rule (DML) in the rule set, and an additional rule for DDL if include_DDL
is set to true for the capture.
If the rules were created with the DBMS_STREAMS_ADM
, you can query the following views.
Select * from dba_streams_<level>_rules;
Where <level> is table, schema, or global. Add appropriate where clause to filter on the rules of interest.
Otherwise, query dba_rules
Select * from dba_rules;
Add appropriate where
clause to filter on the rules of interest.
source_database
, and :dml.get_source_database_name()
are the same as the capture database global_name
. If the global_name
has been changed, the Capture process will no longer capture changes.Compare the rule evaluations to the values of the change that was not or was captured. Rules behave the opposite of rule sets. If one or more conditions evaluates to FALSE, the rule returns FALSE.
Is it a negative rule set ()?— remember this is a double negative. If the negative rule evaluates to TRUE, the change is NOT captured.
If the rules set has multiple rules, do one or more rules cancel each other out?
If the rule set has multiple rules defined, make sure they do not cancel each other out; especially if you mix positive and negative rules in the rule set.
Make sure you are, or are not including tags as expected. Verify the tag value in the rule to the tag value being set when the change is originated if possible (most of the time this is done via code, so you can do this. However, if a user manually set the tag you may or may not be able to determine the tag value used).
select r.rule_name, rac.nvn_name ac_name, rac.nvn_value.accessvarchar2() ac_value from dba_rules r, table (r.rule_action_context.actx_list) rac;
Add the appropriate where clause to filter on the rules of interest.
Transformations can affect the capture of an event. If a transformation exists, verify the logic and outcome of the PL/SQL code. If the transformation fails the Capture process is aborted (see chapter 6).
alert.log
Also look for any tracefiles. Check EM for failed capture operations alerts.
This FP focuses on the ability of the Capture process to enqueue LCR's to its assigned Capture queue.
Make sure the capture user either owns the queue or has been granted enqueue privileges on the queue.
select capture_name, capture_user, queue_name, queue_owner
from dba_capture;
CAPTURE_NAME CAPTURE_USER QUEUE_NAME QUEUE_OWNER
--------------- --------------- --------------- ------------
HR_CAPTURE STRM_ADMIN HR_CAPTURE_Q STRM_ADMIN
At this FP, we want to make sure the Propagation process is able to dequeue LCRs from the Capture queue.
alter session set nls_date_format='DD-MON-YY HH24:MI:SS'; select propagation_name, status, error_date, error_message from dba_propagation;
PROPAGATION_NAME STATUS ERROR_DATE ERROR_MESSAGE
------------------ -------- ------------------ ---------------
HR_PROPAGATION ENABLED 25-AUG-09 15:47:58
You can also view information in the DBA_QUEUE_SCHEDULES
view.
If you see an error date but not an error_message, check the propagation site alert log. If the destination site is down, you will see entries similar to the following:
***********************************************************************
Fatal NI connect error 12518, connecting to:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=STRM2_HOST)(PORT=1521)))(CONNECT_DATA=(SID=STRM2)(server=DEDICATED)(CID=(PROGRAM=c:oracle11gr2product11.2.0db_1inORACLE.EXE)(HOST=STM2_HOST)(USER=SYSTEM))))
VERSION INFORMATION:
Time: 25-AUG-2009 17:29:49
Tracing not turned on.
Tns error struct:
ns main err code: 12564
TNS-12564: TNS:connection refused
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
select propagation_name, rule_set_owner, source_queue_owner srcq_owner, source_queue_name srcq_name from dba_propagation;
PROPAGATION_NAME RULE_SET_OWNER SRCQ_OWNER SRCQ_NAME
------------------ -------------------- ------------ -------------
HR_PROPAGATION STRM_ADMIN STRM_ADMIN HR_CAPTURE_Q
These are pretty much the same as for Capture Rules (See Failure Point 3 for more detail on rules). You just want to focus on the propagation rule types (via filter).
If the rules were created with the DBMS_STREAMS_ADM, you can query the following views.
Select * from dba_streams_<level>_rules;
Where <level> is table, schema, or global. Add appropriate where clause to filter on the rules of interest.
Otherwise, query DBA_RULES
Select * from dba_rules;
Add appropriate where clause to filter on the rules of interest.
source_database
, and :dml.get_source_database_name()
are the same as the capture database global_name
alert.log
Also check Propagation process trace files. Check EM for failed propagation operations alerts
The database link (dblink) name must be the same as global_name
of the database to which it connects. If global_name
of the destination database has been changed, the dblink will need to be dropped and recreated (You may see ORA-02082 or other associated errors indicate db global naming is not configured correctly).
The user connection information is correct (the Propagation process will be aborted with invalid username/ password error).
The "using" parameter is a valid sqlnet connection description: either a valid tnsalias or the full connection description string.
tnsnames.ora
configuration is correct/current and in the correct Oracle home.TNS_NAMES
and PATH
settings can unexpectedly redirect the OracleNet connection to use a tnsnames.ora
file that does not contain the tnsalias used in the db link "using clause". If you are in an environment where these may change, consider using the full connection description string in the database link "using" parameter rather than a tnsalias.
Create database link STRM2
connect to strm_admin identified by strm_admin
using '(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = STRM2_HOST)
(PORT = 1521)
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = STRM2)
)
)';
We need to make sure that the Propagation process is able to enqueue the LCR's to its assigned Apply queue.
select propagation_name, destination_dblink dst_dblink, username connected_user,
destination_queue_owner dstq_owner,
destination_queue_name dstrcq_name
from dba_propagation,
dba_db_links
where destination_dblink=db_link;
PROPAGATION_NAME DST_DBLINK CONNECTED_USER DSTQ_OWNER DSTRCQ_NAME
------------------ --------------- --------------- ------------ -------------
HR_PROPAGATION STRM2 STRM_ADMIN STRM_ADMIN HR_APPLY_Q
Here, we want to make sure the Apply process is able to dequeue the LCRs from the Apply queue
If the status id DISABLED
, attempt to restart the process
If the status is ABORTED
, check the accompanying error information, address the errors, and attempt to restart the process.
If APPLY_CAPTURE
is NO, the Apply process will only apply user enqueued events for the rule set.
select apply_name, status, apply_captured apply_capt,
error_number err_num, error_message err_msg
from dba_apply;
APPLY_NAME STATUS APPLY_CAPT ERR_NUM ERR_MSG
--------------- -------- ---------- ---------- -----------------
HR_APPLY ENABLED YES
These are pretty much the same as for Capture Rules (See Failure Point 3 for more detail on rules). You just want to focus on the Apply rule types (via filter).
If the rules were created with the DBMS_STREAMS_ADM
, you can query the following views.
Select * from dba_streams_<level>_rules;
Where <level> is table, schema, or global. Add appropriate where clause to filter on the rules of interest.
Otherwise, query DBA_RULES
.
Select * from dba_rules;
Add appropriate where clause to filter on the rules of interest.
source_database
, and :dml.get_source_database_name()
are the same as the capture database global_name
alert.log
Also look for any tracefiles. Check EM for failed Apply operations alerts
select hwm_message_number message_id, hwm_message_create_time create_time, hwm_time apply_time, ((hwm_time-hwm_message_create_time) * 86400) apply_lag_secs from v$streams_apply_coordinator;
MESSAGE_ID CREATE_TIME APPLY_TIME APPLY_LAG_SECS ---------- -------------- ------------- -------------- 3103329 25-AUG-09 13:01:46 25-AUG-09 13:01:56 10
The APPLY reader will keep dequeing the large transaction until it reaches the transaction end marker. During this time, the coordinator and APPLY slaves will be IDLE. To determine if there is a large transaction in progress, you can check the capture site alert log for large transaction messages and match it with the transaction id in the APPLY views.
If the number of messages in a transaction exceeds the txn_lcr_spill_threshold the Apply process will begin to spill messages to disk. This can slow down the Apply process. Use the DBA_APPLY_SPILL_TXN view to see information on spilled messages.
Oracle's conflict detection determines if old values or data structure in the event LCR do not match the existing data in the destination table. When this happens, it checks to see if there is any conflict resolution defined for the Apply ruleset. If none is found, the LCR is placed in the Apply Error Queue (this is a persistent queue which allows us to query it at will).
If conflict resolution methods are defined for the apply rule they are used to continue processing the LCR. If the conflict is still not resolved after applying all the conflict resolution methods to the LCR, the LCR is put in the Apply Error queue.
Conflict resolution methods are a type of transformation and they can change the data. Make sure the resolution and data change logic are correct and yield the expected result with the values of the LCR.
Check Supplemental Logging at the Database level:
select supplemental_log_data_pk pk_logging, supplemental_log_data_ui ui_logging from v$database;
PK_LOGGING UI_LOGGING ------------ ------------ NO NO
select lg.log_group_name, lg.table_name, lg.always, lgc.column_name, lgc.position from dba_log_groups lg, dba_log_group_columns lgc where lg.log_group_name = lgc.log_group_name(+);
To enable Database level Supplemental Logging issue the following command:
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY, UNIQUE INDEX) COLUMNS;
To enable table level Supplemental Logging issue the following command:
ALTER TABLE <table_name> ADD SUPPLEMENTAL LOG GOUP <log_group_name> (col1, col2) ALWAYS;
If there are errors in the apply error queue, it means that a conflict was encountered that could not be resolved. These errors must be resolved manually. It may be helpful to extract the LCR of the event to determine the old and new values and change type for the change event. This can be done via Enterprise Manager DB Console/Grid Control Streams Apply Error drill down screens, or you can use scripts to extract the LCR like the one below in the Troubleshooting Tools section. One thing to understand at this failure point is that change transactions that reach the Apply Error queue have not been handled by the conflict resolution or transformation design and should not be "fixed and forgotten". The following tells you how to address errors quickly so that the Apply process can be fixed when these errors occur, allowing Streaming to function. However, a record of the errors and causes should be kept and reviewed with the appropriate parties so that appropriate change handling design modifications can be implemented to circumvent the cause of the errors and avoid them in the future.
It is extremely important to monitor the Apply Error queue. If an LCR fails, all subsequent LCR's with a direct or indirect dependency on that LCR will fail as well. This can quickly lead to the Apply site becoming totally out of sync with its master site(s).
When you address the events in the Apply Error queue, you will want to sort them so that you address the oldest error first.
select * from dba_apply_error order by local_transaction_id;
Address the first error. Determine what the change was and what the original values of the row were by extracting the LCR. Compare these value to the existing row in the table at the Apply site.
If may be possible that changes from another master may have been applied after the erred change you are addressing failed. This being the case you may not wish to apply the change at all. If this is the case, you can delete the change using the DBMS_APPLY_ADM.DELETE_ERROR
procedure. If you wish to attempt to reapply the change, update the existing row in the table to match the old values so the Apply process can "find" the old record and it can pass conflict detection (remember, conflict detection compares the old values of the LCR to the current values in the destination table ). You can then reapply the change using the DBMS_APPLY_ADM.EXECUTE_ERROR
. Verify that the change was applied as expected. Note: capitalization in text fields makes a difference.
You may wish to just manually update the existing data in the destination table to match the new values of the change. This is also acceptable.
If you choose to Apply the change to the destination table, and it has been successfully applied, you may want to run the DBMS_APPLY_ADM.EXECUTE_ALL_ERRORS('APPLY')
. The reason being that what may have kept the rest of the errors from being applied, was the first error. Now that you have fixed the first error, the Apply process can now reprocess the rest of the errors without issue. This could drastically reduce the size of your error queue and the number of errors you will need to manually address.
The same method can be accomplished via the EM Streams Management GUI if you prefer that to command line. See the next section for more on troubleshooting Tools.
The following section addresses some useful and recommended tools that will help you monitor and troubleshoot your Streamed environment.
Your new best friend! The following screenshots highlight the EM DBConsole/Grid Control UI's that allow you to monitor, troubleshoot, and administer the Streams processes in each database. If nothing else worked in EM, with exception of these UI's, this tool would still be worth its weight in gold. Keep in mind however, that as a DBA, you should always know what is going on under the buttons and links. Meaning, you should understand the database level commands that the UI issues to accomplish tasks and displays. If things do not work as expected from the UI, you will want to attempt the operation from command line (a.k.a SQLPlus) to determine if the problem is with the tool or the database. As you go through the screen shots, see what you recognize in the UI's from our previous failure point section.
To get to Streams Management, go to the Database tab. Click on the Data Movement link. Click on the Management link under the Streams section. This will bring you to the Streams Overview page.
This page gives a summary view of Streams Capture, Propagate, Apply, and User Messaging. Notice we have an Error in our Apply Process.
By clicking on the Error link we go to a summary page showing the error.
Or we can go directly to the Apply process home page by clicking on the Apply Tab.
The status of the Apply process shows ABORTED. This usually happens when the Apply process encounters an error and the DISABLE_ON_ERROR
is set to Yes
.
You can verify the Apply process configurations by clicking on the View and Edit buttons. To see the errors, click on the Error button.
You now see the records that are in the local DBA_APPLY_ERROR
queue. You can drill down into the LCR by clicking on the icon in the View Error LCRs column.
The LCR information lists all messages contained in the LCR.
Select a message via its radio button and click on Compare Values to extract the LCR message and display the actual change values.
Return to the Apply Errors page, when you have resolved the issue that caused the LCR to fail, you can retry or delete the error by clicking on the appropriate buttons.
Now let's look at the Capture Process. Notice that even though the status is ENABLED the state is n/a. This is a good indication that one or more of the Apply sites associated to the Capture process (via propagation) is not available. When the Apply returns to operations, you should see the state of the Capture process go to CAPTURING CHANGES.
Our next stop is the propagation page (hey! stop, prop, one-hop prop — feeling a hit single coming on!).
You can check statistics for any process by clicking on the Stastics button and view or edit process configuration by clicking on the View and Edit buttons respectively as you see on the Propagation screen.
In the event that you do not have access to the EM console, you can still efficiently monitor your Streamed environment using a number of command line PL/SQL packages and scripts. This chapter provides a few examples that the authors have found most useful over the years. However, do not limit yourself. A number of Streams command line monitoring scripts can be found on Oracle Technical Network, Metalink, and the Internet. Play with them. Find out what works best for you. Keep in mind one rule of thumb that we've seen get even seasoned DBAs: Role level privileges and PL/SQL execution do not mix. The context user of the script (whomever the script runs as, by default is the owner of the script) must be explicitly (directly) granted privileges on certain objects and database packages, procedures, and functions to access them via PL/SQL script. Privilege assignment via a Role is not recognized by the PL/SQL execution engine by design.
If you encounter "object does not exist" errors when you know the object does exist, or "insufficient privilege" errors during PL/SQL execution, log on as a SYSDBA and grant the package owner privileges on the object directly.
Depending on the circumstance, it may not be feasible to manually find and fix divergent data between sites. Oracle supplies a package to help us do just this. This package is the DBMS_COMPARISON
package. This handy-dandy little package can be used to automate the comparison of the data between two objects (a lot faster than we can), and even converge (fix and synchronize) the data. The object itself must be table-based. Meaning, the object stores data in a record format, such as a table type, an index type, or a materialized view type.
You first must define what is to be compared; the schema and object, the dblink to the remote object (the object is assumed to be in the local database). You use the CREATE_COMPARISON
procedure to do this. This procedure also allows you to compare objects that may be in a different schema or have a different name at the remote site, set a limit on when to stop comparing, replace null values with a specific value, just determine how many rows do not match, or show actual rows that do not match, to name a few.
Once you have created the "Comparison", use the DBMS_COMPARISON.COMPARE
function to compare a shared object between two databases.
The function returns a boolean
, but also has an out parameter (scan_info) that is of the COMPARISON_TYPE
record type of which the SCAN_ID
value can then be used to find scan results in the following views.
DBA/USER_COMPARISON_SCAN
DBA/USER _COMPARISON_SCAN_VALUES
DBA/USER _COMPARISON_ROW_DIF
Each execution of the COMPARE
function will generate a new SCAN_ID
for the scan_info
.
The COMPARISON_TYPE
record has the following attributes:
scan_id(number):
The ID of the scanloc_rows_merged(number):
The number of rows in the local database object that differ from matching rows in the remote database objectrmt_rows_merged(number):
The number of rows in the remote database object that differ from matching rows in the local database objectloc_rows_deleted(number):
The number of rows found in the remote database object not found in the local database objectrmt_rows_deleted(number):
The number of rows found in the local database object not found in the remote database objectThe function returns TRUE
is no differences are found, it returns FALSE
if differences are found.
Once you determine if and what row differences there are, you can use the DBMS_COMPARISON.CONVERGE
procedure to either make the remote site look like the local site, or the local site look like the remote site. This is specified via the CONVERGE_OPTIONS
parameter. The converge option choices are either CMP_CONVERGE_LOCAL_WINS
or CMP_CONVERGE_REMOTE_WINS
.
Note: The option is all inclusive. It is an all or nothing, either/or. There is no "this one from the local and that one from the remote". If you need this level of granularity for a converge, you can run the compare, and then create a procedure to loop through the records in the DBA_COMPARISON_ROW_DIF
view and handle each record separately as needed.
Once you have completed your compare and converge operations, you can recheck the compare (this only rechecks the differing rows for the identified compare scan_id)
, purge the scan information from the comparison views, and drop the compare definition as you wish.
For each major release, Oracle provides a Streams HealthCheck script that you can download from Oracle Metalink. This script runs at a SQLPlus prompt, connected as a SYSDBA and queries various Streams views and tables to generate a compressive report detailing the Streams configuration in the instance and the health of the configuration. This script is exceedingly useful to Oracle Support if you require their assistance to resolve issues. To access these scripts, log on to Metalink and search on Streams Health Check Scripts.
If you just wish to run a quick check on your Streams processes the following script provides a basic report for all sites designated in the script. The script uses union all queries to pull information from multiple dblinked sites into the summary report. Of course the expectation here, is that you run the script from a database that has the dblinks to the sites you wish to monitor. To use the script, edit the union queries to include a select from each database link you wish to monitor (our example shows 3 sites). If you are handy with PL/SQL, you can even auto generate this script by creating a procedure that loops through the Stream Administrators database links and "builds" the union queries for each link. A quick hint if you do not already know. You can mimic a loopback database link on the site that you are running the script on by using the global_name
of the local database as the dblink name. This causes the database to "connect" to itself and lets us standardize the query segment.
The script does the following:
DBA_APPLY_ERROR
at the destination site ACK_SCN
is described as being the last "dequeued" SCN acknowledged by the Apply process. If the Apply queue is a buffered queue, then "dequeue" means both dequeued and successfully processed. This because if the buffered apply queue is flushed from memory, it needs to be reloaded from the source Capture process. To avoid potential event/LCR loss, the Apply process must not acknowledge the SCN until it has completed processing it.We like to call this script check_rep_status.sql
. It should be run as the stream admin user from SQLPlus and it goes like this:
--CHECK_REP_STATUS.sql set echo off set serveroutput on set pagesize 150 set linesize 100 column capture_site format a45 column apply_site format a45 column propagation_site format a45 column error_message format a45 alter session set nls_date_format='DD-MON-YY HH24:MI:SS'; exec dbms_output.put_line('**********************************'), exec dbms_output.put_line('System Capture Progress'), exec dbms_output.put_line('**********************************'), select c.capture_site, c.capture_queue, c.last_captured_scn from (select source_database capture_site, queue_name capture_queue, last_enqueued_scn last_captured_scn from dba_capture@STRM1 union all select source_database, queue_name, last_enqueued_scn from dba_capture@STRM2 union all select source_database, queue_name, last_enqueued_scn from dba_capture@STRM3 ) c order by c.capture_site; exec dbms_output.put_line('**********************************'), exec dbms_output.put_line('System Apply Progress'), exec dbms_output.put_line('**********************************'), select a.apply_site, a.apply_name, a.OLDEST_SCN_NUM, a.last_apply_scn from ( select (select global_name from global_name@STRM1) apply_site, apply_name, OLDEST_SCN_NUM ,DEQUEUED_MESSAGE_NUMBER last_apply_scn from v$streams_apply_reader@STRM1 union all select (select global_name from global_name@STRM2) db_name, apply_name, OLDEST_SCN_NUM, DEQUEUED_MESSAGE_NUMBER from v$streams_apply_reader@STRM2 union all select (select global_name from global_name@STRM3) db_name, apply_name, OLDEST_SCN_NUM, DEQUEUED_MESSAGE_NUMBER from v$streams_apply_reader@STRM3 ) a order by a.apply_name, a.apply_site; exec dbms_output.put_line('**********************************'), exec dbms_output.put_line('Apply Progress Errors'), exec dbms_output.put_line('**********************************'), select c.apply_site, c.apply_name, c.source_commit_scn, c.message_number, c.error_number, c.error_message, c.error_creation_time from ( select (select global_name from global_name@STRM1) apply_site, apply_name, source_commit_scn, message_number, error_number, error_message, error_creation_time from dba_apply_error@STRM1 union all (select global_name from global_name@STRM2) apply_site, apply_name, source_commit_scn, message_number, error_number, error_message, error_creation_time from dba_apply_error@STRM2 union all select (select global_name from global_name@STRM3) apply_site, apply_name, source_commit_scn, message_number, error_number, error_message, error_creation_time from dba_apply_error@STRM3 ) c order by c.apply_site, c.apply_name; exec dbms_output.put_line('**********************************'), exec dbms_output.put_line('System Propagation status'), exec dbms_output.put_line('**********************************'), select p.propagation_site, p.propagation_name, p.aked_scn acknowledged_scn, p.status, p.error_message, p.error_date from (select (select global_name from global_name@STRM1) propagation_site, propagation_name, acked_scn, status, error_message, error_date from dba_propagation@STRM1 union all select (select global_name from global_name@STRM2) db_name,propagation_name, acked_scn, status, error_message, error_date from dba_propagation@STRM2 union all select (select global_name from global_name@STRM3) db_name, propagation_name, acked_scn, status, error_message, error_date from dba_propagation@STRM3 ) p; /
While the UI in Enterprise Manager DB Console/Grid Control Streams Apply Error drill-down into the LCR values is the next best thing since sliced bread, there may be times when you only have command line access available to you, or you wish to extract an LCR to a spool file for analysis. Oracle supplies scripts (found in "Displaying Detailed Information about Apply Errors" section, in the Monitoring Oracle Streams Apply Processes chapter in Oracle Streams Concepts and Administration Guide 11g) to help you "unpack" an LCR associated with an erred transaction for analysis. We have provided a PL/SQL procedure below to create these scripts for you as well as added one or two enhancements (like the ability to print a specific erred transaction). You will need to make sure the schema used to build the scripts has explicit privileges on the necessary objects and procedures/functions referenced in the scripts (you'll quickly find out which ones you need the first time you execute the scripts). As you become familiar with how these scripts work, you can adjust the code to format the output and expand the functionality as you wish. As with the quick check scripts, it is recommended that you create these in the stream admin schema.
--LCR_EXTRACT.sql SET heading OFF --SET feedback OFF --SET echo OFF --SET verify OFF SET pagesize 0 SET linesize 10000 SET serveroutput on set echo ON set feedback ON exec dbms_output.put_line('create print_any'), -- ***************************************************************** -- Description: Print the contents of the sys.anydata payload -- -- Input Parameters: Data, sys.anydata -- -- Output/Returned Parameters: NONE -- -- Error Conditions Raised: NONE -- -- Notes: This procudure is call by: -- print_lcr -- SET SERVEROUTPUT ON before calling the procedure -- -- ***************************************************************** create or replace PROCEDURE print_any (DATA IN SYS.ANYDATA) IS tn VARCHAR2 (61); str VARCHAR2 (4000); chr1 CHAR (255); num NUMBER; dat DATE; rw RAW (4000); res NUMBER; BEGIN IF DATA IS NULL THEN dbms_output.put_line ('NULL value'), RETURN; END IF; tn := DATA.gettypename (); IF tn = 'SYS.VARCHAR2' THEN res := DATA.getvarchar2 (str); dbms_output.put_line (str); ELSIF tn = 'SYS.CHAR' THEN res := DATA.getchar (chr1); dbms_output.put_line (chr1); ELSIF tn = 'SYS.VARCHAR' THEN res := DATA.getvarchar (chr1); dbms_output.put_line (chr1); ELSIF tn = 'SYS.NUMBER' THEN res := DATA.getnumber (num); dbms_output.put_line (num); ELSIF tn = 'SYS.DATE' THEN res := DATA.getdate (dat); dbms_output.put_line (dat); ELSIF tn = 'SYS.RAW' THEN res := DATA.getraw (rw); dbms_output.put_line (RAWTOHEX (rw)); ELSE dbms_output.put_line ('typename is ' || tn); END IF; END print_any; / show errors exec dbms_output.put_line('create print_lcr'), -- ***************************************************************** -- Description: Print the contents of the sys.anydata payload -- -- Input Parameters: lcr payload as sys.anydata -- -- Output/Returned Parameters: NONE -- -- Error Conditions Raised: NONE -- -- Notes: This procudure is call by: -- print_errors -- print_transaction -- Set serveroutput on before running this procedure -- ***************************************************************** create or replace PROCEDURE print_lcr (lcr IN SYS.ANYDATA) IS typenm VARCHAR2 (61); ddllcr SYS.lcr$_ddl_record; proclcr SYS.lcr$_procedure_record; rowlcr SYS.lcr$_row_record; res NUMBER; newlist SYS.lcr$_row_list; oldlist SYS.lcr$_row_list; ddl_text CLOB; BEGIN typenm := lcr.gettypename (); dbms_output.put_line ('type name: ' || typenm); IF (typenm = 'SYS.LCR$_DDL_RECORD') THEN res := lcr.getobject (ddllcr); dbms_output.put_line ('source database: ' || ddllcr.get_source_database_name); dbms_output.put_line ('owner: ' || ddllcr.get_object_owner); dbms_output.put_line ('object: ' || ddllcr.get_object_name); dbms_output.put_line ('is tag null: ' || ddllcr.is_null_tag); DBMS_LOB.createtemporary (ddl_text, TRUE); ddllcr.get_ddl_text (ddl_text); dbms_output.put_line ('ddl: ' || ddl_text); DBMS_LOB.freetemporary (ddl_text); ELSIF (typenm = 'SYS.LCR$_ROW_RECORD') THEN res := lcr.getobject (rowlcr); dbms_output.put_line ('source database: ' || rowlcr.get_source_database_name); dbms_output.put_line ('owner: ' || rowlcr.get_object_owner); dbms_output.put_line ('object: ' || rowlcr.get_object_name); dbms_output.put_line ('is tag null: ' || rowlcr.is_null_tag); dbms_output.put_line ('command_type: ' || rowlcr.get_command_type); oldlist := rowlcr.get_values ('OLD'), FOR i IN 1 .. oldlist.COUNT LOOP IF oldlist (i) IS NOT NULL THEN dbms_output.put_line ( 'old(' || i || '): ' || oldlist (i).column_name); print_any (oldlist (i).DATA); END IF; END LOOP; newlist := rowlcr.get_values ('NEW'), FOR i IN 1 .. newlist.COUNT LOOP IF newlist (i) IS NOT NULL THEN dbms_output.put_line ( 'new(' || i || '): ' || newlist (i).column_name); print_any (newlist (i).DATA); END IF; END LOOP; ELSE dbms_output.put_line ('Non-LCR Message with type ' || typenm); END IF; END print_lcr; / show error exec dbms_output.put_line('create print_errors'), -- ***************************************************************** -- Description: Print the contents of the DBA_APPLY_ERROR queue -- -- Input Parameters: NONE -- -- Output/Returned Parameters: NONE -- -- Error Conditions Raised: NONE -- -- Notes: None -- -- ***************************************************************** create or replace PROCEDURE print_errors IS CURSOR c IS SELECT local_transaction_id, source_database, message_count, error_number, error_message FROM dba_apply_error ORDER BY source_database, source_commit_scn; i NUMBER; txnid VARCHAR2 (30); sourcedb VARCHAR2 (128); msgcnt NUMBER; errnum NUMBER := 0; errno NUMBER; errmsg VARCHAR2 (500); lcr SYS.ANYDATA; r NUMBER; BEGIN FOR r IN c LOOP errnum := errnum + 1; msgcnt := r.message_count; txnid := r.local_transaction_id; sourcedb := r.source_database; errmsg := r.error_message; errno := r.error_number; dbms_output.put_line ('***************************************'), dbms_output.put_line ('----- ERROR #; || errnum'), dbms_output.put_line ('----- Local Transaction ID: ' || txnid); dbms_output.put_line ('----- Source Database: ' || sourcedb); dbms_output.put_line ('----Error Number: ' || errno); dbms_output.put_line ('----Message Text: ' || errmsg); FOR i IN 1 .. msgcnt LOOP dbms_output.put_line ('--message: ' || i); lcr := DBMS_APPLY_ADM.get_error_message (i, txnid); print_lcr (lcr); END LOOP; END LOOP; END print_errors; / show error exec dbms_output.put_line('create print_error_id'), -- ***************************************************************** -- Description: Print error infomation for a specific transaction id -- in DBA_APPLY_ERROR -- -- Input Parameters: Ltran_id: the local transaction Id of -- the erred transaction -- -- Output/Returned Parameters: NONE -- -- Error Conditions Raised: NONE -- -- Notes: None -- -- ***************************************************************** create or replace PROCEDURE print_error_id (ltran_id in varchar2) IS CURSOR c IS SELECT local_transaction_id, source_database, message_count, error_number, error_message FROM dba_apply_error WHERE local_transaction_id = ltran_id ORDER BY source_database, source_commit_scn; i NUMBER; txnid VARCHAR2 (30); sourcedb VARCHAR2 (128); msgcnt NUMBER; errnum NUMBER := 0; errno NUMBER; errmsg VARCHAR2 (500); lcr SYS.ANYDATA; r NUMBER; BEGIN FOR r IN c LOOP errnum := errnum + 1; msgcnt := r.message_count; txnid := r.local_transaction_id; sourcedb := r.source_database; errmsg := r.error_message; errno := r.error_number; dbms_output.put_line ('****************************************'), dbms_output.put_line ('----- ERROR #; || errnum'), dbms_output.put_line ('----- Local Transaction ID: ' || txnid); dbms_output.put_line ('----- Source Database: ' || sourcedb); dbms_output.put_line ('----Error Number: ' || errno); dbms_output.put_line ('----Message Text: ' || errmsg); FOR i IN 1 .. msgcnt LOOP dbms_output.put_line ('--message: ' || i); lcr := DBMS_APPLY_ADM.get_error_message (i, txnid); print_lcr (lcr); END LOOP; END LOOP; END print_error_id; / show error exec dbms_output.put_line('create print_transaction'), -- ***************************************************************** -- Description: Print the lcr transaction metadata for the local -- transaction id passed in -- Input Parameters: Ltxnid local transaction ID of the erred -- transaction -- -- Output/Returned Parameters: NONE -- -- Error Conditions Raised: NONE -- -- Notes: None -- -- ***************************************************************** create or replace PROCEDURE print_transaction ( ltxnid IN VARCHAR2) IS i NUMBER; txnid VARCHAR2 (30); sourcedb VARCHAR2 (128); msgcnt NUMBER; errno NUMBER; errmsg VARCHAR2 (128); lcr SYS.ANYDATA; BEGIN SELECT local_transaction_id, source_database, message_count, error_number, error_message INTO txnid, sourcedb, msgcnt, errno, errmsg FROM dba_apply_error WHERE local_transaction_id = ltxnid; dbms_output.put_line ('----- Local Transaction ID: ' || txnid); dbms_output.put_line ('----- Source Database: ' || sourcedb); dbms_output.put_line ('----Error Number: ' || errno); dbms_output.put_line ('----Message Text: ' || errmsg); FOR i IN 1 .. msgcnt LOOP dbms_output.put_line ('--message: ' || i); --gets the LCR lcr := DBMS_APPLY_ADM.get_error_message (i, txnid); print_lcr (lcr); END LOOP; END print_transaction; / show error spool off /
In this section we offer some examples of how to "get around" a couple of troublesome situations. While we have been offering advice and best practice recommendations throughout this book, this section is dedicated to out-of-the-mainstream techniques. As this publication matures, we hope to expand this section to include tricks and tips suggested by our readers. We would love to see this section grow into its own chapter!
If your Propagation is plagued by constant network disconnection or interruption causing the process to disable or abort throughout the day, you can automate a job to check the status of the Propagation process and attempt to restart it if it finds it stopped. The following script does this on a 15 minute interval. It also creates an audit table that the job populates when it runs, to allow you to review the down history of the Propagation and associated errors.
--AUTOFIX_PROP.sql
set serveroutput on spool c:create_autofix_prop.txt exec dbms_output.put_line('create table prop audit'), --********************************* create table strmadmin.propagation_audit ( PROPAGATION_NAME VARCHAR2(30) ,SOURCE_QUEUE_NAME VARCHAR2(30) ,DESTINATION_QUEUE_NAME VARCHAR2(30) ,DESTINATION_DBLINK VARCHAR2(128) ,STATUS VARCHAR2(8) ,ERROR_MESSAGE VARCHAR2(4000) ,ERROR_DATE DATE ); exec dbms_output.put_line('create ckprop proc'), -- ***************************************************************** -- Description: Query the status for all propagation jobs -- scheduled in dba_propagation. -- Insert a record of the status for each job into -- the propagation_audit table. -- If the job is disabled, attempt to enable it -- If an error occurs on enable attempt it is logged -- in the propagation_audit table. -- -- Input Parameters: None -- -- Output/Returned Parameters: NONE -- -- Error Conditions Raised: NONE. Errors are recorded in -- propagation_audit table. -- -- Notes: This procedure is called by a scheduled -- job that runs every 15 mins -- -- Author: A. McKinnell -- ******************************** create or replace procedure strmadmin.ckprop_enable as errnum number; errmsg varchar2(4000); cursor prop_status is select propagation_name, destination_dblink, status, source_queue_name, destination_queue_name, error_message, error_date from dba_propagation where status != 'ENABLED'; begin for rec in prop_status loop insert into propagation_audit values (rec.propagation_name, rec.source_queue_name, rec.destination_queue_name, rec.destination_dblink, rec.status, rec.error_message, rec.error_date); commit; begin dbms_aqadm.enable_propagation_schedule(rec.source_queue_name, rec.destination_dblink); exception when others then errnum := SQLCODE; errmsg := SQLERRM; insert into propagation_audit values ('CKPROP_ENABLE', rec.source_queue_name, 're-enable propagation for', rec.destination_dblink, 'ERROR', errnum ||': ' ||errmsg, sysdate); commit; end; end loop; exception when others then errnum := SQLCODE; errmsg := SQLERRM; insert into propagation_audit values ('CKPROP_ENABLE', 'Exception handler', null, null, 'ERROR', errnum ||': ' ||errmsg, sysdate); commit; end ckprop_enable; / show error ------------------------------- --Schedule a job to run every 15 mins to re-enable any disabled propagation. ------------------------------- exec dbms_output.put_line('schedule chk prop job'), exec DBMS_SCHEDULER.CREATE_JOB ( - job_name => 'propagation_check', - job_type => 'STORED_PROCEDURE', - job_action => 'ckprop_enable', - number_of_arguments => 0, - start_date =>sysdate, - repeat_interval => 'FREQ=MINUTELY;INTERVAL=15', - end_date => null, - enabled => TRUE, - auto_drop=>FALSE, - comments => 'This job kicks off every 15 minutes and checks dba_propagation for any disabled propagation schedules and attempts to re-enable. Audits of this job are captured in table strmadmin.propagation_audit'), spool off /
Based on what we reviewed in Chapter 6 on Rules, we can modify a rule condition directly. Use this power with caution. Changes made via this method may not be reflected in existing rule metadata. If you make a change, document it to avoid confusion down the road.
An Example: Enterprise Manager Streams Creation Wizard Created Process with INCLUDE_TAGGED_LCR = TRUE
, but you want it to be FALSE
.
As mentioned earlier, if you use the EM Streams Setup wizard to configure your Streams, be aware that it graciously sets all process INCLUDE_TAGGED_LCR
to TRUE.
select streams_name, streams_type, include_tagged_lcr include_tag
from dba_streams_rules;
STREAMS_NAME STREAMS_TYPE INCLUDE_TAGGED_LCR
------------------------------ ------------ ------------------
-----------------
HR_APPLY APPLY YES
HR_APPLY APPLY YES
HR_PROPAGATION PROPAGATION YES
HR_PROPAGATION PROPAGATION YES
HR_PROPAGATION PROPAGATION YES
HR_PROPAGATION PROPAGATION YES
HR_CAPTURE CAPTURE YES
HR_CAPTURE CAPTURE YES
8 rows selected.
To change the behavior without dropping the Capture process and recreating it manually with the DBMS_STREAMS_ADM.ADD_SCHEMA_RULES
setting INCLUDE_TAGGED_LCR => FALSE
, you can alter the existing Rule for the capture.
Using SQLPlus:
First determine the Rule name and condition text for the Capture Rule.
set long 4000
select streams_type, streams_name, rule_owner, rule_name, rule_condition from dba_streams_rules where streams_type = 'CAPTURE' and streams_name = 'HR_CAPTURE';
STREAMS_TYPE STREAMS_NAME RULE_OWNER
------------ ---------------------------- ------------------------------
RULE_NAME
------------------------------
RULE_CONDITION
------------------------------------------------------------------------
CAPTURE HR_CAPTURE STRM_ADMIN
HR19
((((:ddl.get_object_owner() = 'HR' or :ddl.get_base_table_owner() = 'HR') and :ddl.get_source_database_name() = 'STRM1' )) and (:ddl.get_compatible() <= dbms_streams.compatible_11_1))
CAPTURE HR_CAPTURE STRM_ADMIN
HR18
((((:dml.get_object_owner() = 'HR') and :dml.get_source_database_name() = 'STRM1 POSITIVE SCHEMA')) and (:dml.get_compatible() <= dbms_streams.compatible_11_1))
You can also find the condition in the dba_rules
table, as shown:
select rule_name, rule_condition from dba_rules where rule_name = 'HR18';
RULE_NAME
------------------------------
RULE_CONDITION
--------------------------------------------------------------------
HR18
((((:dml.get_object_owner() = 'HR') and :dml.get_source_database_name()
= 'STRM1 POSITIVE SCHEMA')) and (:dml.get_compatible() <= dbms_streams.compatible_11_1))
In our case we are only going to change the DML Rule. The Rule name is HR18
. We copy and edit the text to include an evaluation for a null tag (the equivalent of setting INCLUDE_TAGGED_LCR = FALSE
when adding Rules via DBMS_STREAMS_ADM)
.
((((:dml.get_object_owner() = 'HR') and :dml.is_null_tag() = 'Y' and :dml.get_source_database_name()
= 'STRM1 POSITIVE SCHEMA')) and (:dml.get_compatible() < dbms_streams.compatible_11_1))
Next, alter the Rule to use the new condition (make sure to escape the single quotes in the condition string).
The syntax is dbms_rule_adm.alter_rule('<rulename>','<condition>')
;
begin dbms_rule_adm.alter_rule('HR18', '((((:dml.get_object_owner() = ''HR'') ' || 'and :dml.is_null_tag() = ''Y'' ' || 'and :dml.get_source_database_name() = ''STRM1 POSITIVE SCHEMA'')) ' || 'and (:dml.get_compatible() <= dbms_streams.compatible_11_1))'), end; /
select streams_name, rule_name, rule_condition from dba_streams_rules where rule_name = 'HR18';
STREAMS_NAME RULE_NAME ------------------------------ ------------------------------ RULE_CONDITION -------------------------------------------------------------------- HR_CAPTURE HR18 ((((:dml.get_object_owner() = 'HR') and :dml.is_null_tag() = 'Y' and :dml.get_source_database_name() = 'STRM1 POSITIVE SCHEMA')) and (:dml.get_compatible() <= dbms_streams.compatible_11_1))
One thing to be aware of when using this method is that changing the Rule condition itself to exclude tagged LCR's does not change the INCLUDE_TAGGED_LCR
value. That is ok, it is the rule_condition
that is used for the actual evaluation, not the INCLUDE_TAGGED_LCR
value. You will also notice the columns original_rule_condition
and same_rule_condition
. These show the original Rule condition for the Streams name and whether it is the same as the current Rule condition (YES/NO) respectively. If the change does not work as expected you can use the original_rule
value to quickly fall back to the original condition. In our example we see the original condition that we started with and NO that indicates the Rule condition for the capture has been changed.
select streams_name, rule_name,include_tagged_lcr, original_rule_condition, same_rule_condition from dba_streams_rules where rule_name = 'HR18';
STREAMS_NAME RULE_NAME INCLUDE_TAGGED_LCR -------------------- ----------- -------------------- ORIGINAL_RULE_CONDITION --------------------------------------------------- SAME_RULE_CONDITION ----------------------------- HR_CAPTURE HR18 YES ((((:dml.get_object_owner() = 'HR') and :dml.get_source_database_name() = 'STRM1' )) and (:dml.get_compatible() <= dbms_streams.compatible_11_1)) NO
While the above method is directed at changing the Rule condition to not capture tagged LCRs, the same method can be used to add, change, or delete any condition in the Rule.