Troubleshooting the Database
This chapter contains several recipes that show how to use the database’s built-in diagnostic infrastructure to resolve database performance issues. You’ll learn how to use ADRCI, the Automatic Diagnostic Repository Command Interpreter, to perform various tasks such as checking the database alert log, creating a diagnostic package for sending to Oracle Support engineers, and running a proactive health check of the database.
Many common Oracle database performance-related issues occur when you have space issues with the temporary tablespace or when you’re creating a large index or a large table with the create table as select (CTAS) technique. Undo tablespace space issues are another common source of trouble for many DBAs. This chapter has several recipes that help you proactively monitor, diagnose, and resolve temporary tablespace and undo tablespace–related issues. When a production database seems to hang, there are ways to collect critical diagnostic data for analyzing the causes, and this chapter shows you how to log in to an unresponsive database to collect diagnostic data.
7-1. Determining the Optimal Undo Retention Period
Solution
You can specify the length of time Oracle will retain undo data after a transaction commits, by specifying the UNDO_RETENTION parameter. To determine the optimal value for the UNDO_RETENTION parameter, you must first calculate the actual amount of undo that the database is generating. Once you know approximately how much undo the database is generating, you can calculate a more precise value for the UNDO_RETENTION parameter. Use the following formula to calculate the optimal value of the UNDO_RETENTION parameter:
OPTIMAL UNDO_RETENTION = UNDO SIZE/(DB_BLOCK_SIZE*UNDO_BLOCK_PER_SEC)
You can calculate the space allocated for undo in your database by issuing the following query:
SQL> select sum(d.bytes) "undo"
2 from v$datafile d,
3 v$tablespace t,
4 dba_tablespaces s
5 where s.contents = 'UNDO'
6 and s.status = 'ONLINE'
7 and t.name = s.tablespace_name
8 and d.ts# = t.ts#;
UNDO
----------
104857600
SQL>
You can calculate the value of UNDO_BLOCKS_PER_SEC with the following query:
SQL> select max(undoblks/((end_time-begin_time)*3600*24))
2 "UNDO_BLOCK_PER_SEC"
3 FROM v$undostat;
UNDO_BLOCK_PER_SEC
------------------
7.625
SQL>
You most likely remember the block size for your database—if not, you can look it up in the SPFILE or find it by issuing the command show parameter db_block_size. Let’s say the db_block_size is 8 KB (8,192 bytes) for your database. You can then calculate the optimal value for the UNDO_RETENTION parameter using the formula shown earlier in this recipe—for example, giving a result in seconds: 1,678.69 = 104,857,600/(7.625 * 8,192). In this case, assigning a value of 1,800 seconds for the undo_retention parameter is appropriate, because it’s a bit more than what is indicated by our formula for computing the value of this parameter.
The previous example is predicated on the assumption that you have a limited amount of disk space available. If disk space isn’t an issue for you, you can instead choose the ideal value for the UNDO_RETENTION parameter for your instance. In this case, you choose a value for the UNDO_RETENTION parameter first and use that to calculate the size of your undo tablespace. The formula now becomes:
size of undo = undo_retention * db_block_size :undo_block_per_sec
You can use the following query to do all the calculations for you:
select d.undo_size/(1024*1024) "Current UNDO SIZE",
SUBSTR(e.value,1,25) "UNDO RETENTION",
(to_number(e.value) * to_number(f.value) *
g.undo_block_per_sec) / (1024*1024)
"Necessary UNDO SIZE"
from (
select sum(a.bytes) undo_size
from v$datafile a,
v$tablespace b,
dba_tablespaces c
where c.contents = 'UNDO'
and c.status = 'ONLINE'
and b.name = c.tablespace_name
and a.ts# = b.ts#
) d,
v$parameter e,
v$parameter f,
(
Select max(undoblks/((end_time-begin_time)*3600*24))
undo_block_per_sec
from v$undostat
) g
where e.name = 'undo_retention'
and f.name = 'db_block_size';
Current UNDO SIZE
-----------------
500
UNDO RETENTION
--------------
10800
Necessary UNDO SIZE
-------------------
674.611200
The query shows that you should set the size of the undo tablespace in this database to around 675 MB. The final result is 674.611200. That rounds to 675 MB.
Here is how to set the undo retention to 30 minutes for an instance by updating the value of the UNDO_RETENTION parameter in the SPFILE:
SQl> alter system set undo_retention=1800 scope=both;
System altered.
SQL>
How It Works
Oracle retains the undo data for both read consistency purposes as well as to support Oracle Flashback operations, for the duration you specify with the UNDO_RETENTION parameter. Automatic undo management is the default mode for undo management starting with release 11g. If you create a database with the Database Configuration Assistant (DBCA), Oracle automatically creates an auto-extending undo tablespace named UNDOTBS1. If you’re manually creating a database, you specify the undo tablespace in the database creation statement, or you can add the undo tablespace at any point. If a database doesn’t have an explicit undo tablespace, Oracle will store the undo records in the SYSTEM tablespace.
Once you set the UNDO_TABLESPACE initialization parameter, Oracle automatically manages undo retention for you. Optionally, you can set the UNDO_RETENTION parameter to specify how long Oracle retains older undo data before overwriting it with newer undo data.
The first formula in the Solution section shows how to set the length of the UNDO_RETENTION parameter for a specific undo tablespace size. The second query calculates the size of the undo tablespace for a specifc value for the UNDO_RETENTION parameter that’s based on your instance’s undo needs, which in turn depend on current database activity. Note that we rely on the dynamic view V$UNDOSTAT to calculate the value for the undo retention period. Therefore, it’s essential that you execute your queries after the database has been running for some time, thus ensuring that it has had the chance to process a typical workload.
If you configure the UNDO_RETENTION parameter, the undo tablespace must be large enough to hold the undo generated by the database within the time you specify with the UNDO_RETENTION parameter. When a transaction commits, the database may overwrite its undo data with newer undo data. The undo retention period is the minimum time for which the database will attempt to retain older undo data. After it saves the undo data for the period you specified for the UNDO_RETENTION parameter, the database marks that undo data as expired and makes the space occupied by that data available to write undo data for new transactions.
By default, the database uses the following criteria to determine how long it needs to retain undo data:
It’s somewhat difficult to understand how the Oracle database handles the retention of undo data. Here’s a brief summary of how things work:
The key to figuring out the right size for the undo tablespace or the correct setting for the UNDO_RETENTION parameter is to understand the nature of the current database workload. In order to understand the workload characteristics, it’s important to examine the V$UNDOSTAT view, because it contains statistics showing how the database is utilizing the undo space, as well as information such as the length of the longest-running queries. You can use this information to calculate the size of the undo space for the current workload your database is processing. Note that each row in the V$UNDOSTAT view shows undo statistics for a 10-minute time interval. The table contains a maximum of 576 rows, each for a 10-minute interval. Thus, you can review undo usage for up to 4 days in the past.
Here are the key columns you should monitor in the V$UNDOSTAT view for the time period you’re interested in—ideally, the time period should include the time when your longest-running queries are executing. You can use these statistics to size both the UNDO_TABLESPACE as well as the UNDO_RETENTION initialization parameters.
begin_time: Beginning of the time interval.
end_time: End of the time interval.
undoblks: Number of undo blocks the database consumed in a 10-minute interval; this is what we used in our formula for the estimation of the size of the undo tablespace.
txncount: Number of transactions executed in a 10-minute interval.
maxquerylen: This shows the length of the longest query (in seconds) executed in this instance during a 10-minute interval. You can estimate the size of the UNDO_RETENTION parameter based on the maximum value of the MAXQUERYLEN column.
maxqueryid: Identifier for the longest-running SQL statement in this interval.
nospaceerrcnt: The number of times the database didn’t have enough free space available in the undo tablespace for new undo data, because the entire undo tablespace was being used by active transactions; of course, this means that you need to add space to the undo tablespace.
tuned_undoretention: The time, in seconds, for which the database will retain the undo data after the database commits the transaction to which the undo belongs.
The following query based on the V$UNDOSTAT view shows how Oracle automatically tunes undo retention (check the TUNED_UNDORETENTION column) based on the length of the longest-running query (MAXQUERYLEN column) in the current instance workload.
SQL> select to_char(begin_time,'hh24:mi:ss') BEGIN_TIME,
2 to_char(end_time,'hh24:mi:ss') END_TIME,
3 maxquerylen,nospaceerrcnt,tuned_undoretention
4 from v$undostat;
BEGIN_TI END_TIME MAXQUERYLEN NOSPACEERRCNT TUNED_UNDORETENTION
-------- -------- ----------- ------------- -------------------
12:15:35 12:25:35 592 0 1492
12:05:35 12:15:35 1194 0 2094
11:55:35 12:05:35 592 0 1493
11:45:35 11:55:35 1195 0 2095
11:35:35 11:45:35 593 0 1494
11:25:35 11:35:35 1196 0 2097
11:15:35 11:25:35 594 0 1495
11:05:35 11:15:35 1195 0 2096
10:55:35 11:05:35 593 0 1495
10:45:35 10:55:35 1198 0 2098
...
SQL>
Note that the value of the TUNED_UNDORETENTION column fluctuates continuously, based on the value of the maximum query length (MAXQUERYLEN) during any interval. You can see that the two columns are directly related to each other, with Oracle raising or lowering the tuned undo retention based on the maximum query length during a given interval (of 10 minutes). The following query shows the usage of undo blocks and the transaction count during each 10-minute interval.
SQL> select to_char(begin_time,'hh24:mi:ss'),to_char(end_time,'hh24:mi:ss'),
2 maxquerylen,ssolderrcnt,nospaceerrcnt,undoblks,txncount from v$undostat
3 order by undoblks
4 /
TO_CHAR( TO_CHAR( MAXQUERYLEN SSOLDERRCNT NOSPACEERRCNT UNDOBLKS TXNCOUNT
-------- -------- ----------- ----------- ------------- ---------- ----------
17:33:51 17:36:49 550 0 0 1 18
17:23:51 17:33:51 249 0 0 33 166
17:13:51 17:23:51 856 0 0 39 520
17:03:51 17:13:51 250 0 0 63 171
16:53:51 17:03:51 850 0 0 191 702
16:43:51 16:53:51 245 0 0 429 561
6 rows selected.
SQL>
Oracle provides an easy way to help set the size of the undo tablespace as well as the undo retention period, through the OEM Undo Advisor interface. You can specify the length of time for the advisor’s analysis, for a period going back to a week—the advisor uses the AWR hourly snapshots to perform its analysis. You can specify the undo retention period to support a flashback transaction query. Alternatively, you can let the database determine the desired undo retention based on the longest query in the analysis period.
7-2. Finding What’s Consuming the Most Undo
Problem
Often, one or two user sessions seem to be hogging the undo tablespaces. You’d like to identify the user and the SQL statement that are using up all that undo space.
Solution
Use the following query to find out which SQL statement has run for the longest time in your database.
SQL> select s.sql_text from v$sql s, v$undostat u
where u.maxqueryid=s.sql_id;
You can join the V$TRANSACTIONand the V$SESSION views to find out the most undo used by a session for a currently executing transaction, as shown here:
SQL> select s.sid, s.username, t.used_urec, t.used_ublk
from v$session s, v$transaction t
where s.saddr = t.ses_addr
order by t.used_ublk desc;
You can also issue the following query to find out which session is currently using the most undo in an instance:
SQL>select s.sid, t.name, s.value
from v$sesstat s, v$statname t
where s.statistic# = t.statistic#
and t.name = 'undo change vector size'
order by s.value desc;
The query’s output relies on the statistic undo change vector size in the V$STATNAME view, to show the SID for the sessions consuming the most undo right now. The V$TRANSACTION view shows details about active transactions. Here’s a query that joins the V$TRANSACTION, V$SQL and V$SESSION views and shows you the user and the SQL statement together with the amount of undo space that’s consumed by the SQL statement:
SQL> select sql.sql_text sql_text, t.USED_UREC Records,
t.USED_UBLK Blocks,
(t.USED_UBLK*8192/1024) KBytes from v$transaction t,
v$session s,
v$sql sql
where t.addr = s.taddr
and s.sql_id = sql.sql_id
and s.username ='&USERNAME';
The column USED_UREC shows the number of undo records used, and the USED_UBLK column shows the undo blocks used by a transaction.
How It Works
You can issue the queries described in the “Solution” section to identify the sessions that are responsible for the most undo usage in your database, as well as the users that are responsible for those sessions. You can query the V$UNDOSTAT with the appropriate begin_time and end_time values to get the SQL identifier of the longest-running SQL statement during a time interval. The MAXQUERYID column captures the SQL identifier. You can use this ID to query the V$SQL view in order to find out the actual SQL statement. Similarly, the V$TRANSACTION and the V$SESSION views together help identify the users that are consuming the most undo space. If excessive undo usage is affecting performance, you might want to look at the application to see why the queries are using so much undo.
If you are experiencing excessive undo, then the following query can help. Executing the query helps you find the users and the instance responsible for undo usage in an Oracle RAC Database.
SQL> select a.inst_id, a.sid, c.username, c.osuser, c.program, b.name,
a.value, d.used_urec, d.used_ublk
from gv$sesstat a, v$statname b, gv$session c, gv$transaction d
where a.statistic# = b.statistic#
and a.inst_id = c.inst_id
and a.sid = c.sid
and c.inst_id = d.inst_id
and c.saddr = d.ses_addr
and b.name = ‘undo change vector size’
and a.value > 0
order by a.value;
Once you know the users and the instances responsible, you can take a closer look at just what those users and instances are doing that is causing such a high rate of undo usage.
7-3. Resolving an ORA-01555 Error
Problem
You’re receiving the ORA-01555 (snapshot too old) errors during nightly runs of key production batch jobs. You want to eliminate these errors.
Solution
While setting a high value for the UNDO_RETENTION parameter can potentially minimize the possibility of receiving “snapshot too old” errors, it doesn’t guarantee that the database won’t overwrite older undo data that may be needed by a running transaction. You can move long-running batch jobs to a separate time interval when other programs aren’t running in the database, to avoid these errors. Remember that you don't always need several concurrent programs to generate “snapshot too old”. You can do generate that error by running one poorly written query on a big table in a cursor, and update the same table in the loop.
Regardless, while you can minimize the occurrence of “snapshot too old” errors with these approaches, you can’t completely eliminate such errors without specifying the guaranteed undo retention feature. When you configure guaranteed undo retention in a database, no SELECT statement can fail because of the “snapshot too old” error. Oracle will keep new DML statements from executing when you set up guaranteed undo retention. Implementing the guaranteed undo feature is simple. Suppose you want to ensure that the database retains undo for at least an hour (3,600 seconds). First set the undo retention threshold with the alter system command shown here, and then set up guaranteed undo retention by specifying the retention guarantee clause to alter the undo tablespace.
SQL> alter system set undo_retention=3600;
System altered.
SQL> alter tablespace undotbs1 retention guarantee;
Tablespace altered.
SQL>
You can switch off guaranteed undo retention by executing the alter tablespace command with the retention noguarantee clause.
Tip You can enable guaranteed undo retention by using the alter system command as shown in this recipe, as well as with the create database and create undo tablespace statements.
How It Works
Oracle uses the undo records stored in the undo tablespace to help roll back transactions, provide read consistency, and to help recover the database. In addition, the database also uses undo records to read data from a past point in time using Oracle Flashback Query. Undo data serves as the underpinning for several Oracle Flashback features that help you recover from logical errors.
Occurrence of the Error
The ORA-01555 error (snapshot too old) may occur in various situations. The following is a case where the error occurs during an export.
EXP-00008: ORACLE error 1555 encountered
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10$" too small
EXP-00000: Export terminated unsuccessfully
And you can receive the same error when performing a flashback transaction:
ERROR at line 1:
ORA-01555: snapshot too old: rollback segment number with name "" too small
ORA-06512: at "SYS.DBMS_FLASHBACK", line 37
ORA-06512: at "SYS.DBMS_FLASHBACK", line 70
ORA-06512: at li
The “snapshot too old” error occurs when Oracle overwrites undo data that’s needed by another query. The error is a direct result of how Oracle’s read consistency mechanism works. The error occurs during the execution of a long-running query when Oracle tries to read the “before image” of any changed rows from the undo segments. For example, if a long-running query starts at 1 A.M. and runs until 6 A.M. it’s possible for the database to change the data that’s part of this query during the period in which the query executes. When Oracle tries to read the data as it appeared at 1 A.M., the query may fail if that data is no longer present in the undo segments.
If your database is experiencing a lot of updates, Oracle may not be able to fetch the changed rows, because the before changes recorded in the undo segments may have been overwritten. The transactions that changed the rows will have already committed, and the undo segments don’t have a record of the before change row values because the database overwrote the relevant undo data. Since Oracle fails to return consistent data for the current query, it issues the ORA-01555 error. The query that’s currently running requires the before image to construct read-consistent data, but the before image isn’t available.
The ORA-01555 error may be the result of one or both of the following: a heavy amount of queries during a time of intense changes in the database or too small an undo tablespace. You can increase the size of the undo tablespace, but that doesn’t ensure that the error won’t occur again.
Influence of Extents
The database stores undo data in undo extents, and there are three distinct types of undo extents:
Active: Transactions are currently using these extents.
Unexpired: These are extents that contain undo that’s required to satisfy the undo retention time specified by the UNDO_RETENTION initialization parameter.
Expired: These are extents with undo that’s been retained longer than the duration specified by the UNDO_RETENTION parameter.
If the database doesn’t find enough expired extents in the undo tablespace or it can’t get new undo extents, it’ll re-use the unexpired (but never an active undo extent) extents, and this leaves the door open for an ORA-01555, “snapshot too old” error. By default, the database will essentially stop honoring the undo retention period you specify if it encounters space pressure to accommodate the undo from new transactions. Since the unexpired undo extents contain undo records needed to satisfy the undo retention period, overwriting those extents in reality means that the database is lowering the undo retention period you’ve set. Enabling the undo retention guarantee helps assure the success of long-running queries as well as Oracle Flashback operations. The “guarantee” part of the undo retention guarantee is real—Oracle will certainly retain undo at least for the time you specify and will never overwrite any of the unexpired undo extents that contain the undo required to satisfy the undo retention period. However, there’s a stiff price attached to this guarantee—Oracle will guarantee retention even if it means that DML transactions fail because the database can’t find space to record the undo for those transactions. Therefore, you must exercise great caution when enabling the guaranteed undo retention capability.
7-4. Monitoring Temporary Tablespace Usage
Solution
Execute the following query to find out the used and free space in a temporary tablespace.
SQL> select * from (select a.tablespace_name,
sum(a.bytes/1024/1024) allocated_mb
from dba_temp_files a
where a.tablespace_name = upper('&&temp_tsname') group by a.tablespace_name) x,
(select sum(b.bytes_used/1024/1024) used_mb,
sum(b.bytes_free/1024/1024) free_mb
from v$temp_space_header b
where b.tablespace_name=upper('&&temp_tsname') group by b.tablespace_name);
Enter value for temp_tsname: TEMP
...
TABLESPACE_NAME ALLOCATED_MB USED_MB FREE_MB
--------------- ------------ ------- -------
TEMP 60 12 48
SQL>
How It Works
Oracle uses temporary tablespaces for storing intermediate results from sort operations as well as any temporary tables, temporary LOBs, and temporary B-trees. You can create multiple temporary tablespaces, but only one of them can be the default temporary tablespace. If you don’t explicitly assign a temporary tablespace to a user, that user is assigned the default temporary tablespace.
You won’t find information about temporary tablespaces in the DBA_FREE_SPACE view. Use the V$TEMP_SPACE_HEADER as shown in this example to find how much free and used space there is in any temporary tablespace.
7-5. Identifying Who Is Using the Temporary Tablespace
Problem
You notice that the temporary tablespace is filling up fast, and you want to identify the user and the SQL statements responsible for the high temporary tablespace usage.
Solution
Issue the following query to find out which SQL statement is using up space in a sort segment.
SQL> select se.sid, se.username,
su.blocks * ts.block_size / 1024 / 1024 mb_used, su.tablespace,
su.sqladdr address, sq.hash_value, sq.sql_text
from v$sort_usage su, v$session se, v$sqlarea sq, dba_tablespaces ts
where su.session_addr = se.saddr
and su.sqladdr = sq.address (+)
and su.tablespace = ts.tablespace_name;
The preceding query shows information about the session that issued the SQL statements well as the name of the temporary tablespace and the amount of space the SQL statement is using in that tablespace.
You can use the following query to find out which sessions are using space in the temporary tablespace. Note that the information is in the summary form, meaning it doesn’t separate the various sort operations being run by a session—it simply gives the total temporary tablespace usage by each session.
SQL> select se.sid,
se.username, se.osuser, pr.spid,
se.module,se.program,
sum (su.blocks) * ts.block_size / 1024 / 1024 mb_used, su.tablespace,
count(*) sorts
from v$sort_usage su, v$session se, dba_tablespaces ts, v$process pr
where su.session_addr = se.saddr
and se.paddr = pr.addr
and su.tablespace = ts.tablespace_name
group by se.sid, se.serial#, se.username, se.osuser, pr.spid, se.module,
se.program, ts.block_size, su.tablespace;
The output of this query will show you the space that each session is using in the temporary tablespace, as well as the number of sort operations that session is performing right now.
How It Works
Oracle tries to perform sort and hash operations in memory (PGA), but if a sort operation is too large to fit into memory, it uses the temporary tablespace to do the work. It’s important to understand that even a single large sort operation has the potential to use up an entire temporary tablespace. Since all database sessions share the temporary tablespace, the session that runs the large sort operation could potentially result in other sessions receiving errors due to lack of room in that tablespace, or fail itself. Once the temporary tablespace fills up, all SQL statements that seek to use the temporary tablespace will fail with the ORA-1652: unable to extend temp segment error. New sessions may not be able to connect, and queries can sometimes hang and users may not be able to issue new queries. You try to find any blocking locks, but none exists. If the temporary tablespace fills up, transactions won’t complete. If you look in the alert log, you’ll find that the temporary tablespace ran out of space.
Operations that use an ORDER BY or GROUP BY clause frequently use the temporary tablespace to do their work. Large hash joins also need the temp space while they’re execting. You must also remember that creating an index or rebuilding one also makes use of the temporary tablespace for sorting the index.
Oracle uses the PGA memory for performing the sort and hash operations. Thus, one of the first things you must do is to review the current value set for the PGA_AGGREGATE_TARGET initialization parameter and see if bumping it up will help if you happen to see a lot of I/O occurring in the temporary tablespace. Nevertheless, even a larger setting for the PGA_AGGREGATE_TARGET parameter doesn’t guarantee that Oracle will perform a huge sort entirely in memory. Oracle allocates each session a certain amount of PGA memory, with the amount it allocates internally determined, based on the value of the PGA_AGGREGATE_TARGET parameter. Once a large operation uses its share of the PGA memory, Oracle will write intermediary results to disk in the temporary tablespace. These types of operations are called one-pass or multi-pass operations, and since they are performed on disk, they are much slower than an operation performed entirely in the PGA.
If your database is running out of space in the temporary tablespace, you must increase its size by adding a tempfile. Enabling autoextend for a temporary tablespace will also help prevent “out of space” errors. Since Oracle allocates space in a temporary tablespace that you have assigned for the user performing the sort operation, you can assign users that need to perform heavy sorting a temporary tablespace that’s different from that used by the rest of the users, thus minimizing the effect of the heavy sorting activity on overall database performance.
Note that unlike table or index segments, of which there are several for each object, a temporary tablespace has just one segment called the sort segment. All sessions share this sort segment. A single SQL statement can use multiple sort and hash operations. In addition, the same session can have multiple SQL statements executing simultaneously, with each statement using multiple sort and hash operations. Once a sort operation completes, the database immediately marks the blocks used by the operations as free and allocates them to another sort operation. The database adds extents to the sort segment as the sort operation gets larger, but if there’s no more free space in the temporary tablespace to allocate additional extents, it issues the ORA-1652:unable to extend temp segment error. The SQL statement that’s using the sort operation will fail as a result.
Note Although you’ll receive an ORA-1652 error when a SQL statement performing a huge sort fails due to lack of space in the temporary tablespace, that’s not the only reason you’ll get this error. You’ll also receive this error when performing a table move operation (alter table ... move), if the tablespace to which you’re moving the table doesn’t have room to hold the table. Same is the case sometimes when you're creating a large index. Please see Recipe 7-6 for an explanation of this error.
7-6. Resolving the “Unable to Extend Temp Segment” Error
Problem
While creating a large index, you receive an Oracle error indicating that the database is unable to extend a TEMP segment. However, you have plenty of free space in the temporary tablespace.
Solution
When you get an error such as the following, your first inclination may be to think that there’s no free space in the temporary tablespace.
ORA-01652: unable to extend temp segment by 1024 in tablespace INDX_01
You cannot fix this problem by adding space to the temporary tablespace. The error message clearly indicates the tablespace that ran out of space. In this case, the offending tablespace is INDX_01, and not the TEMP tablespace. Obviously, an index creation process failed because there was insufficient space in the INDX_01 tablespace. You can fix the problem by adding a datafile to the INDX_01 tablespace, as shown here:
SQL>alter tablespace INDX_01 add datafile '/u01/app/oracle/data/indx_01_02.dbf'
2 size 1000m;
How It Works
When you receive the ORA-01652 error, your normal tendency is to check the temporary tablespace. You check the DBA_TEMP_FREE_SPACE view, and there’s plenty of free space in the default temporary tablespace, TEMP. Well, if you look at the error message carefully, it tells you that the database is unable to extend the temp segment in the INDX_01 tablespace. When you create an index, as in this case, you provide the name of the permanent tablespace in which the database must create the new index. Oracle starts the creation of the new index by putting the new index structure into a temporary segment in the tablespace you specify (INDX_01 in our example) for the index. The reason is that if your index creation process fails, Oracle (to be more specific, the SMON process) will remove the temporary segment from the tablespace you specified for creating the new index. Once the index is successfully created (or rebuilt), Oracle converts the temporary segment into a permanent segment within the INDX_01 tablespace. However, as long as Oracle is still creating the index, the database deems it a temporary segment and thus when an index creation fails, the database issues the ORA-01652 error, which is also the error code for an “out of space” error for a temporary tablespace. The TEMP segment the error refers to is the segment that was holding the new index while it was being built. Once you increase the size of the INDX_01 tablespace, the error will go away.
Tip The temporary segment in an ORA-1652 error message may not be referring to a temporary segment in a temporary tablespace.
The key to resolving the ORA-01652 error is to understand that Oracle uses temporary segments in places other than a temporary tablespace. While a temporary segment in the temporary tablespace is for activities such as sorting, a permanent tablespace can also use temporary segments when performing temporary actions necessary during the creation of a table (CTAS) or an index.
Tip When you create an index, the creation process uses two different temporary segments. One temporary segment in the TEMP tablespace is used to sort the index data. Another temporary segment in the permanent tablespace holds the index while it is being created. After creating the index, Oracle changes the temporary segment in the index’s tablespace into a permanent segment. The same is the case when you create a table with the CREATE TABLE...AS SELECT (CTAS) option.
As the “Solution” section explains, the ORA-01652 error refers to the tablespace where you’re rebuilding an index. If you are creating a new index, Oracle uses the temporary tablespace for sorting the index data. When creating a large index, it may be a smart idea to create a large temporary tablespace and assign it to the user who’s creating the index. Once the index is created, you can re-assign the user the original temporary tablespace and remove the large temporary tablespace. This strategy helps avoid enlarging the default temporary tablespace to a very large size to accommodate the creation of a large index.
If you specify autoextend for a temporary tablespace, the temp files may get very large, based on one or two large sorts in the database. When you try to reclaim space for the TEMP tablespace, you may get the following error.
SQL> alter database tempfile '/u01/app/oracle/oradata/prod1/temp01.dbf'
resize 500M;
alter database tempfile '/u01/app/oracle/oradata/prod1/temp01.dbf'
resize 500M
*ERROR at line 1:
ORA-03297: file contains used data beyond requested RESIZE value
One solution is to create a new temporary tablespace, make that the default temporary tablespace, and then drop the larger temporary tablespace (provided no other sessions are using it, of course). You can simplify matters by using the following alter tablespace command to shrink the temporary tablespace:
SQL> alter tablespace temp shrink space;
Tablespace altered.
SQL>
In this example, we shrank the entire temporary tablespace, but you can shrink a specific tempfile by issuing the command alter tablespace temp shrink tempfile <file_name>. The command will shrink the tempfile to the smallest size possible.
7-7. Resolving Open Cursor Errors
Problem
You are frequently getting the Maximum Open Cursors exceeded error, and you want to resolve the error.
Solution
One of the first things you need to do when you receive the ORA-01000: “maximum open cursors exceeded” error is to check the value of the initialization parameter open_cursors. You can view the current limit for open cursors by issuing the following command:
SQL> sho parameter open_cursors
NAME TYPE VALUE
------------ ------- -----
open_cursors integer 300
SQL>
The parameter OPEN_CURSORS sets the maximum number of cursors a session can have open at once. You specify this parameter to control the number of open cursors. Keeping the parameter’s value too low will result in a session receiving the ORA-01000 error. There’s no harm in specifying a large value for the OPEN_CURSORS parameter (unless you expect all sessions to simultaneously max out their cursors, which is unlikely), so you can usually resolve cursor-related errors simply by raising the parameter value to a large number. However, you may sometimes find that raising the value of the open_cursors parameter doesn’t “fix” the problem. In such cases, investigate which processes are using the open cursors by issuing the following query:
SQL> select a.value, s.username,s.sid,s.serial#,s.program,s.inst_id
from gv$sesstat a,gv$statname b,gv$session s
where a.statistic# = b.statistic# and s.sid=a.sid
and b.name='opened cursors current'
The GV$OPEN_CURSOR (or the V$OPEN_CURSOR) view shows all the cursors that each user session has currently opened and parsed, or cached. You can issue the following query to identify the sessions with a high number of opened and parsed or cached cursors.
SQL> select saddr, sid, user_name, address,hash_value,sql_id, sql_text
from gv$open_cursor
where sid in
(select sid from v$open_cursor
group by sid having count(*) > &threshold);
The query lists all sessions with an open cursor count greater than the threshold you specify. This way, you can limit the query’s output and focus just on the sessions that have opened, parsed, or cached a large number of cursors.
You can get the actual SQL code and the open cursor count for a specific session by issuing the following query:
SQl> select sql_id,substr(sql_text,1,50) sql_text, count(*)
from gv$open_cursor where sid=81
group by sql_id,substr(sql_text,1,50)
order by sql_id;
The output shows the SQL code for all open cursors in the session with the SID 81. You can examine all SQL statements with a high open cursor count to see why the session was keeping a large number of cursors open.
How It Works
If your application is not closing open cursors, then setting the OPEN_CURSORS parameter to a higher value won’t really help you. You may momentarily resolve the issue, but you’re likely to run into the same issue a little later. If the application layer never closes the ref cursors created by the PL/SQL code, the database will simply hang on to the server resources for the used cursors. You must fix the application logic so it closes the cursors—the problem isn’t really in the database.
If you’re using a Java application deployed on an application server such as the Oracle WebLogic Server, the WebLogic Server’s JDBC connection pools provide open database connections for applications. Any prepared statements in each of these connections will use a cursor. Multiple application server instances and multiple JDBC connection pools will mean that the database needs to support all the cursors. If multiple requests share the same session ID, the open cursor problem may be due to implicit cursors. The only solution then is to close the connection after each request, unless developers are using the close() method of the PreparedStatement and ResultSet objects.
A cursor leak is when the database opens cursors but doesn’t close them.
You can use the SESSION_CACHED_CURSORS initialization parameter to set the maximum number of cached closed cursors for each session. The default setting is 50. You can use this parameter to prevent a session from opening an excessive number of cursors, thereby filling the library cache or forcing excessive hard parses. Repeated parse calls for a SQL statement leads Oracle to move the session cursor for that statement into the session cursor cache. The database satisfies subsequent parse calls by using the cached cursor instead of re-opening the cursor.
When you re-execute a SQL statement, Oracle will first try to find a parsed version of that statement in the shared pool—if it finds the parsed version in the shared pool, a soft parse occurs. Oracle is forced to perform the much more expensive hard parse if it doesn’t find the parsed version of the statement in the shared pool. While a soft parse is much less expensive than a hard parse, a large number of soft parses can affect performance, because they do require CPU usage and library cache latches. To reduce the number of soft parses, Oracle caches the recent closed cursors of each session in a local session cache for that session—Oracle stores any cursor for which a minimum of three parse calls were made, thus avoiding having to cache every single session cursor, which will fill up the cursor cache.
The default value of 50 for the SESSION_CACHED_CURSORS initialization parameter may be too low for many databases. You can check if the database is bumping against the maximum limit for session-cached cursors by issuing the following statement:
SQL> select max(value) from v$sesstat
2 where statistic# in (select statistic# from v$statname
3* where name = 'session cursor cache count'),
MAX(VALUE)
----------
49
SQL>
The query shows the maximum number of session cursors that have been cached in the past. Since this number (49) is virtually the same as the default value (or the value you’ve set) for the SESSION_CACHED_CURSORS parameter, you must set the parameter's value to a larger number. Session cursor caches use the shared pool. If you’re using automatic memory management, there’s nothing for you to do after you reset the SESSION_CACHED_CURSORS parameter—the database will bump up the shared pool size if necessary. You can find out how many cursors each session has in its session cursor cache by issuing the following query:
SQL> select a.value,s.username,s.sid,s.serial#
2 from v$sesstat a, v$statname b,v$session s
3 where a.statistic#=b.statistic# and s.sid=a.sid
4* and b.name='session cursor cache count';
7-8. Resolving a Hung Database
Problem
Your database is hung. Users aren’t able to log in, and existing users can’t complete their transactions. The DBAs with SYSDBA privileges may also be unable to log in to the database. You need to find out what is causing the database to hang, and fix the problem.
Solution
Follow these general steps when facing a database that appears to be hung:
If you’re still unable to resolve the reasons for the hung database, you most likely have a truly hung database. While you’re investigating such a database, you may sometimes find yourself unable to connect and log in. In that case, use the “prelim” option to log in to the database. The prelim option doesn’t require a real database connection. Here's an example that shows how to use the prelim option to log into a database:
$ sqlplus /nolog
SQL> set _prelim on
SQL> connect / as sysdba
Prelim connection established
SQL>
Note You can’t set prelim while connected to a database. Alternatively, you can use the command sqlplus -prelim "/ as sysdba" to log in with the -prelim option. Note that you use the nolog option to open a SQL*Plus session. You can’t execute the set _prelim on command if you’re already connected to the database. Once you establish a prelim connection as shown here, you can execute the oradebug hanganalyze command to analyze a hung database—for example:
SQL> oradebug hanganalyze 3
Statement processed.
SQL>
The integer 3 in the oradebug comamnd specifes the level at which you want to run the command. We explain this in more detail in the How it Works section of this recipe.
In an Oracle RAC environment, specify the oradebug hanganalyze command with additional options, as shown here:
SQL> oradebug setinst all
SQL> oradebug -g def hanganalyze 3
You can repeat the oradebug hanganalyze command a couple of times to generate dump files for varying process states.
In addition to the dump files generated by the hanganalyze command, Oracle Support may often also request a process state dump, also called a systemstate dump, to analyze hung database conditions. The systemstate dump will report on what the processes are doing and the resources they’re currently holding. You can get a systemstate dump from a non-RAC system by executing the following set of commands.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump systemstate 266
Statement processed.
SQL>
Issue the following commands to get a systemstate dump in a RAC environment:
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug -g all dump systemstate 266
Note that unlike the oradebug hanganalyze command, you must connect to a process. The setmypid option specifies the process, in this case your own process. You can also specify a process ID other than yours, in which case you issue the command oradebug setmypid <pid> before issuing the dump systemstate command. If you try to issue the dump systemstate command without setting the PID, you’ll receive an error:
SQL> oradebug dump systemstate 10
ORA-00074: no process has been specified
SQL>
You must take the systemstate dumps a few times, with an interval of about a minute or so in between the dumps. Oracle Support usually requests several systemstate dumps along with the trace files generated by the hanganalyze command.
How It Works
The key thing you must ascertain when dealing with a “hung” database is whether the database is really hung, or just slow. If one or two users complain about a slow-running query, you need to analyze their sessions, using the techniques described in Chapter 5, to see if the slowness is due to a blocking session or to an Oracle wait event. If several users report that their work is going slowly, it could be due to various reasons, including CPU, memory (SGA or PGA), or other system resource issues.
Check the server’s CPU usage as one of your first steps in troubleshooting a hung database. If your server is showing 100% CPU utilization, or if it’s swapping or paging, the problem may not lie in the database at all. As for memory, if the server doesn’t have enough free memory, new sessions can’t connect to the database.
Tip The “prelim” option shown in the “Solution” section lets you connect to the SGA without opening a session. You can thus “log” in to the hung database even when normal SQL*Plus logins don’t work. The oradebug session you start once you connect to the SGA actually analyzes what’s in the SGA and dumps it into a trace file.
A true database hang can be due to a variety of reasons, including a system that has exhausted resources such as the CPU or memory, or because several sessions are stuck waiting for a resource such as a lock. In fact, a true database hang is more likely due to to an Oracle bug than to any other reason. Oracle is sometimes unable to automatically detect and resolve internal deadlocks—and this leads to what Oracle Support calls a “true database hang.” A true database hang is thus an internal deadlock or a cyclical dependency among multiple processes. Oracle Support will usually ask you to provide them the hanganalyze trace files and multiple systemstate dumps to enable them to diagnose the root cause of your hang. At times like this, you may not even be able to log into the database. Your first instinct when you realize that you can’t even log in to a database is to try shutting down and restarting, often referred to as bouncing the database. Unfortunately, while shutting down and restarting the database may “resolve” the issue, it’ll also disconnect all users—and you’re no wiser as to what exactly caused the problem. If you do decide to bounce your database, quickly generate a few hanganalyze and systemstate dumps first.
Tip As unpleasant as it may be at times, if you find that you simply can’t connect to a hung database, then collect any trace dumps you may need, and quickly bounce the database so that users can access their applications. Especially when you’re dealing with a database that’s hanging because of memory issues, bouncing the instance may get things going again quickly.
If you find that the database is completely unresponsive, and you can’t even log in to the database with the SYSDBA privilege, you can use the prelim option to log into the database. The prelim option stands for preliminary connection, and it starts an Oracle process and attaches that process to the SGA shared memory. However, this is not a full or complete connection, but a limited connection where the structures for query execution are not set up—so, you cannot even query the V$ views. However, the prelim option lets you run oradebug commands to get error dump stacks for diagnostic purposes. The output of the hanganalyze command can tell Oracle Support engineers if your database is really hanging, because of sessions waiting for some resource. The command makes internal kernel calls to find out all sessions that are waiting for a resource and shows the relationship between the blocking and waiting sessions. The hanganalyze option that you can specify with either the oradebug command or an alter session statement produces details about hung sessions. Once you get the dump file, Oracle Support personnel can analyze it and let you know the reasons for the database hang.
You can invoke the hanganalyze command at various levels ranging from 1 to 10. Level 3 dumps processes that are in a hanging (IN_HANG) state. You normally don’t need to specify a level higher than 3, because higher levels will produce voluminous reports with too many details about the processes.
Note The dump files you create with the hanganalyze and the systemstate commands are created in ADR’s trace directory.
Note that we issued the oradebug command to get a systemstate dump with a level of 266. Level 266 (combination of Level 256, which produces short stack information, and Level 10) is for Oracle releases 9.2.0.6 and onward (earlier releases used systemstate level 10). Level 266 allows you to dump the short stacks for each process, which are Oracle function calls that help Oracle development teams determine which Oracle function is causing the problem. The short stack information also helps in matching known bugs in the code. On Solaris and Linux systems, you can safely specify level 266, but on other systems, it may take a long time to dump the short stacks. Therefore, you may want to stick with level 10 for the other operating systems.
If you can find out the blocking session, you can also take a dump just for that session, by using the command oradebug setospid nnnn, where nnnn is the blocking session’s PID, and then invoking the oradebug command, as shown here:
SQL> oradebug setospid 9999
SQL> oradebug unlimit
SQL> oradebug dump errorstack 3
Note that you can generate the hanganalyze and systemstate dumps in a normal session (as well as in a prelim session), without using the oradebug command. You can invoke the hanganalyze command with an alter session command, as shown here.
SQL>alter session set events 'immediate trace name hanganalyze level 3';
Similarly, you can get a systemstate dump with the following command:
SQL> alter session set events 'immediate trace name SYSTEMSTATE level 10';
Session altered.
SQL>
The oradebug and systemstate dumps are just two of the many dumps you can collect. Use the oradebug dumplist command to view the various error dumps you can collect.
SQL> oradebug dumplist
TRACE_BUFFER_ON
TRACE_BUFFER_OFF
LATCHES
PROCESSSTATE
SYSTEMSTATE
INSTANTIATIONSTATE
REFRESH_OS_STATS
CROSSIC
CONTEXTAREA
HANGDIAG_HEADER
HEAPDUMP
...
Note that while you can read some of the dump files in an editor, these files are mainly for helping Oracle Support professionals troubleshoot a database hang situation. There’s not much you can do with the dump files, especially when a database hang situation is due to an Oracle bug or a kernel-level lock, except to send them along to Oracle Support for analysis.
7-9. Invoking the Automatic Diagnostic Repository Command Interpreter
Problem
You’d like to invoke the Automatic Diagnostic Repository Command Interpreter (ADRCI) and work with various components of the Automatic Diagnostic Repository (ADR).
Solution
ADRCI is a tool to help you manage Oracle diagnostic data. You can use ADRCI commands in both an interactive as well as a batch mode.
To start ADRCI in the interactive mode, type adrci at the command line, as shown here:
$ adrci
ADR base = "c:appora"
adrci>
You can issue the adrci command from any directory, so long as the PATH environment variable includes ORACLE_HOME/bin/. You can enter each command at the adrci prompt, and when you’re done using the utility, you can type EXIT or QUIT to exit. You can view all the ADRCI commands available to you by typing HELP at the ADRCI command line, as shown here:
adrci> HELP
HELP [topic]
Available Topics:
CREATE REPORT
ECHO
EXIT
HELP
HOST
IPS
...
SHOW HOMES | HOME | HOMEPATH
SHOW INCDIR
SHOW INCIDENT
SHOW PROBLEM
SHOW REPORT
SHOW TRACEFILE
SPOOL
There are other commands intended to be used directly by Oracle, type
"HELP EXTENDED" to see the list
adrci>
You can get detailed information for an individual ADRCI command by adding the name of the command as an attribute to the HELP command. For example, here is how to get the syntax for the show tracefile command:
adrci> helpshow tracefile
Usage: SHOW TRACEFILE [file1 file2 ...] [-rt | -t]
[-i inc1 inc2 ...] [-path path1 path2 ...]
Purpose: List the qualified trace filenames.
...
Options:
Examples:
...
adrci>
You can also execute ADRCI commands in the batch mode by incorporating the commands in a script or batch file. For example, if you want to run the ADRCI commands SET HOMEPATH and SHOW ALERT from within an operating system script, include the following line inside a shell script:
SET HOMEPATH diag/rdbms/orcl/orcl; SHOW ALERT -term
Let’s say your script name is myscript.txt. You can then execute this script by issuing the following command inside an operating system shell script or batch file:
$ adrci script=myscript.txt
Note that the parameter SCRIPT tells ADRCI that it must execute the commands within the text file myscript.txt. If the text file is not within the same directory from where the shell script or batch file is running, you must provide the path for the directory where you saved the text file.
To execute an ADRCI command directly at the command line instead of invoking ADRCI first and working interactively with the ADR interface, specify the parameter EXEC with the ADRCI command, as shown here:
$ adrci EXEC="SHOW HOMES; SHOW INCIDENT"
This example shows how to include two ADRCI commands—SHOW HOMES and SHOW INCIDENT—by executing the ADRCI command at the command line.
How It Works
The Automatic Diagnostic Repository is a directory structure that you can access even when the database is down, because it’s stored outside the database. The root directory for the ADR is called the ADR baseand is set by the DIAGNOSTIC_DEST initialization parameter. Each Oracle product or component has its own ADR home under the ADR base. The location of each of the ADR homes follows the path diag/product_type/product_id/instance_id. Thus, the ADR home for a database named orcl1 with the instance name orcl1 and the ADR base set to /app/oracle will be /app/oracle/diag/rdbms/orcl1/orcl1. Under each of the ADR homes are the diagnostic data, such as the trace and dump files and the alert log, as well as other diagnostic files, for that instance of an Oracle product.
The ADRCI utility helps you manage the diagnostic data in the ADR. ADRCI lets you perform the following types of diagnostic tasks:
You can view all ADR locations for the current database instance by querying the V$DIAG_INFO view, as shown here.
SQL> select * from v$diag_info;
INST_ID NAME VALUE CON_ID
------- ------------- ----------------------------------------- ------
1 Diag Enabled TRUE 0
1 ADR Base /u01/app/oracle 0
1 ADR Home /u01/app/oracle/diag/rdbms/orcl/orcl 0
1 Diag Trace /u01/app/oracle/diag/rdbms/orcl/orcl/trace 0
1 Diag Alert /u01/app/oracle/diag/rdbms/orcl/orcl/alert 0
1 Diag Incident /u01/app/oracle/diag/rdbms/orcl/orcl/incident 0
1 Diag Cdump /u01/app/oracle/diag/rdbms/orcl/orcl/cdump 0
1 Health Monitor /u01/app/oracle/diag/rdbms/orcl/orcl/hm 0
1 Default Trace File /u01/app/oracle/diag/rdbms/orcl/orcl/
2 trace/orcl_ora_3263.trc 0
1 Active Problem Count 0 0
1 Active Incident Count 0 0
11 rows selected.
SQL>
The ADR home is the root directory for a database’s diagnostic data. All diagnostic files such as the alert log and the various trace files are located under the ADR home. The ADR home is located directly underneath the ADR base, which you specify with the DIAGNOSTIC_DEST initialization parameter. Here’s how to find out the location of the ADR base directory:
adrci> show base
ADR base is "/u01/app/oracle"
adrci>
You can view all the ADR homes under the ADR base by issuing the following command:
adrci> show homes
ADR Homes:
diag/rdbms/orcl/orcl
diag/tnslsnr/localhost/listener
adrci>
You can have multiple ADR homes under the ADR base, and multiple ADR homes can be current at any given time. Your ADRCI commands will work only with diagnostic data in the current ADR home. How do you know which ADR home is current at any point in time? The ADRCI homepath helps determine the ADR homes that are current, by pointing to the ADR home directory under the ADR base hierarchy of directories.
Note Some ADRCI commands require only one ADR home to be current—these commands will issue an error if multiple ADR homes are current.
You can use either the show homes or the show homepath command to view all ADR homes that are current:
adrci> show homepath
ADR Homes:
diag/rdbms/orcl/orcl
diag/tnslsnr/localhost/listener
adrci>
If you want to work with diagnostic data from multiple database instances or components, you must ensure that all the relevant ADR homes are current. Most of the time, however, you’ll be dealing with a single database instance or a single Oracle product or component such as the listener, for example. An ADR homepath is always relative to the ADR. If you specify /u01/app/oracle/ as the value for the ADR base directory, for example, all ADR homes will be under the ADR_Base/diag directory. Issue the set homepath command to set an ADR home directory to a single home, as shown here:
adrci> set homepath diag dbmsorcl1orcl1
adrci> show homepath
ADR Homes:
diag dbmsorcl1orcl1
adrci>
Note Diagnostic data includes descriptions of incidents and problems, health monitoring reports, and traditional diagnostic files such as trace files, dump files, and alert logs.
Note that before you set the homepath with the set homepath command, the show homepath command shows all ADR homepaths. However, once you set the homepath to a specific home, the show homepath command shows just a single homepath. It’s important to set the homepath before you execute several ADRCI commands, as they are applicable to only a single ADR home. For example, if you don’t set the homepath before issuing the following command, you’ll receive an error:
adrci> ips create package
DIA-48448: This command does not support multiple ADR homes
adrci>
The error occurs because the ips create package command is not valid with multiple ADR homes. The command will work fine after you issue the set homepath command to set the homepath to a single ADR home. Commands such as the one shown here work only with a single current ADR home, but others work with multiple current ADR homes—there are also commands that don’t need a current ADR home. The bottom line is that all ADRCI commands will work with a single current ADR home.
7-10. Viewing an Alert Log from ADRCI
Problem
You want to view an alert log by using ADRCI commands.
Solution
To view an alert log with ADRCI, follow these steps:
$ adrci
8. Set the ADR home with the set homepath command.
adrci> set homepath diag/rdbms/orcl/orcl
9. Enter the following command to view the alert log:
adrci> show alert
ADR Home = /u01/app/oracle/diag/rdbms/orcl/orcl:
*************************************************************************
Output the results to file: /tmp/alert_3573_13986_orcl_1.ado
adrci>
show alert
The alert log will pop up in your default editor. The ADRCI prompt will return once you close the text file in the editor.
You can also query the V$DIAG_INFO view to find the path that corresponds to the Diag Trace entry. You can change the directory to that path and open the alert_<db_name>.log file with a text editor.
How It Works
The alert log holds runtime information for an Oracle instance and provides information such as the initialization parameters the instance is using, as well as a record of key changes such as redo log file switches and, most importantly, messages that show Oracle errors and their details. The alert log is critical for troubleshooting purposes, and is usually the first place you’ll look when a problem occurs. Oracle provides the alert log as both a text file as well as an XML-formatted file.
The show alert command brings up the XML-formatted alert log without displaying the XML tags. You can set the default editor with the SET EDITOR command, as shown here:
adrci> set editor notepad.exe
The previous command changes the default editor to Notepad. The show alert -term command shows the alert log contents in the terminal window. If you want to examine just the latest events in the alert log, issue the following command:
adrci>show alert -tail 50
The tail option shows you a set of the most recent lines from the alert log in the command window. In this example, it shows the last 50 lines from the alert log. If you don’t specify a value for the tail parameter, by default, it shows the last 10 lines from the alert log.
The following command shows a “live” alert log, in the sense that it will show changes to the alert log as the entries are added to the log.
adrci> show alert -tail -f
The previous command shows the last 10 lines of the alert log and prints all new messages to the screen, thus offering a “live” display of ongoing additions to the alert log. The CTRL+C sequence will take you back to the ADRCI prompt.
When troubleshooting, it is very useful to see if the database issued any ORA-600 errors. You can issue the following command to trap the ORA-600 errors.
adrci> show alert -p "MESSAGE_TEXT LIKE '%ORA-600%'"
Although you can view the alert log directly by going to the file system location where it’s stored, you can also do so through the ADRCI tool. ADRCI is especially useful for working with the trace files of an instance. The SHOW TRACEFILE command shows all the trace files in the trace directory of the instance. You can issue the SHOW TRACEFILE command with various filters—the following example looks for trace files that reference the background process mmon:
$ adrci> show tracefile %mmon%
diag dbmsorcl1orcl1 raceorcl1_mmon_1792.trc
diag dbmsorcl1orcl1 raceorcl1_mmon_2340.trc
adrci>
This command lists all trace files with the string mmon in their file names. You can apply filters to restrict the output to just the trace files associated with a specific incident number (the next recipe, Recipe 7-11, explains how to get the incident number), as shown here:
adrci> show tracefile -I 43417
diag dbmsorcl1orcl1incidentincdir_43417orcl1_ora_4276_i43417.trc
adrci>
The previous command lists the trace files related to the incident number 43417.
7-11. Viewing Incidents with ADRCI
Problem
You want to use ADRCI to view incidents.
Solution
You can view all incidents in the ADR with the show incident command (be sure to set the homepath first):
$ adrci
$ set homepath diag dbmsorcl1orcl1
adrci> show incident
ADR Home = c:apporadiag dbmsorcl1orcl1:
*******************************************************************************
INCIDENT_ID PROBLEM_KEY
CREATE_TIME
-------------------- ---------------------------------------------------
43417 ORA 600 [kkqctinvvm(2): Inconsistent state space!]
2013-08-17 09:26:15.091000 -05:00
43369 ORA 600 [kkqctinvvm(2): Inconsistent state space!]
2013-08-17 11:08:40.589000 -05:00
79451 ORA 445
2013-03-04 03:00:39.246000 -05:00
84243 ORA 445
2013-03-14 19:12:27.434000 -04:00
84244 ORA 445
2013-03-20 16:55:54.501000 -04:00
5 rows fetched
You can specify the detail mode to view details about a specific incident, as shown here:
adrci> show incident -mode detail -p "incident_id=43369"
ADR Home = c:apporadiag dbmsorcl1orcl1:
*******************************************************************************
INCIDENT INFO RECORD 1
*******************************************************************************
INCIDENT_ID 43369
STATUS ready
CREATE_TIME 2013-08-17 11:08:40.589000 -05:00
PROBLEM_ID 1
CLOSE_TIME <NULL>
FLOOD_CONTROLLED none
ERROR_FACILITY ORA
ERROR_NUMBER 600
ERROR_ARG1 kkqctinvvm(2): Inconsistent state space!
SIGNALLING_COMPONENT SQL_Transform
PROBLEM_KEY ORA 600 [kkqctin: Inconsistent state space!]
FIRST_INCIDENT 43417
FIRSTINC_TIME 2013-08-17 09:26:15.091000 -05:00
LAST_INCIDENT 43369
LASTINC_TIME 2013-08-17 11:08:40.589000 -05:00
KEY_VALUE ORACLE.EXE.3760_3548
KEY_NAME PQ
KEY_NAME SID
KEY_VALUE 71.304
OWNER_ID 1
INCIDENT_FILE c:apporadiag dbmsorcl1orcl1 raceorcl1_ora_3548.trc
adrci>
How It Works
The show incident command reports on all open incidents in the database. For each incident, the output for this command shows the problem key, incident ID, and the time when the incident occurred. In this example, we first set the ADRCI homepath, so the command shows incidents from just this ADR home. If you don’t set the homepath, you’ll see incidents from all the current ADR homes.
As mentioned earlier, an incident is a single occurrence of a problem. A problem is a critical error such as an ORA-600 (internal error) or an ORA-07445 error relating to operating system exceptions. The problem key is a text string that shows the problem details. For example, the problem key ORA 600 [kkqctinvvm(2): Inconsistent state space!] shows that the problem is due to an internal error.
When a problem occurs several times, the database creates an incident for each occurrence of the problem, each with a unique incident ID. The database logs the incident in the alert log and sends an alert to the Oracle Enterprise Manager, where they show up in the Home page. The database automatically gathers diagnostic data for the incident, called incident dumps, and stores them in the ADR trace directory.
Since a critical error can potentially generate numerous identical incidents, the fault diagnosability infrastructure applies a “flood control” mechanism to limit the generation of incidents. For a given problem key, the database allows only 5 incidents within 1 hour and a maximum of 25 incidents in 1 day. Once a problem triggers incidents beyond these thresholds, the database merely logs the incidents in the alert log and the Oracle Enterprise Manager but stops generating new incident dumps for them. You can’t alter the default threshold settings for the incident flood control mechanism.
7-12. Packaging Incidents for Oracle Support
Problem
You want to send the diagnostic files related to a specific problem to Oracle Support.
Solution
You can package the diagnostic information for one more incidents through either Database Control or through commands that you can execute from the ADRCI interface. In this solution, we show you how to package incidents through ADRCI. You can use various IPS commands to package all diagnostic files related to a specific problem in a zipped format and send the file to Oracle Support. Here are the steps to create an incident package.
Create an empty logical package as shown here:
adrci> ips create package
Created package 1 without any contents, correlation level typical
adrci>
In this example, we created an empty package, but you can also create a package based on an incident number, a problem number, a problem key or a time interval. In all these cases, the package won't be empty—it'll include the diagnostic information for the incident or problem that you specify. Since we created an empty package, we need to add diagnostic information to that package in the next step.
10. Add diagnostic information to the logical package with the ips add incident command:
adrci> ips add incident 43369 package 1
Added incident 43369 to package 1
adrci>
At this point, the incident 43369 is associated with package 1, but there’s no diagnostic data in it yet.
11. Generate the physical package.
adrci> ips generate package 1 in apporadiagnostics
Generated package 1 in file apporadiagnosticsIPSPKG_20110419131046_COM_1.zip, mode complete
adrci>
When you issue the generate package command, ADRCI gathers all relevant diagnostic files and adds them to a zip file in the directory you designate.
12. Send the resulting zip file to Oracle Support.
If you decide to add supplemental diagnostic data to an existing physical package (zipped file), you can do so by specifying the incremental option with the generate package command:
adrci> ips generate package 1 in apporadiagnostics incremental
The incremental zip file created by this command will have the term INC in the file name, indicating that it is an incremental zip file.
How It Works
A physical package is the zip file that you can send to Oracle Support for diagnosing a problem in your database. Since an incident is a single occurrence of a problem, adding the incident number to the logical package and generating the physical package rolls up all the diagnostic data for that problem (incident) into a single zipped file. In this example, we showed you how to first create an empty logical package and then associate it with an incident number. However, the ipc create package command has several options: you can specify the incident number or a problem number directly when you create the logical package, and skip the add incident command. You can also create a package that contains all incidents between two points in time, as shown here:
adrci> ips create package time '2013-09-12 10:00:00.00 -06:00' to '2013-09-12 23
:00:00.00 -06:00'
Created package 2 based on time range 2011-04-12 12:00:00.000000 -06:00 to 2011-
04-12 23:00:00.000000 -06:00, correlation level typical
adrci>
The package generated by the previous command contains all incidents that occurred between 10 A.M. and 11 P.M. on April 12, 2011.
Note that you can also manually add a specific diagnostic file to an existing package. To add a file, you specify the file name in the ips add file command—you are limited to adding only those diagnostic files that are within the ADR base directory. Here is an example:
adrci> ips add file <ADR_BASE>/diag/rdbms/orcl1/orcl1/trace/orcl_ora12345.trc package 1
By default, the ips generate package command generates a zip file that includes all files for a package. The incremental option will limit the files to those that the database has generated since you originally generated the zipped file for that package. The ips show files command shows all the files in a package and the ips show incidents command shows all the incidents in a package. You can issue the ips remove file command to remove a diagnostic file from a package.
7-13. Running a Database Health Check
Problem
You’d like to run a comprehensive diagnostic health check on your database. You’d like to find out if there’s any data dictionary or file corruption, as well as any other potential problems in the database.
Solution
You can use the database health monitoring infrastructure to run a health check of your database. You can run various integrity checks, such as transaction integrity checks and dictionary integrity checks. You can get a list of all the health checks you can run by querying the V$HM_CHECK view:
SQL> select name from v$hm_check where internal_check='N';
Once you decide on the type of check, specify the name of the check in the DBMS_HM package's RUN_CHECK procedure, as shown here:
SQL> begin
2 dbms_hm.run_check('Dictionary Integrity Check','testrun1'),
3 end;
4 /
PL/SQL procedure successfully completed.
SQL>
You can also run a health check from the Enterprise Manager. Go to Advisor Central Checkers, and select the specific checker from the Checkers subpage to run a health check.
How It Works
Oracle automatically runs a health check when it encounters a critical error. You can run a manual check using the procedure shown in the “Solution” section. The database stores all health check findings in the ADR.
You can run most of the health checks while the database is open. You can run the Redo Integrity Check and the DB Structure Integrity Check only when the database is closed—you must place the database in the NOMOUNT state to run these two checks.
You can view a health check’s findings using either the DBMS_HM package or through the Enterprise Manager. Here is how to get a health check using the DBMS_HM package:
SQL> set long 100000
SQL> set longchunksize 1000
SQL> set pagesize 1000
SQL> set linesize 512
SQL> select dbms_hm.get_run_report('testrun1') from dual;
DBMS_HM.GET_RUN_REPORT('TESTRUN1')
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
------------------------------------------------------
Basic Run Information
Run Name : testrun1
Run Id : 141
Check Name : Dictionary Integrity Check
Mode : MANUAL
Status : COMPLETED
Start Time : 2013-09-09 15:11:22.779917 -04:00
End Time : 2013-07-09 15:11:40.046797 -04:00
Error Encountered : 0
Source Incident Id : 0
Number of Incidents Created : 0
Input Paramters for the Run
TABLE_NAME=ALL_CORE_TABLES
CHECK_MASK=ALL
Run Findings And Recommendations
Finding
Finding Name : Dictionary Inconsistency
Finding ID : 142
Type : FAILURE
Status : OPEN
Priority : CRITICAL
Message : SQL dictionary health check: seg$.type# 31 on object SEG$
Failed
Message : Damaged rowid is AIAABAAAUUjAAf - description: Ts# 1
File# 3 Block# 83328 is referenced
...
SQL>
You can also go to Advisor Central Checkers and run a report from the Run Detail page for any health check you have run. Use the show hm_run, create report, and show report commands to view health check reports with the ADRCI utility. You can use the views V$HM_FINDING and V$HM_RECOMMENDATION to investigate the findings as well as the recommendations pursuant to a health check.
7-14. Creating a SQL Test Case
Problem
You need to create a SQL test case in order to reproduce a SQL failure on a different machine, either to support your own diagnostic efforts, or to enable Oracle Support to reproduce the failure.
Solution
In order to create a SQL test case, first you must export the SQL statement along with several bits of useful information about the statement. The following example shows how to capture the SQL statement that is throwing an error. In this example, the user SH is doing the export (you can’t do the export as the user SYS).
First, connect to the database as SYSDBA and create a directory to hold the test case:
SQL> conn / as sysdba
Connected.
SQL> create or replace directory TEST_DIR1 as 'c:myoradiagnsoticsincidents';
Directory created.
SQL> grant read,write on directory TEST_DIR1 to sh;
Grant succeeded.
SQL>
Then grant the DBA role to the user through which you will create the test case, and connect as that user:
SQL> grant dba to sh;
Grant succeeded.
SQL> conn sh/sh
Connected.
Issue the SQL command that’s throwing the error:
SQL> select * from my_mv where max_amount_sold >100000 order by 1;
Now you’re ready to export the SQL statement and relevant information, which you can import to a different system later on. Use the EXPORT_SQL_TESTCASE procedure to export the data, as shown here:
SQL> set serveroutput on
SQL> declare mycase clob;
2 begin
3 dbms_sqldiag.export_sql_testcase
4 (directory =>'TEST_DIR1',
5 sql_text => 'select * from my_mv where max_amount_sold >100000 order by 1',
6 user_name => 'SH',
7 exportData => TRUE,
8 testcase => mycase
9 );
10 end;
11 /
PL/SQL procedure successfully completed.
SQL>
Once the export procedure completes, you are ready to perform the import, either on the same or on a different server. The following example creates a new user named TEST, and imports the test case into that user’s schema. Here are the steps for importing the SQL statement and associated information into a different schema.
SQL> conn /as sysdba
Connected.
SQL> create or replace directory TEST_DIR2 as 'c:myoradiagnsoticsincidents'; /
Directory created.
SQL> grant read,write on directory TEST_dir2 to test;
Transfer all the files in the TEST_DIR1 directory to the TEST_DIR2 directory. Then grant the DBA role to user TEST, and connect as that user:
SQL> grant dba to test;
Grant succeeded.
SQL> conn test/test
Connected.
Perform the import of the SQL data as the user TEST, by invoking the IMPORT_SQL_TESTCASE procedure, as shown here:
SQL> begin
2 dbms_sqldiag.import_sql_testcase
3 (directory=>'TEST_DIR2',
4 filename=>'oratcb1_008602000001main.xml',
5 importData=>TRUE
6 );
7 end;
8 /
PL/SQL procedure successfully completed.
SQL>
The user TEST will now have all the objects to execute the SQL statement that you want to investigate. You can verify this by issuing the original select statement. It should give you the same output as under the SH schema.
How It Works
Oracle offers the SQL Test Case Builder (TCB) to reproduce a SQL failure. You can create a test case through Enterprise Manager or through a PL/SQL package. The “Solution” section of this recipe shows how create a test case using the EXPORT_SQL_TESTCASE procedure of the DBMS_SQLDIAG package. There are several variants of this package, and our example shows how to use a SQL statement as the source for creating a SQL test case. Please review the DBMS_SQLDIAG.EXPORT_TESTCASE procedure in Oracle’s PL/SQL Packages manual for details about other options to create test cases.
Note You can use the Test Case Builder without a need to licnese either the Orcle Diagnostics Pack or the Oracle Tuning Pack. Often, you’ll find yourself trying to provide a test case for Oracle, without which the Oracle Support personnel won’t be able to investigate a particular problem they are helping you with. The SQL Test Case Builder is a tool that is part of Oracle Database 11g, and its primary purpose is to help you quickly obtain a reproducible test case. The SQL Test Case Builder helps you easily capture pertinent information relating to a failed SQL statement and package it in a format that either a developer or an Oracle support person can use to reproduce the problem in a different environment.
You access the SQL Test Case Builder through the DBMS_SQLDIAG package. To create a test case, you must first export the SQL statement, including all the objects that are part of the statement, and all other related information. The export process is very similar to an Oracle export with the EXPDP command and thus uses a directory, just as EXPDP does. Oracle creates the SQL test case as a script that contains the statements that will re-create the necessary database objects, along with associated runtime information such as statistics, which enable you to reproduce the error. The following are the various types of information captured and exported as part of the test case creation process:
In the DBMS_SQLDIAG package, the EXPORT_SQL_TESTCASE procedure exports a SQL test case for a SQL statement to a directory. The IMPORT_SQL_TESTCASE procedure imports the test case from a directory.
In the EXPORT_SQL_TESTCASE procedure, here is what the attributes stand for:
DIRECTORY: The directory where you want to store the test case files
SQL_TEXT: The actual SQL statement that’s throwing the error
TESTCASE: The name of the test case
EXPORTDATA: By default, Oracle doesn’t export the data. You can set this parameter to TRUE in order to export the data. You can optionally limit the amount of data you want to export, by specifying a value for the Sampling Percent attribute. The default value is 100.
The Test Case Builder automatically exports the PL/SQL package specifications but not the package body. However, you can specify that the TCB export the package body as well. The export process creates several files in the directory you specify. Of these files, the file in the format oratcb1_008602000001main.xml contains the metadata for the test case.
7-15. Generating an AWR Report
Problem
You’d like to generate an AWR report to analyze performance problems in your database.
Solution
The database automatically takes an AWR (needs additional licensing) snapshot every hour and saves the statistics in the AWR for 8 days. An AWR report contains data captured between two snapshots, which need not be consecutive. Thus, an AWR report lets you examine instance performance between two points in time. You can generate an AWR report through Oracle Enterprise Manager. However, we show you how to create an AWR report using Oracle-provided scripts.
To generate an AWR report for a single instance database, execute the awrrpt.sql script as shown here.
SQL> @?/rdbms/admin/awrrpt.sql
Current Instance
~~~~~~~~~~~~~~~~
DB Id DB Name Inst Num Instance
----------- ----------- -------- --------
1118243965 ORCL1 1 orcl1
Specify the Report Type
~~~~~~~~~~~~~~~~~~~~~~~
Would you like an HTML report, or a plain text report?
Enter 'html' for an HTML report or 'text' for plain text
Defaults to 'html'
Enter value for report_type: text
Type Specified:
Select a text- or an HTML-based report. The HTML report is the default report type, and it provides a nice-looking, well-formatted, easy-to-read report. Press Enter to select the default HTML-type report.
Instances in this Workload Repository schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DB Id Inst Num DB Name Instance Host
------------ -------- ------- -------- --------
* 1118243965 1 ORCL1 orcl1 MIROPC61
Using 1118243965 for database Id
Using 1 for instance number
You must specify the DBID for the database at this point. In our example, however, there’s only one database and therefore one DBID, so there’s no need to enter the DBID.
Specify the number of days of snapshots to choose from
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Entering the number of days (n) will result in the most recent
(n) days of snapshots being listed. Pressing <return> without
specifying a number lists all completed snapshots.
Enter value for num_days: 1
Enter the number of days for which you want the database to list the snapshot IDs. In this example, we chose 1 because we want to generate an AWR for a time period that falls in the last day.
Listing the last day's Completed Snapshots
Snap
Instance DB Name Snap Id Snap Started Level
-------- ------- -------- ----------------- -----
orcl1 ORCL1 1877 09 Sep 2013 00:00 1
1878 09 Sep 2013 07:47 1
Specify a beginning and an ending snapshot for the AWR report.
Specify the Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap: 1877
Begin Snapshot Id specified: 1877
Enter value for end_snap: 1878
End Snapshot Id specified: 1878
You can either accept the default name for the AWR report by pressing Enter, or enter a name for the report.
Specify the Report Name
~~~~~~~~~~~~~~~~~~~~~~~
The default report file name is awrrpt_1_1877_1878.txt. To use this name,
press <return> to continue, otherwise enter an alternative.
Enter value for report_name:
Using the report name awrrpt_1_1877_1878.html
The database generates an AWR report in the same directory from which you invoked the awrrpt.sql script. For example, if you choose an HTML-based report, the AWR report will be in the following format: awrrpt_1_1881_1882.html.
Tip You can generate an AWR report to analyze the performance of a single SQL statement by executing the awrsqrpt.sql script.
How It Works
The AWR reports that you generate show performance statistics captured between two points in time, each of which is called a snapshot. You can gain significant insights into your database performance by reading the AWR reports carefully. An AWR report takes less than a minute to run in most cases, and holds a treasure trove of performance information. The report consists of multiple sections. Walking through an AWR report usually shows you the reason that your database isn’t performing at peak levels.
To generate an AWR report for all instances in an Oracle RAC environment, use the awrgrpt.sql script instead. You can generate an AWR report for a specific instance in a RAC environment by using the awrrpti.sql script. You can also generate an AWR report for a single SQL statement by invoking the awrsqrpt.sql script and providing the SQL_ID of the SQL statement.
You can generate an AWR report to span any length of time, as long as the AWR has snapshots covering that period. By default, the AWR retains its snapshots for a period of eight days.
You can generate an AWR snapshot any time you want, either through the Oracle Enterprise Manager or by using the DBMS_WORKLOAD_REPOSITORY package. That ability is useful when, for example, you want to investigate a performance issue from 20 minutes and the next snapshot is 40 minutes away. In that case, you must either wait 40 minutes or create a manual snapshot. Here’s an example that shows how to create an AWR snapshot manually:
SQL> exec dbms_workload_repository.create_snapshot();
PL/SQL procedure successfully completed.
SQL>
Once you create the snapshot, you can run the awrrpt.sql script. Then select the previous two snapshots to generate an up-to-date AWR report.
You can generate an AWR report when your user response time increases suddenly, say from 1 to 10 seconds during peak hours. An AWR report can also be helpful if a key batch job is suddenly taking much longer to complete. Of course, you must check the system CPU, I/O, and memory usage during the period as well, with the help of operating system tools such as sar, vmstat, and iosat.
If the system CPU usage is high, that doesn’t necessarily mean that it’s the CPU that’s the culprit. CPU usage percentages are not always a true measure of throughput, nor is CPU usage always useful as a database workload metric. Make sure to generate the AWR report for the precise period that encompasses the time during which you noticed the performance deterioration. An AWR report that spans a 24-hour period is of little use in diagnosing a performance dip that occurred 2 hours ago for only 30 minutes. Match your report to the time period of the performance problem.
7-16. Comparing Database Performance Between Two Periods
Problem
You want to examine and compare how the database performed during two different periods.
Solution
Use the awrddrpt.sql script (located in the $ORACLE_HOME/rdbms/admin directory) to generate an AWR Compare Periods Report that compares performance between two periods. Here are the steps.
Invoke the awrddrpt.sql script.
SQL> @$ORACLE_HOME/rdbms/admin/awrddrpt.sql
13. Select the report type (default is text).
Enter value for report_type: html
Type Specified: html
14. Specify the number of days of snapshots from which you want to select the beginning and ending snapshots for the first time period.
Specify the number of days of snapshots to choose from
Enter value for num_days: 4
15. Choose a pair of snapshots over which you want to analyze the first period’s performance.
Specify the First Pair of Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap: 2092
First Begin Snapshot Id specified: 2092
Enter value for end_snap: 2093
First End Snapshot Id specified: 2093
16. Select the number of days of snapshots from which you want to select the pair of snapshots for the second period. Enter the value 4 so you can select a pair of snapshots from the previous 4 days.
Specify the number of days of snapshots to choose from
Enter value for num_days: 4
17. Specify the beginning and ending snapshots for the second period.
Specify the Second Pair of Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap2: 2134
Second Begin Snapshot Id specified: 2134
Enter value for end_snap2: 2135
Second End Snapshot Id specified: 2135
18. Specify a report name or accept the default name.
Specify the Report Name
~~~~~~~~~~~~~~~~~~~~~~~
The default report file name is awrdiff_1_2092_1_2134.html To use this name,
press <return> to continue, otherwise enter an alternative.
Enter value for report_name:
Using the report name awrdiff_1_2092_1_2134.html
Report written to awrdiff_1_2092_1_2134.html
SQL>
How It Works
Generating an AWR Compare Periods report is a process very similar to the one for generating a normal AWR report. The big difference is that the report does not show what happened between two snapshots, as the normal AWR report does. The AWR Compare Periods report compares performance between two different time periods, with each time period involving a different pair of snapshots. If you want to compare the performance of your database between 9 A.M. and 10 A.M. today and the same time period three days ago, you can do it with the AWR Compare Periods report. You can run an AWR Compare Periods report on one or all instances of a RAC database.
The AWR Compare Periods report is organized similarly to the normal AWR report, but it shows each performance statistic for the two periods side by side, so you can quickly see the differences (or similarities) in performance. Here’s a section of the report showing how you can easily review the differences in performance statistics between the first and the second periods.
First Second Diff
----- ------ ------
% Blocks changed per Read: 61.48 15.44 -46.04
Recursive Call %: 98.03 97.44 -0.59
Rollback per transaction %: 0.00 0.00 0.00
Rows per Sort: 2.51 2.07 -0.44
Avg DB time per Call (sec): 1.01 0.03 -0.98
7-17. Analyzing an AWR Report
Problem
You’ve generated an AWR report that covers a period when the database was exhibiting performance problems. You want to analyze the report.
Solution
An AWR report summarizes its performance-related statistics under various sections. The following is a quick summary of the most important sections in an AWR report.
Session Information
You can find out the number of sessions from the section at the very top of the AWR report, as shown here:
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------ -------- ---------
Begin Snap: 1878 12-Sep-13 07:47:33 38 1.7
End Snap: 1879 12-Sep-13 09:00:48 34 3.7
Elapsed: 73.25 (mins)
DB Time: 33.87 (mins)
Be sure to check the Begin Snap and End Snap times, to confirm that the period encompasses the time when the performance problem occurred. If you notice a very high number of sessions, you can investigate if shadow processes are being created—for example, if the number of sessions goes up by 200 between the Begin Snap and End Snap times when you expect the number of sessions to be the same at both times, the most likely cause is an application start-up issue, which is spawning all those sessions.
Load Profile
The load profile section shows the per-second and per-transaction statistics for various indicators of database load such as hard parses and the number of transactions.
Load Profile Per Second Per Transaction
~~~~~~~~~~~~~~~~~ ---------- ---------------
DB Time(s): 0.5 1.4
DB CPU(s): 0.1 0.3
Redo size: 6,165.3 19,028.7
Logical reads: 876.6 2,705.6
Block changes: 99.2 306.0
Physical reads: 10.3 31.8
Physical writes: 1.9 5.9
User calls: 3.4 10.4
Parses: 10.2 31.5
Hard parses: 1.0 3.0
Logons: 0.1 0.2
Transactions: 0.3
The Load Profile section is one of the most important parts of an AWR report. Of particular significance are the physical I/O rates and hard parses. In an efficiently performing database, you should see mostly soft parses and very few hard parses. A high hard parse rate usually is a result of not using bind variables. If you see a high per second value for logons, it usually means that your applications aren’t using persistent connections. A high number of logons or an unusually high number of transactions tells you something unusual is happening in your database. However, the only way you’ll know the numbers are unusual is if you regularly check the AWR reports and know what the various statistics look like in a normally functioning database, at various times of the day!
Instance Efficiency Percentages
The instance efficiency section shows several hit ratios as well as the “execute to parse” and “latch hit” percentages.
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 99.10 In-memory Sort %: 100.00
Library Hit %: 95.13 Soft Parse %: 90.35
Execute to Parse %: 70.71 Latch Hit %: 99.97
Parse CPU to Parse Elapsd %: 36.71 % Non-Parse CPU: 83.60
Shared Pool Statistics Begin End
----- -----
Memory Usage %: 81.08 88.82
% SQL with executions>1: 70.41 86.92
% Memory for SQL w/exec>1: 69.60 91.98
The execute to parse ratio should be very high in a well-running instance. A low value for the % SQL with exec>1 statistic means that the database is not re-using shared SQL statements, usually because the SQL is not using bind variables.
Top 5 Foreground Events
The Top 5 Timed Foreground Events section shows the events that were responsible for the most waits during the time spanned by the AWR report.
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg Wait % DB
Event Waits Time(s) (ms) time Wait Class
---------------------------- ------ ------- ---- ---- ----------
db file sequential read 13,735 475 35 23.4 User I/O
DB CPU 429 21.1
latch: shared pool 801 96 120 4.7 Concurren
db file scattered read 998 49 49 2.4 User I/O
control file sequential read 9,785 31 3 1.5 System I/O
The Top 5 Timed Foreground Events section is where you can usually spot the problem, by showing you why the sessions are “waiting.” The Top 5 Events information shows the total waits for all sessions, but usually one or two sessions are responsible for most of the waits. Make sure to analyze the total waits and average waits (ms) separately, in order to determine if the waits are significant. Merely looking at the total number of waits or the total wait time for a wait event could give you a misleading idea about its importance. You must pay close attention to the average wait times for an event as well. In a nicely performing database, you should see CPU and I/O as the top wait events, as is the case here. If any wait events from the concurrent wait class such as latches show up at the top, investigate those waits further. For example, if you see events such as enq: TX - row lock contention, gc_buffer_busy (RAC), or latch free, it usually indicates contention in the database. If you see an average wait of more than 2 ms for the log file sync event, investigate the wait event further (Chapter 5 shows how to analyze various wait events). If you see a high amount of waits due to the db file sequential read or the db file scattered read wait events, there are heavy indexed reads (this is normal) or full table scans going on. You can find out the SQL statement and the tables involved in these read events in the AWR report.
Time Model Statistics
Time model statistics give you an idea about how the database has spent its time, including the time it spent on executing SQL statements as against parsing statements. If parsing time is very high, or if hard parsing is significant, you must investigate further.
Time Model Statistics DB/Inst: ORCL1/orcl1 Snaps: 1878-1879
Statistic Name Time (s) % of DB Time
------------------------ -------- ------------
sql execute elapsed time 1,791.5 88.2
parse time elapsed 700.1 34.5
hard parse elapsed time 653.7 32.2
Top SQL Statements
This section of the AWR report lets you quickly identify the most expensive SQL statements.
SQL ordered by Elapsed Time DB/Inst: ORCL1/orcl1 Snaps: 1878-1879
-> Captured SQL account for 13.7% of Total DB Time (s): 2,032
-> Captured PL/SQL account for 19.8% of Total DB Time (s): 2,032
Elapsed Elapsed Time
Time (s) Executions per Exec (s) %Total %CPU %IO SQL Id
-------- ---------- ------------ ------ ---- ---- -------------
292.4 1 292.41 14.4 8.1 61.2 b6usrg82hwsas
...
You can generate an explain plan for the expensive SQL statements using the SQL ID from this part of the report.
PGA Histogram
The PGA Aggregate Target Histogramshows how well the database is executing the sort and hash operations—for example:
PGA Aggr Target Histogram DB/Inst: ORCL1/orcl1 Snaps: 1878-1879
-> Optimal Executions are purely in-memory operations
Low High
Optimal Optimal Total Execs Optimal Execs 1-Pass Execs M-Pass Execs
------- ------- ----------- ------------- ------------ ------------
2K 4K 13,957 13,957 0 0
64K 128K 86 86 0 0
128K 256K 30 30 0 0
In this example, the database is performing all sorts and hashes optimally in the PGA. If you see a high number of one-pass executions and even a few large multi-pass executions, that’s an indication that the PGA is too small and you should consider increasing it.
How It Works
Analyzing an AWR report should be your first step when troubleshooting database performance issues such as a slow-running query. An AWR report lets you quickly find out things such as the number of connections, transactions per second, cache-hit rates, wait event information, and the SQL statements that are using the most CPU and I/O. It shows you which of your SQL statements are using the most resources, and which wait events are slowing down the database. Most importantly, probably, the report tells you if the database performance is unusually different from its typical performance during a given time of the day (or night). The AWR report sections summarized in the “Solution” section are only a small part of the AWR report. Here are some other key sections of the report that you must review when troubleshooting performance issues:
Depending on the nature of the performance issue you’re investigating, several of these sections in the report may turn out to be useful. In addition to the performance and wait statistics, the AWR report also offers advisories for both the PGA and the SGA. The AWR report is truly your best friend when you are troubleshooting just about any database performance issue. In a matter of minutes, you can usually find the underlying cause of the issue and figure out a potential fix. AWR does most of the work for you—all you need to do is know what to look for!