CHAPTER 7: Troubleshooting the Database

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 7

Troubleshooting the Database

This chapter contains several recipes that show how to use the database’s built-in diagnostic infrastructure to resolve database performance issues. You’ll learn how to use ADRCI, the Automatic Diagnostic Repository Command Interpreter, to perform various tasks such as checking the database alert log, creating a diagnostic package for sending to Oracle Support engineers, and running a proactive health check of the database.

Many common Oracle database performance-related issues occur when you have space issues with the temporary tablespace or when you’re creating a large index or a large table with the create table as select (CTAS) technique. Undo tablespace space issues are another common source of trouble for many DBAs. This chapter has several recipes that help you proactively monitor, diagnose, and resolve temporary tablespace and undo tablespace–related issues. When a production database seems to hang, there are ways to collect critical diagnostic data for analyzing the causes, and this chapter shows you how to log in to an unresponsive database to collect diagnostic data.

7-1. Determining the Optimal Undo Retention Period

Problem

You need to determine the optimal length of time for undo retention in your database.

Solution

You can specify the length of time Oracle will retain undo data after a transaction commits, by specifying the UNDO_RETENTION parameter. To determine the optimal value for the UNDO_RETENTION parameter, you must first calculate the actual amount of undo that the database is generating. Once you know approximately how much undo the database is generating, you can calculate a more precise value for the UNDO_RETENTION parameter. Use the following formula to calculate the optimal value of the UNDO_RETENTION parameter:

OPTIMAL UNDO_RETENTION = UNDO SIZE/(DB_BLOCK_SIZE*UNDO_BLOCK_PER_SEC)

You can calculate the space allocated for undo in your database by issuing the following query:

SQL> select sum(d.bytes) "undo"
  2  from v$datafile d,
  3  v$tablespace t,
  4  dba_tablespaces s
  5  where s.contents = 'UNDO'
  6  and s.status = 'ONLINE'
  7  and t.name = s.tablespace_name
  8  and d.ts# = t.ts#;
 
 UNDO
----------
 104857600
SQL>

You can calculate the value of UNDO_BLOCKS_PER_SEC with the following query:

SQL> select max(undoblks/((end_time-begin_time)*3600*24))
  2  "UNDO_BLOCK_PER_SEC"
  3  FROM v$undostat;
 
UNDO_BLOCK_PER_SEC
------------------
             7.625
SQL>

You most likely remember the block size for your database—if not, you can look it up in the SPFILE or find it by issuing the command show parameter db_block_size. Let’s say the db_block_size is 8 KB (8,192 bytes) for your database. You can then calculate the optimal value for the UNDO_RETENTION parameter using the formula shown earlier in this recipe—for example, giving a result in seconds: 1,678.69 = 104,857,600/(7.625 * 8,192). In this case, assigning a value of 1,800 seconds for the undo_retention parameter is appropriate, because it’s a bit more than what is indicated by our formula for computing the value of this parameter.

The previous example is predicated on the assumption that you have a limited amount of disk space available. If disk space isn’t an issue for you, you can instead choose the ideal value for the UNDO_RETENTION parameter for your instance. In this case, you choose a value for the UNDO_RETENTION parameter first and use that to calculate the size of your undo tablespace. The formula now becomes:

size of undo = undo_retention * db_block_size :undo_block_per_sec

You can use the following query to do all the calculations for you:

select d.undo_size/(1024*1024) "Current UNDO SIZE",
       SUBSTR(e.value,1,25) "UNDO RETENTION",
       (to_number(e.value) * to_number(f.value) *
       g.undo_block_per_sec) / (1024*1024)
      "Necessary UNDO SIZE"
     from (
       select sum(a.bytes) undo_size
         from v$datafile a,
              v$tablespace b,
              dba_tablespaces c
        where c.contents = 'UNDO'
          and c.status = 'ONLINE'
          and b.name = c.tablespace_name
          and a.ts# = b.ts#
       ) d,
      v$parameter e,
       v$parameter f,
       (
       Select max(undoblks/((end_time-begin_time)*3600*24))
         undo_block_per_sec
         from v$undostat
       ) g
     where e.name = 'undo_retention'
     and f.name = 'db_block_size';
 
Current UNDO SIZE
-----------------
500
UNDO RETENTION
--------------
10800
Necessary UNDO SIZE
-------------------
674.611200

The query shows that you should set the size of the undo tablespace in this database to around 675 MB. The final result is 674.611200. That rounds to 675 MB.

Here is how to set the undo retention to 30 minutes for an instance by updating the value of the UNDO_RETENTION parameter in the SPFILE:

SQl> alter system set undo_retention=1800 scope=both;
 
System altered.
 
SQL>

How It Works

Oracle retains the undo data for both read consistency purposes as well as to support Oracle Flashback operations, for the duration you specify with the UNDO_RETENTION parameter. Automatic undo management is the default mode for undo management starting with release 11g. If you create a database with the Database Configuration Assistant (DBCA), Oracle automatically creates an auto-extending undo tablespace named UNDOTBS1. If you’re manually creating a database, you specify the undo tablespace in the database creation statement, or you can add the undo tablespace at any point. If a database doesn’t have an explicit undo tablespace, Oracle will store the undo records in the SYSTEM tablespace.

Once you set the UNDO_TABLESPACE initialization parameter, Oracle automatically manages undo retention for you. Optionally, you can set the UNDO_RETENTION parameter to specify how long Oracle retains older undo data before overwriting it with newer undo data.

The first formula in the Solution section shows how to set the length of the UNDO_RETENTION parameter for a specific undo tablespace size. The second query calculates the size of the undo tablespace for a specifc value for the UNDO_RETENTION parameter that’s based on your instance’s undo needs, which in turn depend on current database activity. Note that we rely on the dynamic view V$UNDOSTAT to calculate the value for the undo retention period. Therefore, it’s essential that you execute your queries after the database has been running for some time, thus ensuring that it has had the chance to process a typical workload.

If you configure the UNDO_RETENTION parameter, the undo tablespace must be large enough to hold the undo generated by the database within the time you specify with the UNDO_RETENTION parameter. When a transaction commits, the database may overwrite its undo data with newer undo data. The undo retention period is the minimum time for which the database will attempt to retain older undo data. After it saves the undo data for the period you specified for the UNDO_RETENTION parameter, the database marks that undo data as expired and makes the space occupied by that data available to write undo data for new transactions.

By default, the database uses the following criteria to determine how long it needs to retain undo data:

Length of the longest-running query
Length of the longest-running transaction
Longest flashback duration

It’s somewhat difficult to understand how the Oracle database handles the retention of undo data. Here’s a brief summary of how things work:

If you don’t configure the undo tablespace with the AUTOEXTEND option, the database simply ignores the value you set for the UNDO_RETENTION parameter. The database will automatically tune the undo retention period based on database workload and the size of the undo tablespace. So, make sure you set the undo tablespace to a large value if you’re receiving errors indicating (snapshot too old) that the database is not retaining undo for a long enough time. Typically, the undo retention in this case is for a duration significantly longer than the longest-running active query in the database.
If you want the database to try to honor the settings you specify for the UNDO_RETENTION parameter, make sure that you enable the AUTOEXTEND option for the undo tablespace. This way, Oracle will automatically extend the size of the undo tablespace to make room for undo from new transactions, instead of overwriting the older undo data. However, if you’re receiving ORA-0155 (snapshot too old) errors—say, due to Oracle Flashback operations—it means that the database isn’t able to dynamically tune the undo retention period effectively. In a case such as this, try increasing the value of the UNDO_RETENTION parameter to match the length of the longest Oracle Flashback operation. Alternatively, you can try going to a larger fixed-size undo tablespace (without the AUTOEXTEND option).

The key to figuring out the right size for the undo tablespace or the correct setting for the UNDO_RETENTION parameter is to understand the nature of the current database workload. In order to understand the workload characteristics, it’s important to examine the V$UNDOSTAT view, because it contains statistics showing how the database is utilizing the undo space, as well as information such as the length of the longest-running queries. You can use this information to calculate the size of the undo space for the current workload your database is processing. Note that each row in the V$UNDOSTAT view shows undo statistics for a 10-minute time interval. The table contains a maximum of 576 rows, each for a 10-minute interval. Thus, you can review undo usage for up to 4 days in the past.

Here are the key columns you should monitor in the V$UNDOSTAT view for the time period you’re interested in—ideally, the time period should include the time when your longest-running queries are executing. You can use these statistics to size both the UNDO_TABLESPACE as well as the UNDO_RETENTION initialization parameters.

begin_time: Beginning of the time interval.

end_time: End of the time interval.

undoblks: Number of undo blocks the database consumed in a 10-minute interval; this is what we used in our formula for the estimation of the size of the undo tablespace.

txncount: Number of transactions executed in a 10-minute interval.

maxquerylen: This shows the length of the longest query (in seconds) executed in this instance during a 10-minute interval. You can estimate the size of the UNDO_RETENTION parameter based on the maximum value of the MAXQUERYLEN column.

maxqueryid: Identifier for the longest-running SQL statement in this interval.

nospaceerrcnt: The number of times the database didn’t have enough free space available in the undo tablespace for new undo data, because the entire undo tablespace was being used by active transactions; of course, this means that you need to add space to the undo tablespace.

tuned_undoretention: The time, in seconds, for which the database will retain the undo data after the database commits the transaction to which the undo belongs.

The following query based on the V$UNDOSTAT view shows how Oracle automatically tunes undo retention (check the TUNED_UNDORETENTION column) based on the length of the longest-running query (MAXQUERYLEN column) in the current instance workload.

SQL> select to_char(begin_time,'hh24:mi:ss') BEGIN_TIME,
  2  to_char(end_time,'hh24:mi:ss') END_TIME,
  3  maxquerylen,nospaceerrcnt,tuned_undoretention
  4  from v$undostat;
 
BEGIN_TI END_TIME MAXQUERYLEN NOSPACEERRCNT TUNED_UNDORETENTION
-------- -------- ----------- ------------- -------------------
12:15:35 12:25:35         592             0                1492
12:05:35 12:15:35        1194             0                2094
11:55:35 12:05:35         592             0                1493
11:45:35 11:55:35        1195             0                2095
11:35:35 11:45:35         593             0                1494
11:25:35 11:35:35        1196             0                2097
11:15:35 11:25:35         594             0                1495
11:05:35 11:15:35        1195             0                2096
10:55:35 11:05:35         593             0                1495
10:45:35 10:55:35        1198             0                2098
...
SQL>

Note that the value of the TUNED_UNDORETENTION column fluctuates continuously, based on the value of the maximum query length (MAXQUERYLEN) during any interval. You can see that the two columns are directly related to each other, with Oracle raising or lowering the tuned undo retention based on the maximum query length during a given interval (of 10 minutes). The following query shows the usage of undo blocks and the transaction count during each 10-minute interval.

SQL> select to_char(begin_time,'hh24:mi:ss'),to_char(end_time,'hh24:mi:ss'),
  2  maxquerylen,ssolderrcnt,nospaceerrcnt,undoblks,txncount from v$undostat
  3  order by undoblks
  4  /
 
TO_CHAR( TO_CHAR( MAXQUERYLEN SSOLDERRCNT NOSPACEERRCNT   UNDOBLKS   TXNCOUNT
-------- -------- ----------- ----------- ------------- ---------- ----------
17:33:51 17:36:49         550           0             0          1         18
17:23:51 17:33:51         249           0             0         33        166
17:13:51 17:23:51         856           0             0         39        520
17:03:51 17:13:51         250           0             0         63        171
16:53:51 17:03:51         850           0             0        191        702
16:43:51 16:53:51         245           0             0        429        561
 
6 rows selected.
 
SQL>

Oracle provides an easy way to help set the size of the undo tablespace as well as the undo retention period, through the OEM Undo Advisor interface. You can specify the length of time for the advisor’s analysis, for a period going back to a week—the advisor uses the AWR hourly snapshots to perform its analysis. You can specify the undo retention period to support a flashback transaction query. Alternatively, you can let the database determine the desired undo retention based on the longest query in the analysis period.

7-2. Finding What’s Consuming the Most Undo

Problem

Often, one or two user sessions seem to be hogging the undo tablespaces. You’d like to identify the user and the SQL statement that are using up all that undo space.

Solution

Use the following query to find out which SQL statement has run for the longest time in your database.

SQL> select s.sql_text from v$sql s, v$undostat u
     where u.maxqueryid=s.sql_id;

You can join the V$TRANSACTIONand the V$SESSION views to find out the most undo used by a session for a currently executing transaction, as shown here:

SQL> select s.sid, s.username, t.used_urec, t.used_ublk
     from v$session s, v$transaction t
     where s.saddr = t.ses_addr
     order by t.used_ublk desc;

You can also issue the following query to find out which session is currently using the most undo in an instance:

SQL>select s.sid, t.name, s.value
    from v$sesstat s, v$statname t
    where s.statistic# = t.statistic#
    and t.name = 'undo change vector size'
    order by s.value desc;

The query’s output relies on the statistic undo change vector size in the V$STATNAME view, to show the SID for the sessions consuming the most undo right now. The V$TRANSACTION view shows details about active transactions. Here’s a query that joins the V$TRANSACTION, V$SQL and V$SESSION views and shows you the user and the SQL statement together with the amount of undo space that’s consumed by the SQL statement:

SQL> select sql.sql_text sql_text, t.USED_UREC Records,
     t.USED_UBLK Blocks,
     (t.USED_UBLK*8192/1024) KBytes from v$transaction t,
     v$session s,
     v$sql sql
     where t.addr = s.taddr
     and s.sql_id = sql.sql_id
     and s.username ='&USERNAME';

The column USED_UREC shows the number of undo records used, and the USED_UBLK column shows the undo blocks used by a transaction.

How It Works

You can issue the queries described in the “Solution” section to identify the sessions that are responsible for the most undo usage in your database, as well as the users that are responsible for those sessions. You can query the V$UNDOSTAT with the appropriate begin_time and end_time values to get the SQL identifier of the longest-running SQL statement during a time interval. The MAXQUERYID column captures the SQL identifier. You can use this ID to query the V$SQL view in order to find out the actual SQL statement. Similarly, the V$TRANSACTION and the V$SESSION views together help identify the users that are consuming the most undo space. If excessive undo usage is affecting performance, you might want to look at the application to see why the queries are using so much undo.

If you are experiencing excessive undo, then the following query can help. Executing the query helps you find the users and the instance responsible for undo usage in an Oracle RAC Database.

SQL> select a.inst_id, a.sid, c.username, c.osuser, c.program, b.name,
     a.value, d.used_urec, d.used_ublk
     from gv$sesstat a, v$statname b, gv$session c, gv$transaction d
     where a.statistic# = b.statistic#
     and a.inst_id = c.inst_id
     and a.sid = c.sid
     and c.inst_id = d.inst_id
     and c.saddr = d.ses_addr
     and b.name = ‘undo change vector size’
     and a.value > 0
     order by a.value;

Once you know the users and the instances responsible, you can take a closer look at just what those users and instances are doing that is causing such a high rate of undo usage.

7-3. Resolving an ORA-01555 Error

Problem

You’re receiving the ORA-01555 (snapshot too old) errors during nightly runs of key production batch jobs. You want to eliminate these errors.

Solution

While setting a high value for the UNDO_RETENTION parameter can potentially minimize the possibility of receiving “snapshot too old” errors, it doesn’t guarantee that the database won’t overwrite older undo data that may be needed by a running transaction. You can move long-running batch jobs to a separate time interval when other programs aren’t running in the database, to avoid these errors. Remember that you don't always need several concurrent programs to generate “snapshot too old”. You can do generate that error by running one poorly written query on a big table in a cursor, and update the same table in the loop.

Regardless, while you can minimize the occurrence of “snapshot too old” errors with these approaches, you can’t completely eliminate such errors without specifying the guaranteed undo retention feature. When you configure guaranteed undo retention in a database, no SELECT statement can fail because of the “snapshot too old” error. Oracle will keep new DML statements from executing when you set up guaranteed undo retention. Implementing the guaranteed undo feature is simple. Suppose you want to ensure that the database retains undo for at least an hour (3,600 seconds). First set the undo retention threshold with the alter system command shown here, and then set up guaranteed undo retention by specifying the retention guarantee clause to alter the undo tablespace.

SQL> alter system set undo_retention=3600;
System altered.
SQL> alter tablespace undotbs1 retention guarantee;
Tablespace altered.
SQL>

You can switch off guaranteed undo retention by executing the alter tablespace command with the retention noguarantee clause.

Tip You can enable guaranteed undo retention by using the alter system command as shown in this recipe, as well as with the create database and create undo tablespace statements.

How It Works

Oracle uses the undo records stored in the undo tablespace to help roll back transactions, provide read consistency, and to help recover the database. In addition, the database also uses undo records to read data from a past point in time using Oracle Flashback Query. Undo data serves as the underpinning for several Oracle Flashback features that help you recover from logical errors.

Occurrence of the Error

The ORA-01555 error (snapshot too old) may occur in various situations. The following is a case where the error occurs during an export.

EXP-00008: ORACLE error 1555 encountered
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10$" too small
EXP-00000: Export terminated unsuccessfully

And you can receive the same error when performing a flashback transaction:

ERROR at line 1:
ORA-01555: snapshot too old: rollback segment number  with name "" too small
ORA-06512: at "SYS.DBMS_FLASHBACK", line 37
ORA-06512: at "SYS.DBMS_FLASHBACK", line 70
ORA-06512: at li

The “snapshot too old” error occurs when Oracle overwrites undo data that’s needed by another query. The error is a direct result of how Oracle’s read consistency mechanism works. The error occurs during the execution of a long-running query when Oracle tries to read the “before image” of any changed rows from the undo segments. For example, if a long-running query starts at 1 A.M. and runs until 6 A.M. it’s possible for the database to change the data that’s part of this query during the period in which the query executes. When Oracle tries to read the data as it appeared at 1 A.M., the query may fail if that data is no longer present in the undo segments.

If your database is experiencing a lot of updates, Oracle may not be able to fetch the changed rows, because the before changes recorded in the undo segments may have been overwritten. The transactions that changed the rows will have already committed, and the undo segments don’t have a record of the before change row values because the database overwrote the relevant undo data. Since Oracle fails to return consistent data for the current query, it issues the ORA-01555 error. The query that’s currently running requires the before image to construct read-consistent data, but the before image isn’t available.

The ORA-01555 error may be the result of one or both of the following: a heavy amount of queries during a time of intense changes in the database or too small an undo tablespace. You can increase the size of the undo tablespace, but that doesn’t ensure that the error won’t occur again.

Influence of Extents

The database stores undo data in undo extents, and there are three distinct types of undo extents:

Active: Transactions are currently using these extents.

Unexpired: These are extents that contain undo that’s required to satisfy the undo retention time specified by the UNDO_RETENTION initialization parameter.

Expired: These are extents with undo that’s been retained longer than the duration specified by the UNDO_RETENTION parameter.

If the database doesn’t find enough expired extents in the undo tablespace or it can’t get new undo extents, it’ll re-use the unexpired (but never an active undo extent) extents, and this leaves the door open for an ORA-01555, “snapshot too old” error. By default, the database will essentially stop honoring the undo retention period you specify if it encounters space pressure to accommodate the undo from new transactions. Since the unexpired undo extents contain undo records needed to satisfy the undo retention period, overwriting those extents in reality means that the database is lowering the undo retention period you’ve set. Enabling the undo retention guarantee helps assure the success of long-running queries as well as Oracle Flashback operations. The “guarantee” part of the undo retention guarantee is real—Oracle will certainly retain undo at least for the time you specify and will never overwrite any of the unexpired undo extents that contain the undo required to satisfy the undo retention period. However, there’s a stiff price attached to this guarantee—Oracle will guarantee retention even if it means that DML transactions fail because the database can’t find space to record the undo for those transactions. Therefore, you must exercise great caution when enabling the guaranteed undo retention capability.

7-4. Monitoring Temporary Tablespace Usage

Problem

You want to monitor the usage of the temporary tablespace.

Solution

Execute the following query to find out the used and free space in a temporary tablespace.

SQL> select * from (select a.tablespace_name,
      sum(a.bytes/1024/1024) allocated_mb
      from dba_temp_files a
      where a.tablespace_name = upper('&&temp_tsname') group by a.tablespace_name) x,
      (select sum(b.bytes_used/1024/1024) used_mb,
      sum(b.bytes_free/1024/1024) free_mb
      from v$temp_space_header b
      where b.tablespace_name=upper('&&temp_tsname') group by b.tablespace_name);
 
Enter value for temp_tsname: TEMP
...
TABLESPACE_NAME                ALLOCATED_MB    USED_MB    FREE_MB
---------------                ------------    -------    -------
TEMP                                     60         12         48
SQL>

How It Works

Oracle uses temporary tablespaces for storing intermediate results from sort operations as well as any temporary tables, temporary LOBs, and temporary B-trees. You can create multiple temporary tablespaces, but only one of them can be the default temporary tablespace. If you don’t explicitly assign a temporary tablespace to a user, that user is assigned the default temporary tablespace.

You won’t find information about temporary tablespaces in the DBA_FREE_SPACE view. Use the V$TEMP_SPACE_HEADER as shown in this example to find how much free and used space there is in any temporary tablespace.

7-5. Identifying Who Is Using the Temporary Tablespace

Problem

You notice that the temporary tablespace is filling up fast, and you want to identify the user and the SQL statements responsible for the high temporary tablespace usage.

Solution

Issue the following query to find out which SQL statement is using up space in a sort segment.

SQL> select se.sid, se.username,
     su.blocks * ts.block_size / 1024 / 1024 mb_used, su.tablespace,
     su.sqladdr address, sq.hash_value, sq.sql_text
     from v$sort_usage su, v$session se, v$sqlarea sq, dba_tablespaces ts
     where su.session_addr = se.saddr
     and su.sqladdr = sq.address (+)
     and su.tablespace = ts.tablespace_name;

The preceding query shows information about the session that issued the SQL statements well as the name of the temporary tablespace and the amount of space the SQL statement is using in that tablespace.

You can use the following query to find out which sessions are using space in the temporary tablespace. Note that the information is in the summary form, meaning it doesn’t separate the various sort operations being run by a session—it simply gives the total temporary tablespace usage by each session.

SQL> select se.sid,
     se.username, se.osuser, pr.spid,
     se.module,se.program,
     sum (su.blocks) * ts.block_size / 1024 / 1024 mb_used, su.tablespace,
     count(*) sorts
     from v$sort_usage su, v$session se, dba_tablespaces ts, v$process pr
     where su.session_addr = se.saddr
     and se.paddr = pr.addr
     and su.tablespace = ts.tablespace_name
     group by se.sid, se.serial#, se.username, se.osuser, pr.spid, se.module,
     se.program, ts.block_size, su.tablespace;

The output of this query will show you the space that each session is using in the temporary tablespace, as well as the number of sort operations that session is performing right now.

How It Works

Oracle tries to perform sort and hash operations in memory (PGA), but if a sort operation is too large to fit into memory, it uses the temporary tablespace to do the work. It’s important to understand that even a single large sort operation has the potential to use up an entire temporary tablespace. Since all database sessions share the temporary tablespace, the session that runs the large sort operation could potentially result in other sessions receiving errors due to lack of room in that tablespace, or fail itself. Once the temporary tablespace fills up, all SQL statements that seek to use the temporary tablespace will fail with the ORA-1652: unable to extend temp segment error. New sessions may not be able to connect, and queries can sometimes hang and users may not be able to issue new queries. You try to find any blocking locks, but none exists. If the temporary tablespace fills up, transactions won’t complete. If you look in the alert log, you’ll find that the temporary tablespace ran out of space.

Operations that use an ORDER BY or GROUP BY clause frequently use the temporary tablespace to do their work. Large hash joins also need the temp space while they’re execting. You must also remember that creating an index or rebuilding one also makes use of the temporary tablespace for sorting the index.

Oracle uses the PGA memory for performing the sort and hash operations. Thus, one of the first things you must do is to review the current value set for the PGA_AGGREGATE_TARGET initialization parameter and see if bumping it up will help if you happen to see a lot of I/O occurring in the temporary tablespace. Nevertheless, even a larger setting for the PGA_AGGREGATE_TARGET parameter doesn’t guarantee that Oracle will perform a huge sort entirely in memory. Oracle allocates each session a certain amount of PGA memory, with the amount it allocates internally determined, based on the value of the PGA_AGGREGATE_TARGET parameter. Once a large operation uses its share of the PGA memory, Oracle will write intermediary results to disk in the temporary tablespace. These types of operations are called one-pass or multi-pass operations, and since they are performed on disk, they are much slower than an operation performed entirely in the PGA.

If your database is running out of space in the temporary tablespace, you must increase its size by adding a tempfile. Enabling autoextend for a temporary tablespace will also help prevent “out of space” errors. Since Oracle allocates space in a temporary tablespace that you have assigned for the user performing the sort operation, you can assign users that need to perform heavy sorting a temporary tablespace that’s different from that used by the rest of the users, thus minimizing the effect of the heavy sorting activity on overall database performance.

Note that unlike table or index segments, of which there are several for each object, a temporary tablespace has just one segment called the sort segment. All sessions share this sort segment. A single SQL statement can use multiple sort and hash operations. In addition, the same session can have multiple SQL statements executing simultaneously, with each statement using multiple sort and hash operations. Once a sort operation completes, the database immediately marks the blocks used by the operations as free and allocates them to another sort operation. The database adds extents to the sort segment as the sort operation gets larger, but if there’s no more free space in the temporary tablespace to allocate additional extents, it issues the ORA-1652:unable to extend temp segment error. The SQL statement that’s using the sort operation will fail as a result.

Note Although you’ll receive an ORA-1652 error when a SQL statement performing a huge sort fails due to lack of space in the temporary tablespace, that’s not the only reason you’ll get this error. You’ll also receive this error when performing a table move operation (alter table ... move), if the tablespace to which you’re moving the table doesn’t have room to hold the table. Same is the case sometimes when you're creating a large index. Please see Recipe 7-6 for an explanation of this error.

7-6. Resolving the “Unable to Extend Temp Segment” Error

Problem

While creating a large index, you receive an Oracle error indicating that the database is unable to extend a TEMP segment. However, you have plenty of free space in the temporary tablespace.

Solution

When you get an error such as the following, your first inclination may be to think that there’s no free space in the temporary tablespace.

ORA-01652: unable to extend temp segment by 1024 in tablespace INDX_01

You cannot fix this problem by adding space to the temporary tablespace. The error message clearly indicates the tablespace that ran out of space. In this case, the offending tablespace is INDX_01, and not the TEMP tablespace. Obviously, an index creation process failed because there was insufficient space in the INDX_01 tablespace. You can fix the problem by adding a datafile to the INDX_01 tablespace, as shown here:

SQL>alter tablespace INDX_01 add datafile '/u01/app/oracle/data/indx_01_02.dbf'
  2 size 1000m;

How It Works

When you receive the ORA-01652 error, your normal tendency is to check the temporary tablespace. You check the DBA_TEMP_FREE_SPACE view, and there’s plenty of free space in the default temporary tablespace, TEMP. Well, if you look at the error message carefully, it tells you that the database is unable to extend the temp segment in the INDX_01 tablespace. When you create an index, as in this case, you provide the name of the permanent tablespace in which the database must create the new index. Oracle starts the creation of the new index by putting the new index structure into a temporary segment in the tablespace you specify (INDX_01 in our example) for the index. The reason is that if your index creation process fails, Oracle (to be more specific, the SMON process) will remove the temporary segment from the tablespace you specified for creating the new index. Once the index is successfully created (or rebuilt), Oracle converts the temporary segment into a permanent segment within the INDX_01 tablespace. However, as long as Oracle is still creating the index, the database deems it a temporary segment and thus when an index creation fails, the database issues the ORA-01652 error, which is also the error code for an “out of space” error for a temporary tablespace. The TEMP segment the error refers to is the segment that was holding the new index while it was being built. Once you increase the size of the INDX_01 tablespace, the error will go away.

Tip The temporary segment in an ORA-1652 error message may not be referring to a temporary segment in a temporary tablespace.

The key to resolving the ORA-01652 error is to understand that Oracle uses temporary segments in places other than a temporary tablespace. While a temporary segment in the temporary tablespace is for activities such as sorting, a permanent tablespace can also use temporary segments when performing temporary actions necessary during the creation of a table (CTAS) or an index.

Tip When you create an index, the creation process uses two different temporary segments. One temporary segment in the TEMP tablespace is used to sort the index data. Another temporary segment in the permanent tablespace holds the index while it is being created. After creating the index, Oracle changes the temporary segment in the index’s tablespace into a permanent segment. The same is the case when you create a table with the CREATE TABLE...AS SELECT (CTAS) option.

As the “Solution” section explains, the ORA-01652 error refers to the tablespace where you’re rebuilding an index. If you are creating a new index, Oracle uses the temporary tablespace for sorting the index data. When creating a large index, it may be a smart idea to create a large temporary tablespace and assign it to the user who’s creating the index. Once the index is created, you can re-assign the user the original temporary tablespace and remove the large temporary tablespace. This strategy helps avoid enlarging the default temporary tablespace to a very large size to accommodate the creation of a large index.

If you specify autoextend for a temporary tablespace, the temp files may get very large, based on one or two large sorts in the database. When you try to reclaim space for the TEMP tablespace, you may get the following error.

SQL> alter database tempfile '/u01/app/oracle/oradata/prod1/temp01.dbf'
     resize 500M;
alter database tempfile '/u01/app/oracle/oradata/prod1/temp01.dbf'
    resize 500M
*ERROR at line 1:
ORA-03297: file contains used data beyond requested RESIZE value

One solution is to create a new temporary tablespace, make that the default temporary tablespace, and then drop the larger temporary tablespace (provided no other sessions are using it, of course). You can simplify matters by using the following alter tablespace command to shrink the temporary tablespace:

SQL> alter tablespace temp shrink space;
 
Tablespace altered.
 
SQL>

In this example, we shrank the entire temporary tablespace, but you can shrink a specific tempfile by issuing the command alter tablespace temp shrink tempfile <file_name>. The command will shrink the tempfile to the smallest size possible.

7-7. Resolving Open Cursor Errors

Problem

You are frequently getting the Maximum Open Cursors exceeded error, and you want to resolve the error.

Solution

One of the first things you need to do when you receive the ORA-01000: “maximum open cursors exceeded” error is to check the value of the initialization parameter open_cursors. You can view the current limit for open cursors by issuing the following command:

SQL> sho parameter open_cursors
 
NAME                                    TYPE        VALUE
------------                         -------        -----
open_cursors                         integer          300
SQL>

The parameter OPEN_CURSORS sets the maximum number of cursors a session can have open at once. You specify this parameter to control the number of open cursors. Keeping the parameter’s value too low will result in a session receiving the ORA-01000 error. There’s no harm in specifying a large value for the OPEN_CURSORS parameter (unless you expect all sessions to simultaneously max out their cursors, which is unlikely), so you can usually resolve cursor-related errors simply by raising the parameter value to a large number. However, you may sometimes find that raising the value of the open_cursors parameter doesn’t “fix” the problem. In such cases, investigate which processes are using the open cursors by issuing the following query:

SQL> select a.value, s.username,s.sid,s.serial#,s.program,s.inst_id
     from gv$sesstat a,gv$statname b,gv$session s
     where a.statistic# = b.statistic# and s.sid=a.sid
     and b.name='opened cursors current'

The GV$OPEN_CURSOR (or the V$OPEN_CURSOR) view shows all the cursors that each user session has currently opened and parsed, or cached. You can issue the following query to identify the sessions with a high number of opened and parsed or cached cursors.

SQL> select saddr, sid, user_name, address,hash_value,sql_id, sql_text
     from gv$open_cursor
     where sid in
     (select sid from v$open_cursor
     group by sid having count(*)  > &threshold);

The query lists all sessions with an open cursor count greater than the threshold you specify. This way, you can limit the query’s output and focus just on the sessions that have opened, parsed, or cached a large number of cursors.

You can get the actual SQL code and the open cursor count for a specific session by issuing the following query:

SQl> select sql_id,substr(sql_text,1,50) sql_text, count(*)
     from gv$open_cursor where sid=81
     group by sql_id,substr(sql_text,1,50)
     order by sql_id;

The output shows the SQL code for all open cursors in the session with the SID 81. You can examine all SQL statements with a high open cursor count to see why the session was keeping a large number of cursors open.

How It Works

If your application is not closing open cursors, then setting the OPEN_CURSORS parameter to a higher value won’t really help you. You may momentarily resolve the issue, but you’re likely to run into the same issue a little later. If the application layer never closes the ref cursors created by the PL/SQL code, the database will simply hang on to the server resources for the used cursors. You must fix the application logic so it closes the cursors—the problem isn’t really in the database.

If you’re using a Java application deployed on an application server such as the Oracle WebLogic Server, the WebLogic Server’s JDBC connection pools provide open database connections for applications. Any prepared statements in each of these connections will use a cursor. Multiple application server instances and multiple JDBC connection pools will mean that the database needs to support all the cursors. If multiple requests share the same session ID, the open cursor problem may be due to implicit cursors. The only solution then is to close the connection after each request, unless developers are using the close() method of the PreparedStatement and ResultSet objects.

A cursor leak is when the database opens cursors but doesn’t close them.

You can use the SESSION_CACHED_CURSORS initialization parameter to set the maximum number of cached closed cursors for each session. The default setting is 50. You can use this parameter to prevent a session from opening an excessive number of cursors, thereby filling the library cache or forcing excessive hard parses. Repeated parse calls for a SQL statement leads Oracle to move the session cursor for that statement into the session cursor cache. The database satisfies subsequent parse calls by using the cached cursor instead of re-opening the cursor.

When you re-execute a SQL statement, Oracle will first try to find a parsed version of that statement in the shared pool—if it finds the parsed version in the shared pool, a soft parse occurs. Oracle is forced to perform the much more expensive hard parse if it doesn’t find the parsed version of the statement in the shared pool. While a soft parse is much less expensive than a hard parse, a large number of soft parses can affect performance, because they do require CPU usage and library cache latches. To reduce the number of soft parses, Oracle caches the recent closed cursors of each session in a local session cache for that session—Oracle stores any cursor for which a minimum of three parse calls were made, thus avoiding having to cache every single session cursor, which will fill up the cursor cache.

The default value of 50 for the SESSION_CACHED_CURSORS initialization parameter may be too low for many databases. You can check if the database is bumping against the maximum limit for session-cached cursors by issuing the following statement:

SQL> select max(value) from v$sesstat
  2  where statistic# in (select statistic# from v$statname
  3* where name = 'session cursor cache count'),
 
MAX(VALUE)
----------
        49
 
SQL>

The query shows the maximum number of session cursors that have been cached in the past. Since this number (49) is virtually the same as the default value (or the value you’ve set) for the SESSION_CACHED_CURSORS parameter, you must set the parameter's value to a larger number. Session cursor caches use the shared pool. If you’re using automatic memory management, there’s nothing for you to do after you reset the SESSION_CACHED_CURSORS parameter—the database will bump up the shared pool size if necessary. You can find out how many cursors each session has in its session cursor cache by issuing the following query:

SQL> select a.value,s.username,s.sid,s.serial#
  2  from v$sesstat a, v$statname b,v$session s
  3  where a.statistic#=b.statistic# and s.sid=a.sid
  4* and b.name='session cursor cache count';

7-8. Resolving a Hung Database

Problem

Your database is hung. Users aren’t able to log in, and existing users can’t complete their transactions. The DBAs with SYSDBA privileges may also be unable to log in to the database. You need to find out what is causing the database to hang, and fix the problem.

Solution

Follow these general steps when facing a database that appears to be hung:

Check your alert log to see if the database has reported any errors, which may indicate why the database is hanging.
See if you can get an AWR or ASH report or query some of the ASH views, as explained in Chapter 5. You may notice events such as hard parses at the top of the Load Profile section of the AWR report, indicating that this is what is slowing down the database.
A single ad hoc query certainly has the potential to bring an entire database to its knees. See if you can identify one or more very poorly performing SQL statements that may be leading to the hung (or a very poorly performing) database.
Check the database for blocking locks as well as latch contention.
Check the server’s memory usage as well as CPU usage. Make sure the sessions aren’t stalling because you’ve sized the PGA too low, as explained in Chapter 3.
Don’t overlook the fact that a scary-looking database hang may be caused by something as simple as the filling up of all archive log destinations. If the archive destination is full, the database will hang, and new user connections will fail. You can, however, still connect as the SYS user, and once you make room in the archive destination by moving some of the archived redo log files, the database becomes accessible to the users.
Check the Flash Recovery Area (FRA). A database also hangs when it’s unable to write Flashback Database logs to the recovery area. When the FRA fills up, the database won’t process new work and it won’t spawn new database connections. You can fix this problem by making the recovery area larger with the alter system set db_recovery_file_dest_size command.

If you’re still unable to resolve the reasons for the hung database, you most likely have a truly hung database. While you’re investigating such a database, you may sometimes find yourself unable to connect and log in. In that case, use the “prelim” option to log in to the database. The prelim option doesn’t require a real database connection. Here's an example that shows how to use the prelim option to log into a database:

$ sqlplus /nolog
 
SQL> set _prelim on
SQL> connect / as sysdba
Prelim connection established
SQL>

Note You can’t set prelim while connected to a database. Alternatively, you can use the command sqlplus -prelim "/ as sysdba" to log in with the -prelim option. Note that you use the nolog option to open a SQL*Plus session. You can’t execute the set _prelim on command if you’re already connected to the database. Once you establish a prelim connection as shown here, you can execute the oradebug hanganalyze command to analyze a hung database—for example:

SQL> oradebug hanganalyze 3

Statement processed.

SQL>

The integer 3 in the oradebug comamnd specifes the level at which you want to run the command. We explain this in more detail in the How it Works section of this recipe.

In an Oracle RAC environment, specify the oradebug hanganalyze command with additional options, as shown here:

SQL> oradebug setinst all
SQL> oradebug -g def hanganalyze 3

You can repeat the oradebug hanganalyze command a couple of times to generate dump files for varying process states.

In addition to the dump files generated by the hanganalyze command, Oracle Support may often also request a process state dump, also called a systemstate dump, to analyze hung database conditions. The systemstate dump will report on what the processes are doing and the resources they’re currently holding. You can get a systemstate dump from a non-RAC system by executing the following set of commands.

SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump systemstate 266
Statement processed.
SQL>

Issue the following commands to get a systemstate dump in a RAC environment:

SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug -g all dump systemstate 266

Note that unlike the oradebug hanganalyze command, you must connect to a process. The setmypid option specifies the process, in this case your own process. You can also specify a process ID other than yours, in which case you issue the command oradebug setmypid <pid> before issuing the dump systemstate command. If you try to issue the dump systemstate command without setting the PID, you’ll receive an error:

SQL> oradebug dump systemstate 10
ORA-00074: no process has been specified
SQL>

You must take the systemstate dumps a few times, with an interval of about a minute or so in between the dumps. Oracle Support usually requests several systemstate dumps along with the trace files generated by the hanganalyze command.

How It Works

The key thing you must ascertain when dealing with a “hung” database is whether the database is really hung, or just slow. If one or two users complain about a slow-running query, you need to analyze their sessions, using the techniques described in Chapter 5, to see if the slowness is due to a blocking session or to an Oracle wait event. If several users report that their work is going slowly, it could be due to various reasons, including CPU, memory (SGA or PGA), or other system resource issues.

Check the server’s CPU usage as one of your first steps in troubleshooting a hung database. If your server is showing 100% CPU utilization, or if it’s swapping or paging, the problem may not lie in the database at all. As for memory, if the server doesn’t have enough free memory, new sessions can’t connect to the database.

Tip The “prelim” option shown in the “Solution” section lets you connect to the SGA without opening a session. You can thus “log” in to the hung database even when normal SQL*Plus logins don’t work. The oradebug session you start once you connect to the SGA actually analyzes what’s in the SGA and dumps it into a trace file.

A true database hang can be due to a variety of reasons, including a system that has exhausted resources such as the CPU or memory, or because several sessions are stuck waiting for a resource such as a lock. In fact, a true database hang is more likely due to to an Oracle bug than to any other reason. Oracle is sometimes unable to automatically detect and resolve internal deadlocks—and this leads to what Oracle Support calls a “true database hang.” A true database hang is thus an internal deadlock or a cyclical dependency among multiple processes. Oracle Support will usually ask you to provide them the hanganalyze trace files and multiple systemstate dumps to enable them to diagnose the root cause of your hang. At times like this, you may not even be able to log into the database. Your first instinct when you realize that you can’t even log in to a database is to try shutting down and restarting, often referred to as bouncing the database. Unfortunately, while shutting down and restarting the database may “resolve” the issue, it’ll also disconnect all users—and you’re no wiser as to what exactly caused the problem. If you do decide to bounce your database, quickly generate a few hanganalyze and systemstate dumps first.

Tip As unpleasant as it may be at times, if you find that you simply can’t connect to a hung database, then collect any trace dumps you may need, and quickly bounce the database so that users can access their applications. Especially when you’re dealing with a database that’s hanging because of memory issues, bouncing the instance may get things going again quickly.

If you find that the database is completely unresponsive, and you can’t even log in to the database with the SYSDBA privilege, you can use the prelim option to log into the database. The prelim option stands for preliminary connection, and it starts an Oracle process and attaches that process to the SGA shared memory. However, this is not a full or complete connection, but a limited connection where the structures for query execution are not set up—so, you cannot even query the V$ views. However, the prelim option lets you run oradebug commands to get error dump stacks for diagnostic purposes. The output of the hanganalyze command can tell Oracle Support engineers if your database is really hanging, because of sessions waiting for some resource. The command makes internal kernel calls to find out all sessions that are waiting for a resource and shows the relationship between the blocking and waiting sessions. The hanganalyze option that you can specify with either the oradebug command or an alter session statement produces details about hung sessions. Once you get the dump file, Oracle Support personnel can analyze it and let you know the reasons for the database hang.

You can invoke the hanganalyze command at various levels ranging from 1 to 10. Level 3 dumps processes that are in a hanging (IN_HANG) state. You normally don’t need to specify a level higher than 3, because higher levels will produce voluminous reports with too many details about the processes.

Note The dump files you create with the hanganalyze and the systemstate commands are created in ADR’s trace directory.

Note that we issued the oradebug command to get a systemstate dump with a level of 266. Level 266 (combination of Level 256, which produces short stack information, and Level 10) is for Oracle releases 9.2.0.6 and onward (earlier releases used systemstate level 10). Level 266 allows you to dump the short stacks for each process, which are Oracle function calls that help Oracle development teams determine which Oracle function is causing the problem. The short stack information also helps in matching known bugs in the code. On Solaris and Linux systems, you can safely specify level 266, but on other systems, it may take a long time to dump the short stacks. Therefore, you may want to stick with level 10 for the other operating systems.

If you can find out the blocking session, you can also take a dump just for that session, by using the command oradebug setospid nnnn, where nnnn is the blocking session’s PID, and then invoking the oradebug command, as shown here:

SQL> oradebug setospid  9999
SQL> oradebug unlimit
SQL> oradebug dump errorstack  3

Note that you can generate the hanganalyze and systemstate dumps in a normal session (as well as in a prelim session), without using the oradebug command. You can invoke the hanganalyze command with an alter session command, as shown here.

SQL>alter session set  events 'immediate trace name hanganalyze level 3';

Similarly, you can get a systemstate dump with the following command:

SQL> alter session set events 'immediate trace name SYSTEMSTATE level 10';
Session altered.
SQL>

The oradebug and systemstate dumps are just two of the many dumps you can collect. Use the oradebug dumplist command to view the various error dumps you can collect.

SQL> oradebug dumplist
TRACE_BUFFER_ON
TRACE_BUFFER_OFF
LATCHES
PROCESSSTATE
SYSTEMSTATE
INSTANTIATIONSTATE
REFRESH_OS_STATS
CROSSIC
CONTEXTAREA
HANGDIAG_HEADER
HEAPDUMP
...

Note that while you can read some of the dump files in an editor, these files are mainly for helping Oracle Support professionals troubleshoot a database hang situation. There’s not much you can do with the dump files, especially when a database hang situation is due to an Oracle bug or a kernel-level lock, except to send them along to Oracle Support for analysis.

7-9. Invoking the Automatic Diagnostic Repository Command Interpreter

Problem

You’d like to invoke the Automatic Diagnostic Repository Command Interpreter (ADRCI) and work with various components of the Automatic Diagnostic Repository (ADR).

Solution

ADRCI is a tool to help you manage Oracle diagnostic data. You can use ADRCI commands in both an interactive as well as a batch mode.

To start ADRCI in the interactive mode, type adrci at the command line, as shown here:

 $ adrci
 
ADR base = "c:appora"
adrci>

You can issue the adrci command from any directory, so long as the PATH environment variable includes ORACLE_HOME/bin/. You can enter each command at the adrci prompt, and when you’re done using the utility, you can type EXIT or QUIT to exit. You can view all the ADRCI commands available to you by typing HELP at the ADRCI command line, as shown here:

adrci> HELP
 
 HELP [topic]
   Available Topics:
        CREATE REPORT
        ECHO
        EXIT
        HELP
        HOST
        IPS
...
        SHOW HOMES | HOME | HOMEPATH
        SHOW INCDIR
        SHOW INCIDENT
        SHOW PROBLEM
        SHOW REPORT
        SHOW TRACEFILE
        SPOOL
 
 There are other commands intended to be used directly by Oracle, type
 "HELP EXTENDED" to see the list
 
adrci>

You can get detailed information for an individual ADRCI command by adding the name of the command as an attribute to the HELP command. For example, here is how to get the syntax for the show tracefile command:

adrci> helpshow tracefile
 
  Usage: SHOW TRACEFILE [file1 file2 ...] [-rt | -t]
                        [-i inc1 inc2 ...] [-path path1 path2 ...]
 
  Purpose: List the qualified trace filenames.
...
  Options:
  Examples:
...
adrci>

You can also execute ADRCI commands in the batch mode by incorporating the commands in a script or batch file. For example, if you want to run the ADRCI commands SET HOMEPATH and SHOW ALERT from within an operating system script, include the following line inside a shell script:

SET HOMEPATH diag/rdbms/orcl/orcl; SHOW ALERT -term

Let’s say your script name is myscript.txt. You can then execute this script by issuing the following command inside an operating system shell script or batch file:

$ adrci script=myscript.txt

Note that the parameter SCRIPT tells ADRCI that it must execute the commands within the text file myscript.txt. If the text file is not within the same directory from where the shell script or batch file is running, you must provide the path for the directory where you saved the text file.

To execute an ADRCI command directly at the command line instead of invoking ADRCI first and working interactively with the ADR interface, specify the parameter EXEC with the ADRCI command, as shown here:

$ adrci EXEC="SHOW HOMES; SHOW INCIDENT"

This example shows how to include two ADRCI commands—SHOW HOMES and SHOW INCIDENT—by executing the ADRCI command at the command line.

How It Works

The Automatic Diagnostic Repository is a directory structure that you can access even when the database is down, because it’s stored outside the database. The root directory for the ADR is called the ADR baseand is set by the DIAGNOSTIC_DEST initialization parameter. Each Oracle product or component has its own ADR home under the ADR base. The location of each of the ADR homes follows the path diag/product_type/product_id/instance_id. Thus, the ADR home for a database named orcl1 with the instance name orcl1 and the ADR base set to /app/oracle will be /app/oracle/diag/rdbms/orcl1/orcl1. Under each of the ADR homes are the diagnostic data, such as the trace and dump files and the alert log, as well as other diagnostic files, for that instance of an Oracle product.

The ADRCI utility helps you manage the diagnostic data in the ADR. ADRCI lets you perform the following types of diagnostic tasks:

View diagnostic data in the ADR (Automatic Diagnostic Repository): The ADR stores diagnostic data such as alert logs, dump files, trace files, health check reports, etc.
View health check reports: The diagnosability infrastructure automatically runs health checks to capture details about the error and adds them to other diagnostic data it collects for that error. You can also manually invoke a health check.
Package incidents and problem information for transmission to Oracle Support: A problem is a critical database error, and an incident is a single occurrence of a specific problem. An incident package is a collection of diagnostic data that you send to Oracle Support for troubleshooting purposes. ADRCI has special commands that enable you to create packages and generate zipped diagnostic packages to send to Oracle Support.

You can view all ADR locations for the current database instance by querying the V$DIAG_INFO view, as shown here.

SQL> select * from v$diag_info;
 
INST_ID         NAME           VALUE                                     CON_ID
-------         -------------  ----------------------------------------- ------
   1            Diag Enabled   TRUE                                         0
   1            ADR Base       /u01/app/oracle                              0
   1            ADR Home       /u01/app/oracle/diag/rdbms/orcl/orcl         0
   1            Diag Trace     /u01/app/oracle/diag/rdbms/orcl/orcl/trace   0
   1            Diag Alert     /u01/app/oracle/diag/rdbms/orcl/orcl/alert   0
   1            Diag Incident /u01/app/oracle/diag/rdbms/orcl/orcl/incident 0
   1            Diag Cdump    /u01/app/oracle/diag/rdbms/orcl/orcl/cdump         0
   1            Health Monitor  /u01/app/oracle/diag/rdbms/orcl/orcl/hm          0
   1            Default Trace File     /u01/app/oracle/diag/rdbms/orcl/orcl/
   2                                     trace/orcl_ora_3263.trc                 0
    
   1            Active Problem Count     0                                       0
   1            Active Incident Count    0                                       0
 
11 rows selected.
 
SQL>

The ADR home is the root directory for a database’s diagnostic data. All diagnostic files such as the alert log and the various trace files are located under the ADR home. The ADR home is located directly underneath the ADR base, which you specify with the DIAGNOSTIC_DEST initialization parameter. Here’s how to find out the location of the ADR base directory:

adrci> show base
ADR base is "/u01/app/oracle"
adrci>

You can view all the ADR homes under the ADR base by issuing the following command:

adrci> show homes
ADR Homes:
diag/rdbms/orcl/orcl
diag/tnslsnr/localhost/listener
adrci>

You can have multiple ADR homes under the ADR base, and multiple ADR homes can be current at any given time. Your ADRCI commands will work only with diagnostic data in the current ADR home. How do you know which ADR home is current at any point in time? The ADRCI homepath helps determine the ADR homes that are current, by pointing to the ADR home directory under the ADR base hierarchy of directories.

Note Some ADRCI commands require only one ADR home to be current—these commands will issue an error if multiple ADR homes are current.

You can use either the show homes or the show homepath command to view all ADR homes that are current:

adrci> show homepath
ADR Homes:
diag/rdbms/orcl/orcl
diag/tnslsnr/localhost/listener
adrci>

If you want to work with diagnostic data from multiple database instances or components, you must ensure that all the relevant ADR homes are current. Most of the time, however, you’ll be dealing with a single database instance or a single Oracle product or component such as the listener, for example. An ADR homepath is always relative to the ADR. If you specify /u01/app/oracle/ as the value for the ADR base directory, for example, all ADR homes will be under the ADR_Base/diag directory. Issue the set homepath command to set an ADR home directory to a single home, as shown here:

adrci> set homepath diag
dbmsorcl1orcl1
 
adrci> show homepath
 
ADR Homes:
diag
dbmsorcl1orcl1
adrci>

Note Diagnostic data includes descriptions of incidents and problems, health monitoring reports, and traditional diagnostic files such as trace files, dump files, and alert logs.

Note that before you set the homepath with the set homepath command, the show homepath command shows all ADR homepaths. However, once you set the homepath to a specific home, the show homepath command shows just a single homepath. It’s important to set the homepath before you execute several ADRCI commands, as they are applicable to only a single ADR home. For example, if you don’t set the homepath before issuing the following command, you’ll receive an error:

adrci> ips create package
DIA-48448: This command does not support multiple ADR homes
adrci>

The error occurs because the ips create package command is not valid with multiple ADR homes. The command will work fine after you issue the set homepath command to set the homepath to a single ADR home. Commands such as the one shown here work only with a single current ADR home, but others work with multiple current ADR homes—there are also commands that don’t need a current ADR home. The bottom line is that all ADRCI commands will work with a single current ADR home.

7-10. Viewing an Alert Log from ADRCI

Problem

You want to view an alert log by using ADRCI commands.

Solution

To view an alert log with ADRCI, follow these steps:

Invoke ADRCI.

$ adrci

8. Set the ADR home with the set homepath command.

adrci> set homepath diag/rdbms/orcl/orcl

9. Enter the following command to view the alert log:

adrci> show alert

ADR Home = /u01/app/oracle/diag/rdbms/orcl/orcl:
*************************************************************************
Output the results to file: /tmp/alert_3573_13986_orcl_1.ado
adrci>
show alert

The alert log will pop up in your default editor. The ADRCI prompt will return once you close the text file in the editor.

You can also query the V$DIAG_INFO view to find the path that corresponds to the Diag Trace entry. You can change the directory to that path and open the alert_<db_name>.log file with a text editor.

How It Works

The alert log holds runtime information for an Oracle instance and provides information such as the initialization parameters the instance is using, as well as a record of key changes such as redo log file switches and, most importantly, messages that show Oracle errors and their details. The alert log is critical for troubleshooting purposes, and is usually the first place you’ll look when a problem occurs. Oracle provides the alert log as both a text file as well as an XML-formatted file.

The show alert command brings up the XML-formatted alert log without displaying the XML tags. You can set the default editor with the SET EDITOR command, as shown here:

adrci> set editor notepad.exe

The previous command changes the default editor to Notepad. The show alert -term command shows the alert log contents in the terminal window. If you want to examine just the latest events in the alert log, issue the following command:

adrci>show alert -tail 50

The tail option shows you a set of the most recent lines from the alert log in the command window. In this example, it shows the last 50 lines from the alert log. If you don’t specify a value for the tail parameter, by default, it shows the last 10 lines from the alert log.

The following command shows a “live” alert log, in the sense that it will show changes to the alert log as the entries are added to the log.

adrci> show alert -tail -f

The previous command shows the last 10 lines of the alert log and prints all new messages to the screen, thus offering a “live” display of ongoing additions to the alert log. The CTRL+C sequence will take you back to the ADRCI prompt.

When troubleshooting, it is very useful to see if the database issued any ORA-600 errors. You can issue the following command to trap the ORA-600 errors.

adrci> show alert -p "MESSAGE_TEXT LIKE '%ORA-600%'"

Although you can view the alert log directly by going to the file system location where it’s stored, you can also do so through the ADRCI tool. ADRCI is especially useful for working with the trace files of an instance. The SHOW TRACEFILE command shows all the trace files in the trace directory of the instance. You can issue the SHOW TRACEFILE command with various filters—the following example looks for trace files that reference the background process mmon:

$ adrci> show tracefile %mmon%
     diag
dbmsorcl1orcl1	raceorcl1_mmon_1792.trc
     diag
dbmsorcl1orcl1	raceorcl1_mmon_2340.trc
adrci>

This command lists all trace files with the string mmon in their file names. You can apply filters to restrict the output to just the trace files associated with a specific incident number (the next recipe, Recipe 7-11, explains how to get the incident number), as shown here:

adrci> show tracefile -I 43417
          diag
dbmsorcl1orcl1incidentincdir_43417orcl1_ora_4276_i43417.trc
adrci>

The previous command lists the trace files related to the incident number 43417.

7-11. Viewing Incidents with ADRCI

Problem

You want to use ADRCI to view incidents.

Solution

You can view all incidents in the ADR with the show incident command (be sure to set the homepath first):

$ adrci
$ set homepath diag
dbmsorcl1orcl1
 
adrci> show incident
 
ADR Home = c:apporadiag
dbmsorcl1orcl1:
*******************************************************************************
INCIDENT_ID          PROBLEM_KEY
 CREATE_TIME
-------------------- ---------------------------------------------------
 43417                ORA 600 [kkqctinvvm(2): Inconsistent state space!]
 2013-08-17 09:26:15.091000 -05:00
43369                ORA 600 [kkqctinvvm(2): Inconsistent state space!]
 2013-08-17 11:08:40.589000 -05:00
79451                ORA 445
 2013-03-04 03:00:39.246000 -05:00
84243                ORA 445
 2013-03-14 19:12:27.434000 -04:00
84244                ORA 445
 2013-03-20 16:55:54.501000 -04:00
5 rows fetched

You can specify the detail mode to view details about a specific incident, as shown here:

adrci> show incident -mode detail -p "incident_id=43369"
 
ADR Home = c:apporadiag
dbmsorcl1orcl1:
*******************************************************************************
 
INCIDENT INFO RECORD 1
*******************************************************************************
   INCIDENT_ID                   43369
   STATUS                        ready
   CREATE_TIME                   2013-08-17 11:08:40.589000 -05:00
   PROBLEM_ID                    1
   CLOSE_TIME                    <NULL>
   FLOOD_CONTROLLED              none
   ERROR_FACILITY                ORA
   ERROR_NUMBER                  600
   ERROR_ARG1                    kkqctinvvm(2): Inconsistent state space!
   SIGNALLING_COMPONENT          SQL_Transform
   PROBLEM_KEY                   ORA 600 [kkqctin: Inconsistent state space!]
   FIRST_INCIDENT                43417
   FIRSTINC_TIME                 2013-08-17 09:26:15.091000 -05:00
   LAST_INCIDENT                 43369
   LASTINC_TIME                  2013-08-17 11:08:40.589000 -05:00
   KEY_VALUE                     ORACLE.EXE.3760_3548
   KEY_NAME                      PQ
   KEY_NAME                      SID
   KEY_VALUE                     71.304
   OWNER_ID                      1
   INCIDENT_FILE                 c:apporadiag
dbmsorcl1orcl1	raceorcl1_ora_3548.trc
 
adrci>

How It Works

The show incident command reports on all open incidents in the database. For each incident, the output for this command shows the problem key, incident ID, and the time when the incident occurred. In this example, we first set the ADRCI homepath, so the command shows incidents from just this ADR home. If you don’t set the homepath, you’ll see incidents from all the current ADR homes.

As mentioned earlier, an incident is a single occurrence of a problem. A problem is a critical error such as an ORA-600 (internal error) or an ORA-07445 error relating to operating system exceptions. The problem key is a text string that shows the problem details. For example, the problem key ORA 600 [kkqctinvvm(2): Inconsistent state space!] shows that the problem is due to an internal error.

When a problem occurs several times, the database creates an incident for each occurrence of the problem, each with a unique incident ID. The database logs the incident in the alert log and sends an alert to the Oracle Enterprise Manager, where they show up in the Home page. The database automatically gathers diagnostic data for the incident, called incident dumps, and stores them in the ADR trace directory.

Since a critical error can potentially generate numerous identical incidents, the fault diagnosability infrastructure applies a “flood control” mechanism to limit the generation of incidents. For a given problem key, the database allows only 5 incidents within 1 hour and a maximum of 25 incidents in 1 day. Once a problem triggers incidents beyond these thresholds, the database merely logs the incidents in the alert log and the Oracle Enterprise Manager but stops generating new incident dumps for them. You can’t alter the default threshold settings for the incident flood control mechanism.

7-12. Packaging Incidents for Oracle Support

Problem

You want to send the diagnostic files related to a specific problem to Oracle Support.

Solution

You can package the diagnostic information for one more incidents through either Database Control or through commands that you can execute from the ADRCI interface. In this solution, we show you how to package incidents through ADRCI. You can use various IPS commands to package all diagnostic files related to a specific problem in a zipped format and send the file to Oracle Support. Here are the steps to create an incident package.

Create an empty logical package as shown here:

adrci> ips create package
Created package 1 without any contents, correlation level typical
adrci>

In this example, we created an empty package, but you can also create a package based on an incident number, a problem number, a problem key or a time interval. In all these cases, the package won't be empty—it'll include the diagnostic information for the incident or problem that you specify. Since we created an empty package, we need to add diagnostic information to that package in the next step.

10. Add diagnostic information to the logical package with the ips add incident command:

adrci> ips add incident 43369 package 1
Added incident 43369 to package 1
adrci>

At this point, the incident 43369 is associated with package 1, but there’s no diagnostic data in it yet.

11. Generate the physical package.

adrci> ips generate package 1 in apporadiagnostics
Generated package 1 in file apporadiagnosticsIPSPKG_20110419131046_COM_1.zip, mode complete
adrci>

When you issue the generate package command, ADRCI gathers all relevant diagnostic files and adds them to a zip file in the directory you designate.

12. Send the resulting zip file to Oracle Support.

If you decide to add supplemental diagnostic data to an existing physical package (zipped file), you can do so by specifying the incremental option with the generate package command:

adrci> ips generate package 1 in apporadiagnostics incremental

The incremental zip file created by this command will have the term INC in the file name, indicating that it is an incremental zip file.

How It Works

A physical package is the zip file that you can send to Oracle Support for diagnosing a problem in your database. Since an incident is a single occurrence of a problem, adding the incident number to the logical package and generating the physical package rolls up all the diagnostic data for that problem (incident) into a single zipped file. In this example, we showed you how to first create an empty logical package and then associate it with an incident number. However, the ipc create package command has several options: you can specify the incident number or a problem number directly when you create the logical package, and skip the add incident command. You can also create a package that contains all incidents between two points in time, as shown here:

adrci> ips create package time '2013-09-12 10:00:00.00 -06:00' to '2013-09-12 23
:00:00.00 -06:00'
Created package 2 based on time range 2011-04-12 12:00:00.000000 -06:00 to 2011-
04-12 23:00:00.000000 -06:00, correlation level typical
adrci>

The package generated by the previous command contains all incidents that occurred between 10 A.M. and 11 P.M. on April 12, 2011.

Note that you can also manually add a specific diagnostic file to an existing package. To add a file, you specify the file name in the ips add file command—you are limited to adding only those diagnostic files that are within the ADR base directory. Here is an example:

adrci> ips add file <ADR_BASE>/diag/rdbms/orcl1/orcl1/trace/orcl_ora12345.trc package 1

By default, the ips generate package command generates a zip file that includes all files for a package. The incremental option will limit the files to those that the database has generated since you originally generated the zipped file for that package. The ips show files command shows all the files in a package and the ips show incidents command shows all the incidents in a package. You can issue the ips remove file command to remove a diagnostic file from a package.

7-13. Running a Database Health Check

Problem

You’d like to run a comprehensive diagnostic health check on your database. You’d like to find out if there’s any data dictionary or file corruption, as well as any other potential problems in the database.

Solution

You can use the database health monitoring infrastructure to run a health check of your database. You can run various integrity checks, such as transaction integrity checks and dictionary integrity checks. You can get a list of all the health checks you can run by querying the V$HM_CHECK view:

SQL> select name from v$hm_check where internal_check='N';

Once you decide on the type of check, specify the name of the check in the DBMS_HM package's RUN_CHECK procedure, as shown here:

 SQL> begin
  2   dbms_hm.run_check('Dictionary Integrity Check','testrun1'),
  3   end;
  4  /
 
PL/SQL procedure successfully completed.
 
SQL>

You can also run a health check from the Enterprise Manager. Go to Advisor Central Checkers, and select the specific checker from the Checkers subpage to run a health check.

How It Works

Oracle automatically runs a health check when it encounters a critical error. You can run a manual check using the procedure shown in the “Solution” section. The database stores all health check findings in the ADR.

You can run most of the health checks while the database is open. You can run the Redo Integrity Check and the DB Structure Integrity Check only when the database is closed—you must place the database in the NOMOUNT state to run these two checks.

You can view a health check’s findings using either the DBMS_HM package or through the Enterprise Manager. Here is how to get a health check using the DBMS_HM package:

SQL> set long 100000
SQL> set longchunksize 1000
SQL> set pagesize 1000
SQL> set linesize 512
 
SQL> select dbms_hm.get_run_report('testrun1') from dual;
 
DBMS_HM.GET_RUN_REPORT('TESTRUN1')
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
------------------------------------------------------
Basic Run Information
Run Name                      : testrun1
 Run Id                       : 141
 Check Name                   : Dictionary Integrity Check
 Mode                         : MANUAL
 Status                       : COMPLETED
 Start Time                   : 2013-09-09 15:11:22.779917 -04:00
 End Time                     : 2013-07-09 15:11:40.046797 -04:00
 Error Encountered            : 0

Source Incident Id            : 0
 Number of Incidents Created  : 0
 Input Paramters for the Run
 TABLE_NAME=ALL_CORE_TABLES
 CHECK_MASK=ALL
 
Run Findings And Recommendations
 Finding
 Finding Name  : Dictionary Inconsistency
Finding ID     : 142
 Type          : FAILURE
 Status        : OPEN
 
Priority       : CRITICAL
 Message       : SQL dictionary health check: seg$.type# 31 on object SEG$
                 Failed
Message        : Damaged rowid is AIAABAAAUUjAAf - description: Ts# 1
                 File# 3 Block# 83328 is referenced
...
 
SQL>

You can also go to Advisor Central Checkers and run a report from the Run Detail page for any health check you have run. Use the show hm_run, create report, and show report commands to view health check reports with the ADRCI utility. You can use the views V$HM_FINDING and V$HM_RECOMMENDATION to investigate the findings as well as the recommendations pursuant to a health check.

7-14. Creating a SQL Test Case

Problem

You need to create a SQL test case in order to reproduce a SQL failure on a different machine, either to support your own diagnostic efforts, or to enable Oracle Support to reproduce the failure.

Solution

In order to create a SQL test case, first you must export the SQL statement along with several bits of useful information about the statement. The following example shows how to capture the SQL statement that is throwing an error. In this example, the user SH is doing the export (you can’t do the export as the user SYS).

First, connect to the database as SYSDBA and create a directory to hold the test case:

SQL> conn / as sysdba
Connected.
SQL> create or replace directory TEST_DIR1 as 'c:myoradiagnsoticsincidents';
 
Directory created.
SQL> grant read,write on directory TEST_DIR1 to sh;
 
Grant succeeded.
SQL>

Then grant the DBA role to the user through which you will create the test case, and connect as that user:

SQL> grant dba to sh;
 
Grant succeeded.
 
SQL> conn sh/sh
Connected.

Issue the SQL command that’s throwing the error:

SQL> select * from  my_mv  where max_amount_sold >100000 order by 1;

Now you’re ready to export the SQL statement and relevant information, which you can import to a different system later on. Use the EXPORT_SQL_TESTCASE procedure to export the data, as shown here:

SQL> set serveroutput on
 
SQL> declare mycase clob;
  2  begin
  3  dbms_sqldiag.export_sql_testcase
  4  (directory    =>'TEST_DIR1',
  5  sql_text      => 'select * from my_mv where max_amount_sold >100000 order by 1',
  6  user_name     => 'SH',
  7  exportData    =>  TRUE,
  8  testcase      => mycase
  9  );
 10  end;
 11  /
 
PL/SQL procedure successfully completed.
 
SQL>

Once the export procedure completes, you are ready to perform the import, either on the same or on a different server. The following example creates a new user named TEST, and imports the test case into that user’s schema. Here are the steps for importing the SQL statement and associated information into a different schema.

SQL> conn /as sysdba
Connected.
SQL> create or replace directory TEST_DIR2 as 'c:myoradiagnsoticsincidents'; /
 
Directory created.
SQL> grant read,write on directory TEST_dir2 to test;

Transfer all the files in the TEST_DIR1 directory to the TEST_DIR2 directory. Then grant the DBA role to user TEST, and connect as that user:

SQL> grant dba to test;
 
Grant succeeded.
 
SQL> conn test/test
Connected.

Perform the import of the SQL data as the user TEST, by invoking the IMPORT_SQL_TESTCASE procedure, as shown here:

SQL> begin
  2  dbms_sqldiag.import_sql_testcase
  3  (directory=>'TEST_DIR2',
  4  filename=>'oratcb1_008602000001main.xml',
  5  importData=>TRUE
  6  );
  7  end;
  8  /
 
PL/SQL procedure successfully completed.
 
SQL>

The user TEST will now have all the objects to execute the SQL statement that you want to investigate. You can verify this by issuing the original select statement. It should give you the same output as under the SH schema.

How It Works

Oracle offers the SQL Test Case Builder (TCB) to reproduce a SQL failure. You can create a test case through Enterprise Manager or through a PL/SQL package. The “Solution” section of this recipe shows how create a test case using the EXPORT_SQL_TESTCASE procedure of the DBMS_SQLDIAG package. There are several variants of this package, and our example shows how to use a SQL statement as the source for creating a SQL test case. Please review the DBMS_SQLDIAG.EXPORT_TESTCASE procedure in Oracle’s PL/SQL Packages manual for details about other options to create test cases.

Note You can use the Test Case Builder without a need to licnese either the Orcle Diagnostics Pack or the Oracle Tuning Pack. Often, you’ll find yourself trying to provide a test case for Oracle, without which the Oracle Support personnel won’t be able to investigate a particular problem they are helping you with. The SQL Test Case Builder is a tool that is part of Oracle Database 11g, and its primary purpose is to help you quickly obtain a reproducible test case. The SQL Test Case Builder helps you easily capture pertinent information relating to a failed SQL statement and package it in a format that either a developer or an Oracle support person can use to reproduce the problem in a different environment.

You access the SQL Test Case Builder through the DBMS_SQLDIAG package. To create a test case, you must first export the SQL statement, including all the objects that are part of the statement, and all other related information. The export process is very similar to an Oracle export with the EXPDP command and thus uses a directory, just as EXPDP does. Oracle creates the SQL test case as a script that contains the statements that will re-create the necessary database objects, along with associated runtime information such as statistics, which enable you to reproduce the error. The following are the various types of information captured and exported as part of the test case creation process:

SQL text for the problem statement
Table data—this is optional, and you can export a sample or complete data.
The execution plan
Optimizer statistics
PL/SQL functions, procedure, and packages
Bind variables
User privileges
SQL profiles
Metadata for all the objects that are part of the SQL statement
Dynamic sampling results
Runtime information such as the degree of parallelism, for example

In the DBMS_SQLDIAG package, the EXPORT_SQL_TESTCASE procedure exports a SQL test case for a SQL statement to a directory. The IMPORT_SQL_TESTCASE procedure imports the test case from a directory.

In the EXPORT_SQL_TESTCASE procedure, here is what the attributes stand for:

DIRECTORY: The directory where you want to store the test case files

SQL_TEXT: The actual SQL statement that’s throwing the error

TESTCASE: The name of the test case

EXPORTDATA: By default, Oracle doesn’t export the data. You can set this parameter to TRUE in order to export the data. You can optionally limit the amount of data you want to export, by specifying a value for the Sampling Percent attribute. The default value is 100.

The Test Case Builder automatically exports the PL/SQL package specifications but not the package body. However, you can specify that the TCB export the package body as well. The export process creates several files in the directory you specify. Of these files, the file in the format oratcb1_008602000001main.xml contains the metadata for the test case.

7-15. Generating an AWR Report

Problem

You’d like to generate an AWR report to analyze performance problems in your database.

Solution

The database automatically takes an AWR (needs additional licensing) snapshot every hour and saves the statistics in the AWR for 8 days. An AWR report contains data captured between two snapshots, which need not be consecutive. Thus, an AWR report lets you examine instance performance between two points in time. You can generate an AWR report through Oracle Enterprise Manager. However, we show you how to create an AWR report using Oracle-provided scripts.

To generate an AWR report for a single instance database, execute the awrrpt.sql script as shown here.

SQL> @?/rdbms/admin/awrrpt.sql
 
Current Instance
~~~~~~~~~~~~~~~~
   DB Id    DB Name      Inst Num Instance
----------- -----------  -------- --------
 1118243965 ORCL1               1 orcl1
Specify the Report Type
~~~~~~~~~~~~~~~~~~~~~~~
Would you like an HTML report, or a plain text report?
Enter 'html' for an HTML report or 'text' for plain text
Defaults to 'html'
Enter value for report_type: text
 
Type Specified:

Select a text- or an HTML-based report. The HTML report is the default report type, and it provides a nice-looking, well-formatted, easy-to-read report. Press Enter to select the default HTML-type report.

Instances in this Workload Repository schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
   DB Id     Inst Num DB Name      Instance     Host
------------ -------- -------      --------     --------
* 1118243965        1 ORCL1        orcl1        MIROPC61
 
Using 1118243965 for database Id
Using          1 for instance number

You must specify the DBID for the database at this point. In our example, however, there’s only one database and therefore one DBID, so there’s no need to enter the DBID.

Specify the number of days of snapshots to choose from
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Entering the number of days (n) will result in the most recent
(n) days of snapshots being listed.  Pressing <return> without
specifying a number lists all completed snapshots.
Enter value for num_days: 1

Enter the number of days for which you want the database to list the snapshot IDs. In this example, we chose 1 because we want to generate an AWR for a time period that falls in the last day.

Listing the last day's Completed Snapshots
                                                        Snap
Instance     DB Name        Snap Id    Snap Started    Level
--------     -------       -------- -----------------  -----
orcl1        ORCL1             1877 09 Sep 2013 00:00      1
                               1878 09 Sep 2013 07:47      1

Specify a beginning and an ending snapshot for the AWR report.

Specify the Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap: 1877
Begin Snapshot Id specified: 1877
 
Enter value for end_snap: 1878
End Snapshot Id specified: 1878

You can either accept the default name for the AWR report by pressing Enter, or enter a name for the report.

Specify the Report Name
~~~~~~~~~~~~~~~~~~~~~~~
The default report file name is awrrpt_1_1877_1878.txt.  To use this name,
press <return> to continue, otherwise enter an alternative.
 
Enter value for report_name:
Using the report name awrrpt_1_1877_1878.html

The database generates an AWR report in the same directory from which you invoked the awrrpt.sql script. For example, if you choose an HTML-based report, the AWR report will be in the following format: awrrpt_1_1881_1882.html.

Tip You can generate an AWR report to analyze the performance of a single SQL statement by executing the awrsqrpt.sql script.

How It Works

The AWR reports that you generate show performance statistics captured between two points in time, each of which is called a snapshot. You can gain significant insights into your database performance by reading the AWR reports carefully. An AWR report takes less than a minute to run in most cases, and holds a treasure trove of performance information. The report consists of multiple sections. Walking through an AWR report usually shows you the reason that your database isn’t performing at peak levels.

To generate an AWR report for all instances in an Oracle RAC environment, use the awrgrpt.sql script instead. You can generate an AWR report for a specific instance in a RAC environment by using the awrrpti.sql script. You can also generate an AWR report for a single SQL statement by invoking the awrsqrpt.sql script and providing the SQL_ID of the SQL statement.

You can generate an AWR report to span any length of time, as long as the AWR has snapshots covering that period. By default, the AWR retains its snapshots for a period of eight days.

You can generate an AWR snapshot any time you want, either through the Oracle Enterprise Manager or by using the DBMS_WORKLOAD_REPOSITORY package. That ability is useful when, for example, you want to investigate a performance issue from 20 minutes and the next snapshot is 40 minutes away. In that case, you must either wait 40 minutes or create a manual snapshot. Here’s an example that shows how to create an AWR snapshot manually:

SQL> exec dbms_workload_repository.create_snapshot();
 
PL/SQL procedure successfully completed.
SQL>

Once you create the snapshot, you can run the awrrpt.sql script. Then select the previous two snapshots to generate an up-to-date AWR report.

You can generate an AWR report when your user response time increases suddenly, say from 1 to 10 seconds during peak hours. An AWR report can also be helpful if a key batch job is suddenly taking much longer to complete. Of course, you must check the system CPU, I/O, and memory usage during the period as well, with the help of operating system tools such as sar, vmstat, and iosat.

If the system CPU usage is high, that doesn’t necessarily mean that it’s the CPU that’s the culprit. CPU usage percentages are not always a true measure of throughput, nor is CPU usage always useful as a database workload metric. Make sure to generate the AWR report for the precise period that encompasses the time during which you noticed the performance deterioration. An AWR report that spans a 24-hour period is of little use in diagnosing a performance dip that occurred 2 hours ago for only 30 minutes. Match your report to the time period of the performance problem.

7-16. Comparing Database Performance Between Two Periods

Problem

You want to examine and compare how the database performed during two different periods.

Solution

Use the awrddrpt.sql script (located in the $ORACLE_HOME/rdbms/admin directory) to generate an AWR Compare Periods Report that compares performance between two periods. Here are the steps.

Invoke the awrddrpt.sql script.

SQL> @$ORACLE_HOME/rdbms/admin/awrddrpt.sql

13. Select the report type (default is text).

Enter value for report_type: html
Type Specified: html

14. Specify the number of days of snapshots from which you want to select the beginning and ending snapshots for the first time period.

Specify the number of days of snapshots to choose from
Enter value for num_days: 4

15. Choose a pair of snapshots over which you want to analyze the first period’s performance.

Specify the First Pair of Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap: 2092
First Begin Snapshot Id specified: 2092

Enter value for end_snap: 2093
First End Snapshot Id specified: 2093

16. Select the number of days of snapshots from which you want to select the pair of snapshots for the second period. Enter the value 4 so you can select a pair of snapshots from the previous 4 days.

Specify the number of days of snapshots to choose from
Enter value for num_days: 4

17. Specify the beginning and ending snapshots for the second period.

Specify the Second Pair of Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap2: 2134
Second Begin Snapshot Id specified: 2134

Enter value for end_snap2: 2135
Second End Snapshot Id specified: 2135

18. Specify a report name or accept the default name.

Specify the Report Name
~~~~~~~~~~~~~~~~~~~~~~~
The default report file name is awrdiff_1_2092_1_2134.html To use this name,
press <return> to continue, otherwise enter an alternative.

Enter value for report_name:
Using the report name awrdiff_1_2092_1_2134.html
Report written to awrdiff_1_2092_1_2134.html
SQL>

How It Works

Generating an AWR Compare Periods report is a process very similar to the one for generating a normal AWR report. The big difference is that the report does not show what happened between two snapshots, as the normal AWR report does. The AWR Compare Periods report compares performance between two different time periods, with each time period involving a different pair of snapshots. If you want to compare the performance of your database between 9 A.M. and 10 A.M. today and the same time period three days ago, you can do it with the AWR Compare Periods report. You can run an AWR Compare Periods report on one or all instances of a RAC database.

The AWR Compare Periods report is organized similarly to the normal AWR report, but it shows each performance statistic for the two periods side by side, so you can quickly see the differences (or similarities) in performance. Here’s a section of the report showing how you can easily review the differences in performance statistics between the first and the second periods.

                                    First         Second         Diff
                                    -----         ------       ------
% Blocks changed per Read:          61.48          15.44       -46.04
Recursive Call %:                   98.03          97.44        -0.59
Rollback per transaction %:          0.00           0.00         0.00
Rows per Sort:                       2.51           2.07        -0.44
Avg DB time per Call (sec):          1.01           0.03        -0.98

7-17. Analyzing an AWR Report

Problem

You’ve generated an AWR report that covers a period when the database was exhibiting performance problems. You want to analyze the report.

Solution

An AWR report summarizes its performance-related statistics under various sections. The following is a quick summary of the most important sections in an AWR report.

Session Information

You can find out the number of sessions from the section at the very top of the AWR report, as shown here:

                 Snap Id    Snap Time             Sessions   Curs/Sess
                 ---------  ------------------    --------   ---------
Begin Snap:      1878       12-Sep-13 07:47:33          38         1.7
  End Snap:      1879       12-Sep-13 09:00:48          34         3.7
   Elapsed:                 73.25 (mins)
   DB Time:                 33.87 (mins)

Be sure to check the Begin Snap and End Snap times, to confirm that the period encompasses the time when the performance problem occurred. If you notice a very high number of sessions, you can investigate if shadow processes are being created—for example, if the number of sessions goes up by 200 between the Begin Snap and End Snap times when you expect the number of sessions to be the same at both times, the most likely cause is an application start-up issue, which is spawning all those sessions.

Load Profile

The load profile section shows the per-second and per-transaction statistics for various indicators of database load such as hard parses and the number of transactions.

Load Profile              Per Second    Per Transaction
~~~~~~~~~~~~~~~~~         ----------    ---------------
      DB Time(s):                0.5                1.4
       DB CPU(s):                0.1                0.3
       Redo size:            6,165.3           19,028.7
   Logical reads:              876.6            2,705.6
   Block changes:               99.2              306.0
  Physical reads:               10.3               31.8
 Physical writes:                1.9                5.9
      User calls:                3.4               10.4
          Parses:               10.2               31.5
     Hard parses:                1.0                3.0
          Logons:                0.1                0.2
    Transactions:                0.3

The Load Profile section is one of the most important parts of an AWR report. Of particular significance are the physical I/O rates and hard parses. In an efficiently performing database, you should see mostly soft parses and very few hard parses. A high hard parse rate usually is a result of not using bind variables. If you see a high per second value for logons, it usually means that your applications aren’t using persistent connections. A high number of logons or an unusually high number of transactions tells you something unusual is happening in your database. However, the only way you’ll know the numbers are unusual is if you regularly check the AWR reports and know what the various statistics look like in a normally functioning database, at various times of the day!

Instance Efficiency Percentages

The instance efficiency section shows several hit ratios as well as the “execute to parse” and “latch hit” percentages.

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:  100.00       Redo NoWait %:  100.00
            Buffer  Hit   %:   99.10    In-memory Sort %:  100.00
            Library Hit   %:   95.13        Soft Parse %:   90.35
         Execute to Parse %:   70.71         Latch Hit %:   99.97
Parse CPU to Parse Elapsd %:   36.71     % Non-Parse CPU:   83.60
 
 Shared Pool Statistics        Begin    End
                               -----   -----
             Memory Usage %:   81.08   88.82
    % SQL with executions>1:   70.41   86.92
  % Memory for SQL w/exec>1:   69.60   91.98

The execute to parse ratio should be very high in a well-running instance. A low value for the % SQL with exec>1 statistic means that the database is not re-using shared SQL statements, usually because the SQL is not using bind variables.

Top 5 Foreground Events

The Top 5 Timed Foreground Events section shows the events that were responsible for the most waits during the time spanned by the AWR report.

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                        Avg Wait % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
----------------------------         ------     -------   ----   ---- ----------
db file sequential read              13,735         475     35   23.4 User I/O
DB CPU                                              429          21.1
latch: shared pool                      801          96    120    4.7 Concurren
db file scattered read                  998          49     49    2.4 User I/O
control file sequential read          9,785          31      3    1.5 System I/O

The Top 5 Timed Foreground Events section is where you can usually spot the problem, by showing you why the sessions are “waiting.” The Top 5 Events information shows the total waits for all sessions, but usually one or two sessions are responsible for most of the waits. Make sure to analyze the total waits and average waits (ms) separately, in order to determine if the waits are significant. Merely looking at the total number of waits or the total wait time for a wait event could give you a misleading idea about its importance. You must pay close attention to the average wait times for an event as well. In a nicely performing database, you should see CPU and I/O as the top wait events, as is the case here. If any wait events from the concurrent wait class such as latches show up at the top, investigate those waits further. For example, if you see events such as enq: TX - row lock contention, gc_buffer_busy (RAC), or latch free, it usually indicates contention in the database. If you see an average wait of more than 2 ms for the log file sync event, investigate the wait event further (Chapter 5 shows how to analyze various wait events). If you see a high amount of waits due to the db file sequential read or the db file scattered read wait events, there are heavy indexed reads (this is normal) or full table scans going on. You can find out the SQL statement and the tables involved in these read events in the AWR report.

Time Model Statistics

Time model statistics give you an idea about how the database has spent its time, including the time it spent on executing SQL statements as against parsing statements. If parsing time is very high, or if hard parsing is significant, you must investigate further.

Time Model Statistics                  DB/Inst: ORCL1/orcl1  Snaps: 1878-1879
 
Statistic Name                                       Time (s) % of DB Time
------------------------                             -------- ------------
sql execute elapsed time                              1,791.5         88.2
parse time elapsed                                      700.1         34.5
hard parse elapsed time                                 653.7         32.2

Top SQL Statements

This section of the AWR report lets you quickly identify the most expensive SQL statements.

SQL ordered by Elapsed Time            DB/Inst: ORCL1/orcl1  Snaps: 1878-1879
-> Captured SQL account for      13.7% of Total DB Time (s):           2,032
-> Captured PL/SQL account for   19.8% of Total DB Time (s):        2,032
 
Elapsed             Elapsed Time
Time (s) Executions  per Exec (s) %Total   %CPU    %IO    SQL Id
-------- ----------  ------------ ------   ----   ---- -------------
292.4      1             292.41     14.4    8.1   61.2 b6usrg82hwsas
...

You can generate an explain plan for the expensive SQL statements using the SQL ID from this part of the report.

PGA Histogram

The PGA Aggregate Target Histogram shows how well the database is executing the sort and hash operations—for example:

 PGA Aggr Target Histogram               DB/Inst: ORCL1/orcl1  Snaps: 1878-1879
-> Optimal Executions are purely in-memory operations
 
  Low     High
Optimal Optimal    Total Execs  Optimal Execs 1-Pass Execs M-Pass Execs
------- -------    -----------  ------------- ------------ ------------
     2K      4K         13,957         13,957            0            0
    64K    128K             86             86            0            0
   128K    256K             30             30            0            0

In this example, the database is performing all sorts and hashes optimally in the PGA. If you see a high number of one-pass executions and even a few large multi-pass executions, that’s an indication that the PGA is too small and you should consider increasing it.

How It Works

Analyzing an AWR report should be your first step when troubleshooting database performance issues such as a slow-running query. An AWR report lets you quickly find out things such as the number of connections, transactions per second, cache-hit rates, wait event information, and the SQL statements that are using the most CPU and I/O. It shows you which of your SQL statements are using the most resources, and which wait events are slowing down the database. Most importantly, probably, the report tells you if the database performance is unusually different from its typical performance during a given time of the day (or night). The AWR report sections summarized in the “Solution” section are only a small part of the AWR report. Here are some other key sections of the report that you must review when troubleshooting performance issues:

Foreground Wait Events
SQL Ordered by Gets
SQL Ordered by Reads
SQL Ordered by Physical Reads
Instance Activity Stats
Log Switches
Enqueue Activity
Reads by Tablespace, Datafile, and SQL Statement
Segments by Table Scans
Segments by Row Lock Waits
Undo Segment Summary

Depending on the nature of the performance issue you’re investigating, several of these sections in the report may turn out to be useful. In addition to the performance and wait statistics, the AWR report also offers advisories for both the PGA and the SGA. The AWR report is truly your best friend when you are troubleshooting just about any database performance issue. In a matter of minutes, you can usually find the underlying cause of the issue and figure out a potential fix. AWR does most of the work for you—all you need to do is know what to look for!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 7: Troubleshooting the Database

Create new playlist

Sign In

Sign Up

Table of Contents for
CHAPTER 7: Troubleshooting the Database