CHAPTER 5

image

Minimizing System Contention

It’s not uncommon for Oracle DBAs to field calls about a user being locked or “blocked” in the database. Oracle’s locking behavior is extremely sophisticated and supports simultaneous use of the database by multiple users. However, on occasion, it’s possible for a user to block another user’s work, mostly because of flaws in application design. This chapter explains how Oracle handles locks and how to identify a session that’s blocking others.

An Oracle database can experience two main types of contention for resources. The first is contention for transaction locks on a table’s rows. The second type of contention is that caused by simultaneous requests for areas of the shared memory (SGA), resulting in latch contention. In addition to showing you how to troubleshoot typical locking issues, we will show how to handle various types of latch contention in your database.

Oracle Wait Interface is Oracle’s internal mechanism for classifying and measuring the different types of waits for resources in an Oracle instance. Understanding Oracle wait events is the key to instance tuning because high waits slow down response time. We will explain the Oracle Wait Interface in this chapter and show you how to reduce the most common Oracle wait events that beguile Oracle DBAs. We will show you how to use various SQL scripts to unravel the mysteries of the Oracle Wait Interface, and we will also show how to use Oracle Enterprise Manager to quickly track down the SQL statements and sessions that are responsible for contention in the database.

5-1. Understanding Response Time

Problem

You want to understand what database response time is and its relationship with wait time.

Solution

The most crucial performance indicator in a database is response time. Response time is the time it takes to get a response from the database for a query that a client sends to the database. Response time is simply the sum of two components:

response time = processing time + wait time

The foregoing relationship is also frequently represented as R=S + W, where R is the response time, S is the service time, and W stands for the wait time. The processing time component is the actual time spent by the database processing the request. Wait time, on the other hand, is time actually wasted by the database—it’s the time the database spends waiting for resources such as a lock on a table’s rows, library cache latch, or any of the numerous resources that a query needs to complete its processing. Oracle has hundreds of official wait events, a dozen or so of which are crucial to troubleshooting slow-running queries.

Do You Have a Wait Problem?

It’s easy to find out the percentage of time the instance has spent waiting for resources instead of actually executing. Issue the following query to find out the relative percentages of wait times and actual CPU processing in the database:

SQL> select metric_name, value
   2 from v$sysmetric
   3 where metric_name in ('Database CPU Time Ratio',
   4 'Database Wait Time Ratio') and
   5 intsize_csec =
   6 (select max(INTSIZE_CSEC) from V$SYSMETRIC);
 
METRIC_NAME                             VALUE
————————————------------              -----------
Database Wait Time Ratio                11.371689
Database CPU Time Ratio                 87.831890
SQL>

In general, if the results of this query show a very high value for the Database Wait Time Ratio or if the Database Wait Time Ratio is much greater than the Database CPU Time Ratio, the database is spending more time waiting than processing and you must dig deeper into the Oracle wait events to identify the specific wait events causing this.

Find Detailed Information

You can use the following Oracle views to find out detailed information of what a wait event indicates that a database instance is actually waiting for and how long it has waited for each resource.

  • V$SESSION: This view shows the specific resource currently being waited for, as well as the event last waited for in each session.
  • V$SESSION_WAIT: This view lists either the event currently being waited for or the event last waited on for each session. It also shows the wait state and the wait time.
  • V$SESSION_WAIT_HISTORY: This view shows the last ten wait events for each current session.
  • V$SESSION_EVENT: This view shows the cumulative history of events waited on for each session. The data in this view is available only so long as a session is active.
  • V$SYSTEM_EVENT: This view shows each wait event and the time the entire instance has waited on that event since you started the instance.
  • V$SYSTEM_WAIT_CLASS: This view shows wait event statistics by wait classes.

How It Works

Your goal in tuning performance is to minimize the total response time. If the Database Wait Time Ratio (in the query shown in the “Solution” section) is high, your response time will also be high because of waits or bottlenecks in your system. On the other hand, high values for the Database CPU Time Ratio indicate a well-running database, with few waits or bottlenecks. The Database CPU Time Ratio is calculated by dividing the total CPU used by the database by the Oracle time model statistic DB Time. You must be cautious, however, and make sure that a high Database CPU Time Ratio isn’t because of a session simply burning CPU cycles through wasteful work such as spinning on a latch, because of excessive parsing (even soft parsing), or because of performing needless, consistent gets because of inefficient execution plans.

Oracle uses time model statisticsto measure the time spent in the database by the type of operation. Database time, or DB Time, is the most important time model statistic; it represents the total time spent in database calls and serves as a measure of total instance workload. Database time is total time spent by user processes either actively working or actively waiting in a database call. The DB Time we’re referring to here (and the one referred to by AWR and ADDM reports) is the sum of DB Time over all sessions. DB Time is computed by adding the CPU time and wait time of all sessions (excluding the waits for idle events). An AWR report shows the total DB Time for the instance (in the section “Time Model System Stats”) during the period covered by the AWR snapshots. If the time model statistic DB CPU consumes most of the database time for the instance, it shows the database was actively processing most of the time. Database time tuning, or understanding how the database is spending its time, is fundamental to understanding performance.

The total time spent by foreground sessions making database calls consists of I/O time, CPU time, and time spent waiting because of nonidle events. Your database time will increase as the system load increases; that is, as more users log on and larger queries are executed, the greater the system load. However, even in the absence of an increase in system load, database time can increase because of deterioration either in I/O or in application performance. As application performance degrades, wait time will increase, and consequently database time will increase.

Database time is captured by internal instrumentation, ASH, AWR, and ADDM, and you can find detailed performance information by querying various views or through Enterprise Manager.

image Note  If the host system is CPU-bound, you’ll see an increase in database time. You must first tune CPU usage before focusing on wait events in that particular case.

The V$SESSION_WAIT view shows more detailed information than the V$SESSION_EVENT and V$SYSTEM_EVENT views. While both the V$SESSION_EVENT and V$SESSION_WAIT views show that there are waits such as the event db file scattered read, for example, only the V$SESSION_WAIT view shows information such as the amount of time waited for the current wait (SECONDS_IN_WAIT) and the wait state. The STATE column in V$SESSION_WAIT captures the wait state, which could be one of the following:

  • WAITING: Session is currently waiting
  • WAITED UNKNOWN TIME: Duration of the last wait is unknown (value when you set the parameter TIMED_STATISTICS to false)
  • WAITED SHORT TIME: Last wait was less than 100th of a second
  • WAITED KNOWN TIME: Duration of the last wait is specified in the WAIT_TIME column

image Note  The Automatic Workload Repository (AWR) queries the V$SYSTEM_EVENT view for its wait event–related analysis.

You can first query the V$SYSTEM_EVENT view to rank the top wait events by total and average time waited for that event. You can then drill down to the wait event level by focusing on the events at the top of the event list. In addition to providing information about blocking and blocked users and the current wait events, the V$SESSION view also shows the objects that are causing the problem by providing the file number and block number for the object.

5-2. Identifying SQL Statements with the Most Waits

Problem

You want to identify the SQL statements responsible for the most waits in your database.

Solution

Execute the following query to identify the SQL statements that are experiencing the most waits in your database:

SQL> select ash.user_id,
  2  u.username,
  3  s.sql_text,
  4  sum(ash.wait_time +
  5  ash.time_waited) ttl_wait_time
  6  from v$active_session_history ash,
  7  v$sqlarea s,
  8  dba_users u
  9  where ash.sample_time between sysdate - 60/2880 and sysdate
 10  and ash.sql_id = s.sql_id
 11  and ash.user_id = u.user_id
 12  group by ash.user_id,s.sql_text, u.username
 13* order by ttl_wait_time
SQL>

The preceding query ranks queries that ran during the past 30 minutes, according to the total time waited by each query.

How It Works

When you’re experiencing a performance problem, it’s a good idea to see which SQL statements are waiting the most. These are the statements that are using most of the database’s resources. To find the queries that are waiting the most, you must sum the values in the wait_time and time_waited columns of V$ACTIVE_SESSION_HISTORY for a specific SQL statement. To do this, you must join the V$SQLAREA view with the V$ACTIVE_SESSION_HISTORY view, using SQL_ID as the join column.

image Note  To query the V$ACTIVE_SESSION_HISTORY view, you must first license the Oracle Database Diagnostic Pack and the Tuning Pack. If you can’t license the packs, all is not lost; you can still examine wait times through views such as V$SQLAREA.

Besides the SQL_ID of the SQL statements, the V$ACTIVE_SESSION_HISTORY view contains information about the execution plans used by the SQL statements. You can use this information to identify why a SQL statement is experiencing a high number of waits. You can also run an active session history (ASH) report, using a SQL script or through Oracle Enterprise Manager, to get details about the top SQL statements in the sampled session activity. The Top SQL section of an ASH report helps you identify the high-load SQL statements that are responsible for performance problems. Examining the Top SQL report may show you, for example, that one bad query is responsible for most of the database activity.

5-3. Analyzing Wait Events

Problem

You want to analyze Oracle wait events while troubleshooting database performance.

Solution

Several recipes in this chapter show you how to analyze the most important Oracle wait events. It’s not uncommon to find that a significant portion of the total wait time in most cases is because of I/O–related waits, such as those caused by either full table scans or indexed reads. While indexed reads may seem to be completely normal on the face of it, too many indexed reads can also slow down performance. Therefore, you must investigate why the database is performing a large number of indexed reads. For example, if you see the db file sequential read event (indicates indexed reads) at the top of the wait event list, you must look a bit further to see how the database is accumulating these read events. If you find that the database is performing hundreds of thousands of query executions, with each query doing only a few indexed reads, that’s fine. However, if you find that just a couple of queries in an OLTP environment are contributing to a high number of logical reads, then, most likely, those queries are reading more data than necessary. You must tune those queries to reduce the db file sequential read events.

How It Works

Wait events are statistics that a server process or thread increments when it waits for an operation to complete in order to continue its processing. For example, a SQL statement may be modifying data, but the server process may have to wait for a data block to be read from disk because it’s not available in the SGA. Although there’s a large number of wait events, some the most common wait events are the following:

  • Buffer busy waits: These occur when multiple processes attempt to concurrently access the same buffers in the buffer cache.
  • Free buffer waits: These waits occur when a server process posts the database writer process to write out dirty buffers in order to make free buffers available.
  • Db file scattered read: These are waits incurred by an user process for a physical I/O to return when reading buffers into the SGA buffer cache. The scattered reads are multiblock reads and can occur because of a full table scan or a fast full scan of an index.
  • Db file sequential read: These are waits incurred by an user process for a physical I/O to return when reading buffers into the SGA buffer cache. The reads are single block reads and are usually because of indexed reads.
  • Enqueue waits: These are waits on Oracle locking mechanisms that control access to internal database resources and that occur when a session waits on a lock held by another session. You actually won’t see a wait event named enqueue (or enq) because the enqueue wait event types are always held for a specific purpose, such as enq: TX – contention, enq:TX – index contention, and enq:TX – row lock contention.
  • Log buffer space: These are waits caused by server processes waiting for free space in the log buffer.
  • Log file sync: These are waits by server processes that are performing a commit (or rollback) for the LGWR to complete its write to the redo log file.

Analyzing Oracle wait events is the starting point when troubleshooting a slow-running query. When a query is running slow, it usually means that there are excessive waits of one type or another. Some of the waits may be because of excessive I/O due to missing indexes. Other waits may be caused by a latch or a locking event. Several recipes in this chapter show you how to identify and fix various types of Oracle wait-related performance problems. In general, wait events that account for the most wait time warrant further investigation.

image Tip  It’s important to understand that wait events show only the symptoms of underlying problems. Thus, you should view a wait event as a window into a particular problem, and not the problem itself.

When Oracle encounters a problem such as buffer contention or latch contention, it simply increments a specific type of wait event relating to that latch or buffer. By doing this, the database is showing where it had to wait for a specific resource and was thus unable to continue processing. The buffer or latch contention can often be traced to faulty application logic, but some wait events could also emanate from system issues such as a misconfigured RAID system. Missing indexes, inappropriate initialization parameters, inadequate values for initialization parameters that relate to memory, and inadequate sizing of redo log files are just some of the things that can lead to excessive waits in a database. The great benefit of analyzing Oracle wait events is that it takes the guesswork out of performance tuning—you can see exactly what is causing a performance slowdown so you can immediately focus on fixing the problem. The bottom line is that you won’t be spending inordinate amounts of time on marginal improvements because you can clearly see the contribution of each type of wait.

In most cases, you probably will notice that events such as “rdbms ipc message,” “pmon timer,” and “SQL*Net message from client” might be the most common wait events in your database. However, all of these are considered “idle waits,” and you can safely ignore them. For example, if the client program is busy computing something after having retrieved data, the DBMS is “idle” until the program sends another statement. Similarly, when performing backups with RMAN, you may notice a high number of “RMAN backup and recovery I/O” wait events. As you can tell, these are wait events associated with the RMAN backups and don’t usually signify anything important.

5-4. Understanding Wait Class Events

Problem

You want to understand how Oracle classifies wait events into various classes.

Solution

Every Oracle wait event belongs to a specific wait event class. Oracle groups wait events into classes, such as Administrative, Application, Cluster, Commit, Concurrency, Configuration, Scheduler, System I/O, and User I/O, to facilitate the analysis of wait events. Here are the characteristics of typical waits in some of these wait classes:

  • Application: Waits resulting from application code issues such as lock waits because of row-level locking
  • Commit: Waits for confirmation of a redo log write after committing a transaction
  • Network: Waits caused by delays in sending data over the network
  • User I/O: Waits for reading blocks from disk

Two key wait classes are the Application and User I/O wait classes. The Application wait class contains waits because of row and table locks caused by an application. The User I/O class includes the db file scattered read, db file sequential read, direct path read, and direct path write events. The System I/O class includes redo log–related wait events among other waits. The Commit class contains just the log file sync wait information. There’s also an “idle” class of wait events such as SQL*Net message from client, for example, that merely indicate an inactive session. You can ignore the idle waits.

How It Works

Classes of wait events help you quickly find out what type of activity is affecting database performance. For example, the Administrative wait class may show a high number of waits because you’re rebuilding an index. Concurrency waits point to waits for internal database resources such as latches. If the Cluster wait class shows the most wait events, then your RAC instances may be experiencing contention for global cache resources (gc cr block busy event). Note that the System I/O wait class includes waits for background process I/O such as the database writer (DBWR) wait event db file parallel write.

The Application wait class contains waits that result from user application code—most of your enqueue waits fall in this wait class. The only wait event in the Commit class is the log file sync event, which we examine in detail later in this chapter. The Configuration class waits include waits such as those caused by log files that are sized too small.

5-5. Examining Session Waits

Problem

You want to find out the wait events in a session.

Solution

You can use the V$SESSION_WAIT view to get a quick idea about what a particular session is waiting for, as shown here:

SQL> select event, count(*) from v$session_wait
     group by event;
 
EVENT                                                        COUNT(*)
---------------------------------------------                --------
SQL*Net message from client                                        11
Streams AQ: waiting for messages in the queue                       1
enq: TX - row lock contention                                       1
...
15 rows selected.
 
SQL>

The output of the query indicates that one session is waiting for an enqueue lock, possibly because of a blocking lock held by another session. If you see a large number of sessions experiencing row lock contention, you must investigate further and identify the blocking session.

Here’s one more way you can query the V$SESSION_WAIT view to find out what’s slowing down a particular session:

SQL> select event, state, seconds_in_wait siw
     from   v$session_wait
     where  sid = 81;
 
 
EVENT                                         STATE            SIW
-----------------------------                 -----------      ------
enq: TX - row lock contention                    WAITING          976

The preceding query shows that the session with SID 81 has been waiting for an enqueue event because the row (or rows) it wants to update is locked by another transaction.

image Note  Since Oracle Database 11g, the database counts each resource wait as just one wait, even if the session experiences many internal time-outs caused by the wait. For example, a wait for an enqueue for 15 seconds may include 5 different 3-second wait calls—the database considers these as just a single enqueue wait.

How It Works

The first query shown in the “Solution” section offers an easy way to find out which wait events, if any, are slowing down user sessions. When you issue the query without specifying a SID, it displays the current and last waits for all sessions in the database. If you encounter a locking situation in the database, for example, you can issue the query periodically to see whether the total number of enqueue waits is coming down. If the number of enqueue waits across the instance is growing, that means more sessions are encountering slowdowns because of blocked locks.

The V$SESSION_WAIT view shows the current or last wait for each session. The STATE column in this view tells you whether a session is currently waiting. Here are the possible values for the STATE column:

  • WAITING: The session is currently waiting for a resource.
  • WAITED UNKNOWN TIME: The duration of the last wait is unknown. (This value is shown only if you set the TIMED_STATISTICS parameter to false, so in effect this depends on the value set for the STATISTICS_LEVEL parameter. If you set STATISTICS_LEVEL to TYPICAL or ALL, the TIMED_STATISTICS parameter will be TRUE by default. If the STATISTICS_LEVEL parameter is set to BASIC, TIMED_STATISTICS will be FALSE by default.)
  • WAITED SHORT TIME: The most recent wait was less than a 100th of a second long.
  • WAITED KNOWN TIME: The WAIT_TIME column shows the duration of the last wait.

Note that the query utilizes the seconds_in_wait column to find out how long this session has been waiting. Oracle has deprecated this column in favor of the wait_time_micro and TIME_SINCE_LAST_WAIT_MICRO columns. The WAIT_TIME_MICRO column shows the amount of time waited in microseconds. If the session is currently waiting, the value is the time spent in the current wait. If the session is currently not in a wait, then the value is the amount of time waited during the last wait. The TIME_SINCE_LAST_WAIT_MIRCO column shows time elapsed since the end of the last wait (in microseconds). All three columns show the amount of time waited for the current wait if the session is currently waiting. If the session is not currently waiting, the wait_time_micro column shows the amount of time waited during the last wait. We chose to use the deprecated column SECONDS_IN_WAIT simply because we wanted to show the wait in seconds.

image Note  We want to mention here that since Oracle Database 10g Release 1, several of the wait columns from the V$SESSION_WAIT view are made available in the V$SESSION view as well.

5-6. Examining Wait Events by Class

Problem

Your database is exhibiting poor performance and you want to quickly find out which Oracle wait event class could be responsible for the performance deterioration.

Solution

The following query shows the different types of wait classes and the wait events associated with each wait class:

SQL> select  wait_class, name
  2  from v$event_name
  3  where name LIKE 'enq%'
  4  and wait_class <> 'Other'
  5* order by wait_class
SQL> /
 
WAIT_CLASS                                 NAME
--------------                     --------------------------
Administrative                     enq: TW - contention
Concurrency                        enq: TX - index contention
...
SQL>

To view the current waits grouped into various wait classes, issue the following query:

SQL>
SQL> select wait_class, sum(time_waited), sum(time_waited)/sum(total_waits)
  2  sum_waits
  3  from v$system_wait_class
  4  group by wait_class
  5* order by 3 desc;
 
WAIT_CLASS            SUM(TIME_WAITED)               SUM_WAITS
-----------           ---------------               ----------
Idle                        249659211               347.489249
Commit                      1318006               236.795904
Concurrency                   16126                 4.818046
User I/O                     135279                 2.228869
Application                     912                 .0928055
Network                         139                 .0011209
...
SQL>

Do not worry if you see a very high sum of waits for the Idle wait class. You should actually expect to see a high number of Idle waits in any healthy database. In a typical production environment, however, you’ll certainly see more waits under the User I/O and Application wait classes. If you notice that the database has accumulated a very large wait time for the Application wait class or the User I/O wait class, for example, it’s time to investigate those two wait classes further. In the following example, we drill down into a couple of wait classes to find out which specific waits are causing the high sum of total wait time under the Application and Concurrency classes. To do this, we use the V$SYSTEM_EVENT and $EVENT_NAME views in addition to the V$SYSTEM_WAIT_CLASS view. Focus not just on the total time waited but also on the average wait to gauge the effect of the wait event.

SQL>  select sea.event, sea.total_waits, sea.time_waited, sea.average_wait
  2   from v$system_event sea, v$event_name enb, v$system_wait_class swc
  3   where sea.event_id=enb.event_id
  4   and enb.wait_class#=swc.wait_class#
  5   and swc.wait_class in ('Application','Concurrency')
  6*  order by average_wait desc
SQL> /
 
EVENT                       TOTAL_WAITS     TIME_WAITED    AVERAGE_WAIT
----------- ------------    -----------     ----------     ----------
enq: TX - index contention          2               36          17.8
library cache load lock            76              800         10.53
buffer busy waits                   9               89          9.87
row cache lock                     26              100          3.84
cursor: pin S wait on X           484              1211          2.5
SQL*Net break/reset to client       2                2          1.16
library cache: mutex X             12               13          1.10
latch: row cache objects          183              158           .86
latch: cache buffers chains         5                3           .69
enq: RO - fast object reuse       147               70           .47
library cache lock                  4                1           .27
cursor: pin S                      20                5           .27
latch: shared pool                297               74           .25
 
13 rows selected.
 
SQL>

image Tip  Two of the most common Oracle wait events are the db file scattered read and db file sequential read events. The db file scattered read wait event is because of full table scans of large tables. If you experience this wait event, investigate the possibility of adding indexes to the table or tables. Sometimes, an index may already exist, but the instance is unable to use it because of the existence of a function. Make sure that this isn’t the case. The db file sequential read wait event is because of indexed reads. While an indexed read may seem like it’s a good thing, a high amount of indexed reads could potentially indicate an inefficient query that you must tune. If high values for the db file sequential read wait event are because of a large number of small indexed reads, it’s not really a problem in most cases—this is natural in a database. You should be concerned if a handful of queries are responsible for most of the waits.

You can see that the enqueue waits caused by the row lock contention are what’s causing the most waits under these two classes. Now you know exactly what’s slowing down the queries in your database! To get at the session whose performance is being affected by the contention for the row lock, drill down to the session level using the following query:

SQL> select se.sid, se.event, se.total_waits, se.time_waited, se.average_wait
     from v$session_event se, v$session ss
     where time_waited > 0
     and se.sid=ss.sid
     and ss.username is not NULL
     and se.event='enq: TX - row lock contention';
SID   EVENT                       TOTAL_WAITS  time_waited    average_wait
----  --------------------------- -----------  ------------   -----------
68    enq: TX - row lock content          24           8018           298
SQL>

The output shows that the session with SID 68 had waited (or still might be waiting) for a row lock that’s held by another transaction.

How It Works

Understanding the various Oracle wait event classes enhances your ability to quickly diagnose Oracle wait-related problems. Analyzing wait events by classes lets you know whether contention, user I/O, or a configuration issue is responsible for high waits. The examples in the “Solution” section show you how to start analyzing the waits based on the wait event classes. This helps identify the source of the waits, such as concurrency issues, for example. Once you identify the wait event class responsible for most of the waits, you can drill down into that wait event class to find out the specific wait events that are contributing to high total waits for that wait event class. You can then identify the user sessions waiting for those wait events using the final query shown in the “Solution” section.

5-7. Resolving Buffer Busy Waits

Problem

Your database is experiencing a high number of buffer busy waits, based on the output from the AWR report. You want to resolve those waits.

Solution

Oracle has several types of buffer classes, such as data block, segment header, undo header, and undo block. How you fix a buffer busy wait situation will depend on the types of buffer classes that are causing the problem. You can find out the type of buffer causing the buffer waits by issuing the following two queries. Note that you first get the value of row_wait_obj# from the first query and use it as the value for object_id in the second query.

SQL> select row_wait_obj#
     from v$session
     where event = 'buffer busy waits';
 
SQL> select owner, object_name, subobject_name, object_type
     from dba_objects
     where object_id = &row_wait_obj;

The preceding queries will reveal the specific type of buffer causing the high buffer waits. How you resolve the problem depends on which buffer class causes the buffer waits, as summarized in the following subsections.

Segment Header

If your queries show that the buffer waits are being caused by contention on the segment header, there’s free list contention in the database because of several processes attempting to insert into the same data block; each of these processes needs to obtain a free list before it can insert data into that block. If you aren’t already using it, an automatic segment space management (ASSM) is recommended—under ASSM, the database doesn’t use free lists. In cases where you can’t implement ASSM, you must increase the free lists for the segment in question. You can also try increasing the free list groups. Partitioning the segment can also help in some cases.

Data Block

Data block buffer contention could be related to a table or an index. This type of contention is often caused by right-hand indexes, that is, indexes that result in several processes inserting into the same point, such as when you use sequence number generators to produce the key values. Again, if you’re using manual segment management, move to ASSM or increase free lists for the segment. A good solution would be to implement a reverse key index if possible.

Undo Header and Undo Block

If you’re using automatic undo management, few or none of the buffer waits will be because of contention for an undo segment header or an undo segment block. If you do see one of these buffer classes as the culprit, however, you may increase the size of your undo tablespace to resolve the buffer busy waits.

How It Works

A buffer busy wait occurs when a session tries to access a block in the buffer cache but can’t do so because the buffer is busy. Another session is modifying the required data block, and the contents of the block are going through a change. To provide the requester with a consistent image of the data block (with all of the changes or none of the changes), the session that’s changing the data block will mark the block header with a flag to indicate the block is being modified and that the other session needs to wait until all the changes are applied to the block. During the time the block is being modified, the block is marked unreadable by other sessions.

Buffer busy waits occur in two important types of cases:

  • Another session is reading the block into the buffer
  • Another session is holding the buffer in a mode that’s incompatible to our request

High concurrent inserts into a hot block where multiple users are inserting into the same block at the same time leads to high buffer busy waits. You also see these waits when several users are running full table scans simultaneously on the same table. As the first user reads the data blocks off the disk storage, the rest of the sessions will wait on the Buffer Busy Wait for the physical I/O to finish.

If your investigation of buffer busy waits reveals that the same block or set of blocks is involved most of the time, a good strategy would be to delete some of these rows and insert them back into the table, thus forcing them onto different data blocks.

Check your current memory allocation to the buffer cache, and, if necessary, increase it. A larger buffer cache can reduce the waiting by sessions to read data from disk, since more of the data will already be in the buffer cache. You can also place the offending table in memory by using the keep pool in the buffer cache (please see Recipe 3-7). By making the hot block always available in memory, you’ll avoid the high buffer busy waits.

Indexes that have a very low number of unique values are called low cardinality indexes. Low cardinality indexes generally result in too many block reads. Thus, if several DML operations are occurring concurrently, some of the index blocks could become “hot” and lead to high buffer busy waits. As a long-term solution, you can try to reduce the number of the low cardinality indexes in your database.

Each Oracle data segment such as a table or an index contains a header block that records information such as free blocks available. When multiple sessions are trying to insert or delete rows from the same segment, you could end up with contention for the data segment’s header block.

Buffer busy waits are also caused by a contention for free lists. A session that’s inserting data into a segment needs to first examine the free list information for the segment to find blocks with free space into which the session can insert data.

You can identify the query that’s involved by executing the following two queries:

SQL> select sql_id from v$session
     where sid in (SELECT sid FROM v$session_wait WHERE event = 'buffer
                   busy waits'),
SQL> SELECT sql_text FROM v$sqlarea WHERE sql_id = <sql_id>;

If you use ASSM in your database, you shouldn’t see any waits due to contention for a free list.

5-8. Resolving Log File Sync Waits

Problem

You’re seeing a high amount of log file sync wait events, which are at the top of all wait events in your database. You want to reduce these wait events.

Solution

The following are two strategies for dealing with high log file sync waits in your database:

  • If you notice a large number of waits with a short average wait time per wait, that’s an indication that too many commit statements are being issued by the database. You must change the commit behavior by batching the commits. Instead of committing after each row, for example, you can specify that the commits occur after a large number of inserts or updates.
  • If you notice that the large amount of wait time accumulated because of the redo log file sync event was caused by long waits for writing to the redo log file (high average time waited for this event), it’s more a matter of how fast your I/O subsystem is. You can alternate the redo log files on various disks to reduce contention. You can also see whether you can dedicate disks entirely for the redo logs instead of allowing other files on those disks—this will reduce I/O contention when the LGWR is writing the buffers to disk. Finally, as a long-term solution, you can look into placing redo logs on faster devices, say, by moving them from a RAID 5 device to a RAID 1 device or even by moving to solid-state (SSD) disks.

How It Works

image Note  Oracle (actually the LGWR background process) flushes a session’s redo information to the redo log file whenever a session issues a COMMIT statement. The database writes commit records to disk before it returns control to the client. The server process thus waits for the completion of the write to the redo log. This is the default behavior.

The session will tell the LGWR process to write the session’s redo information from the redo log buffer to the redo log file on disk. The LGWR process posts the user session after it finishes writing the buffer’s contents to disk. The log file sync wait event includes the wait during the writing of the log buffer to disk by LGWR and the posting of that information to the session. The server process will have to wait until it gets confirmation that the LGWR process has completed writing the log buffer contents out to the redo log file.

The log file sync events are caused by contention during the writing of the log buffer contents to the redo log files. Check the V$SESSION_WAIT view to ascertain whether Oracle is incrementing the SEQ# column. If Oracle is incrementing this column, it means that the LGWR process is the culprit, and it may be stuck.

As the log file sync wait event is caused by contention caused by the LGWR process, see if you can use the NOLOGGING option to get rid of these waits. Of course, in a production system, you can’t use the NOLOGGING option when the database is processing user requests, so this option is of limited use in most cases.

The log file sync wait event can also be caused by too large a setting for the LOG_BUFFER initialization parameter. Too large a value for the LOG_BUFFER parameter will lead the LGWR process to write data less frequently to the redo log files. For example, if you set the LOG BUFFER to something like 64 MB, it sets an internal parameter, _log_io_size, to a high value. The _log_io_size parameter acts as a threshold for when the LGWR writes to the redo log files. In the absence of a commit request or a checkpoint, LGWR waits until the _log_io_size threshold is met. Thus, when the database issues a COMMIT statement, the LGWR process would be forced to write a large amount of data to the redo log files at once, resulting in sessions waiting on the log file sync wait event. This happens because each of the waiting sessions is waiting for LGWR to flush the contents of the redo log buffer to the redo log files before these sessions can write to the log buffer. Although the database automatically calculates the value of the _log_io_size parameter, you can specify a value for it by issuing a command such as the following:

SQL> alter system set "_log_io_size"=1024000 scope=spfile;
 
System altered.
 
SQL>

It’s important here to remember that regardless of the value of the _log_io_size parameter, by default the instance writes the contents of the redo log buffer to the log every three seconds. Finally, be sure to check the CPU usage on your server because a CPU starvation condition could result in excessive time spent on the log file sync wait event.

5-9. Minimizing Read by Other Session Wait Events

Problem

Your AWR report shows that the read by other session wait event is responsible for the highest number of waits. You’d like to reduce the high read by other session waits.

Solution

The main reason you’ll see the read by other session wait event is that multiple sessions are seeking to read the same data blocks, whether they are table or index blocks, and are forced to wait behind the session that’s currently reading those blocks. You can find the data blocks a session is waiting for by executing the following command:

SQL> select p1 "file#", p2 "block#", p3 "class#"
     from v$session_wait
     where event = 'read by other session';

You can then take the block# and use it in the following query to identify the exact segments (table or index) that are causing the read by other session waits:

SQL> select relative_fno, owner, segment_name, segment_type
     from dba_extents
     where file_id = &file
     and &block between block_id
     and block_id + blocks - 1;

Once you identify the hot blocks and the segments they belong to, you need to identify the queries that use these data blocks and segments and tune those queries if possible. You can also try deleting and re-inserting the rows inside the hot blocks.

The query shown here using the DBA_EXTENTS view may run quite slow in large databases. You can alternatively query the ROW_WAIT_OBJ# column from the V$SESSION view to quickly retrieve the associated OBJECT_ID and then use the OBJECT_ID to look up the OWNER, OBJECT_NAME, and OJBECT_TYPE column values from the DBA_OBJECTS view. Use this first query that follows to determine the possible causes when a session is waiting for buffered busy waits. This will get you the ROW_WAIT_OBJ# that you can use in the second query.

SQL> select row_wait_obj# from v$session     where event = 'buffer_busy_waits';

You can also query the DBA_OBJECTS view using the value for ROW_WAIT_OBJ# from the previous query. In that way you can identify the object and object type that sessions are contending for. Here’s an example:

SQL> select owner, object_name, subobject_name, object_type
     from dba_objects
     where data_object_id = &row_wait_obj;

To reduce the amount of data in each of the hot blocks and thus reduce these types of waits, you can also try to create a new tablespace with a smaller block size and move the segment to that tablespace. It’s also a good idea to check whether any low cardinality indexes are being used because this type of an index will make the database read a large number of data blocks into the buffer cache, potentially leading to the read by other session wait event. If possible, rewrite queries so that they use an index on a column with a high rather than low cardinality.

How It Works

The read by other session wait event indicates that one or more sessions are waiting for another session to read the same data blocks from disk into the SGA. Your first goal should be to identify the actual data blocks and the objects the blocks belong to. For example, these waits can be caused by multiple sessions trying to read the same index blocks into memory. Multiple sessions can also be trying to execute a full table scan simultaneously on the same table.

We’ve suggested diagnostic and remedial strategies to handle the read by other session wait event. However, as with just about any wait that is related to SQL performance, one of the first things you probably ought to do is to examine the execution plan for the SQL query.

5-10. Reducing Direct Path Read Wait Events

Problem

You notice a high amount of the direct path read wait events and also of the direct path read temp wait events and you want to reduce the occurrence of those events.

Solution

The direct path read and direct path read temp events are related wait events that occur when sessions are reading data directly into the PGA instead of reading it into the SGA. Reading data into the PGA isn’t the problem here—that’s normal behavior for certain operations, such as sorting, for example. The direct path read temp event usually indicates that the sorts being performed are large and that the PGA is unable to accommodate those sorts.

image Note  The direct path read temp wait event may also appear when the instance is performing hash joins.

Issue the following command to get the file ID for the blocks that are being waited for:

SQL> select p1 "file#", p2 "block#", p3 "class#"
     from v$session_wait
     where event = 'direct path read temp';

The column P1 shows the file ID for the read call. Column P2 shows the start BLOCK_ID, and column P3 shows the number of blocks. You can then execute the following statement to check whether this file ID is for a temporary tablespace tempfile:

SQL> select relative_fno, owner, segment_name, segment_type
     from dba_extents
     where file_id = &file
     and &block between block_id and block_id + &blocks - 1;

The direct read type waits can be caused by excessive sorts to disk or full table scans. To find out what the reads are actually for, check the P1 column (file ID for the read call) of the V$SESSION_WAIT view. By doing this, you can find out whether the reads are being caused by reading data from the TEMP tablespace because of disk sorting or whether they’re occurring because of full table scans by parallel slaves.

image Note  It must be understood that direct path read waits are more common in Oracle Database 11.1 and higher releases and are likely to happen during full table scans even when a parallel query isn’t used, that is, unless you disable serial direct path reads by setting event 10949 to a value of 1.

If you determine that sorts to disk are the main culprit in causing high direct read wait events, increase the value of the PGA_AGGREGATE_TARGET parameter (or specify a minimum size for it, if you’re using automatic memory management). Increasing PGA size is also a good strategy when the queries are doing large hash joins, which could result in excessive I/O on disk if the PGA is inadequate for handling the large hash joins. When you set a high degree of parallelism for a table, Oracle tends to go for full table scans, using parallel slaves. If your I/O system can’t handle all the parallel slaves, you’ll notice a high amount of direct path reads. The solution for this is to reduce the degree of parallelism for the table or tables in question. Also investigate whether you can avoid the full table scan by specifying appropriate indexes.

As far as efficient query writing is concerned, a DISTINCT at the top level of a query that returns a lot of data is a wrong idea, especially if it’s there to mask duplicates returned by badly written joins. Some unwanted hash joins are sometimes simply because a poor use of functions in queries prevents Oracle from running the nested loops or correlated subquery that would be appropriate.

How It Works

Normally, during both a sequential database read or a scattered database read operation, the database reads data from disk into the SGA. A direct path read is one where a single or multiblock read is made from disk directly to the PGA, bypassing the SGA. Ideally, the database should perform the entire sorting of the data in the PGA. When a huge sort doesn’t fit into the available PGA, Oracle writes part of the sort data directly to disk. A direct read occurs when the server process reads this data from disk (instead of the PGA).

You’re most likely to encounter significant direct path read waits when the I/O subsystem is overloaded, most likely because of full table scans caused by setting a high degree of parallelism for tables, causing the database to return buffers slower than what the processing speed of the server process requires. Direct path read waits will occur even when the I/O system isn’t overloaded, but the duration of these waits is likely to increase significantly when you face an overloaded I/O system. A good disk striping strategy would help out here. Oracle’s Automatic Storage Management (ASM) automatically stripes data for you. If you aren’t already using ASM, consider implementing it in your database.

The direct path write and direct path write temp wait events are analogous to the direct path read and direct path read temp waits. Normally, it’s the DBWR that writes data from the buffer cache. Oracle uses a direct path write when a process writes data buffers directly from the PGA. If your database is performing heavy sorts that spill onto disk or parallel DML operations, you can on occasion expect to encounter the direct path write events. You may also see this wait event when you execute direct path load events such as a parallel CTAS (create table as select) or a direct path INSERT operation. As with the direct path read events, the solution for direct path write events depends on what’s causing the waits. If the waits are being mainly caused by large sorts, then you may think about increasing the value of the PGA_AGGREGATE_TARGET parameter. If operations such as parallel DML are causing the waits, you must look into the proper spreading of I/O across all disks and also ensure that your I/O subsystem can handle the high degree of parallelism during DML operations.

5-11. Minimizing Recovery Writer Waits

Problem

You’ve turned on the Oracle Flashback Database feature in your database. You’re now seeing a large number of wait events because of a slow recovery writer (RVWR) process. You want to reduce the recovery writer waits.

Solution

Oracle writes all changed blocks from memory to the flashback logs on disk. You may encounter the flashback buf free byRVWR wait event as a top wait event when the database is writing to the flashback logs. To reduce these recovery writer waits, you must tune the flash recovery area file system and storage. Specifically, you must do the following:

  • Since flashback logs tend to be quite large, operating system caching may not only be ineffectual, but the instance is likely to incur some CPU overhead when writing to these files. One of the things you may consider is moving the flash recovery area to a faster file system. Also, Oracle recommends that you use file systems based on ASM because they won’t be subject to operating system file caching, which tends to slow down I/O.
  • Increase the disk throughput for the file system where you store the flash recovery area by configuring multiple disk spindles for that file system. This will speed up the writing of the flashback logs.
  • Stripe the storage volumes, ideally with small stripe sizes (for example, 128 KB).
  • Set the LOG_BUFFER initialization parameter to a minimum value of 8 MB—the memory allocated for writing to the flashback database logs depends on the setting of the LOG_BUFFER parameter.

How It Works

Unlike in the case of the redo log buffer, Oracle writes flashback buffers to the flashback logs at infrequent intervals to keep overhead low for the Oracle Flashback Database. The flashback buf free by RVWR wait event occurs when sessions are waiting on the RVWR process. The RVWR process writes the contents of the flashback buffers to the flashback logs on disk. When the RVWR falls behind during this process, the flashback buffer is full and free buffers aren’t available to sessions that are making changes to data through DML operations. The sessions will continue to wait until RVWR frees up buffers by writing their contents to the flashback logs. High RVWR waits indicate that your I/O system is unable to support the rate at which the RVWR needs to flush flashback buffers to the flashback logs on disk.

5-12. Finding Out Who’s Holding a Blocking Lock

Problem

Your users are complaining that some of their sessions are very slow. You suspect that those sessions may be locked by Oracle for some reason and want to find the best way to go about figuring out who is holding up these sessions.

Solution

As we’ve explained in the introduction to this chapter, Oracle uses several types of locks to control transactions being executed by multiple sessions to prevent destructive behavior in the database. A blocking lock could “slow” down a session; in fact, the session is merely waiting on another session that is holding a lock on an object (such as a row or a set of rows or even an entire table). Or, in a development scenario, a developer might have started multiple sessions, some of which are blocking each other.

When analyzing Oracle locks, some of the key database views you must examine are the V$LOCK and V$SESSION views. The V$LOCKED_OBJECT and DBA_OBJECTS views are also useful in identifying the locked objects. To find out whether a session is being blocked by the locks being applied by another session, you can execute the following query:

SQL> select s1.username || '@' || s1.machine
  2  || ' ( SID=' || s1.sid || ' )  is blocking '
  3  || s2.username || '@' || s2.machine || ' ( SID=' || s2.sid || ' ) ' AS blocking_status
  4  from v$lock l1, v$session s1, v$lock l2, v$session s2
  5  where s1.sid=l1.sid and s2.sid=l2.sid
  6  and l1.BLOCK=1 and l2.request > 0
  7  and l1.id1 = l2.id1
  8  and l2.id2 = l2.id2 ;
 
 
BLOCKING_STATUS
--------------------------------------------------------------------
HR@MIROMIROPC61 ( SID=68 )  is blocking SH@MIROMIROPC61 ( SID=81 )
 
SQL>

Note that the BLOCK column can have the value 2 in an Oracle RAC environment. The output of the query shows the blocking session as well as all the blocked sessions.

A quick way to find out whether you have any blocking locks in your instance at all, for any user, is to simply run the following query:

SQL> select * from V$lock where block > 0;

If you don’t get any rows back from this query, good—you don’t have any blocking locks in the instance right now! We’ll explain this view in more detail in the explanation section.

You can also issue a SELECT statement on the V$SESSION view with the clause where blocking_session is not null to identify blocking locks in the instance. Here’s an example:

SQL> select process,sid, blocking_session
     from v$session
     where blocking_session is not null;
 
PROCESS             SID              BLOCKING_SESSION
-------             ====             ---------------
6789                 123                         456
SQL>

You can use the SID to find the SERIAL# value for the blocking session and kill the blocking session.

How It Works

Oracle uses two types of locks to prevent destructive behavior: exclusive and shared locks. Only one transaction can obtain an exclusive lock on a row or a table, while multiple shared locks can be obtained on the same object. Oracle uses locks at two levels: the row and table levels. Row locks, indicated by the symbol TX, lock just a single row of a table for each row that’ll be modified by a DML statement such as INSERT, UPDATE, and DELETE. This is true also for a MERGE or a SELECT ... FOR UPDATE statement. The transaction that includes one of these statements grabs an exclusive row lock as well as a row share table lock. Note that each time a transaction intends to modify a row or rows of a table, it holds a table lock (TM) as well on that table to prevent the database from allowing any DDL operations (such as DROP TABLE) on that table while the transaction is trying to modify some of its rows. The transaction (and the session) will hold these locks until it commits or rolls back the statement. Until it does one of these two things, all other sessions that intend to modify that particular row are blocked.

In an Oracle database, locking works this way:

  • A reader won’t block another reader.
  • A reader won’t block a writer.
  • A writer won’t block a reader of the same data.
  • A writer will block another writer that wants to modify the same data.

It’s the last case in the list, where two sessions intend to modify the same data in a table, that Oracle’s automatic locking kicks in to prevent destructive behavior. The first transaction that contains the statement that updates an existing row will get an exclusive lock on that row. While the first session that locks a row continues to hold that lock (until it issues a COMMIT or ROLLBACK statement), other sessions can modify any other rows in that table other than the locked row. The concomitant table lock held by the first session is merely intended to prevent any other session from issuing a DDL statement to alter the table’s structure. Oracle uses a sophisticated locking mechanism whereby a row-level lock isn’t automatically escalated to the table or even the block level.

5-13. Identifying Blocked and Blocking Sessions

Problem

You notice enqueue locks in your database and suspect that a blocking lock may be holding up other sessions. You’d like to identify the blocking and blocked sessions.

Solution

When you see an enqueue wait event in an Oracle database, chances are that it’s a locking phenomenon that’s holding up some sessions from executing their SQL statements. When a session waits on an “enqueue” wait event, that session is waiting for a lock that’s held by a different session. The blocking session is holding the lock in a mode that’s incompatible with the lock mode that’s being requested by the blocked session. You can issue the following command to view information about the blocked and blocking sessions:

SQL> select decode(request,0,'Holder: ','Waiter: ')||sid sess,
     id1, id2, lmode, request, type
     from v$lock
     where (id1, id2, type) in
     (select id1, id2, type from v$lock where request>0)
     order by id1, request;

The V$LOCK view shows whether there are any blocking locks in the instance. If there are blocking locks, it also shows the blocking session(s) and the blocked session(s). Note that a blocking session can block multiple sessions simultaneously, if all of them need the same object that’s being blocked. Here’s an example that shows there are locks present:

 SQL> select sid,type,lmode,request,ctime,block from v$lock;
 
SID            TY           LMODE      REQUEST          CTIME       BLOCK
----      --------     -----------    --------       --------     -------
 127           MR              4             0         102870          0
  81           TX              0             6            778          0
 191           AE              4             0            758          0
 205           AE              4             0            579          0
 140           AE              4             0          11655          0
  68           TM              3             0            826          0
  68           TX              6             0            826          1
...
SQL>

The key column to watch is the BLOCK column—the blocking session will have the value 1 for this column. In our example, session 68 is the blocking session because it shows the value 1 under the BLOCK column. Thus, the V$LOCK view confirms our initial finding in the “Solution” section of this recipe. The blocking session, with a SID of 68, also shows a lock mode 6 under the LMODE column, indicating that it’s holding this lock in the exclusive mode; this is the reason session 81 is “hanging,” unable to perform its update operation. The blocked session, of course, is the victim, so it shows a value of 0 in the BLOCK column. It also shows a value of 6 under the REQUEST column because it’s requesting a lock in the exclusive mode to perform its update of the column. The blocking session, in turn, will show a value of 0 for the REQUEST column because it isn’t requesting any locks; it’s already holding it.

If you want to find out the wait class and for how long a blocking session has been blocking others, you can do so by querying the V$SESSION view, as shown here:

SQL> select  blocking_session, sid,  wait_class,
     seconds_in_wait
     from     v$session
     where blocking_session is not NULL
     order by blocking_session;
 
BLOCKING_SESSION        SID        WAIT_CLASS        SECONDS_IN_WAIT
-----------------       -----      ------------      ----------------
      68                81          Application                  7069
 
SQL>

The query shows that the session with SID=68 is blocking the session with SID=81 and that it started blocking the session 7,069 seconds ago. You can replace the WAIT_CLASS clause from the V$SESSION view with the EVENT column if you want to find the exact wait event instead of the wait class. The EVENT column shows the resource or event for which the session is waiting, while the WAIT_CLASS column shows the name of the class for the wait event.

How It Works

The following are the most common types of enqueue locks you’ll see in an Oracle database:

  • TX: These are because of a transaction lock and usually caused by faulty application logic.
  • TM: These are table-level DML locks, and the most common cause is that you haven’t indexed foreign key constraints in a child table and you’re modifying the parent table.

In addition, you are also likely to notice ST enqueue locks on occasion. These indicate sessions that are waiting while Oracle is performing space management operations, such as the allocation of temporary segments for performing a sort.

5-14. Dealing with a Blocking Lock

Problem

You’ve identified blocking locks in your database. You want to know how to deal with those locks.

Solution

There are two basic strategies when dealing with a blocking lock: a short-term strategy and a long-term strategy. The short-term solution is to quickly get rid of the blocking locks so they don’t hurt the performance of your database. You get rid of them by simply killing the blocking session. If you see a long queue of blocked sessions waiting behind a blocking session, kill the blocking session so that the other sessions can get going.

The first thing you need to do is get rid of the blocking lock so the sessions don’t keep queuing up—it’s not at all uncommon for a single blocking lock to result in dozens and even hundreds of sessions, all waiting for the blocked object. Since you already know the SID of the blocking session (session 68 in our example), just kill the session in this way after first querying the V$SESSION view for the corresponding serial# for the session:

SQL> select serial# from v$session where sid=68;
SQL> alter system kill session '68, 1234';

image Tip  If you want and if things aren’t in a panic mode, you can query the V$SESSION view’s active column before proceeding to kill the session.

For the long run, though, you must investigate why the blocking session is behaving the way it is. Usually, you’ll find a flaw in the application logic. You may, though, need to dig deep into the SQL code that the blocking session is executing.

How It Works

In this example, obviously the blocking lock is a DML lock. However, even if you didn’t know this ahead of time, you can figure out the type of lock by examining the TYPE (TY) column of the V$LOCK view. Oracle uses several types of internal “system” latches to maintain the library cache and other instance-related components, but those locks are normal, and you won’t find anything related to those locks in the V$LOCK view.

image Note  In more recent versions of the database, most of the library cache–related caches are termed mutexes. Mutexes, which is short for mutual exclusion algorithms, are a much lighter and finer-grained concurrency management mechanism than latches. Mutexes are typically used in concurrent programming to control access to common resources. Mutexes protect single structures whereas latches typically protect access to multiple structures. In Oracle Database 10g Oracle introduced the use of mutexes for certain library cache operation and in Oracle Database 11g replaced all library cache latches with mutexes. A typical concurrency wait involving mutexes is the library cache: mutex X concurrency wait event.

For DML operations, Oracle uses two basic types of locks: transaction locks (TX) and DML locks (TM). There is also a third type of lock, a user lock (UL), but it doesn’t play a role in troubleshooting general locking issues. Transaction locks are the most frequent type of locks you’ll encounter when troubleshooting Oracle locking issues. Each time a transaction modifies data, it invokes a TX lock, which is a row transaction lock. The DML lock, TM, on the other hand, is acquired once for each object that’s being changed by a DML statement.

The LMODE column shows the lock mode, with a value of 6 indicating an exclusive lock. The REQUEST column shows the requested lock mode. The session that first modifies a row will hold an exclusive lock with LMODE=6. This session’s REQUEST column will show a value of 0, since it’s not requesting a lock—it already has one! The blocked session needs but can’t obtain an exclusive lock on the same rows, so it requests a TX in the exclusive mode (MODE=6) as well. So, the blocked session’s REQUEST column will show a value of 6 and its LMODE column a value of 0 (a blocked session has no lock at all in any mode).

The preceding discussion applies to row locks, which are always taken in the exclusive mode. A session may attempt to acquire a TX-type lock in a mode other than the exclusive mode (mode 6), as is the case when a session enqueues because of a potential primary key violation, where the session will attempt to acquire a TX lock in share mode (mode 4). A TM lock is normally acquired in mode 3, which is a Shared Row Exclusive mode, whereas a DDL statement will need a TM exclusive lock.

5-15. Identifying a Locked Object

Problem

You are aware of a locking situation, and you want to find out the object that’s being locked.

Solution

You can find the locked object’s identity by looking at the value of the ID1 (LockIdentifier) column in the V$LOCK view (see Recipe 5-13). The value of the ID1 column where the TYPE column is TM (DML enqueue) identifies the locked object. Let’s say you’ve ascertained that the value of the ID1 column is 99999. You can then issue the following query to identify the locked table:

SQL> select object_name from dba_objects where object_id=99999;
 
OBJECT_NAME
------------
TEST
SQL>

An even easier way is to use the V$LOCKED_OBJECT view to find out the locked object, the object type, and the owner of the object.

SQL> select lpad(' ',decode(l.xidusn,0,3,0)) || l.oracle_username "User",
     o.owner, o.object_name, o.object_type
     from v$locked_object l, dba_objects o
     where l.object_id = o.object_id
     order by o.object_id, 1 desc;
 
User       OWNER      OBJECT_NAME      OBJECT_TYPE
------     ------     ------------     ------------
HR         HR         TEST             TABLE
SH         HR         TEST             TABLE
 
SQL>

Note that the query shows both the blocking and blocked users.

How It Works

As the “Solution” section shows, it’s rather easy to identify a locked object. You can certainly use Oracle Enterprise Manager (Cloud Control 12c or the new Enterprise Manager Database Express) to quickly identify a locked object, the ROWID of the object involved in the lock, and the SQL statement that’s responsible for the locks. However, it’s always important to understand the underlying Oracle views that contain the locking information, and that’s what this recipe demonstrates. Using the queries shown in this recipe, you can easily identify a locked object without recourse to a monitoring tool such as Oracle Enterprise Manager, for example.

In the example shown in the solution, the locked object was a table, but it could be any other type of object, including a PL/SQL package. Often, it turns out that the reason a query is just hanging is that one of the objects the query needs is locked. You may have to kill the session holding the lock on the object before other users can access the object.

5-16. Resolving enq: TM Lock Contention

Problem

Several sessions in your database are taking a long time to process some insert statements. As a result, the “active” sessions count is very high, and the database is unable to accept new session connections. Upon checking, you find that the database is experiencing a lot of enq: TM – contention wait events.

Solution

The enq: TM – contention event is usually because of indexes on foreign key columns on a table that’s part of an Oracle DML operation. Once you fix the problem by indexing the foreign key constraint, the enq:TM – contention event will go away.

The waits on the enq: TM – contention event for the sessions that are waiting to perform insert operations are almost always because of an unindexed foreign key constraint. This happens when a dependent or child table’s foreign key constraint that references a parent table is missing an index on the associated key. Oracle acquires a table lock on a child table if it’s performing modifications on the primary key column in the parent table that’s referenced by the foreign key of the child table. Note that these are full table locks (TM) and not row-level locks (TX); thus, these locks aren’t restricted to a row but to the entire table. Naturally, once this table lock is acquired, Oracle will block all other sessions that seek to modify the child table’s data. Once you create an index in the child table performing on the column that references the parent table, the waits due to the TM contention will go away.

How It Works

Oracle takes out an exclusive lock on a child table if you don’t index the foreign key constraints in that table, when a SQL statement modifies a table that is referenced by another table. To illustrate how an unindexed foreign key will result in contention because of locking, we use the following example. Create two tables, STORES and PRODUCTS, as shown here:

SQL> create table stores
     (store_id     number(10)     not null,
     supplier_name     varchar2(40)     not null,
     constraint stores_pk PRIMARY KEY (store_id));
SQL>create table products
    (product_id     number(10)     not null,
    product_name    varchar2(30)    not null,
    supplier_id     number(10)     not null,
    store_id     number(10)     not null,
    constraint fk_stores
    foreign key (store_id)
    references stores(store_id)
    on delete cascade);

If you now delete any rows in the STORES table, you’ll notice waits because of locking. You can get rid of these waits by simply creating an index on the column you’ve specified as the foreign key in the PRODUCTS table:

create index fk_stores on products(store_id);

You can find all unindexed foreign key constraints in a specific schema by issuing the following query. (Recipe 2-5 shows a query to get all such constraints in the entire database.)

SQL> select * from (
     select ct.table_name, co.column_name, co.position column_position
     from   user_constraints ct, user_cons_columns co
     where  ct.constraint_name = co.constraint_name
     and   ct.constraint_type = 'R'
     minus
     select ui.table_name, uic.column_name, uic.column_position
     from   user_indexes ui, user_ind_columns uic
     where  ui.index_name = uic.index_name
     )
     order by table_name, column_position;

If you don’t index a foreign key column, you’ll notice the child table is often locked, thus leading to contention-related waits. Oracle recommends that you always index your foreign keys.

image Tip  If the matching unique or primary key for a child table’s foreign key never gets updated or deleted, you don’t have to index the foreign key column in the child table.

As mentioned earlier, Oracle tends to acquire a table lock on the child table if you don’t index the foreign key column. If you insert a row into the parent table, the parent table doesn’t acquire a lock on the child table; however, if you update or delete a row in the parent table, the database will acquire a full table lock on the child table. That is, any modifications to the primary key in the parent table will result in a full table lock (TM) on the child table. In our example, the STORES table is a parent of the PRODUCTS table, which contains the foreign key STORE_ID. The table PRODUCTS being a dependent table, the values of the STORE_ID column in that table must match the values of the unique or primary key of the parent table, STORES. In this case, the STORE_ID column in the STORES table is the primary key of that table.

Whenever you modify the parent table’s (STORES) primary key, the database acquires a full table lock on the PRODUCTS table. Other sessions can’t change any values in the PRODUCTS table, including the columns other than the foreign key column. The sessions can query but not modify the PRODUCTS table. During this time, any sessions attempting to modify any column in the PRODUCTS table will have to wait (TM: enq contention wait). Oracle will release this lock on the child table PRODUCTS only after the transaction has finished the modifying the primary key in the parent table, STORES. If you have a bunch of sessions waiting to modify data in the PRODUCTS table, they’ll all have to wait, and the active session count naturally will go up very fast if you have an online transaction processing–type database that has many users who perform short DML operations. Note that any DML operations you perform on the child table don’t require a table lock on the parent table.

5-17. Identifying Recently Locked Sessions

Problem

A session is experiencing severe waits in the database, most likely because of a blocking lock placed by another session. You’ve tried to use V$LOCK and other views to drill deeper into the locking issue but are unable to “capture” the lock while it’s in place. You want to use a different view to “see” the older locking data that you might have missed while the locking was going on.

Solution

You can execute the following statement based on ASH (needs licensing) to find out information about all locks held in the database during the previous ten minutes. Of course, you can vary the time interval to a smaller or larger period, so long as there’s ASH data covering that time period.

SQL> select to_char(h.sample_time, 'HH24:MI:SS') TIME,ash.session_id,
     decode(ash.session_state, 'WAITING' ,ash.event, ash.session_state) STATE,
     ash.sql_id,
     ash.blocking_session BLOCKER
     from v$active_session_history ash, dba_users du
     where du.user_id = ash.user_id
     and ash.sample_time > SYSTIMESTAMP-(10/1440);
 
TIME        SID      STATE                           SQL_ID          BLOCKER
---------   -----   -----------------------------    -------------   -------
17:00:52    197     116 enq: TX - row lock conten    094w6n53tnywr      191
17:00:51    197     116 enq: TX - row lock conten    094w6n53tnywr      191
17:00:50    197     116 enq: TX - row lock conten    094w6n53tnywr      191
 
...
 
SQL>

You can see that ASH has recorded all the blocks placed by session 1, the blocking session (SID=191) that led to a “hanging” situation for session 2, the blocked session (SID=197). Be aware that the data in the V$ACTIVE_SESSION_HISTORY view doesn’t show all enqueues. It shows only those enqueues that the sessions were waiting on at the instant that the ASH sample was collected. And ASH samples are collected every second.

How It Works

Often, when your database users complain about a performance problem, you may query the V$SESSION or V$LOCK view, but you may not find anything useful there because the wait issue may have been already resolved by then. In these circumstances, you can query the V$ACTIVE_SESSION_HISTORY view to find out what transpired in the database during the previous 60 minutes. This view offers a window into the active session history (ASH), which is a memory buffer that collects information about all active sessions, every second. V$ACTIVE_SESSION_HISTORY contains one row for each active session, and newer information continuously overwrites older data, since ASH is a rolling buffer.

We can best demonstrate the solution by creating the scenario that we’re discussing and then working through that scenario. Begin by creating a test table with a couple of columns:

SQL>  create table test (name varchar(20), id number (4));
Table created.
SQL>

Insert some data into the test table:

SQL> insert into test values ('alapati',9999);
1 row created.
SQL> insert into test values ('sam', 1111);
1 row created.
SQL> commit;
Commit complete.
SQL>

In session 1 (the current session), execute a SELECT * FOR UPDATE statement on the table TEST—this will place a lock on that table.

SQL> select * from test for update;
 
SQL>

In a different session, session 2, execute the following UPDATE statement:

SQL> update test set name='Jackson' where id = 9999;

Session 2 will hang now because it’s being blocked by the SELECT FOR UPDATE statement issued by session 1. Go ahead now and issue either a ROLLBACK or a COMMIT from session 1.

SQL> rollback;
Rollback complete.
SQL>

When you issue the ROLLBACK statement, session 1 releases all locks it’s currently holding on table TEST. You’ll notice that session 2, which has been blocked thus far, immediately processes the UPDATE statement, which was previously “hanging,” waiting for the lock held by session 2.

Therefore, we know for sure that there was a blocking lock in your database for a brief period, with session 1 the blocking session and session 2 the blocked session. You can’t find any evidence of this in the V$LOCK view, though, because that and all other lock-related views show you details only about currently held locks. Here’s where the ASH views shine—they can provide you with information about locks that have been held recently but are gone already before you can view them with a query on the V$LOCK or V$SESSION view.

image Caution  Be careful when executing the ASH query shown in the “Solution” section of this recipe. As the first column (SAMPLE_TIME) shows, ASH will record session information every second. If you execute this query over a long time frame, you may get a large amount of output just repeating the same locking information. To deal with that output, you may specify the SET PAUSE ON option in SQL*Plus. That will pause the output every page, enabling you to scroll through a few rows of the output to identify the problem.

Use the following query to find out the wait events that occurred in this session during the past hour.

SQL> select sample_time, event, wait_time
     from v$active_session_history
     where session_id = 81
     and session_serial# = 422;

The column SAMPLE_TIME lets you know precisely when this session suffered a performance hit because of a specific wait event. You can identify the actual SQL statement that was being executed by this session during that period by using the V$SQL view along with the V$ACTIVE_SESSION_HISTORY view, as shown here:

SQL> select sql_text, application_wait_time
     from v$sql
     where sql_id in ( select sql_id from v$active_session_history
     where sample_time =  '08-MAR-14 05.00.52.00 PM'
     and session_id = 68 and session_serial# = 422);

Alternatively, if you have the SQL_ID already from the V$ACTIVE_SESSION_HISTORY view, you can get the value for the SQL_TEXT column from the V$SQLAREA view, as shown here:

SQL> select sql_text FROM v$sqlarea WHERE sql_id = '7zfmhtu327zm0';

Once you have the SQL_ID, it’s also easy to extract the SQL plan for this SQL statement, by executing the following query based on the DBMS_XPLAN package:

SQL> select * FROM table(dbms_xplan.display_cursor('7zfmhtu327zm0'));

image Tip  Since a SQL statement can have multiple execution plans, you may want to add the SQL_PLAN_HASH_VALUE column’s value from the V$ACTIVE_SESSION_HISTORY view when you execute the DBMS_XPLAN.DISPLAY_CURSOR procedure.

The background process MMON flushes ASH data to disk every hour, when the AWR snapshot is created. What happens when MMON flushes ASH data to disk? Well, you won’t be able to query older data any longer with the V$ACTIVE_SESSION_HISTORY view. No worry, because you can still use the DBA_HIST_ACTIVE_SESS_HISTORY view to query the older data. The structure of this view is similar to that of the V$ACTIVE_SESSION_HISTORY view. The DBA_HIST_ACTIVE_SESS_HISTORY view shows the history of the contents of the in-memory active session history of recent system activity. You can also query the V$SESSION_WAIT_HISTORY view to examine the last ten wait events for a session while it’s still active. This view offers more reliable information for very recent wait events than the V$SESSION and V$SESSION_WAIT views, both of which show wait information for only the most recent wait. Here’s a typical query using the V$SESSION_WAIT_HISTORY view:

SQL> select sid from v$session_wait_history
     where wait_time = (select max(wait_time) from v$session_wait_history);

Any nonzero values under the WAIT_TIME column represent the time waited by this session for the last wait event. A zero value for this column means that the session is currently waiting for a resource.

5-18. Analyzing Recent Wait Events in a Database

Problem

You want to find out the most important waits in your database in the recent past, as well as the users, SQL statements, and objects that are responsible for most of those waits.

Solution

Query the V$ACTIVE_SESSION_HISTORY view to get information about the most common wait events and the SQL statements, database objects, and users responsible for those waits. The following are some useful queries you can use.

To find the most important wait events in the last 15 minutes, issue the following query:

SQL> select event,
     sum(wait_time +
     time_waited) total_wait_time
     from v$active_session_history
     where sample_time between
     sysdate –15/1440 and sysdate
     group by event
     order by total_wait_time desc

To find out which of your users experienced the most waits in the past 15 minutes, issue the following query:

SQL> select s.sid, s.username,
     sum(a.wait_time +
     a.time_waited) total_wait_time
     from v$active_session_history a,
     v$session s
     where a.sample_time between sysdate – 15/1440 and sysdate
     and a.session_id=s.sid
     group by s.sid, s.username
     order by total_wait_time desc;

You can identify the SQL statements that have been waiting the most during the last 15 minutes with this query:

SQL> select a.user_id,u.username,s.sql_text,
     sum(a.wait_time + a.time_waited) total_wait_time
     from v$active_session_history a,
     v$sqlarea s,
     dba_users u
     where a.sample_time between sysdate – 15/1440 and sysdate
     and a.sql_id = s.sql_id
     and a.user_id = u.user_id
     group by a.user_id,s.sql_text, u.username
     order by 4;

How It Works

The “Solution” section shows how to join the V$ACTIVE_SESSION_HISTORY view with other views, such as the V$SESSION, V$SQLAREA, DBA_USERS, and DBA_OBJECTS views, to find out exactly what’s causing the highest number of wait events or who’s waiting the most, in the past few minutes. This information is valuable when troubleshooting “live” database performance issues.

5-19. Identifying Time Spent Waiting Because of Locking

Problem

You want to identify the total time spent waiting by sessions because of locking issues.

Solution

You can use the following query to identify (and quantify) waits caused by the locking of a table’s rows. Since the query orders the wait events by time waited, you can quickly see which type of wait events accounts for most of the waits in your instance.

SQL> select wait_class, event, time_waited / 100 time_secs
  2  from v$system_event e
  3  where e.wait_class <> 'Idle' AND time_waited > 0
  4  union
  5  select 'Time Model', stat_name NAME,
  6  round ((value / 1000000), 2) time_secs
  7  from v$sys_time_model
  8  where stat_name NOT IN ('background elapsed time', 'background cpu time')
  9*  order by 3 desc;
 
WAIT_CLASS             EVENT                                TIME_SECS
-----------            ------------------------------       ----------
System I/O             log file parallel write                45066.32
System I/O             control file sequential read           23254.41
Time Model             DB time                                11083.91
Time Model             sql execute elapsed time                7660.04
Concurrency            latch: shared pool                      5928.73
Application            enq: TX - row lock contention           3182.06
...
SQL>

In this example, the wait event enq: TX - row lock contention reveals the total time because of row lock enqueue wait events. Note that the shared pool latch events are classified under the Concurrency wait class, while the enqueue TX - row lock contention event is classified as an Application class wait event.

How It Works

The query in the “Solution” section joins the V$SYSTEM_EVENT and V$SYS_TIME_MODEL views to show you the total time waited because of various wait events. In our case, we’re interested in the total time waited because of enqueue locking. If you’re interested in the total time waited by a specific session, you can use a couple of different V$ views to find out how long sessions have been in a wait state, but we recommend using the V$SESSION view because it shows you various useful attributes of the blocking and blocked sessions. Here’s an example showing how to find out how long a session has been blocked by another session:

SQL>select sid, username, event, blocking_session,
    seconds_in_wait, wait_time
    from v$session where state in ('WAITING'),

The query reveals the following about the session with SID 81, which is in a WAITING state:

SID  : 81 (this is the blocked session)
username: SH (user who's being blocked right now)
event: TX - row lock contention (shows the exact type of lock contention)
blocking session: 68 (this is the "blocker")
seconds_in_wait: 3692 (how long the blocked session is in this state)

The query reveals that the user SH, with a SID of 81, has been blocked for more than an hour (3,692 seconds). User SH is shown as waiting for a lock on a table that is currently locked by session 68. While the V$SESSION view is highly useful for identifying the blocking and blocked sessions, it can’t tell you the SQL statement that’s involved in the blocking of the table. Often, identifying the SQL statement that’s involved in a blocking situation helps in finding out exactly why the statement is leading to the locking behavior. To find out the actual SQL statement that’s involved, you must join the V$SESSION and V$SQL views, as shown here:

SQL> select sid, sql_text
     from v$session s, v$sql q
     where sid in (68,81)
     and (
     q.sql_id = s.sql_id or  q.sql_id = s.prev_sql_id)
SQL> /
 
     SID                       SQL_TEXT
     ----         -----------------------------------------------------
     68           select * from test for update
     81           update hr.test set name='nalapati' where user_id=1111
 
SQL>

image Note  Developers can also use Oracle’s built-in DBMS_APPLICATION_INFO package to Oracle Trace and the SQL trace facility to record the names of executing modules or transactions in the database. They can later use this information for tracking the performance of the modules and for debugging.

The output of the query shows that session 81 is being blocked because it’s trying to update a row in a table that has been locked by session 68, using the SELECT ... FOR UPDATE statement. In cases such as this, if you find a long queue of user sessions being blocked by another session, you must kill the blocking session so the other sessions can process their work. You’ll also see a high active user count in the database during these situations—killing the blocking session offers you an immediate solution to resolving contention caused by enqueue locks. Later, you can investigate why the blocks are occurring so as to prevent these situations.

For any session, you can identify the total time waited by a session for each wait class by issuing the following query:

SQL> select wait_class_id, wait_class,
     total_waits, time_waited
     from v$session_wait_class
     where sid = <SID>;

If you find, for example, that this session endured a high number of waits in the application wait class (the wait class ID for this class is 4217450380), you can issue the following query using the V$SYSTEM_EVENT view to find out exactly which waits are responsible:

SQL> select event, total_waits, time_waited
     from v$system_event e, v$event_name n
     where n.event_id = e.event_id
     and e.wait_class_id = 4217450380;
 
EVENT                                    TOTAL_WAITS             TIME_WAITED
--------------------                     ------------            -------------
enq: TM - contention                              82                      475
...
SQL>

In our example, the waits in the Application class (ID 4217450380) are because of locking contention as revealed by the wait event enq:TM - contention. You can further use the V$EVENT_HISTOGRAM view to find out how many times and for how long sessions have waited for a specific wait event since you started the instance. Here’s the query you need to execute to find out the wait time pattern for enqueue lock waits:

SQL> select wait_time_milli bucket, wait_count
     from v$event_histogram
     where event = 'enq: TX - row lock contention';

A high number of enqueue waits because of locking behavior is usually because of faulty application design. You’ll sometimes encounter this when an application executes many updates against the same row or a set of rows. Since this type of high waits due to locking is because of inappropriately designed applications, there’s not much you can do by yourself to reduce these waits. Let your application team know why these waits are occurring, and ask them to consider modifying the application logic to avoid the waits.

Any of the following four DML statements can cause locking contention: INSERT, UPDATE, DELETE, and SELECT FOR UPDATE. INSERT statements wait for a lock because another session is attempting to insert a row with an identical value. This usually happens when you have a table that has a primary key or unique constraint, with the application generating the keys. Use an Oracle sequence instead to generate the key values to avoid these types of locking situations. You can specify the NOWAIT option with a SELECT FOR UPDATE statement to eliminate session blocking because of locks. You can also use the SELECT FOR UPDATE NOWAIT statement to avoid waiting by sessions for locks when they issue an UPDATE or DELETE statement. The SELECT FOR UPDATE NOWAIT statement locks the row without waiting. You can specify SELECT FOR UPDATE SKIP UNLOCKED to skip all rows in the candidate result set that have already been locked and return the rest of the rows.

5-20. Minimizing Latch Contention

Problem

You’re seeing a high number of latch waits, and you want to reduce the latch contention.

Solution

Severe latch contention can slow your database down noticeably. When you’re dealing with a latch contention issue, start by executing the following query to find out the specific types of latches and the total wait time caused by each wait:

SQL> select event, sum(P3), sum(seconds_in_wait) seconds_in_wait
     from v$session_wait
     where event like 'latch%'
     group by event;

The previous query shows the latches that are currently being waited for by this session. To find out the amount of time the entire instance has waited for various latches, execute the following SQL statement:

SQL> select wait_class, event, time_waited / 100 time_secs
     from v$system_event e
     where e.wait_class <> 'Idle' AND time_waited > 0
     union
     select 'Time Model', stat_name NAME,
     round ((value / 1000000), 2) time_secs
     from v$sys_time_model
     where stat_name not in ('background elapsed time', 'background cpu time')
     order by 3 desc;
 
WAIT_CLASS                                EVENT                     TIME_SECS
-----------                    -------------------------            ----------
Concurrency                            library cache pin               622.24
Concurrency                         latch: library cache               428.23
Concurrency                    latch: library cache lock                93.24
Concurrency                           library cache lock                24.20
Concurrency                     latch: library cache pin                60.28
...

The partial output from the query shows the latch-related wait events, which are part of the Concurrency wait class.

You can also view the Top 5 Timed Events in the AWR report to see whether latch contention is an issue, as shown here:

Event                         Waits           Time (s)     (ms)      Time      Wait Class
------------------------   ------------    ----------    -------   --------   ------------
db file sequential read      42,005,780       232,838          6       73.8      User I/O
CPU time                                      124,672                  39.5      Other
latch free                   11,592,952        76,746          7       24.3      Other
wait list latch free            107,553         2,088         19        0.7      Other
latch: library cache          1,135,976         1,862          2        0.6      Concurrency

Here are the most common Oracle latch wait types and how you can reduce them:

  • Shared pool and library latches: These are caused mostly by the database repeatedly executing the same SQL statement that varies slightly each time. For example, a database may execute a SQL statement 10,000 times, each time with a different value for a variable. The solution in all such cases is to use bind variables. Too small a shared pool may also contribute to the latch problem, so check your SGA size.
  • Cache buffers LRU chain: These latch events are usually because of excessive buffer cache usage and may be caused both by excessive physical reads and by logical reads. Either the database is performing large full table scans or it’s performing large index range scans. The usual cause for these types of latch waits is either the lack of an index or the presence of an unselective index. Also check to see whether you need to increase the size of the buffer cache.
  • Cache buffer chains: These waits are because of one or more hot blocks that are being repeatedly accessed. Application code that updates a table’s rows to generate sequence numbers, rather than using an Oracle sequence, can result in such hot blocks. You might also see the cache buffer chains wait event when too many processes are scanning an unselective index with similar predicates.

Also, if you’re using Oracle sequences, re-create them with a larger cache size setting and try to avoid using the ORDER clause. The CACHE clause for a sequence determines the number of sequence values the database must cache in the SGA. If your database is processing a large number of inserts and updates, consider increasing the cache size to avoid contention for sequence values. By default, the cache is set to 20 values. Contention can result if values are being requested fast enough to frequently deplete the cache. If you’re dealing with a RAC environment, using the NOORDER clause will prevent enqueue contention because of the forced ordering of queued sequence values.

How It Works

Oracle uses internal locks called latches to protect various memory structures. When a server process attempts to get a latch but fails to do so, that attempt is counted as a latch-free wait event. Oracle doesn’t group all latch waits into a single latch-free wait event. Oracle does use a generic latch-free wait event, but this is only for the minor latch-related wait events. For the latches that are most common, Oracle uses various subgroups of latch wait events, with the name of the wait event type. You can identify the exact type of latch by looking at the latch event name. For example, the latch event latch: library cache indicates contention for library cache latches. Similarly, the latch: cache buffer chains event indicates contention for the buffer cache.

Oracle uses various types of latches to prevent multiple sessions from updating the same area of the SGA. Various database operations require sessions to read or update the SGA. For example, when a session reads a data block into the SGA from disk, it must modify the buffer cache least recently used chain. Similarly, when the database parses a SQL statement, that statement has to be added to the library cache component of the SGA. Oracle uses latches to prevent database operations from stepping on each other and corrupting the SGA.

A database operation needs to acquire and hold a latch for very brief periods, typically lasting a few nanoseconds. If a session fails to acquire a latch at first because the latch is already in use, the session will try a few times before going to “sleep.” The session will re-awaken and try a few more times, before going into the sleep mode again if it still can’t acquire the latch it needs. Each time the session goes into the sleep mode, it stays longer there, thus increasing the time interval between subsequent attempts to acquire a latch. Thus, if there’s a severe contention for latches in your database, it results in a severe degradation of response times and throughput.

Don’t be surprised to see latch contention even in a well-designed database running on very fast hardware. Some amount of latch contention, especially the cache buffers chain latch events, is pretty much unavoidable. You should be concerned only if the latch waits are extremely high and are slowing down database performance.

Contention because of the library cache latches as well as shared pool latches is usually because of applications not using bind variables. If your application can’t be recoded to incorporate bind variables, all is not lost. You can set the CURSOR_SHARING parameter to force Oracle to use bind variables, even if your application hasn’t specified them in the code. You must set the CURSOR_SHARING parameter to FORCE to force the substituting of bind variables for hard-coded values of variables.

The default setting for the CURSOR_SHARING parameter is EXACT, which means the database won’t substitute bind variables for literal values. Rather, Oracle only allows statements with identical text to share cursors. When you set the CURSOR_SHARING parameter to FORCE, Oracle converts all literals to bind variables. The setting of the parameter to FORCE allows the creation a new cursor if an existing cursor is being shared or if the cursor’s execution plan isn’t optimal.

Although there are some concerns about the safety of setting the CURSOR_SHARING parameter to FORCE, we haven’t seen any real issues with using this setting. The library cache contention usually disappears once you set the CURSOR_SHARING parameter to FORCE.

While the CURSOR_SHARING parameter does seem to help in a number of instances when dealing with library cache latch contention, be sure to review the bug reports in My Oracle Support to make sure your applications don’t fall prey to various reported issues with setting the CURSOR_SHARING parameter to FORCE. These issues include wrong results being returned at times, instance crashes, excessively long parse times, problems with function-based indexes, and so forth. Also, when the distribution of values is everything but uniform, systematically binding constants may result in much worse performance.

The cache buffer chains latch contention is usually because of a session repeatedly reading the same data blocks. First identify the SQL statement that’s responsible for the highest number of the cache buffers chain latches and see whether you can tune it. If this doesn’t reduce the latch contention, you must identify the actual hot blocks that are being repeatedly read.

If a hot block belongs to an index segment, you may consider partitioning the table and using local indexes. For example, a hash partitioning scheme lets you spread the load among multiple partitioned indexes. You can also consider converting the table to a hash cluster based on the indexed columns. This way, you can avoid the index altogether. If the hot blocks belong to a table segment instead, you can still consider partitioning the table to spread the load across the partitions. You may also want to reconsider the application design to see why the same blocks are being repeatedly accessed, thus rendering them “hot.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset