The Oracle database is a very well instrumented piece of code, and it has been so for quite a while. It keeps track of the amount of time spent in discrete operations via the use of wait events. While the database software is quite complex, wait event analysis allows performance analysts to determine precisely where the database is spending its time. Many difficult performance problems can be resolved by analyzing these wait events. The introduction of Exadata has resulted in the creation of several new wait events to support the unique operations that are performed on the platform. This chapter will focus on describing these new events and how they relate to the activities actually being performed, while contrasting them with the wait events used by the database on non-Exadata platforms. It will also describe a few wait events that aren't specific to Exadata but play an important role on Exadata platforms.
A wait event is actually a section of code that is timed and assigned a name. The pieces of code covered by these events are quite often discrete operating system calls such as I/O requests, but some wait events cover considerably larger portions of code. The events may even include sections of code that contain other wait events. The naming of wait events has been somewhat inconsistent over the years, and many events have somewhat misleading names. Even though some of the event names are acknowledged to be potentially misleading, Oracle has been understandably reluctant to change them. The Exadata platform has provided an excuse to rename some of the I/O-related events, and as you'll see shortly, the developers took the opportunity to do just that.
There are actually no events that exist exclusively on the Exadata platform. Wait events are built into the database code, and since the compute nodes on Exadata run standard Oracle Database software, all the wait events that are used when invoking Exadata-specific features are available in databases that are running on non-Exadata platforms as well. But because the Exadata features are only available on the Exadata platform, no time is ever allocated to those events on other platforms. By way of proof, consider the following example, which compares the events from V$EVENT_NAME
(which exposes the valid wait events) first on an Exadata Database Machine and then on a standard 11gR2 database on a non-Exadata platform:
SYS@EXDB> select count(*) from v$event_name;
COUNT(*)
----------
1118
SYS@EXDB> select count(*) from v$event_name@LAB112;
COUNT(*)
----------
1118
SYS@EXDB> select name event from v$event_name
2 minus
3* select name from v$event_name@LAB112;
no rows selected
So there are no differences in the events. This does make it a little difficult to come up with a list of “Exadata Only” events.
The Oracle Exadata Storage Server Software User's Guide 11g Release 2 provides a table of wait events. All of them start with the word “cell.” The manual lists nine such events. One of them (cell interconnect retransmit during physical read
) doesn't actually exist.
There are also another nine events having the word “cell” in their names that are not included in the documentation. So we'll use the complete list of “cell” events as a starting point. You can query V$EVENT_NAME
for that list, and you'll get the following results. Notice that most of the events are in one of the I/O classes. There are only 17 rows, because one of the documented events does not exist in the version of Oracle that we're running (11.2.0.2).
SYS@POC2> @valid_events
Enter value for event_name: cell%
EVENT CLASS
--------------------------------------------- -------------------------
cell list of blocks physical read User I/O
cell multiblock physical read User I/O
cell single block physical read User I/O
cell smart file creation User I/O
cell smart index scan User I/O
cell smart table scan User I/O
cell statistics gather User I/O
cell manager closing cell System I/O
cell manager discovering disks System I/O
cell manager opening cell System I/O
cell smart incremental backup System I/O
cell smart restore from backup System I/O
cell manager cancel work request Other
cell smart flash unkeep Other
cell worker online completion Other
cell worker retry Other
cell worker idle Idle
17 rows selected.
The following sections will cover all of these events, along with a couple of additional events that have special applicability to Exadata.
First, though, it might be interesting to see what operations (plan steps) cause the “cell” wait events to occur. Here is a query against DBA_HIST_ACTIVE_SESS_HISTORY
on an active production system running on an Exadata V2 that shows cell events and the operations that caused them:
SYS@EXDB1> select event, operation, count(*) from (
2 select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
3 from DBA_HIST_ACTIVE_SESS_HISTORY
4 where event like 'cell %')
5 group by operation, event
6 order by 1,2,3
7 /
EVENT OPERATION COUNT(*)
---------------------------------- ---------------------------------------- ----------
cell list of blocks physical read 62
DDL STATEMENT 2
INDEX FAST FULL SCAN 1
INDEX RANGE SCAN 3060
INDEX STORAGE FAST FULL SCAN 7
INDEX STORAGE SAMPLE FAST FULL SCAN 10
INDEX UNIQUE SCAN 1580
INSERT STATEMENT 6
TABLE ACCESS BY GLOBAL INDEX ROWID 151
TABLE ACCESS BY INDEX ROWID 5458
TABLE ACCESS BY LOCAL INDEX ROWID 131
TABLE ACCESS STORAGE FULL 183
TABLE ACCESS STORAGE SAMPLE 2
TABLE ACCESS STORAGE SAMPLE BY ROWID RAN 1
cell multiblock physical read 3220
DDL STATEMENT 157
INDEX FAST FULL SCAN 94
INDEX RANGE SCAN 2
INDEX STORAGE FAST FULL SCAN 6334
INDEX STORAGE SAMPLE FAST FULL SCAN 429
UNIQUE SCAN 2
VIEW ACCESS STORAGE FULL 634
MAT_VIEW ACCESS STORAGE SAMPLE 56
TABLE ACCESS BY GLOBAL INDEX ROWID 5
TABLE ACCESS BY INDEX ROWID 484
TABLE ACCESS BY LOCAL INDEX ROWID 3
TABLE ACCESS STORAGE FULL 41559
TABLE ACCESS STORAGE SAMPLE 1763
TABLE ACCESS STORAGE SAMPLE BY ROWID RAN 78
UPDATE 4
cell single block physical read 181186
BUFFER SORT 1
CREATE TABLE STATEMENT 67
DDL STATEMENT 985
DELETE 11204
DELETE STATEMENT 6
FIXED TABLE FIXED INDEX 352
FOR UPDATE 27
HASH GROUP BY 3
HASH JOIN 14
HASH JOIN RIGHT OUTER 1
INDEX BUILD NON UNIQUE 80
INDEX BUILD UNIQUE 6
INDEX FAST FULL SCAN 9
INDEX FULL SCAN 1101
INDEX RANGE SCAN 17597
INDEX RANGE SCAN (MIN/MAX) 1
INDEX RANGE SCAN DESCENDING 6
INDEX SKIP SCAN 691
INDEX STORAGE FAST FULL SCAN 313
INDEX STORAGE SAMPLE FAST FULL SCAN 72
INDEX UNIQUE SCAN 30901
INSERT STATEMENT 5174
LOAD AS SELECT 120
LOAD TABLE CONVENTIONAL 5827
MAT_VIEW ACCESS STORAGE FULL 3
MAT_VIEW ACCESS STORAGE SAMPLE 1
MERGE 12
PX COORDINATOR 1
SELECT STATEMENT 978
SORT CREATE INDEX 1
SORT GROUP BY 1
SORT JOIN 5
SORT ORDER BY 2
TABLE ACCESS BY GLOBAL INDEX ROWID 5812
TABLE ACCESS BY INDEX ROWID 65799
TABLE ACCESS BY LOCAL INDEX ROWID 4591
TABLE ACCESS BY USER ROWID 464
TABLE ACCESS CLUSTER 57
TABLE ACCESS STORAGE FULL 7168
TABLE ACCESS STORAGE SAMPLE 205
TABLE ACCESS STORAGE SAMPLE BY ROWID RAN 24
UNION-ALL 7
UPDATE 89353
UPDATE STATEMENT 367
WINDOW CHILD PUSHED RANK 2
WINDOW SORT 1
WINDOW SORT PUSHED RANK 1
cell smart file creation 35
DELETE 3
INDEX BUILD NON UNIQUE 5
LOAD AS SELECT 3
LOAD TABLE CONVENTIONAL 1
UPDATE 1
cell smart incremental backup 714
cell smart index scan 14
INDEX STORAGE FAST FULL SCAN 42
INDEX STORAGE SAMPLE FAST FULL SCAN 32
cell smart table scan 163
MAT_VIEW ACCESS STORAGE FULL 1
TABLE ACCESS STORAGE FULL 12504
Obviously this output does not show all the possible combinations, but it should give you an idea of the relative frequency of events and which operations generally motivate them.
The User I/O Class is far and away the most important for Exadata. The most interesting events in this category are of course the two Smart Scan events (cell smart table scan
and cell smart index scan
). These are the events that record time for the primary Offloading optimizations provided by Exadata, which include predicate filtering, column projection, and storage index usage. The User I/O Class also contains three events described as physical I/O events. These three events actually measure time for physical I/O using the more familiar multi-block and single-block read mechanisms we're used to seeing on non-Exadata platforms, although their names have been changed to something a little more meaningful. Finally, there are two events that don't really seem to belong in the User I/O category at all. One has to do with initialization of blocks when file space is allocated. The other has to do with gathering statistical information from the storage cells. We'll cover each of these wait events in turn in the next several sections, starting with the Smart Scan events.
The cell smart table scan
event is what Oracle uses to account for time spent waiting for Full Table Scans that are Offloaded. It is the most important new event on the Exadata platform. Its presence or absence can be used to verify whether a statement benefited from Offloading or not. As discussed in Chapter 2, Offloading only occurs when Oracle is able to do direct path reads. Consequently, this event replaces the direct path read
event in most cases on Exadata. As with normal direct path reads, data is returned directly to the PGA of the requesting process on the database server (either the user's shadow process or a parallel slave process). Blocks are not returned to the buffer cache.
Although the mechanism for performing reads across the InfiniBand network is very different from that for normal reads on non-Exadata platforms, the code path driving the smart scans is actually very similar to a direct path read on a non-Exadata platform. The main difference lies in the fact that each request to a storage cell contains a reference to the metadata of the statement, which in the case of Exadata includes the predicates and the list of columns to be returned. Since the storage cells have access to this information, they can apply the filters and do the column projection before returning the data to the requesting process. These optimizations are applied to each set of blocks as they are requested. The processes on the database servers that request the data have access to the ASM extent map and can therefore request the required allocation units (AUs) from each storage cell. The storage cells read the requested AU and apply the predicate filters and, if any rows satisfy the filters, return the projected columns to the requesting process. The process then requests the next AU and the entire routine is repeated until all the data has been scanned. So this event will occur repeatedly in a large scan.
Note Column Projection is one of the major optimizations provided by Smart Scans. The feature is slightly misunderstood. It does not pass only columns in the select list back to the database servers; it also passes back some columns from the WHERE
clause. Older versions of cellsrv
passed all columns specified in a WHERE
clause back to the database tier. Later versions have corrected this behavior to include only columns that are involved in join predicates.
Here's a portion of a 10046 trace file showing some of the events. The query generating this trace file did a full table scan which was performed via the direct read mechanism. (Note that we've shortened some of the lines to get them to fit, primarily by stripping off the time values):
PARSING IN CURSOR #2 len=32 dep=0 uid=0 oct=3 lid=0 hv=123 ad='196' sqlid='162wjnvwyybhn'
select sum(pk_col) from kso.skew
END OF STMT
PARSE #2:c=2000,e=2424,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=568322376
EXEC #2:c=0,e=34,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=568322376
WAIT #2: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=-1
WAIT #2: nam='ges message buffer allocation' ela= 2 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='KJC: Wait for msg sends to complete' ela= 10 msg=6674450368 dest|rcvr=65536
WAIT #2: nam='reliable message' ela= 1107 channel context=6712270872 channel handle=66967991
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 104 name|mode=126 2=65575 0=1 obj#=-1
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 103 name|mode=126 2=65575 0=2 obj#=-1
WAIT #2: nam='cell smart table scan' ela= 162 cellhash#=2133459483 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 244 cellhash#=379339958 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 181 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 1285 cellhash#=2133459483 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 1327 cellhash#=379339958 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 1310 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 19755 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 39 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 24 cellhash#=2133459483 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 847 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 44 cellhash#=2133459483 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 674 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 32 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 39 cellhash#=2133459483 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 928 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 37 cellhash#=3176594409 p2=0 p3=0 obj#=66849
WAIT #2: nam='cell smart table scan' ela= 97 cellhash#=379339958 p2=0 p3=0 obj#=66849
This portion of the trace file also shows the enq: KO - fast object checkpoint
event, which is used to ensure that any dirty blocks for the scanned object are flushed to disk prior to beginning the scan. By the way, the direct path read
event is not completely eliminated on Exadata platforms. In fact, we can use a hint to disable Offloading and see how the same statement behaves without Offloading.
PARSING IN CURSOR #2 len=84 dep=0 uid=0 oct=3 lid=0 hv=123 ad='196' sqlid='cczz6y6rtz4bn'
select /*+ OPT_PARAM('cell_offload_processing' 'false') */ sum(pk_col) from kso.skew
END OF STMT
PARSE #2:c=26995,e=33582,p=1,cr=100,cu=0,mis=1,r=0,dep=0,og=1,plh=568322376
EXEC #2:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=568322376
WAIT #2: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=-1
WAIT #2: nam='ges message buffer allocation' ela= 3 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='KJC: Wait for msg sends to complete' ela= 13 msg=66 dest|rcvr=65536 mtype=12
WAIT #2: nam='reliable message' ela= 1355 channel context=6712270872 channel
handle=6696795216 broadcast message=6741335856 obj#=-1 tim=1285455676264108
WAIT #2: nam='ges message buffer allocation' ela= 2 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 172 name|mode=126 2=65573 0=1 obj#=-1
WAIT #2: nam='ges message buffer allocation' ela= 2 pool=0 request=1 allocated=0 obj#=-1
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 221 name|mode=126 2=65573 0=2 obj#=-
WAIT #2: nam='kfk: async disk IO' ela= 14 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='direct path read' ela= 23746 file number=7 first dba=76322 block cnt=94
obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 28 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 20 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 11 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 6 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='direct path read' ela= 18566 file number=7 first dba=76800 block cnt=128
obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 19 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 20 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 6 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='direct path read' ela= 35083 file number=7 first dba=77184 block cnt=128
obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 19 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 22 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 7 count=1 intr=0 timeout=4294967295 obj#=66849
WAIT #2: nam='kfk: async disk IO' ela= 5 count=1 intr=0 timeout=4294967295 obj#=66849
Note that we still have the enq: KO – fast object checkpoint
events for flushing dirty blocks. So it is clear that the cell smart table scan
event replaces this event.
The parameters for this event are not particularly informative. Only the object ID of the table being scanned and the cell hash number are provided:
P1
- Cell hash number
P2
- Not used
P3
- Not used
obj#
- The object number of the table being scanned
You'll notice that the direct path read
event (which cell smart table scan
replaces) provides additional information including the file number, the offset into the file (first dba
) and the number of contiguous blocks read (block cnt
). On the other hand, with the direct path read
event, there is no indication how the read requests are routed to the individual cells.
Note The Cell hash number reported with many of the wait events can be found in the V$CELL
view. This view has only two columns, CELL_PATH
and CELL_HASHVAL
. The CELL_PATH
column actually contains the IP address of the storage cell.
Time is clocked to the cell smart index scan
event when fast full index scans are performed that are Offloaded. This event is analogous to cell smart table scan
, except that the object being scanned is an index. It replaces the direct path read
event and returns data directly to the PGA of the requesting process as opposed to the buffer cache.
This event does not show up very often on the systems we have observed, probably for several reasons:
Kevin says: A SmartScan on an index is possible without parallelism, but the rules are strict. If you alter the session to disable parallel query and set _serial_direct_read=TRUE
in the session, and then run your test, you should get a smart, serial, index fast full scan (IFFS). If not, then you're looking at a bug. In short, to get a serial IFFS the object needs to be more than five times the size of the SGA.
Gluing range scans to smart index scans can be done, but it's very messy. There isn't likely much to be gained there in terms of completing a query.
As a result of these issues, cell smart index scans
are fairly rare compared to cell smart table scans
. It is, of course, possible to encourage the feature with hints (such as parallel_index
) or by decorating specific indexes with a parallel degree setting of greater than 1. Here's an excerpt from a 10046 trace file showing the event:
PARSING IN CURSOR #1 len=112 dep=1 uid=0 oct=3 lid=0 hv=123 ad='197' sqlid='dnmumhk9599p7'
select /*+ parallel_index (a skew2_all 8) index_ffs (a skew2_all) */ sum(pk_col) from kso.skew2 a where col1 > 1
END OF STMT
PARSE #1:c=0,e=89,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2801817111,tim=1285372448705192
WAIT #1: nam='PX Deq: Execution Msg' ela= 3124 sleeptime/senderid=268 passes=1 p3=0 obj#=-1
WAIT #1: nam='cell smart index scan' ela= 133 cellhash#=379339958 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 522 cellhash#=2133459483 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 277 cellhash#=3176594409 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 1910 cellhash#=379339958 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 330 cellhash#=2133459483 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 348 cellhash#=3176594409 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 45162 cellhash#=3176594409 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 13 cellhash#=3176594409 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 207 cellhash#=379339958 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 30 cellhash#=379339958 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 779 cellhash#=3176594409 p2=0 p3=0 obj#=66856
WAIT #1: nam='cell smart index scan' ela= 30 cellhash#=2133459483 p2=0 p3=0 obj#=66856
Note that this trace file was produced by one of the parallel slave processes and not the requesting process. The trace produced for the same statement when Offloading is disabled should look familiar. Here's an excerpt:
PARSING IN CURSOR #1 len=158 dep=1 uid=0 oct=3 lid=0 hv=338 ad='196' sqlid='g2966dwa2xfm8'
select /*+ OPT_PARAM('cell_offload_processing' 'false') parallel_index (a skew2_all 8) index_ffs (a skew2_all) */ sum(pk_col) from kso.skew2 a where col1 > 1
END OF STMT
PARSE #1:c=0,e=94,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2801817111,tim=1285546990415744
WAIT #1: nam='PX Deq: Execution Msg' ela= 2377 sleeptime/senderid=268566527 passes=1 p3=0
WAIT #1: nam='latch: cache buffers chains' ela= 44 address=6636639304 number=150 tries=0
WAIT #1: nam='kfk: async disk IO' ela= 12 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='direct path read' ela= 28957 file number=6 first dba=13 block cnt=124 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 32 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 24 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 24 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 8 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='direct path read' ela= 1410 file number=6 first dba=1304064 block cnt=128 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 24 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 21 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 24 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 23 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 23 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 23 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 32 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 28 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 21 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 25 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 31 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 7 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='direct path read' ela= 16725 file number=6 first dba=1303171 block cnt=125 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 24 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 27 count=1 intr=0 timeout=4294967295 obj#=66856
WAIT #1: nam='kfk: async disk IO' ela= 8 count=1 intr=0 timeout=4294967295 obj#=66856
Again we see the repeating pattern of a single direct path read
event followed by several kfk: async disk IO
events. The cell smart index scan
event replaces both of these events. The enq: KO – fast object checkpoint
events for flushing dirty blocks prior to starting the direct path reads are still present (although not shown in this excerpt because they occur in the query coordinator process, not in the parallel slave processes).
Just as with the cell smart table scan
event, the parameters for cell smart index scan
do not contain a lot of details. The cell hash number and the object ID of the segment being scanned are the only information provided.
P1
- Cell hash number
P2
- Not used
P3
- Not used
obj#
- The object number of the index being scanned
This event is equivalent to the db file sequential read
event used on non-Exadata platforms. Single block reads are used most often for index access paths (both the index block reads and the table block reads via rowids from the index lookups). They can also be used for a wide variety of other operations where it makes sense to read a single block.
Here is the output of a query that shows the operations that resulted in the cell single block physical read
wait event on an active production system:
SYS@EXDB1> select event, operation, count(*) from (
2 select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
3 from DBA_HIST_ACTIVE_SESS_HISTORY
4 where event like 'cell single%')
5 group by operation, event
6 order by 1,2,3
7 /
EVENT OPERATION COUNT(*)
--------------------------------- ---------------------------------------- ----------
cell single block physical read 13321
CREATE TABLE STATEMENT 35
DDL STATEMENT 118
DELETE 269
FIXED TABLE FIXED INDEX 3
FOR UPDATE 2
HASH JOIN 4
HASH JOIN RIGHT OUTER 8
INDEX FULL SCAN 9283
INDEX FULL SCAN (MIN/MAX) 1
INDEX RANGE SCAN 2763
INDEX STORAGE FAST FULL SCAN 6
INDEX STORAGE SAMPLE FAST FULL SCAN 13
INDEX UNIQUE SCAN 1676
INSERT STATEMENT 1181
LOAD AS SELECT 6
LOAD TABLE CONVENTIONAL 92
MERGE 106
SELECT STATEMENT 41
SORT ORDER BY 6
TABLE ACCESS BY GLOBAL INDEX ROWID 10638
TABLE ACCESS BY INDEX ROWID 8714
TABLE ACCESS BY LOCAL INDEX ROWID 10446
TABLE ACCESS CLUSTER 12
TABLE ACCESS STORAGE FULL 776
TABLE ACCESS STORAGE SAMPLE 40
UPDATE 8116
As you can see, row access via an index is the most common operation that generates this event. You should also be aware that Exadata provides a large amount of flash cache (384G) on each storage cell. For that reason, physical reads (both multi-block and single-block) are considerably faster than on most disk-based storage systems. Here is an excerpt from an AWR report showing a histogram of single block reads for the instance:
% of Waits
-----------------------------------------
Total
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s
-------------------------- ----- ---- ---- ---- ---- ----- ----- ---- ----
cell single block physical 2940K 94.4 3.2 .3 .6 .9 .5 .2 .0
Notice that about 95% of the cell single block physical read
events take less than 1ms. This is fairly representative of several production systems that we've observed.
The cell single block physical read
event provides more information than most cell events. The parameters allow you to tell exactly which object was read along with providing the disk and cell where the block was stored.
P1
- Cell hash number
P2
- Disk hash number
P3
- Total bytes passed during read operation (always 8192 assuming 8K block size)
obj#
- The object number of the object being read
This is another renamed event. It is equivalent to the less clearly named db file scattered read
event. On non-Exadata platforms, Oracle Database 11gR2 still uses the db file scattered read
event whenever it issues a contiguous multi-block read to the operating system.
This event is generally used with Full Table Scans and Fast Full Index scans, although it can be used with many other operations. The new name on the Exadata platform is much more descriptive than the older name. This wait event is not nearly as prevalent on Exadata platforms as on non-Exadata platforms, because Exadata handles many full scan operations with Smart Scans that have their own wait events (cell smart table scan
and cell smart index scan
). The cell multiblock physical read
event on Exadata platforms is used for serial Full Scan operations on tables that are below the threshold for serial direct path reads. That is to say, you will see this event used most often on Full Scans of relatively small tables. It is also used for Fast Full Index Scans that are not executed with direct path reads. Here is the output of a query that shows the operations that resulted in the cell multiblock physical read
wait event on an active production system:
EVENT OPERATION COUNT(*)
-------------------------------- ---------------------------------------- ----------
cell multiblock physical read 764
DDL STATEMENT 28
INDEX FAST FULL SCAN 2
INDEX STORAGE FAST FULL SCAN 657
INDEX STORAGE SAMPLE FAST FULL SCAN 133
TABLE ACCESS BY INDEX ROWID 74
TABLE ACCESS BY LOCAL INDEX ROWID 1428
TABLE ACCESS STORAGE FULL 5046
TABLE ACCESS STORAGE SAMPLE 916
The cell multiblock physical read
event also provides more information than most cell events. The parameters in the following list allow you to tell which object was read and identifies the disk and cell where the blocks were stored. The total bytes passed should be a multiple of the block size.
P1
- Cell hash number
P2
- Disk hash number
P3
- Total bytes passed during read operation
obj#
- The object number of the object being read
This event is a replacement for the db file parallel read
event. It appears that the developers took the opportunity to rename some of the events that are related to disk operations, and this is one of those events. The new name is actually much more descriptive than the previous name, since the wait event has nothing whatsoever to do with parallel query or parallel DML.
This event is used for a multi-block read of noncontiguous blocks. This is more effective with async I/O, which is enabled on Exadata by default. This event can be provoked by several operations. The most common are Index Range Scans, Index Unique Scans and Table Access By Index Rowid. The most common reason for the event is index pre-fetching. Here is the output of a query that shows the operations that resulted in this wait event on an active production system:
SYS@EXDB1> select event, operation, count(*) from (
2 select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
3 from DBA_HIST_ACTIVE_SESS_HISTORY
4 where event like 'cell list%')
5 group by operation, event
6 order by 1,2,3
7 /
EVENT OPERATION COUNT(*)
----------------------------------- ---------------------------------------- --------
cell list of blocks physical read 2
INDEX RANGE SCAN 156
INDEX STORAGE FAST FULL SCAN 1
INDEX UNIQUE SCAN 66
TABLE ACCESS BY GLOBAL INDEX ROWID 90
TABLE ACCESS BY INDEX ROWID 1273
TABLE ACCESS BY LOCAL INDEX ROWID 2593
TABLE ACCESS STORAGE FULL 20
TABLE ACCESS STORAGE SAMPLE 1
As you can see, the vast majority of these events were motivated by index access paths. By the way, on non-Exadata platforms, noncontiguous multi-block reads still clock time to the old db file parallel read
event. It's also possible for this older wait event name to show up on an Exadata platform for some operations.
The cell list of blocks physical read
event provides more information than most cell events. The following parameters allow you to tell exactly which object was read along with identifying the disk and cell where the block was stored.
P1
- Cell hash number
P2
- Disk hash number
P3
- Number of blocks read
obj#
- The object number of the object being read
Exadata has an optimization technique that allows the storage cells to do the initialization of blocks when a data file is created or extended. This occurs when a tablespace is created or a data file is manually added to a tablespace. However, it can also occur when a data file is automatically extended during DML operations.
We previously stated that this event seemed out of place in the User I/O class. However, if it occurs because of DML operations, it makes sense to have it in this category. At any rate, offloading the block formatting eliminates CPU usage and I/O from the database servers and moves it to the storage tier. When this occurs, time is collected in the cell smart file creation
event. This event replaces the Data file init write
event that is still used on non-Exadata platforms. Here is the output of a query from a busy production system showing operations that generated this event:
SYS@EXDB1> select event, operation, count(*) from (
2 select sql_id, event, sql_plan_operation||' '||sql_plan_options operation
3 from DBA_HIST_ACTIVE_SESS_HISTORY
4 where event like 'cell smart file%')
5 group by operation, event
6 order by 1,2,3
7 /
EVENT OPERATION COUNT(*)
-------------------------- ------------------------- --------
cell smart file creation 35
DELETE 3
INDEX BUILD NON UNIQUE 5
LOAD AS SELECT 3
LOAD TABLE CONVENTIONAL 1
UPDATE 1
You'll notice that on this particular system, the cell smart file creation
event was occasionally generated by a DELETE
statement. The fact that a DELETE
could cause this event might be a little surprising. But remember that this event is actually timing a section of code that does block formatting, not file creation.
The only parameter of interest for this event is P1
, which shows which cell was being accessed when this event was generated.
P1
- Cell hash number
P2
- Not used
P3
- Not used
The cell statistics gather
event records time spent reading from various V$
and X$
tables. Although the event is grouped in the User I/O category, it does not refer to I/O in the sense of reading and writing to and from disk.
Time is clocked to this event when a session is reading from V$CELL
, V$CELL_THREAD_HISTORY (X$KCFISOSSN)
, V$CELL_REQUEST_TOTALS (X$KCFISOSST)
, and a few other X$
tables in the same family. The event is miscategorized in our opinion and does not really belong in the User I/O category.
The parameters for this event provide no additional information. In fact, values are not even set for the parameters in this event. Here are the parameter definitions:
P1
- Cell hash number (always 0)
P2 - Not used
P3 - Not used
The Exadata wait events that are assigned to the System I/O class are of less importance and do not generally show up as major consumers of time. The backup events are the most interesting as they record time for sections of code that have been optimized on the Exadata platform. The others are simply housekeeping events. The non-backup events are listed in Table 12.1, while the backup events are detailed in the following sections.
This event is used to measure time spent waiting on RMAN when doing an Incremental Level 1 backup. Exadata optimizes incremental backups by offloading much of the processing to the storage tier. This new wait event was added to account for time spent waiting on the optimized incremental backup processing that is offloaded to the storage cells.
Interestingly, an Incremental Level 0 backup does not result in this wait event even though the word “incremental” is in the RMAN command. That is because the Level 0 backup doesn't actually do an incremental backup at all. It generates a full backup that is flagged as a baseline for future incremental backups. Here's an excerpt from a 10046 trace file generated on a process that was doing an incremental Level 1 backup:
WAIT #0: nam='cell smart incremental backup' ela= 27 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 54 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 119 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 603 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 120 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='control file sequential read' ela= 378 file#=0 block#=1 blocks=1 obj#=-1
WAIT #0: nam='control file sequential read' ela= 378 file#=1 block#=1 blocks=1 obj#=-1
WAIT #0: nam='control file sequential read' ela= 254 file#=0 block#=15 blocks=1 obj#=-1
WAIT #0: nam='control file sequential read' ela= 332 file#=0 block#=17 blocks=1 obj#=-1
WAIT #0: nam='control file sequential read' ela= 268 file#=0 block#=309 blocks=1 obj#=-1
WAIT #0: nam='control file sequential read' ela= 15794 file#=0 block#=307 blocks=1 obj#=-1
WAIT #0: nam='control file parallel write' ela= 351 files=2 block#=310 requests=2 obj#=-1
WAIT #0: nam='control file parallel write' ela= 404 files=2 block#=18 requests=2 obj#=-1
WAIT #0: nam='control file parallel write' ela= 211 files=2 block#=16 requests=2 obj#=-1
WAIT #0: nam='control file parallel write' ela= 294 files=2 block#=1 requests=2 obj#=-1
WAIT #0: nam='control file sequential read' ela= 944 file#=0 block#=1 blocks=1 obj#=-1
WAIT #0: nam='KSV master wait' ela= 13352 p1=0 p2=0 p3=0 obj#=-1 tim=1285518586442775
WAIT #0: nam='ASM file metadata operation' ela= 32 msgop=41 locn=0 p3=0 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 333 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='Disk file operations I/O' ela= 67 FileOperation=3 fileno=1 filetype=10 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 392 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 95 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 169 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 104 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 128 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 115 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart incremental backup' ela= 722 cellhash#=3176594409 p2=0 p3=0 obj#=-1
The only parameter used for this event is P1
, which shows which cell was responsible for generating the event.
P1
- Cell hash number
P2
- Not used
P3
- Not used
obj#
- Not used
Note The obj# field is a part of many wait events, even some that are not specifically related to an individual object. Be aware that in some cases, the value may be set by one event and then not cleared appropriately when the wait ends, resulting in meaningless values left in place for the next wait event. In the previous example, the obj# was cleared (set to a value of -1).
This event is used to measure time spent waiting on RMAN when doing a restore. Exadata optimizes RMAN restores by offloading processing to the storage cells.
The event actually records time related to file initialization during a restore. Here's an excerpt from a 10046 trace file taken while a restore was in progress:
The input backup piece +RECO/test/backupset/2010_09_26/nnndn0_tag20100926t111118_0.1263.730725079 is in BASIC 11.2 compressed format.
WAIT #0: nam='kfk: async disk IO' ela= 98 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 757 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 94 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='RMAN backup & recovery I/O' ela= 20 count=1 intr=256 timeout=214748364 obj#=-1
WAIT #0: nam='RMAN backup & recovery I/O' ela= 22 count=1 intr=256 timeout=2147483647 obj#=-
WAIT #0: nam='RMAN backup & recovery I/O' ela= 15 count=1 intr=256 timeout=2147483647 obj#=-
WAIT #0: nam='RMAN backup & recovery I/O' ela= 10 count=1 intr=256 timeout=2147483647 obj#=-
WAIT #0: nam='RMAN backup & recovery I/O' ela= 14 count=1 intr=256 timeout=2147483647 obj#=-
. . .
WAIT #0: nam='RMAN backup & recovery I/O' ela= 1364 count=1 intr=256 timeout=2147483647
WAIT #0: nam='kfk: async disk IO' ela= 16 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 30 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='RMAN backup & recovery I/O' ela= 4783 count=1 intr=256 timeout=2147483647
WAIT #0: nam='kfk: async disk IO' ela= 5 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='RMAN backup & recovery I/O' ela= 2952 count=1 intr=256 timeout=2147483647
WAIT #0: nam='kfk: async disk IO' ela= 14 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 33 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 62 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 92 count=1 intr=0 timeout=4294967295 obj#=-1
WAIT #0: nam='kfk: async disk IO' ela= 12 count=1 intr=0 timeout=4294967295 obj#=-1
. . .
WAIT #0: nam='cell smart restore from backup' ela= 43249 cellhash#=379339958 p2=0 p3=0 obj#=
WAIT #0: nam='cell smart restore from backup' ela= 18141 cellhash#=213345948 p2=0 p3=0 obj#=
WAIT #0: nam='cell smart restore from backup' ela= 430 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 378 cellhash#=213345948 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 15 cellhash#=2133459483 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 1355 cellhash#=213345948 p2=0 p3=0 obj#=-
WAIT #0: nam='cell smart restore from backup' ela= 766 cellhash#=2133459483 p2=0 p3=0 obj#=-
WAIT #0: nam='cell smart restore from backup' ela= 167 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 199 cellhash#=213345948 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 19 cellhash#=213345948 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 226 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 127 cellhash#=213345948 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 110 cellhash#=213345948 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 177 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #0: nam='cell smart restore from backup' ela= 160 cellhash#=213345948 p2=0 p3=0 obj#=-1
The only parameter used for this event is P1, which shows which cell was responsible for generating the event.
P1
- Cell hash number
P2
- Not used
P3
- Not used
These are relatively minor events that occur primarily during startup and shut down of storage cells and fault conditions. You will probably not see them on normally functioning systems. There is one exception to this, the cell smart flash unkeep
event. Table 12.2 lists the “cell” wait events in the Other class along with their parameters. A separate section will cover cell smart flash unkeep
.
This event records the time spent waiting when Oracle must flush blocks out of Exadata Smart Flash Cache. This can occur when a table that has a storage clause designating that it be pinned in Exadata Smart Flash Cache is truncated or dropped.
Truncating and dropping tables doesn't generally happen very often in most production systems (with some notable exceptions) and therefore this wait event will probably not be seen very often. Here's an excerpt from a 10046 trace file showing the event:
WAIT #4: nam='enq: RO - fast object reuse' ela= 393 name|mode=1380909062 2=65581 0=1 obj#=-1
WAIT #4: nam='reliable message' ela= 1548 channel context=6712270872 channel handle=6696807856
broadcast message=6741342984 obj#=-1 tim=1286218926420916
WAIT #4: nam='enq: RO - fast object reuse' ela= 21027 name|mode=1380909 2=65581 0=1 obj#=-1
WAIT #4: nam='ges message buffer allocation' ela= 2 pool=0 request=1 allocated=0 obj#=-1
WAIT #4: nam='enq: RO - fast object reuse' ela= 159 name|mode=1380909057 2=65581 0=2 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 336 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 291 cellhash#=2133459483 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 319 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 418 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 1 cellhash#=2133459483 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 216 cellhash#=3176594409 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 222 cellhash#=379339958 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 196 cellhash#=2133459483 p2=0 p3=0 obj#=-1
WAIT #4: nam='cell smart flash unkeep' ela= 216 cellhash#=3176594409 p2=0 p3=0 obj#=-1
Note that the cell smart flash unkeep
event is preceded by a handful of enq: RO – fast object reuse
events, which are used to mark time associated with cleaning up the buffer cache after a drop or truncate. The cell smart flash unkeep
is basically an extension of that event to clean up the Exadata Smart flash cache on the storage server as well.
The only parameter used for this event is P1
, which shows which cell was responsible for generating the event.
P1
- Cell hash number
P2
- Not used
P3
- Not used
In addition to the new cell events, there are also a few non-Exadata-specific wait events that you should be aware of. These are events that you may already be familiar with from managing Oracle on other platforms. They happen to also be important in an Exadata environment, so they represent cases in which your existing knowledge and skill can carry over and stand you in good stead as you move in to managing Exadata.
Direct path reads are used by Oracle to read data directly into PGA memory (instead of into the buffer cache). They are an integral part of Exadata Offloading because SQL processing can only be offloaded to the storage cells when the direct path read mechanism is used. The direct path read
wait event is actually replaced by the cell smart table scan
and cell smart index scan
wait events when a query is offloaded. However, the direct path read mechanism is still used by the code covered by those new wait events. That is, either the plan must include a parallel scan or Oracle must decide to use the serial direct path read mechanism.
This event records time that Oracle spends waiting on a direct path read to complete. You should know that the direct path read wait event can be very misleading. Both the number of events recorded and the timings associated with them can appear to be inaccurate. This is due to the fact that direct path reads are done in an asynchronous and overlapping fashion. See MOS note 50415.1 for further details about why time recorded by this event can be misleading. It also bears mentioning that 11gR2 contains an enhancement that causes serial direct path reads to occur more frequently than in previous releases. See MOS Note 793845.1, which briefly mentions this change.
By the way, the direct path read
wait event does still show up on the Exadata platform for various operations but generally not for full table scans unless the table (or partition is relatively small).
The parameters for this event show you exactly which segment (obj) is scanned and which set of blocks were scanned during this event.
P1
- File number
P2
- First DBA
P3
- Block count
obj#
- The object number of the table being scanned
As mentioned in the Cell Smart Table Scan section, the parameters contain specific information about which file and object are being accessed. The offset into the file is also provided in the P2 parameter, along with the number of contiguous blocks read in the P3 parameter.
The enq:KO
event has a very strange name. Don't be put off by that. The event is essentially an object checkpoint
event. The V$LOCK_TYPE
view describes the KO lock as follows:
SYS@EXDB1> select type, name, description from v$lock_type
2* where type = 'KO';
TYPE NAME DESCRIPTION
----- ------------------------------ ---------------------------------------------
KO Multiple Object Checkpoint Coordinates checkpointing of multiple objects
This event is used when a session is waiting for all dirty blocks to be flushed from the buffer cache for an object prior to starting a direct path read
or cell smart table scan
or cell smart index scan
. This event is important because the time required to do the checkpoint may outweigh the benefit of the direct path reads. This is unlikely on Exadata storage, though, where the additional Smart Scan benefits are only enabled by the direct path read mechanism.
The parameters for this event are not overly helpful but the event does show which object is scanned. Here are the parameter definitions:
P1
- Name/ Mode
P2
- Not used
P3
- Not used
obj#
- The object number of the object being checkpointed
The reliable message
event is used to record time spent communicating with background processes, like the checkpoint process (CKPT). We have included it here because of its close association with the enq: KO – fast object checkpoint
event.
This event is the precursor to the enq: KO – fast object checkpoint
event (among others). The communication is done using an inter-process communication channel rather than a more normal post mechanism. This communication method allows the sender to request an ACK before it continues, thus the reason it is called a reliable message. It is generally a very short duration event as it only records time for communicating between processes. Both the users foreground process and the chkpt process will wait on this event as they communicate with each other. Here's an excerpt of a 10046 trace file showing a complete reliable message event:
PARSING IN CURSOR #46963980936744 len=50 dep=0 uid=0 oct=3 lid=0 tim=1301885220743528 hv=3032626544 ad='2cf675ff0' sqlid='7y09dtyuc4dbh'
select avg(pk_col) from kso.skew2 a where col1 > 0
END OF STMT
PARSE #46963980936744:c=1999,e=2122,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=2117817910,tim=1301885220743527
WAIT #46963980936744: nam='reliable message' ela= 1360 channel context=11888341784 channel handle=12088585896 broadcast message=12089037216 obj#=75759 tim=1301885220745397
WAIT #469: nam='enq: KO - fast object checkpoint' ela= 183 ... obj#=75759 tim=130
WAIT #469: nam='enq: KO - fast object checkpoint' ela= 144 ... obj#=75759 tim=130
WAIT #469: nam='enq: PS - contention' ela= 200 ... obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 58333 p1=0 p2=0 p3=0 obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 101705 p1=0 p2=0 p3=0 obj#=75759 tim=130
WAIT #469: nam='enq: PS - contention' ela= 276 ... obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 102988 p1=0 p2=0 p3=0 obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 103495 p1=0 p2=0 p3=0 obj#=75759 tim=130
WAIT #469: nam='enq: PS - contention' ela= 237 ... obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 102655 p1=0 p2=0 p3=0 obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 102329 p1=0 p2=0 p3=0 obj#=75759 tim=130
WAIT #469: nam='enq: PS - contention' ela= 313 ... obj#=75759 tim=130
WAIT #469: nam='os thread startup' ela= 102673 p1=0 p2=0 p3=0 obj#=75759 tim=130
Here are the parameters for the reliable message event.
P1
- channel context
P2
- channel handle
P3
- broadcast message
obj#
- The object number of the object of interest (not always set)
Finally, there are a few Resource Manager events that you should be aware of. While these are not specific to Exadata, Resource Manager provides key functionality for combining mixed workloads on Exadata. There are actually eight separate events as of release 11.2.0.2. The following query against V$EVENT_NAME shows these events and their parameters.
SYS@SANDBOX1> @valid_events
Enter value for event_name: resmgr%
EVENT CLASS P1 P2 P3
------------------------------ --------------- -------- -------------------- ----
resmgr:internal state change Concurrency location
resmgr:sessions to exit location
resmgr:internal state cleanup Other location
resmgr:become active Scheduler location
resmgr:cpu quantum location consumer group id
resmgr:large I/O queued location
resmgr:pq queued location
resmgr:small I/O queued location
8 rows selected.
There are only two of these events that are of interest.
This event is used to record forced idle time imposed by Database Resource Manager (DBRM) due to competition with higher priority work. Said another way, it is the time a process spent waiting for DBRM to allocate it a time slice.
DBRM behaves in an analogous manner to CPU scheduling algorithms in that it divides time into units (quantum) and either allows a process to run or not depending on other workload on the system. Unlike CPU scheduling algorithms though, DBRM throttling is interjected at key locations in the Oracle code to eliminate the possibility of a process being kicked off of the CPU when it is holding a shared resource such as a latch. This prevents some nasty behavior that may occur on heavily loaded systems such as priority inversion problems. In effect, the processes voluntarily go to sleep when they are not holding these shared resources. There are multiple locations in the code where these checks are implemented. Here's an excerpt of a 10046 trace file showing the resmgr:cpu quantum
event:
PARSING IN CURSOR #47046073104400 len=34 dep=0 uid=85 oct=3 lid=85 tim=1301275393323414
hv=3308960238 ad='b3c9faf8' sqlid='c4js15z2mpfgf'
select count(*)
from kso.skew
END OF STMT
PARSE #470460:c=0,e=77,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=937687140,tim=130
EXEC #470460:c=0,e=49,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=937687140,tim=130
WAIT #470460: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 ...
WAIT #470460: nam='resmgr:cpu quantum' ela= 3457 location=3 consumer group id=75525 ...
WAIT #470460: nam='resmgr:cpu quantum' ela= 68126 location=3 consumer group id=75525 ...
WAIT #470460: nam='resmgr:cpu quantum' ela= 68046 location=3 consumer group id=75525 ...
WAIT #470460: nam='resmgr:cpu quantum' ela= 108941 location=3 consumer group id=75525 ...
WAIT #470460: nam='resmgr:cpu quantum' ela= 38234 location=3 consumer group id=75525 ...
WAIT #470460: nam='resmgr:cpu quantum' ela= 103282 location=3 consumer group id=75525 ...
Here are the parameters for this event. Note that the obj# parameter exists but is not used.
P1
- Location
P2
- Consumer group id
P3
- Not used
obj#
- NA
The location parameter is a numeric value that most likely refers to a location (function) in the Oracle code. There are at least 5 distinct locations that we have observed. Unfortunately, Oracle does not publicly document where in the Oracle kernel these checks are performed.
The consumer group number in the P2 parameter is pretty self-explanatory. It maps to the CONSUMER_GROUP_ID column in the DBA_RSRC_CONSUMER_GROUPS view. This parameter allows you to tell what consumer group a process was assigned to when its CPU usage was curtailed.
This event is used to record time spent waiting in the parallel query queue.
The parallel statement queuing feature comes with its own wait event. Statements that are queued due to insufficient parallel server processes or other directives clock time to this event. Here's an excerpt of a 10046 trace file showing the resmgr:pq queued
event:
PARSING IN CURSOR #47898436021000 len=73 dep=0 uid=0 oct=3 lid=0 tim=1301966072332694 hv=3186480617 ad='2cd2d1cb8' sqlid='bgcmmcyyyvpg9'
select /*+ parallel (a 16) */ avg(pk_col) from kso.skew3 a where col1 > 0
END OF STMT
PARSE #478:c=2000,e=1572,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=1404581711,tim=130
WAIT #478: nam='resmgr:pq queued' ela= 65102047 location=1 =0 =0 obj#=523 tim=130
WAIT #478: nam='enq: KO - fast object checkpoint' ela= 258 ... obj#=523
WAIT #478: nam='reliable message' ela= 2005 channel context=118 channel handle=120 ... obj#=523
WAIT #478: nam='enq: KO - fast object checkpoint' ela= 156 ... obj#=523 ...
WAIT #478: nam='PX Deq: Join ACK' ela= 772 sleeptime/senderid=268 passes=1 p3=120 obj#=523 ...
WAIT #478: nam='PX Deq: Join ACK' ela= 932 sleeptime/senderid=268 passes=8 p3=120 obj#=523 ...
WAIT #478: nam='PX Deq: Join ACK' ela= 2 sleeptime/senderid=0 passes=0 p3=0 obj#=523 ...
Here are the parameters for the resmgr:pq queued
event.
P1
- Location
P2
- Not used
P3
- Not used
obj#
- NA
The location parameter is a numeric value that most likely refers to a location (function) in the Oracle code as described above in the resmgr:cpu quantum
event.
The wait interface has been expanded to cover several Exadata-specific features. In this chapter we've covered the new wait events. By far the most interesting of the new events are cell smart table scan
and cell smart index scan
. These events cover the time spent waiting on an offloadable I/O request to a storage cell. There is a lot of processing that occurs at the storage layer that is lumped together under these events. It's important to understand that these events replace the direct path read
event and that the mechanism of returning the data directly to the process PGA employed by the Smart Scan events is analogous to the way it is handled by direct path read
.