There is an old saying among carpenters: “If the only tool you have is a hammer, everything starts to look like a nail.” Likewise, a properly chosen and deployed index can be one of the most flexible and sharpest tools in an Oracle DBA’s toolbox, as long as the DBA realizes that an index is not the only tool that may be appropriate for solving query performance issues.
This chapter discusses the different types of indexes that are available for Oracle databases, as well as each index type’s strengths and weaknesses. This chapter also looks at issues that can typically arise when indexes are misused or overused in an Oracle database and how to detect when this is the case. New features in Oracle Database Releases 11g and 12c that may alleviate the dependency on indexes for improved query performance are also covered in this chapter.
Indexes are very much like a double-edged razor blade: they are often excellent at cutting through the Gordian knot of query performance tuning when the only other alternative is to retrieve all of the data in a table via a full table scan. However, like any balanced, dual-edged blade, an index tends to cut both ways: it can become quite expensive to maintain, especially when implemented against a table that is updated frequently via an OLTP application workload. And it can become especially deleterious to application performance when deletions are permitted against the index’s underlying data segments.
The Oracle database offers two types of indexes—B-tree indexes and bitmap indexes—and each index type has distinct advantages and drawbacks.
A B-tree index (balanced tree index) is typically the most common type of index found in Oracle databases. A B-tree index is extremely efficient at organizing information about the values stored in one or more columns of a table. Moreover, it generally remains in a balanced state even after its underlying data has been modified via data manipulation language (DML) statements—hence its name—and does not require any maintenance to retain that balance except in the most extreme of circumstances.
Figure 18.1 illustrates the basic concepts behind a B-tree index storage structure. As data values are added to the first block (also sometimes called a page) of a B-tree index, it continues to fill up until it’s about 50 percent full, at which point it splits into a root page and two leaf pages. Each leaf contains roughly one-half of the total entries in the index, with the split based on the median values of those present in the index. The root page retains just two entries about where to find the data in the two leaf pages. As the number of entries in the index continues to grow, this page splitting will continue, and the appropriate entries will be stored initially in the root page until it fills up and subsequently in a new branch page that’s created when the initial root page’s space has been exhausted.
This storage structure usually enables the index to be traversed extremely quickly to locate the particular values being sought after, as is generally the purpose of indexing columns in a table. Subsequently, the ROWID
s of the desired values are captured to access the required data from the table.
B-tree indexes are, of course, useful for other purposes as well:
They are excellent structures for guaranteeing uniqueness of a primary key or unique key constraint.
If the columns of two row sets being joined are indexed, then it may be much faster to use the indexes themselves as the row sources to look for common members in each set.
If indexed values are sorted in the same order as the ORDER BY
or GROUP BY
clause in a query, the index can return the data in sorted order, thereby eliminating the expensive sorting process.
If all the columns needed to answer a query can be found in the index itself, retrieval from the underlying table may be completely bypassed.
Consider a simple table, AP.RANDOMIZED_SORTED
, with just four columns that have been populated with 10 million rows using DBMS_RANDOM
to generate anonymous data, as shown in Listing 18.1.
DROP TABLE ap.randomized_sorted PURGE;
CREATE TABLE ap.randomized_sorted (
key_id NUMBER(8)
,key_date DATE
,key_desc VARCHAR2(32)
,key_sts NUMBER(2) NOT NULL
)
TABLESPACE ado_cold_data
NOLOGGING;
TRUNCATE TABLE ap.randomized_sorted;
SET TIMING ON
DECLARE
TYPE tnb_KeyID IS
TABLE OF ap.randomized_sorted.key_id%TYPE
INDEX BY PLS_INTEGER;
TYPE tdt_KeyDate IS
TABLE OF ap.randomized_sorted.key_date%TYPE
INDEX BY PLS_INTEGER;
TYPE tvc_KeyDesc IS
TABLE OF ap.randomized_sorted.key_desc%TYPE
INDEX BY PLS_INTEGER;
TYPE tnb_KeySts IS
TABLE OF ap.randomized_sorted.key_sts%TYPE
INDEX BY PLS_INTEGER;
tb_KeyId tnb_KeyID;
tb_KeyDate tdt_KeyDate;
tb_KeyDesc tvc_KeyDesc;
tb_KeySts tnb_KeySts;
ctr PLS_INTEGER;
nMaxIterations CONSTANT PLS_INTEGER := 10000000;
BEGIN
-- Populate collections
FOR ctr IN 1..nMaxIterations
LOOP
tb_KeyID(ctr) := ctr;
tb_KeyDate(ctr) :=
(TO_DATE('12/31/2014','mm/dd/yyyy') - DBMS_RANDOM.VALUE(1,3650));
tb_KeyDesc(ctr) := LPAD(' ',DBMS_RANDOM.VALUE(1,32),
SUBSTR('abcdefghijklmnopqrstuvwxyz',DBMS_RANDOM.VALUE(1,26), 1));
CASE MOD(ROUND(DBMS_RANDOM.VALUE(1,nMaxIterations),0),50)
WHEN 0 THEN tb_KeySts(ctr) := 30;
WHEN 1 THEN tb_KeySts(ctr) :=40;
WHEN 2 THEN tb_KeySts(ctr) :=40;
WHEN 3 THEN tb_KeySts(ctr) :=40;
WHEN 4 THEN tb_KeySts(ctr) :=40;
WHEN 5 THEN tb_KeySts(ctr) :=20;
WHEN 6 THEN tb_KeySts(ctr) :=40;
WHEN 7 THEN tb_KeySts(ctr) :=40;
WHEN 8 THEN tb_KeySts(ctr) :=40;
WHEN 9 THEN tb_KeySts(ctr) :=30;
WHEN 10 THEN tb_KeySts(ctr) :=40;
WHEN 11 THEN tb_KeySts(ctr) :=20;
WHEN 12 THEN tb_KeySts(ctr) :=10;
WHEN 13 THEN tb_KeySts(ctr) :=15;
WHEN 14 THEN tb_KeySts(ctr) :=30;
ELSE tb_KeySts(ctr) :=50;
END CASE;
END LOOP;
-- Load table from collection items
FORALL ctr IN 1..nMaxIterations
INSERT INTO ap.randomized_sorted(key_id, key_date, key_desc, key_sts)
VALUES (tb_KeyID(ctr), tb_KeyDate(ctr), tb_KeyDesc(ctr), tb_KeySts(ctr));
COMMIT;
EXCEPTION
WHEN OTHERS THEN DBMS_OUTPUT.PUT_LINE('Fatal error detected!')
END;
/
To enforce a primary key constraint for this table’s primary key column (KEY_ID
), a B-tree index, as shown in Listing 18.2, is an excellent choice because each row has a unique value in the column KEY_ID
.
ALTER TABLE ap.randomized_sorted
ADD CONSTRAINT randomized_sorted_pk
PRIMARY KEY (key_id)
USING INDEX (
CREATE UNIQUE INDEX ap.randomized_sorted_pk
ON ap.randomized_sorted(key_id)
TABLESPACE ado_cold_idx
NOLOGGING
PARALLEL 4);
ALTER INDEX ap.randomized_sorted_pk LOGGING;
Note the use of parallelism as well as the NOLOGGING
clause to avoid writing out any online redo entries during its creation so that this index can be built as quickly as possible. Don’t forget to re-enable online redo log generation once the index has been created successfully.
Though they are typically used less frequently than B-tree indexes, the unique features of bitmap indexes are especially valuable for data warehousing application workloads. The easiest way to understand how a bitmap index works is to imagine all the ROWID
s of the corresponding table, laid on their side, along with a series of bitmaps—one for each unique value in the indexed column—arrayed below those ROWID
s. Each bit in a bitmap maps to a ROWID
. In each bitmap, the bits mapping to those ROWID
s will be set, which have the value corresponding to the bitmap in the indexed column, as shown in Figure 18.2.
Listing 18.3 shows an example of a bitmap index created on the KEY_STS
column in table AP.RANDOMIZED_SORTED
.
CREATE BITMAP INDEX ap.randomized_sorted_sts_bix
ON ap.randomized_sorted (key_sts)
TABLESPACE ado_cold_idx
NOLOGGING PARALLEL(4);
ALTER INDEX ap.randomized_sorted_sts_bix LOGGING;
Because they essentially reference multiple rows simultaneously, bitmap indexes tend to be especially useful for a database that’s servicing “read-mostly” data primarily for data warehouse application workloads:
A bitmap index works best for a column that comprises a small set of relatively static column values because filtering the rows that match selection criteria against that column requires searching the bitmaps only for the corresponding value.
A table with a very large number of rows—say, 500,000 or more—tends to benefit from a bitmap index because multiple rows can be searched for matching column values with significantly fewer reads of index blocks than with a B-tree index.
When individual bitmap indexes are present on two or more columns, selection criteria against those columns can leverage bitmap XAND
and XOR
operations to quickly filter matching column values.
Unlike B-tree indexes, a bitmap index can record the presence of a NULL
value in its corresponding column; therefore, the IS NOT NULL
operator can leverage a bitmap index without having to read all values in the underlying table.
Bitmap indexes can take advantage of parallel DML during data loading, so they are perfect for data warehouses that are typically loaded during longer-running batch operations.
However, the same features that make the bitmap index so powerful for data warehouses are also its Achilles heel, especially for heavily updated data as well as online transaction processing (OLTP) application workloads:
First and foremost, because they reference multiple rows simultaneously, bitmap index segments must be updated simultaneously too. Unfortunately, this means that the bitmap segments as well as the table rows they index must be locked while the index is reconstructed.
Bitmap indexes are not optimal for enforcing a PRIMARY KEY
or UNIQUE KEY
referential integrity constraint.
Likewise, a column that contains constantly changing values or whose domain grows constantly would be a poor choice for a bitmap index.
A table with a relatively small number of rows—say, less than 500,000—will not likely benefit enough from a bitmap index to justify the maintenance costs, which are much higher compared to that of a B-tree index.
Eventually, every bitmap entry must be reconverted into its corresponding ROWID
. Although this takes virtually no time to complete, it obviously impacts performance when millions of entries need to undergo conversion.
When thinking about partitioned tables in an Oracle database, it’s also important to understand the difference between local partitioned indexes and global partitioned indexes. Although these aren’t really different index types—they can only be either B-tree or bitmap indexes—each flavor of partitioned index offers distinct advantages and drawbacks.
As its name suggests, a local partitioned index exists only within the scope of a single table partition and indexes only the column values within that partition. Since the data values to be retrieved exist only within a few partitions, local indexes can be used to filter, sort, and retrieve data only within those partitions whenever the Oracle Optimizer chooses to leverage partition pruning. Using a local partitioned index usually results in much fewer physical reads to locate data.
Unlike a local partitioned index, a global partitioned index spans all existing partitions of a partitioned table. It is therefore useful in situations when the Oracle Optimizer determines that a majority of the table partitions must be scanned to retrieve pertinent data values, and thus the global index is a better choice than searching all the local partitioned indexes to locate data.
Listing 18.4 shows the creation of table AP.RANDOMIZED_PARTED
, which comprises four partitions.
DROP TABLE ap.randomized_parted PURGE;
CREATE TABLE ap.randomized_parted (
key_id NUMBER(8)
,key_date DATE
,key_desc VARCHAR2(32)
,key_sts NUMBER(2) NOT NULL
)
PARTITION BY RANGE(key_date) (
PARTITION p1_frigid
VALUES LESS THAN (TO_DATE('2010-01-01','yyyy-mm-dd'))
TABLESPACE ado_cold_data
,PARTITION p2_cool
VALUES LESS THAN (TO_DATE('2013-01-01','yyyy-mm-dd'))
TABLESPACE ado_cool_data
,PARTITION p3_warm
VALUES LESS THAN (TO_DATE('2014-01-01','yyyy-mm-dd'))
TABLESPACE ado_warm_data
,PARTITION p4_hot
VALUES LESS THAN (TO_DATE('2014-07-01','yyyy-mm-dd'))
TABLESPACE ado_hot_data
,PARTITION p5_radiant
VALUES LESS THAN (MAXVALUE)
TABLESPACE ap_data
)
NOLOGGING
PARALLEL 4
;
INSERT /*+ APPEND NOLOGGING PARALLEL(4) */ INTO ap.randomized_parted
SELECT * FROM ap.randomized_sorted;
COMMIT;
ALTER TABLE ap.randomized_parted
DROP CONSTRAINT randomized_parted_pk;
DROP INDEX ap.randomized_parted_pk;
ALTER TABLE ap.randomized_parted
ADD CONSTRAINT randomized_parted_pk
PRIMARY KEY (key_id)
USING INDEX (
CREATE UNIQUE INDEX ap.randomized_parted_pk
ON ap.randomized_parted(key_id)
TABLESPACE ap_idx
NOLOGGING
PARALLEL 4
);
Table created.
Listing 18.5 illustrates how to create a series of local indexes on column KEY_STS
as well as a single global partitioned index on column KEY_DATE
.
-----
-- Create local partitioned indexes on KEY_STS
-----
DROP INDEX ap.randomized_parted_loc_sts;
CREATE INDEX ap.randomized_parted_loc_sts
ON ap.randomized_parted (key_sts)
STORAGE (INITIAL 10M)
LOCAL (
PARTITION lx1_frigid TABLESPACE ado_cold_idx
,PARTITION lx2_cool TABLESPACE ado_cool_idx
,PARTITION lx3_warm TABLESPACE ado_warm_idx
,PARTITION lx4_hot TABLESPACE ado_hot_idx
,PARTITION lx5_radiant TABLESPACE ap_idx
);
-----
-- Create a global partitioned index on KEY_DESC
-----
CREATE INDEX ap.randomized_parted_glb_desc
ON ap.randomized_parted (key_desc)
STORAGE (INITIAL 10M)
GLOBAL;
Starting with Oracle Database Release 12.1.0.1, it’s now possible to create indexes for only some of the partitions of a partitioned table. For example, consider table AP.RANDOMIZED_PARTED
, shown earlier in Listing 18.5. There may be very little advantage to creating local indexes on column KEY_STS
in the oldest partitions because queries hardly ever reference these partitions; additionally, even when the oldest data is referenced, it’s usually going to be accessed via a full table partition scan anyway because the majority of the data is likely to be in an active state. The new INDEXING {ON|OFF}
clause of the CREATE TABLE
and ALTER TABLE
statements makes this possible, as shown in the reconstructed data definition language (DDL) for this table in Listing 18.6.
CREATE TABLE ap.randomized_parted (
key_id NUMBER(8)
,key_date DATE
,key_desc VARCHAR2(32)
,key_sts NUMBER(2) NOT NULL
)
INDEXING OFF
PARTITION BY RANGE(key_date) (
PARTITION p1_frigid
VALUES LESS THAN (TO_DATE('2010-01-01','yyyy-mm-dd'))
TABLESPACE ado_cold_data
,PARTITION p2_cool
VALUES LESS THAN (TO_DATE('2013-01-01','yyyy-mm-dd'))
TABLESPACE ado_cool_data
,PARTITION p3_warm
VALUES LESS THAN (TO_DATE('2014-01-01','yyyy-mm-dd'))
TABLESPACE ado_warm_data
INDEXING ON
,PARTITION p4_hot
VALUES LESS THAN (TO_DATE('2014-07-01','yyyy-mm-dd'))
TABLESPACE ado_hot_data
INDEXING ON
,PARTITION p5_radiant
VALUES LESS THAN (MAXVALUE)
TABLESPACE ap_data
INDEXING ON
)
NOLOGGING
PARALLEL 4;
In this case, local indexes would not be created by default for any table partitions at all unless the INDEXING
attribute is set to ON
for each individual partition. In addition, if a global index were to be created on the KEY_STS
column for this table, by default that index would not include any values from the first two partitions.
There are a few other index types and subvariants that are beyond the bounds of this discussion, but they definitely deserve mention, as you will almost certainly encounter them during your Oracle DBA career.
Index-organized tables (IOTs) are truly indexes, but they have many of the same attributes of a standard heap-organized table. When a majority of a table’s columns need to be indexed to enforce a PRIMARY KEY
constraint, an IOT fulfills this need quite nicely.
Another advantage to an IOT is that it eliminates the required retrieval of rows from the table it would normally index, so it’s not uncommon to see the physical I/O required to retrieve data dropping by a significant margin; IOTs are therefore not uncommon in OLTP applications. Data within IOTs can be indexed with secondary indexes as well.
When a table’s PRIMARY KEY
constraint is based on a single column that contains a monotonically increasing value—for example, ORDER_ID
in the OE.ORDERS
table (part of the standard Oracle sample schemas)—index entries for similar values tend to be located within just a few index blocks. While this arrangement does help the clustering factor of the index, it can have an extremely deleterious effect on OLTP application performance, especially in an Oracle Real Application Cluster (RAC) database.
A reverse-key index simply stores the values in reverse byte order—for example, 1001 is stored as 1001, but 1002 is stored as 2001, 1003 is stored as 3001, and so forth—so this tends to spread out the values among multiple index blocks. The end result is that two different user sessions will not contend with each other when attempting to add a new entry into the same index simultaneously during heavy OLTP application activity, thus alleviating a TX
wait enqueue for that index.
If an index contains more than one column, it’s termed a composite or concatenated index. If a query’s selection criteria includes all or leading columns of the columns in a composite index, it’s likely to speed the execution of that query. However, it’s also possible to construct a composite index so that the columns with the lowest number of unique values are listed first in the index—for example, COUNTRY
, STATE
, and CITY
—so that a query that specifies only values for STATE
and CITY
in its selection criteria could still take advantage of that index via a skip-scan operation, which essentially bypasses the leading column in the index (COUNTRY
) and accesses the secondary columns.
The child table in a parent–child relationship typically requires a multicolumn PRIMARY KEY
constraint to enforce referential integrity. For example, the OE.ORDER_ITEMS
table that’s part of the standard Oracle sample schemas has a two-column primary key for columns ORDER_ID
and LINE_ITEM_ID
. It’s not uncommon for the child table to have a multiplicity of LINE_ITEM_ID
values per ORDER_ID
; therefore, to save space in an index for this constraint, the index can be compressed so that the value for ORDER_ID
is stored just once as a prefix within the index page for all LINE_ITEM_ID
values, as shown in Listing 18.7.
ALTER TABLE oe.order_items DROP CONSTRAINT order_items_pk;
ALTER TABLE oe.order_items
ADD CONSTRAINT order_items_pk
PRIMARY KEY (order_id, line_item_id)
USING INDEX (
CREATE UNIQUE INDEX oe.order_items_pk_idx
ON oe.order_items (order_id, line_item_id)
TABLESPACE example
COMPRESS);
The bitmap join index is a special subtype of bitmap index that is useful when the columns that are going to be used for joins within a subset of tables—for example, a fact table and several of its corresponding dimension tables—are already known. A bitmap join index essentially stores the ROWID
s of all the values that already matched between the fact and dimension tables (see Listing 18.8). The advantage of this flavor of bitmap index is that it’s extremely quick to search for matched sets of column values, as it essentially offloads the processing of the joined values to the index itself.
CREATE BITMAP INDEX oe.order_item_qty_bjx
ON oe.order_items (quantity)
FROM oe.order_items OI, oe.product_information P
WHERE OI.product_id = P.product_id
TABLESPACE example;
Prior to Oracle Database 12c Release 1, only one index could be created on the same column or combination of columns. But starting with Release 12.1.0.1, it’s now possible to create multiple indexes on the same column or set of columns, so that the statements that create the new bitmap index shown in Listing 18.3 would actually execute correctly without having to drop the existing B-tree index first. Using multiple indexes on identical columns offers the following possibilities:
A unique index as well as a nonunique index can be created for the same set of columns.
A B-tree and a bitmap index can be created on the same columns.
Indexes with different partition schemes (for example, a local partitioned index and a global partitioned index) can co-exist for the same columns.
Partitioned indexes are also permitted on the same columns, but with different partitioning methods (for example, one that is hash-partitioned and another that is range-partitioned).
Note that there are some restrictions on this new feature, however:
A B-tree non-clustered index and a B-tree clustered index cannot be created on the same column.
A B-tree index and an IOT cannot co-exist on the same column.
When multiple indexes are created on the same column, only one of the indexes can be visible at a time, and the other indexes must have a status of INVISIBLE
. However, it’s possible to inform the optimizer that it is permitted to leverage the non-default index by setting the OPTIMIZER_USE_INVISIBLE_INDEX
initialization parameter to TRUE
at either the session or system level.
Even when an Oracle DBA has done her homework and has chosen the proper type of index for the job, indexes can present some unique challenges for tuning application performance.
The Oracle query optimizer leverages index statistics when it makes its decision about whether to use an index instead of a full table scan. If there are multiple indexes to choose from, the optimizer also uses these statistics to determine which index is most efficient. A simple query against the DBA_INDEXES
data dictionary view reveals the most crucial statistics, as Listing 18.9 shows.
SET LINESIZE 170
SET PAGESIZE 20000
COL owner FORMAT A08 HEADING "Owner"
COL index_owner FORMAT A08 HEADING "Owner"
COL table_name FORMAT A24 HEADING "Table Name"
COL index_name FORMAT A30 HEADING "Index Name"
COL partition_name FORMAT A12 HEADING "Partition Name"
COL tsp_name FORMAT A12 HEADING "Tablespace|Name"
COL status FORMAT A08 HEADING "Status"
COL num_rows FORMAT 999,999,999 HEADING "Row|Count"
COL distinct_keys FORMAT 999,999,999 HEADING "Distinct|Keys"
COL clusfctr FORMAT 99,999,999 HEADING "Clust|Factor"
COL blevel FORMAT 99999 HEADING "Index|Depth"
COL leaf_blocks FORMAT 9,999,999 HEADING "Leaf|Blocks"
COL avg_lbpk FORMAT 999,999 HEADING "Avg Leaf|Blks/Key"
COL avg_dbpk FORMAT 999,999 HEADING "Avg Data|Blks/Key"
TTITLE "Index Performance Metadata|(from DBA_INDEXES)"
SELECT
owner
,table_name
,index_name
,tablespace_name tsp_name
,status
,num_rows
,distinct_keys
,clustering_factor clusfctr
,blevel
,leaf_blocks
,avg_leaf_blocks_per_key avg_lbpk
,avg_data_blocks_per_key avg_dbpk
FROM dba_indexes
WHERE owner = 'AP'
ORDER BY 1,2,3
;
TTITLE OFF
Index Performance Metadata
(from DBA_INDEXES)
Tablespace Row Distinct Clust Index Leaf Avg Leaf Avg Data
Owner Table Name Index Name Name Status Count Keys Factor Depth Blocks Blks/Key Blks/Key
----- ------------------- ----------------------- ------------ ------ ------------ ------------ ----------- ------ ---------- -------- --------
AP EST_SALES EST_SALES_CITYST_IDX ADO_COLD_IDX VALID 680,352 29,768 526,867 2 1,124 1 17
AP EST_SALES EST_SALES_CITY_IDX ADO_COLD_IDX VALID 680,352 18,792 526,845 2 1,948 1 28
AP EST_SALES EST_SALES_PK_IDX ADO_COLD_IDX VALID 680,352 680,352 74,988 2 1,519 1 1
AP EST_SALES EST_SALES_STATE_IDX ADO_COLD_IDX VALID 680,352 62 89,808 2 1,047 16 1,448
AP EST_SALES EST_SALES_TIMEZONE_IDX ADO_COLD_IDX VALID 649,680 26 82,138 2 1,001 38 3,159
AP EST_SALES EST_SALES_ZIPCITYST_IDX ADO_COLD_IDX VALID 680,352 42,522 680,352 2 1,193 1 16
AP EST_SALES EST_SALES_ZIPCODE_IDX ADO_COLD_IDX VALID 680,352 42,264 680,352 2 1,118 1 16
AP INVOICES INVOICES_CUST_IDX AP_IDX VALID 500 1 3 0 1 1 3
AP INVOICES INVOICES_PK_IDX AP_IDX VALID 500 500 3 0 1 1 1
AP INVOICE_ITEMS INVOICE_ITEMS_PK_IDX AP_IDX VALID 47,500 47,500 191 1 118 1 1
AP INVOICE_ITEMS INVOICE_ITEMS_PROD_IDX AP_IDX VALID 47,500 5 955 1 140 28 191
AP RANDOMIZED_PARTED RANDOMIZED_PARTED_PK ADO_COLD_IDX VALID 10,365,139 10,365,139 6,722,752 2 28,733 1 1
AP RANDOMIZED_SORTED RANDOMIZED_SORTED_PK ADO_COLD_IDX VALID 10,194,228 10,194,228 76,739 2 22,556 1 1
AP RANDOMIZED_UNSORTED RANDOMIZED_UNSORTED_PK ADO_COLD_IDX VALID 10,395,465 10,395,465 10,379,934 2 23,004 1 1
AP VENDORS VENDORS_PK_IDX AP_IDX VALID 164 164 3 0 1 1 1
Here’s a breakout of what five of the most crucial statistics reveal:
Depth: An index’s depth refers to the number of root versus node versus leaf pages it comprises. A value of zero (0) means there is only a root page, while any number greater than zero indicates how many levels have been constructed to accommodate all the values.
Leaf blocks: Since this count reflects the total number of leaf blocks that make up the index, the optimizer uses this number to determine if a full table scan is cheaper than a full index scan.
Distinct keys: This column reflects the number of unique values stored in the index; for an index that’s enforcing either a PRIMARY KEY
or UNIQUE
constraint, the number of distinct keys will equal the number of rows in the table.
Average leaf blocks per key: The average number of leaf blocks per distinct key value indicates the average number of leaf blocks in which each distinct value in the index appears. For an index that supports either a PRIMARY KEY
or UNIQUE
constraint, this value will be equal to one (1); for a nonunique index, a larger value could be there in case the number of entries per distinct key value is more than what can fit into one leaf block.
Average data blocks per key: Likewise, this column records the average number of data blocks per distinct value in the index; it gives an approximate measure of how many rows can be retrieved from the corresponding table for one key value.
Clustering factor: This is probably the most crucial index statistic to understand because the optimizer uses it to determine how efficient the index is for retrieving all the necessary ROWID
s from a B-tree index in the fewest number of index blocks.
When data has been loaded into a table and the rows in the table are not sorted by the indexed column, then the index on the column will have a low clustering factor. For example, consider the table AP.RANDOMIZED_UNSORTED
, which has been loaded using data from table AP.RANDOMIZED_SORTED
, as shown in Listing 18.10. Note that the ORDER BY
clause in the statement that loads data into AP.RANDOMIZED_SORTED
forces the data to be sorted in descending order based on the values for the KEY_DESC
column, whereas the index has been created on the KEY_ID
column.
DROP TABLE ap.randomized_unsorted PURGE;
CREATE TABLE ap.randomized_unsorted (
key_id NUMBER(8)
,key_date DATE
,key_desc VARCHAR2(32)
,key_sts NUMBER(2) NOT NULL
)
TABLESPACE ado_cold_data
NOLOGGING
PCTFREE 0;
TRUNCATE TABLE ap.randomized_unsorted;
INSERT /*+ APPEND NOLOGGING PARALLEL(4) */ INTO ap.randomized_unsorted
SELECT /*+ PARALLEL(4) */ * FROM ap.randomized_sorted
ORDER BY key_desc DESC;
COMMIT;
ALTER TABLE ap.randomized_unsorted
DROP CONSTRAINT randomized_unsorted_pk;
DROP INDEX ap.randomized_unsorted_pk;
ALTER TABLE ap.randomized_unsorted
ADD CONSTRAINT randomized_unsorted_pk
PRIMARY KEY (key_id)
USING INDEX (
CREATE UNIQUE INDEX ap.randomized_unsorted_pk
ON ap.randomized_unsorted(key_id)
TABLESPACE ado_cold_idx
NOLOGGING
PARALLEL 4
);
ALTER INDEX ap.randomized_unsorted_pk LOGGING;
When index AP.RANDOMIZED_UNSORTED_PK_IDX
is created on column KEY_ID
, the index’s clustering factor is dramatically different than that of AP.RANDOMIZED_SORTED_PK_IDX
, as the query and output in Listing 18.9 shows. As a result, whenever a query filters data using a range scan against KEY_ID
for this table, it will use dramatically larger amounts of physical I/O because the data values are not tightly clustered within a few index pages.
Note
An index’s clustering factor is only meaningful to the query optimizer for B-tree indexes; it is meaningless for bitmap indexes because of their structure and how they are used.
Sometimes, an unexpectedly poor performance from an index might have nothing to do with the index at all. Its ill effect on a decision support system (DSS) query’s execution speed or an OLTP application workload’s execution rate per transaction could occur because the Oracle DBA employed an inappropriate index type; he might have ignored best practices, or he could simply be encountering a performance “headwind” because of the application’s inability to scale up as expected.
As we’ve already discussed, when implementing a new index, it’s important to remember which index type is the most appropriate:
Generally speaking, remember that bitmap indexes are most effective when they are used to index an extremely large number of rows for a column that has relatively few distinct values that very rarely change.
Conversely, B-tree indexes will generally outperform a bitmap index when the column being indexed has an extremely large number of frequently updated distinct values, and especially when the domain of unique values is constantly expanding over time.
Unfortunately, forcing an index via a hint happens more frequently than it should. For example, suppose an application’s user complains about the performance of a query. An overzealous developer or DBA reviews the query’s execution plan to determine the cause of the perceived poor performance. When she discovers that the query is not using the supposedly “appropriate” index, she might choose to force the use of the appropriate index by modifying the application code to incorporate a +INDEX
hint.
That strategy may solve the query’s performance issue (for now), but in due course of time when that selected index ceases to be the most appropriate choice—say, after the table has grown to such a size that a table scan with several degrees of parallelism is actually a more efficient access path to data—the optimizer will have no choice but to continue to use the now less-than-desirable index.
It was not uncommon in earlier releases of Oracle to override the default values for the two initialization parameters discussed in this section in an attempt to influence the cost-based optimizer to favor indexes over table scans. We’ve often seen that these settings continue to be propagated forward into future releases of the Oracle database long after they might have outlived their usefulness.
The OPTIMIZER_INDEX_CACHING
initialization parameter controls the costing of an index probe in conjunction with a nested loop or an INLIST
iterator. The range of values 0 to 100 for this parameter indicates percentage of index blocks available in the buffer cache, which modifies the optimizer’s estimate of the cost of an index probe for nested loops and INLIST
iterators. A value of 100 infers that 100 percent of the index blocks are likely to be found in the buffer cache, and the optimizer adjusts the cost of an index probe or nested loop accordingly.
For good reason, the default value of this parameter is zero (0), or in other words, index blocks are no more likely than table blocks to be found in the database buffer cache, which results in the default optimizer behavior. Unless you know precisely how often this is likely to occur, Oracle suggests using caution when setting this parameter to any value other than zero.
The OPTIMIZER_INDEX_COST_ADJ
initialization parameter tweaks optimizer behavior to be either less or more inclined to choose an index versus a table scan to retrieve data. The minimum value of one (1) is the lowest setting, indicating the optimizer should consider indexes about 100 times more costly than table scans; conversely, the maximum setting of 10000 forces the optimizer to consider that index scans are 100 times less costly than table scans.
The default for this parameter is 100, at which the optimizer evaluates index access paths at the regular cost. Any other value makes the optimizer evaluate the access path at that percentage of the regular cost. Again, unless you know your application workload intimately and can accurately predict the implications of this setting after extensive testing, Oracle suggests leaving this parameter at its default value.
If an application permits deletion of rows from its tables, after a mass deletion has occurred, it is possible that many index pages may have only a few entries that correspond to active, non-deleted rows. Unfortunately, Oracle cannot reclaim space in an index page until all entries have been deleted; as a result, it is possible that many more index pages are being scanned than necessary to retrieve index row pieces during indexed lookups. It’s therefore recommended in this state to either rebuild an index or, if the database release is at least Oracle 10g Release 2, to compact the index segment with the ALTER INDEX <
index name> COMPACT;
command.
Finally, when it comes to tuning application performance, a prudent Oracle DBA should use indexes like a gourmet chef uses saffron or some other rare and costly spice: with great care, and not in every dish! Put simply, not every poorly performing query will suddenly start performing faster just because a new index has been added, and every index adds the specter of additional physical I/O whenever DML affecting the indexed column is performed on the underlying table.
If your current Oracle Database release is at least 11g Release 1, then you are fortunate to have a powerful tool to battle unnecessary indexes on your tool belt: the ability to hide an index from the Oracle Optimizer for consideration as a possible access path during statement parsing.
An existing index can be made invisible by simply altering its state to INVISIBLE
, and a new index can even be created in an initially INVISIBLE
state, as shown in Listing 18.11. Even though the optimizer will not be allowed to use this index as an access path, all DML operations against this index will continue to be processed.
ALTER INDEX sh.zm_customers_marital_bix INVISIBLE;
DROP INDEX sh.zm_customers_marital_bix;
CREATE BITMAP INDEX sh.zm_customers_marital_bix
ON sh.zm_customers (cust_marital_status)
TABLESPACE example
INVISIBLE;
Note that even if the index is specified directly in a +INDEX
hint, the optimizer will ignore it as long as it has been marked INVISIBLE
... unless, that is, the OPTIMIZER_USE_INVISIBLE_INDEX
initialization parameter is set to TRUE
at the session level, as Listing 18.12 shows.
SELECT /*+ INDEX(sh.zm_customers_marital_bix) */
cust_state_province, COUNT(*)
FROM sh.zm_customers
WHERE cust_marital_status = 'widow'
GROUP BY cust_state_province
ORDER BY cust_state_province;
Plan hash value: 2056234527
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 145 | 2465 | 424 (1)| 00:00:01 |
| 1 | SORT GROUP BY | | 145 | 2465 | 424 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| ZM_CUSTOMERS | 3461 | 58837 | 423 (1)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
2 - filter("CUST_MARITAL_STATUS"='widow')
ALTER SESSION SET optimizer_use_invisible_index = TRUE;
SELECT /* +INDEX(sh.zm_customers_marital_idx) */
cust_city, COUNT(*)
FROM sh.zm_customers
WHERE marital_status = 'S';
Plan hash value: 2120557835
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 58 | 986 | 19 (6)| 00:00:01 |
| 1 | SORT GROUP BY | | 58 | 986 | 19 (6)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| ZM_CUSTOMERS | 75 | 1275 | 18 (0)| 00:00:01 |
| 3 | BITMAP CONVERSION TO ROWIDS | | | | | |
|* 4 | BITMAP INDEX SINGLE VALUE | ZM_CUSTOMERS_MARITAL_BIX | | | | |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("CUST_MARITAL_STATUS"='widow')
If the Oracle DBA determines that the index should indeed remain visible, it can simply be rendered visible to the optimizer again by issuing the ALTER INDEX
<index name> VISIBLE;
command.
RAC databases present their own unique challenges in regard to index performance, as discussed in the following sections.
Indexes that are nonselective—that is, that don’t dramatically improve query performance—can have an especially deleterious effect on DML in RAC because they may be subject to interinstance contention. For example, consider a table with five indexes: one primary key index and four other indexes that were created in anticipation of particular query workloads. If those four additional indexes provide very little benefit for speeding query execution times, they may actually be hindering the RAC database’s ability to quickly update these nonselective indexes. Even worse, if these indexes need to be accessed by queries while OLTP activity is occurring, it is likely that this concurrent query and OLTP activity will cause severe cache transfer contention between RAC database instances because both a current image and at least one CR image must be maintained for each of the index pages.
Indexes that grow monotonically because of ever-increasing values (for example, OE.ORDERS.ORDER_ID
) often become hot spots in a RAC application workload. Consider a typical primary key strategy that simply increments the next key value by +1 each time a new entry is requested. In this situation, the corresponding primary key index will tend to cluster all of the like values in a very few index blocks; this clustering will result in frequent leaf block splits, the index will end up with an index tree structure that is relatively low in depth, and the index’s root block will tend to be accessed much more frequently than other leaf pages as compared to the case when the index was not monotonic. In a RAC database, this strategy is likely to become a serious performance bottleneck for OLTP applications, especially if multiple instances are hosting the OLTP application: the instances will essentially play a constant game of tug-of-war as they exchange holder status of the current primary index block.
One approach to alleviating this issue is to implement global index hash partitioning, a new feature in Oracle 10g. Other approaches include using a natural key based on existing database column values (that is, instead of a “dumb-numbering” scheme), or even using a reverse-key index as described previously. (Remember, reverse-key indexes cannot be used for a range scan.)
If sequences are used to obtain a table’s next value for its primary key column, it’s important to verify that the sequence’s CACHE
value is set much higher than the default of 20—high enough to ensure that two or more instances will never access the same index page at the same time—and also to specify the NOORDER
directive. (Recall that the NOORDER
clause means that if Session 1 requests a NEXTVAL
from a sequence before Session 2 does, it doesn’t matter that Session 2’s value is higher than Session 1’s.)
This means that each instance in the cluster will get a much higher range of sequence values to choose from, eliminating index page contention as a factor; however, this approach might result in large gaps between sequenced values because each instance has reserved a widely disjoint set of values.
Finally, sometimes problems related to index difficulties are simply the symptom of a set of application operations colliding unexpectedly at the same time. Consider this situation: an OLTP application has recently added new rows to an indexed table on one RAC node, but the changes haven’t been committed yet. There is also a read-mostly workload (for example, batch reporting) executing on another RAC node that is actively attempting to read from the same table and indexes. Because the changes haven’t been committed on the first node, the second node has no choice but to request consistent read (CR) images be generated from undo data on the first node for both the tables and the indexes. If we now imagine that both nodes are servicing both the OLTP and the batch reporting workload, it’s easy to understand how quickly the RAC database’s interconnect will tend to spend most of its time processing requests for CR images.
If the Oracle DBA cannot get the application developers to commit changes more frequently, then it’s very likely that the CR images of the data and index blocks will take up most of the RAC database’s interconnect bandwidth. Likewise, if the indexes are significantly unselective, all of this extra work to maintain CR images for the index blocks will turn out to be essentially counterproductive.
An index may not be the panacea to cure a query’s performance ills, so don’t be afraid to strenuously question the need for a new index if the intent is merely to speed a query’s execution. Remember that an index may actually contribute significantly to the execution time of DML against the index’s base table or partition, so adding indexes, especially to often-updated data segments, may actually cause more trouble than they are worth.
Invest time to choose the right index type. A bitmap index may be excellent for a column with a small domain of indexed values—until a single brand-new value needs to be added to that column’s domain, forcing a complete rebuild of all bitmap index segments to incorporate that value.
Remember that the B in B-tree stands for “balanced”—and that means that except for the most extreme circumstances, there is little need to rebuild a B-tree index because it remains balanced in almost all situations.
If deletions are performed against a table with large numbers of indexed columns, remember that space allocated to an index block cannot be reclaimed until all values within that block are marked as deleted, and it’s possible that poor application performance may result because many more “almost empty” index blocks may have to be accessed.
The onset of engineered systems such as Exadata and the innovation of In-Memory Column Store (IMCS) in Oracle Database 12.1.0.2 often completely negate the need for indexes for analytical queries. So when in doubt of an index’s benefit, simply consider making that index invisible—and telling no one. If there are no significant complaints about application performance with an invisible index, you have probably found an opportunity to reduce the demands of keeping that index updated by removing it entirely.
If you’re fortunate enough to have made the transition to Oracle Database 12c Release 1 for your data warehouse, consider leveraging the new PARTIAL
index feature to limit the creation of local and/or global index partitions against the table partitions that can least benefit from them while retaining index partitions for the table partitions that could most benefit from speedy data access.
Real Application Cluster (RAC) databases are often persnickety about indexes. Remember that applications with too many unselective indexes, improperly sequenced monotonically increasing indexes, or colliding OLTP versus read-mostly workloads can often trace their unexpectedly poor performance or lack of scalability directly to indexes.