13.3. Understanding Special Table Types

Two special table types—index-organized tables and clustered tables—provide a number of performance-related and space-related benefits compared to the traditional heap-based tables. In the sections that follow, we will show you how to create, maintain, and use both of these table types in your database to maximize your disk space usage.

13.3.1. Index-Organized Tables

An index-organized table (IOT) is one special type of table that is most effective when the access of the table is primarily by the primary key and the primary key columns constitute a large part of the table's columns. In the following sections, we will present an overview of IOTs, show you how to create an IOT, manage the non-primary key columns of an IOT by using an overflow area, and show you how to retrieve the metadata about IOTs from the data dictionary.

13.3.1.1. An Overview of Index-Organized Tables

You can store index and table data together in an IOT. IOTs are suitable for tables in which the data access is mostly through its primary key, such as lookup tables. An IOT is a b-tree index, much like a traditional stand-alone index. However, instead of storing the ROWID of the table where the row belongs, the entire row is stored as part of the index. In addition, you can also build additional indexes on the columns of an IOT. Accessing an IOT is identical to accessing a heap-based table, and no special syntax is required in your SQL statements.

Because the row is stored along with the b-tree index, there is no physical ROWID for each row. The primary key identifies the rows in an IOT. Oracle "guesses" the location of the row and assigns a logical ROWID for each row, which permits the creation of secondary indexes. As with heap-based tables, you can partition an IOT, but the partition columns must be the same as or a subset of the primary key columns.

13.3.1.2. Creating Index-Organized Tables

To create an IOT, you use the CREATE TABLE command, adding the ORGANIZATION INDEX keyword as part of the command. A primary key must be specified as part of the table definition. In this example, you create an index-organized table that stores sales summaries by division for a given sales date; the primary key is a composite primary key. Here is an example:

create table sales_summary_by_date_division
   (sales_date        date,
   division_id        number,
   total_sales        number(20,2),
   constraint ssum_dt_div_pk primary key
      (sales_date, division_id))
organization index
tablespace users;

Each entry in the IOT contains a date, a division number, and a total sales amount for the day. All three of these column values are stored in each IOT row, but the IOT is built based on only the date and division number. Only one segment is used to store an IOT; if you build a secondary index on this IOT, then a new segment is created.

13.3.1.3. IOT Row Overflow

To further improve the performance benefits provided by an IOT, you can create an IOT overflow area. If an IOT row's data exceeds the threshold of available space in a block, the row's data will be dynamically and automatically moved to the overflow area. When you create an IOT, you specify a tablespace to hold the IOT overflow segment. Expanding on the previous example, you will re-create the IOT and specify an overflow area in this example:

create table sales_summary_by_date_division
   (sales_date        date,
    division_id       number,
    total_sales       number(20,2),
    constraint ssum_dt_div_pk primary key
        (sales_date, division_id))
organization index
tablespace users
pctthreshold 25
overflow tablespace users2;

NOTE

The default value for PCTTHRESHOLD is 50.

Notice that you are also including the PCTTHRESHOLD clause. This is the percentage of space reserved in the index block of an IOT; this value must be high enough to at least store the primary key. Any columns that cause this threshold to be exceeded are stored in the overflow area. In this example, 25 percent of each block is reserved to hold the key and the row data for the row.

13.3.1.4. IOT Mapping Tables

To create bitmap indexes on an IOT, you must use an IOT mapping table. In a heap-based table, the index stores the bitmaps with the physical ROWID. Because an IOT does not have physical ROWIDs, you must create a mapping table. The mapping table maps the IOT's physical ROWIDs to the corresponding logical ROWIDs used by the IOT. IOTs use logical ROW-IDs to manage index access to their rows because the physical ROWIDs can change as data is added to, or removed from, the IOT.

NOTE

Only one mapping table is required per IOT, regardless of the number of bitmap indexes defined on the IOT.

In this example, you will further refine the IOT that you created previously by adding a mapping table:

create table sales_summary_by_date_division
    (sales_date       date,
     division_id      number,

total_sales      number(20,2),
     constraint ssum_dt_div_pk primary key
          (sales_date, division_id))
organization index
tablespace users
pctthreshold 25
overflow tablespace users2
mapping table;

Oracle creates the mapping table in the same tablespace as its parent IOT. You cannot query, perform DML operations on, or modify the storage characteristics of the mapping table—it is created and maintained automatically and is dropped automatically when the IOT is dropped.

13.3.1.5. Data Dictionary Views for IOTs

A number of data dictionary views contain metadata concerning an IOT, its overflow segment, and its mapping table. Each of these views—DBA_INDEXES, DBA_SEGMENTS, and DBA_TABLES— reveal different parts of the structure of an IOT, as you will see in the following examples.

13.3.1.5.1. DBA_INDEXES

The view DBA_INDEXES contains most of the information you need about IOTs. Note that because an IOT is an index and a table at the same time, you will see an entry in DBA_INDEXES and in DBA_TABLES for an IOT. Here is a query where you can retrieve information about the table named SALES_SUMMARY_BY_DATE_DIVISION that you created earlier:

SQL> select index_name, index_type, tablespace_name
  2  from dba_indexes
  3  where table_name = 'SALES_SUMMARY_BY_DATE_DIVISION';

INDEX_NAME             INDEX_TYPE       TABLESPACE_NAME
---------------------- ---------------- ------------------
SSUM_DT_DIV_PK         IOT - TOP        USERS

1 row selected.

13.3.1.5.2. DBA_SEGMENTS

The view DBA_SEGMENTS helps you identify an IOT's related segments: the IOT itself and the overflow segment. You can see both of these segments in the following query against DBA_SEGMENTS:

SQL> select segment_name, segment_type, tablespace_name
  2  from dba_segments
  3  where segment_name LIKE '%_IOT_%' or
  4        segment_name = 'SSUM_DT_DIV_PK';

SEGMENT_NAME         SEGMENT_TYPE     TABLESPACE_NAME
-------------------- ---------------- --------------------
SYS_IOT_MAP_59990    TABLE            USERS
SYS_IOT_OVER_59990   TABLE            USERS2
SSUM_DT_DIV_PK       INDEX            USERS

3 rows selected.

Notice that the mapping table resides in the same tablespace as the IOT itself and that the overflow segment resides in a different tablespace, as you specified when you created the IOT.

13.3.1.5.3. DBA_TABLES

The DBA_TABLES view, similar to the DBA_SEGMENTS view, provides you with the information about the IOT itself, the mapping table, and the associated overflow segment, as in this example:

SQL> select table_name, iot_name, iot_type, tablespace_name
  2  from dba_tables
  3  where table_name = 'SALES_SUMMARY_BY_DATE_DIVISION'
  4     or iot_name = 'SALES_SUMMARY_BY_DATE_DIVISION';

TABLE_NAME     IOT_NAME       IOT_TYPE     TABLESPACE_NAME
-------------- -------------- ------------ ---------------
SALES_SUMMARY_               IOT
BY_DATE_DIVISI
ON

SYS_IOT_MAP_59 SALES_SUMMARY_ IOT_MAPPING USERS
990            BY_DATE_DIVISI
ON

SYS_IOT_OVER_5 SALES_SUMMARY_ IOT_OVERFLOW USERS2
9990           BY_DATE_DIVISI
ON

3 rows selected.

13.3.1.6. Clustered Tables

Clustered tables provide another alternative to the traditional heap-based table to provide performance benefits as well as optimizing the usage of your disk space. In this section, we will provide an architectural overview of clusters and describe the different types of clusters and in which situations each is appropriate.

13.3.1.7. Clusters Overview

A cluster consists of a group of two or more tables that share the same data blocks. If you are using hash clusters, as you will see later in this section, you can have only one table in the cluster. In other words, rows from two different tables in the cluster may reside in the same block on disk. The tables are logically grouped together because they share one or more common columns and are often used together in a join, such as an ORDER table and an ORDER_ITEM table. In most cases, the retrieval of a customer's order will include rows from both tables when they are joined on the ORDER_NUMBER column.

An alternative to creating a cluster for tables that are frequently or always queried together is to create a single table.


Before you add tables to a cluster, you specify a cluster key containing the common columns for all tables that will reside in the cluster. When you add a table to a cluster, you specify the column in the table that corresponds to the cluster key.

A cluster provides two primary benefits:

  • Because rows from both tables on the join column are stored in the same block, disk I/O is dramatically reduced since a given query may need to read only a single block instead of at least two blocks for a join using traditional heap-based tables.

  • The disk space required to store the cluster is significantly reduced because each cluster key value is stored only once each in the cluster and in the cluster index, no matter how many rows of each table in the cluster contain the value.

Creating a clustered table requires three steps:

  1. Create the cluster definition.

  2. Create the index on the cluster. (This step is skipped for hash clusters.)

  3. Create tables within the cluster.

Once the cluster is created, you may add and remove indexes on tables in the cluster as with any heap-based table.

13.3.1.8. Cluster Types

Clusters can be divided into two broad categories: index clusters and hash clusters. Index clusters use a traditional b-tree index to find rows in the cluster, and hash clusters use a hashing function to determine the physical location where a given row is stored. Each has their own benefits depending on the type of data stored in the cluster; we will show you how each of these clusters is created and show you a special type of hash cluster called a sorted hash cluster.

13.3.1.8.1. Index Clusters

An index cluster performs much like a b-tree index to ensure quick access to the rows in each cluster table. Consider an index cluster in these situations:

  • The tables in the cluster are always queried together and only infrequently on their own.

  • The tables have little or no insert, update, or delete activity performed on them after the initial load of the cluster.

  • The child tables have roughly equal numbers of rows for each parent key in the parent table.

The following example shows the SQL commands necessary to create a cluster called ORD_ITEM, an index on the cluster called ORD_ITEM_IDX, and tables ORD and ITEM in the cluster:

SQL> create cluster ord_item
  2  (ord_number number) size 500
  3  tablespace app_data;

Cluster created.

The preceding command creates a cluster called ORD_ITEM and specifies that the column ORD_NUMBER is the cluster key or column around which the two tables' data will be clustered; in this case, the ORD_NUMBER column is the common column for all tables that will be stored in this cluster. The parameter SIZE specifies how many cluster keys you expect to have per Oracle block. In this example, you create the index for all tables in the cluster:

SQL> create index ord_item_idx
  2  on cluster ord_item
  3  tablespace app_idx;

Index created.

Notice that the indexed column is not specified when creating an index on a cluster segment. The index is automatically created on the cluster key column, ORD_NUMBER.

Now that the cluster structure exists, you can create the ORD and ITEM tables in the cluster:

SQL> create table ord
  2  (ord_number number,
  3  cust_last_name varchar2(40),
  4  cust_first_name varchar2(30),
  5  ord_date date,
  6  ord_total number(15,2))
  7  cluster ord_item (ord_number);

Table created.

SQL> create table item
  2  (item_id number,
  3  ord_number number,
  4  catalog_number number,
  5  item_price number(7,2))
  6  cluster ord_item (ord_number);

Table created.

Note that no tablespace specification is allowed when creating the tables in the cluster: The cluster table is stored in the tablespace specified in the cluster definition. Storing each table in a different tablespace is logically incorrect; it also does not make sense, because that would defeat the purpose of storing each table's row in the same data block to improve I/O performance.

13.3.1.8.2. Hash Clusters

A hash cluster is similar to an index cluster, except that a hash cluster uses a hashing algorithm on the row's cluster key to find the physical location of the row in the table. Hash clusters work best for queries with equivalence operators, as in this example:

select cust_last_name from ord
       where ord_number = 681107;

Here are the attributes of a table that make it suitable for use in a hash cluster:

  • The tables in the cluster have a relatively uniform distribution of values in the indexed column.

  • The tables have little or no insert, update, or delete activity performed on them after the initial load of the cluster.

  • The tables in the cluster have a predictable number of values in the indexed column.

  • The queries against the clustered table almost exclusively use an equality operator to retrieve the desired row.

Using a hash cluster, the desired row can usually be found with one I/O read, in contrast to a traditional indexed heap-based table, which may require several I/O operations on the index itself and another I/O on the table to retrieve the desired row.

Because you are using a hashing function to find rows in a cluster's table, you cannot create a cluster index on a hash cluster. To create a hash cluster, you add the HASH IS clause when creating the cluster. To create an ORD_ITEM2 table using hashing, you can use a CREATE CLUSTER command similar to the following:

create cluster ord_item2
  (ord_number number(6,0)) size 500
  hash is ord_number hashkeys 100000
  tablespace users;

The HASHKEYS keyword specifies the number of unique hash values that can be generated by the cluster's hash function, in this case, 100,000. Note that if a cluster's table has more rows than the maximum number of hash keys, performance may begin to suffer because more than one row may map to the same location, potentially increasing the number of I/O reads necessary to retrieve the desired row.

13.3.1.9. Sorted Hash Clusters

Sorted hash clusters, introduced in Oracle 10g, extend the functionality of hash clusters that have been available since Oracle 8i by maintaining a sort order for rows that are retrieved by the same cluster key. In heap-organized tables and traditional hash clusters, the order in which rows are returned is not under user control and depends on internal algorithms and the relative physical location of data blocks on disk. For each hash cluster key, Oracle maintains a list of rows sorted by one or more sort columns.

Maintaining the sort order of rows upon insert incurs minimal overhead but provides a tangible benefit when the data is updated or queried: CPU time and private memory requirements are reduced because no additional sorts are required, as long as the ORDER BY clause references the sort key columns or the sort key columns prefix. In fact, the ORDER BY clause is not required if you are retrieving rows only for a single hash cluster key and want to order the rows by the sort key columns. This processing implies another valuable benefit of sorted hash clusters in that it supports FIFO processing: The sort order within each cluster key guarantees that rows are returned in the same order in which they were inserted.

For queries with an ORDER BY clause using non-prefixed sort key columns, you can use a traditional index to maintain the performance of queries on the table in the cluster.


A couple of examples will help demonstrate the value of sorted hash clusters. In the sample order entry system, you want to make sure to process the customer orders for a given customer in the order in which the orders were received without the extra overhead of sorting on the timestamp of the order.

The first step is to create a single table sorted hash cluster, as follows:

create cluster order_cluster
   (customer_number        number,
    order_timestamp        timestamp sort)
hashkeys 10000000
single table hash is customer_number
size 500;

You expect at most 10 million unique customer numbers, and the average size of the row in your cluster will be 500 bytes. The next step is to create the order table itself:

create table orders
   (cust_number      number,

order_date       timestamp,
    order_number     number,
    spec_instr       varchar2(1000))
cluster order_cluster(cust_number, order_date);

Note that the names of the cluster keys do not have to match as long as the relative positions match and the datatypes are compatible.

Next, add a few orders with the following INSERT statements. Depending on when the orders were submitted and the locations where the orders are placed, the orders may not necessarily be inserted in chronological order:

insert into orders values(3045,
       timestamp'2004-05-05 15:04:14',
       405584,'Reorder from last month'),
insert into orders values(1958,
       timestamp'2004-05-05 15:05:01',
       348857,'New customer'),
insert into orders values(3045,
       timestamp'2004-05-04 9:26:59',
       938477,'GGT Promotion'),
insert into orders values(3045,
       timestamp'2004-05-07 12:33:23',
       703749,''),
insert into orders values(3045,
       timestamp'2004-05-02 19:47:09',
       389233,'Needs order in time for Mothers Day'),

However, because you are storing the orders in a sorted hash cluster, they are automatically maintained in the order of the sort key columns for each customer without specifying an ORDER BY clause:

SQL> select cust_number,
  2     to_char(order_date,'yyyy-mm-dd hh:mi pm')
  3               order_date,
  4     order_number, spec_instr
  5  from orders where cust_number = 3045;

CUST_NUMBER ORDER_DATE          ORDER_NUMBER SPEC_INSTR
----------- ------------------- ------------ -------------
       3045 2004-05-02 07:47 pm       389233  Needs order i
                                              n time for Mo
                                              thers Day

3045 2004-05-04 09:26 am        938477 GGT Promotion
       3045 2004-05-05 03:04 pm        405584 Reorder from
                                              last month
       3045 2004-05-07 12:33 pm        703749

4 rows selected.

Execution Plan
---------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS
            (Cost=0 Card=4 Bytes=2164)
   1     0 TABLE ACCESS (HASH) OF 'ORDERS'
                      (CLUSTER (HASH))

Even though you had no ORDER BY clause, all rows selected using a specific customer number (in this case, customer number 3045) will automatically return the rows ordered by the sort keys, because the sorted hash cluster maintains the order within the customer number cluster key.

To make sure the new access path is used, you must ensure that the cost-based optimizer is enabled and statistics are gathered for the table. An EXPLAIN PLAN of any query on a sorted hash cluster will show an access method of TABLE ACCESS HASH without a sort operation. Also, any queries must use an equality predicate; if the previous query was instead written as follows, then an ORDER BY clause would be necessary to keep the rows in the desired sequence:

SQL> select cust_number,
  2     to_char(order_date,'yyyy-mm-dd hh:mi pm')
  3         order_date,
  4     order_number, spec_instr
  5  from orders where cust_number >= 3045;

CUST_NUMBER ORDER_DATE          ORDER_NUMBER SPEC_INSTR
----------- ------------------- ------------ -------------
       3045 2004-05-05 03:04 pm       405584 Reorder from
                                             last month
       3045 2004-05-04 09:26 am       938477 GGT Promotion
       3045 2004-05-07 12:33 pm       703749
       3045 2004-05-02 07:47 pm       389233 Needs order i
                                             n time for Mo
                                             thers Day
4 rows selected.

Similarly, if you accessed the table using a reference to only the SPEC_INSTR column in the WHERE clause, a sort would be necessary to return the rows in the desired order.

To make further improvements in performance, you may consider creating a multi-table hash cluster to hold both the orders and the order items, but for now the improvements in processing orders alone will help you avoid new hardware acquisitions for a few months.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset