6. DDL Optimization Tips, Techniques, and Tricks

Prior to Oracle Database 11g, adding a column with a default value to a real-life production table was painful, particularly for a huge table where a substantial number of records would be enriched with the new column and physically updated with its default value. Such a table modification was far from being a simple operation. Fortunately, starting with Oracle Database 11g, a new optimization technique for improving the performance of data definition language (DDL) operations has been implemented. Adding a non-null column with a default value to an existing table can be done almost instantaneously. This chapter explains how Oracle has internally transformed such a painful table modification into a simple metadata operation. It also shows how to check the optimizer’s cardinality estimate for such added columns and the ability of such columns to serve as virtual columns and to be used in a column group extension. The chapter investigates the performance penalty this new column might introduce when it is selected from a table. It also explores the effect of this optimizing technique against the creation of an index and, finally, examines the effect of the Oracle Database 12c extension of this technique to nullable columns as well.

DLL Optimization Concept

Consider the following table having 3 million rows:

SQL> create table t1
     as select
        rownum n1
      , trunc ((rownum-1)/3) n2
      , trunc(dbms_random.value(rownum, rownum*10)) n3
      , dbms_random.string('U', 10) c1
    from dual
    connect by level <= 3e6;

SQL> desc t1
           Name              Null? Type
           ----------------- ----- ---------
    1      N1                      NUMBER
    2      N2                      NUMBER
    3      N3                      NUMBER
    4      C1                      VARCHAR2(4000 CHAR)

The following SQL statement adds an extra non-null column with a default value of 42 to this table:

SQL> alter table t1 add C_DDL number DEFAULT 42 NOT NULL;

The two crucial keywords DEFAULT and NOT NULL represent the keys driving this new feature.

To appreciate the difference in the execution time of the preceding alter table command, let’s execute it in two different Oracle Database releases: 10.2.0.4.0 and 11.2.0.3.0:

10.2.0.4.0> alter table t1 add C_DDL number default 42 not null;

Table altered.
Elapsed: 00:00:48.53

11.2.0.3.0> alter table t1 add C_DDL number default 42 not null;

Table altered.
Elapsed: 00:00:00.04

Notice the difference in the execution times. The C_DDL column was added instantaneously in Oracle Database 11g Release 2, whereas it took almost 49 seconds in Oracle Database 10g Release 2. What is this new mechanism that allows such a rapid execution time when adding a NOT NULL column with DEFAULT value to an existing table? How could 3 million rows be updated in just 4 milliseconds?

Let’s verify that the update has really been carried out (from here on, this chapter will refer to Oracle Database version 11.0.2.3 unless specified otherwise):

SQL> select count(1) from t1;

  COUNT(1)
----------
   3000000

SQL> select count(1) from t1 where c_ddl = 42;

  COUNT(1)
----------
   3000000

Although Oracle altered the table T1 instantaneously, the query shows that all the C_DDL column values have been updated with a default value set to 42. How is this possible? Will the following execution plan be of any help here?

SQL> select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |  3736 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    13 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  2712K|    33M|  3736   (1)| 00:00:45 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(NVL("C_DDL",42)=42)

Note
-----
   - dynamic sampling used for this statement (level=2)

Notice the presence of the NVL function in the predicate part of this execution plan. This optimization technique does not tell the whole story. Behind the scenes, Oracle is still considering the C_DDL column as NULLABLE (which means it has not been updated), and consequently, Oracle is protecting it from the null threat by replacing it with its DEFAULT value 42.

Let’s have a look at the execution plan in the earlier release and locate the difference:

10.2.0.4.0 > select count(1) from t1 where c_ddl = 42;

  COUNT(1)
----------
   3000000

10.2.0.4.0> select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |  4001 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  3000K|  8789K|  4001   (8)| 00:00:09 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("C_DDL"=42)

The absence of the NVL function in the predicate part together with the long time (00:00:48.53 sec) it took to add the column in Oracle Database 10g Release 2 explain the working concept introduced in Oracle Database 11g Release 1 to optimize the addition of a non-null column with a default value to an existing table.

Another clue to what is happening in this technique in Oracle Database 11.2.0.3 can be found by comparing the table size before and after the ALTER TABLE statement:

SQL> SELECT
        segment_name
        ,bytes/1024/1024 mb
    FROM
        dba_segments
    WHERE
        segment_type = 'TABLE'
    AND segment_name = 'T1';

SEGMENT_NAME   MB
------------- ----
T1            112

SQL> alter table t1 add C_DDL number default 42 not null;

SQL> SELECT
        segment_name
        ,bytes/1024/1024 mb
    FROM
        dba_segments
    WHERE
        segment_type = 'TABLE'
    AND segment_name = 'T1';

SEGMENT_NAME   MB
------------- ----
T1            112

Even after the addition of a column, the table size remains unchanged, indicating that the table has not been physically updated.

Simply put, in Oracle Database 11g Release 1 and later, when you add a non-null column with a default value, Oracle will not update all existing rows with this default value. Instead, Oracle will store metadata for this new column (NOT NULL constraint and DEFAULT value) and allow the table to be modified almost instantaneously irrespective of the size of the altered table. Of course, this is possible at the cost of adding an NVL function when retrieving the added column from a table.

The DDL Optimization Mechanism

Now that we understand the wonderful concept of DDL optimization, let’s explore further how Oracle manages this feature to ensure rapidity of DDL while still guaranteeing the correctness of results during data retrieval. This section investigates whether the Oracle Optimizer can accurately estimate the cardinality of such columns and whether this altered column can be used in a column group extension and/or serve as a virtual column.

Table Cardinality Estimation

It’s fairly well known that if the Oracle Optimizer can estimate cardinality accurately, it will almost always generate an optimal execution plan. Hence, it is worth investigating whether the selectivity of a column added via this new technique can be accurately estimated using the following instructions:

SQL> exec dbms_stats.gather_table_stats(user ,'t1', method_opt => 'for all columns size 1' , no_
invalidate => false);

SQL> select /*+ gather_plan_statistics */ count(1) from t1 where C_DDL = 42;

COUNT(1)
--------
3000000

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

---------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.60 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.60 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |   3000K|   3000K|00:00:00.45 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NVL("C_DDL",42)=42)

The Oracle optimizer does a perfect single table cardinality estimation when using the C_DDL column as Starts*E-Rows = A-Rows. This was, by the way, expected given the available statistics, as shown in the following:

SQL>  select
             ut.num_rows*us.density as card
      from
            user_tables ut
           ,user_tab_col_statistics us
      where
            ut.table_name  = us.table_name
      and ut.table_name  = 'T1'
      and us.column_name = 'C_DDL';

 CARD
-------
3000000

Notice that the cost-based optimizer (CBO) is clever enough to distinguish between a function applied to a column behind the scenes and a function applied by the end user directly into the query. In other words, we all know that the selectivity calculated by the CBO on a predicate of the form function (column)= constant is 1 percent of the number of rows in the table (30,000 in this case). However, the CBO got a perfect estimation (3,000,000) despite the hidden NVL function applied by Oracle while retrieving values in the NOT NULL C_DDL column.

Let’s now disturb the uniformly distributed data in C_DDL column a little, gather statistics without a histogram, display the new C_DDL distribution, and check whether the Oracle Optimizer is still able to get an accurate estimate:

SQL> update t1
    set c_ddl = 25
    where
       mod(n1,100) = 0;

30000 rows updated.

SQL> commit;

SQL> begin
dbms_stats.gather_table_stats(user ,'t1'
                          , method_opt => 'for all columns size 1'
                        , no_invalidate => false);
end;
/

SQL> select
        c_ddl
       ,count(1)
    from t1
    group by
        c_ddl;

 C_DDL  COUNT(1)
------- --------
 42     2970000
 25     30000

The update skews the C_DDL column to check the cardinality estimate of the optimizer when the column C_DDL is used in the WHERE clause predicate as follows:

SQL> select /*+ gather_plan_statistics */ count(1) from t1 where C_DDL = 42;

  COUNT(1)
----------
   2970000

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

---------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.40 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.40 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |   1500K|   2970K|00:00:00.50 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NVL("C_DDL",42)=42)

The optimizer is no longer doing a perfect estimation (E-Rows * StartsA-Rows), as shown via the operation in line 2. However, this incorrect estimation is not because of the added column. It is common behavior for all columns with an uneven distribution resulting in a large skew.

C_DDL Column in a Virtual Column

The issue of inaccurate cardinality estimates can be resolved by collecting a histogram for the C_DDL column so that the Oracle CBO is aware of nonuniform distribution. However, in such a situation, it’s often better to create a virtual column instead of collecting a histogram. The C_DDL column can be used as a virtual column and save us from the wrong cardinality estimation:

SQL> alter table t1 add c_ddl_virt number
    generated always as (case when c_ddl = 42 then c_ddl else null end)
    virtual;

SQL> exec dbms_stats.gather_table_stats (user, 't1', method_opt => 'for columns c_ddl_virt size 1', no_invalidate => false);
SQL> select /*+ gather_plan_statistics */ count(1) from t1 where c_ddl_virt = 42;

  COUNT(1)
----------
2970000

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
---------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.85 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.85 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |   2970K|   2970K|00:00:00.70 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(CASE NVL("C_DDL",42) WHEN 42 THEN NVL("C_DDL",42) ELSE NULL
              END =42)

Thanks to the virtual column on the top of the C_DDL column, the CBO now gets a perfect estimate and hence an optimal execution plan. Notice in the predicate part that the NVL function is wrapped around the C_DDL column. Refer to the definition of the virtual column, and you will find that it doesn’t use the NVL function in its definition. It is Oracle that recognizes the C_DDL column through its internal metadata that protects it against null values.

You might also have noticed that to take advantage of the virtual column, the original query was modified and the C_DDL column was replaced by the C_DDL_VIRT column. Several Oracle experts have written that the virtual column has an advantage over a function-based index because it doesn’t require a rewrite of the original query in order for the CBO to be able to use it at least for its estimation. However, this is true only for virtual columns defined using the SQL TRUNC function. Had the original query gone unchanged, the Oracle optimizer would have made a wrong cardinality estimate, as shown in the following corresponding execution plan:

SQL> select /*+ gather_plan_statistics */ count(1) from t1 where C_DDL = 42;

  COUNT(1)
----------
   2970000
---------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.41 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.41 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |   1500K|   2970K|00:00:00.51 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NVL("C_DDL",42)=42)

You can see that the CBO could not detect the presence of the virtual column and hence could not use it in its cardinality estimation.

C_DDL Column in a Column Group Extension

One very important and grossly underused technique for enhancing the optimizer estimation is to use a column group extension. Although the CBO can accurately estimate the cardinality of a single column predicate, it cannot spot a correlation between two columns used in an equality predicate. Let’s investigate whether the C_DDL column, when used in a conjunction predicate with another normal column, presents any anomalies:

SQL> select /*+ gather_plan_statistics */ count(1)
    from t1
    where C_DDL = 42
    and   n2    = 22;

  COUNT(1)
----------
         3

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

----------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.15 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.15 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |      1 |      3 |00:00:00.15 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("N2"=22 AND NVL("C_DDL",42)=42))

Notice that the cardinality estimation at operation 2 is wrong. A cardinality that equals 1 is always a symptom of an incorrect CBO estimation.

Let’s then create a column group extension to help the CBO in its cardinality estimation task:

SQL> SELECT
       dbms_stats.create_extended_stats
        (ownname=>user
        ,tabname=>'t1'
        ,extension=>'(c_ddl,n2)')
    FROM dual;

DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME)
----------------------------------------
SYS_STU5FUD4#KCC03#_TZNTSTN5AV

As you might have already guessed, creating a column group extension results in the creation of the virtual column SYS_STU5FUD4#KCC03#_TZNTSTN5AV, on which we can gather statistics without a histogram and issue again the same query with the two initial conjunction predicates:

SQL> begin
      dbms_stats.gather_table_stats
            (user
            ,'t1'
            ,method_opt    => 'for columns SYS_STU5FUD4#KCC03#_TZNTSTN5AV size 1'
            ,cascade       => true
            ,no_invalidate => false
            );
   end;
  /

SQL> select /*+ gather_plan_statistics */ count(1)
     from t1
     where C_DDL = 42
     and   n2    = 22;

  COUNT(1)
----------
         3

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

---------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.14 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.14 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |      3 |      3 |00:00:00.14 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("N2"=22 AND NVL("C_DDL",42)=42))

The optimizer, with the help of the new extended column group, now can do a perfect estimation. You might be wondering how this perfect estimation results from the presence of the column group extension. There are two ways to back this assumption. The first way is taken directly from the available statistics, and the second way is taken from the corresponding 10053 trace file, as the following shows:

SQL> select
        round (ut.num_rows*us.density) as card
    from
        user_tables ut
       ,user_tab_col_statistics us
     where
         ut.table_name  = us.table_name
     and ut.table_name  = 'T1'
     and us.column_name = 'SYS_STU5FUD4#KCC03#_TZNTSTN5AV';

CARD
----
3

Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
SPD: Return code in qosdDSDirSetup: NODIR, estType = TABLE
  Column (#5): C_DDL(NUMBER)
    AvgLen: 3 NDV: 2 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 25.000000
  Column (#2): N2(NUMBER)
    AvgLen: 5 NDV: 1000000 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 17.000000
  Column (#6): SYS_STU5FUD4#KCC03#_TZNTSTN5AV(NUMBER)
    AvgLen: 12 NDV: 1030000 Nulls: 0 Density: 0.000000
  ColGroup (#1, VC) SYS_STU5FUD4#KCC03#_TZNTSTN5AV
    Col#: 2 5    CorStregth: 1.94
  ColGroup Usage:: PredCnt: 2  Matches Full: #1  Partial:  Sel: 0.0000
  Table: T1  Alias: T1
    Card: Original: 3000000.000000  Rounded: 3  Computed: 2.91  Non Adjusted: 2.91

It is clear from this trace file that the optimizer uses the column group extension to get the cardinality estimation using the following formula:

card = t1(num_rows)/ NDV(SYS_STU5FUD4#KCC03#_TZNTSTN5AV)
card = 3000000.000000 / 1030000 = 2,912621 rounded to 3

When the Default Value of C_DDL Changes

Remember that we have defined the C_DDL column to default to 42 when it is not supplied a specific value. Also remember that this has been recorded in the corresponding metadata column without updating the entire table, resulting in the instantaneous change to the structure of the table. But what if during the life cycle of this table, we decide to change the C_DDL column default value to 0, for example? Will this change the existing metadata?

Before executing this second alter table, and for the sake of clarity, take a look at the actual data distribution of the C_DDL column:

SQL> select
       c_ddl
      ,count(1)
    from t1
    group by
        c_ddl;

 C_DDL   COUNT(1)
------- --------
42      2970000
25      30000

Remember also that the only manual update concerns the C_DDL value 25 because the value 42 comes from Oracle using its internal recorded metadata against the C_DDL column. Now let’s modify the default value:

SQL> alter table t1 modify c_ddl default 0;

Elapsed: 00:00:00.03

Again, this second alter table operation was almost instantaneous. What happens now to the initial query involving the first default value 42? Let’s select rows from table t1 that have the initial default value of C_DDL column:

SQL> select /*+ gather_plan_statistics */ count(1) from t1 where C_DDL = 42;

COUNT(1)
--------
2970000

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

---------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.63 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.63 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |   1500K|   2970K|00:00:00.48 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NVL("C_DDL",42)=42)

The predicate part still uses the NVL function, which was, by the way, expected. But what wasn’t foreseen is that the old initial default value of 42 is still referenced in the predicate part. Have we not changed the default value to 0?

In fact, Oracle is considering that the records present in table t1 at the time of the first alter table have been physically updated with the default value 42 (although this update is only a metadata change). And only a physical update is necessary to put them back to another default value. Altering the t1 table to modify the default value to 0 will not affect the existing records. It will affect only records inserted after the second alter table modification. This is perfectly correct and conforms to the earlier releases when this optimization technique was not used, as shown via the following example:

SQL> create table t2 as select
rownum n1
, trunc ((rownum -1)/3) n2
from dual
connect by level<=10;
SQL> alter table t2 add n3 number default 10;

This alter table was executed in Oracle Database 11.2.0.3, and the key word NOT NULL was omitted so that the DDL optimization technique will not kick in. Selecting from this table will show that the N3 column has been assigned the default value 10, as follows:

SQL> select * from t2;

        N1         N2         N3
---------- ---------- ----------
         1          0         10
         2          0         10
         3          0         10
         4          1         10
         5          1         10
         6          1         10
         7          2         10
         8          2         10
         9          2         10
        10          3         10

Now, if we change the default value from 10 to 20 and add a few rows without supplying an explicit value for the C_DDL column, this is what will happen:

SQL> alter table t2 modify n3 default 20;

SQL> select * from t2;

        N1         N2         N3
---------- ---------- ----------
         1          0         10
         2          0         10
         3          0         10
         4          1         10
         5          1         10
         6          1         10
         7          2         10
         8          2         10
         9          2         10
        10          3         10

SQL> insert into t2(n1,n2) values (10,5);

SQL> select * from t2 where n1 = 10;

        N1         N2         N3
---------- ---------- ----------
        10          3         10
        10          5         20

This example demonstrates that changing the default value of an existing column follows exactly the same algorithm irrespective of the method used to add this column—the conventional method or the Oracle Database 11g new DDL optimization feature. The default value resulting from the second alter table affects only newly inserted rows.

C_DDL Column and Indexes

When a function is applied to a column that figures in the predicate part, it will preclude the use of any index that might exist on this column. In this particular case, will the NVL function that is applied to the C_DDL column impeach an index to be used by the CBO if this column is indexed? That’s what we are going to see next.

Consider the following index:

SQL> create index i1_c_ddl on t1(c_ddl);

Index created.

Elapsed: 00:00:02.14

And query this again:

SQL> select /*+ gather_plan_statistics */ count(1) from t1 where C_DDL = 42;

  COUNT(1)
----------
 2970000

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows | A-Rows |   A-Time   |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |      1 |        |      1 |00:00:00.84 |
|   1 |  SORT AGGREGATE       |          |      1 |      1 |      1 |00:00:00.84 |
|*  2 |   INDEX FAST FULL SCAN| I1_C_DDL |      1 |   1500K|   2970K|00:00:00.70 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("C_DDL"=42)

There is good news to emphasize here: the index is used. The hidden NVL function is not applied on the C_DDL column when accessing it via the index, which explains why the index is used by the CBO. If the index considers the column without its underlying NVL function, then this means that, at index creation time, the index leaf blocks, in contrast to the table blocks, have been entirely filled up with the default value 42 and not with its corresponding stored metadata.

Another way of reinforcing the nonextension of the DDL optimization technique to indexes is the size of the index itself. Had only a metadata change been made, the size of the index would have been zero. This obviously is not the case, as shown here:

SQL>SELECT
         segment_name
        ,bytes/1024/1024 mb
    FROM
        dba_segments
    WHERE
        segment_type = 'INDEX'
    AND segment_name = 'I1_C_DDL';

SEGMENT_NAME  MB
------------ ---
I1_C_DDL     47

But you could argue that this is the normal behavior: an index cannot contain null column values. Let’s then create a composite multicolumn index with a NOT NULL column to protect non-null values of the C_DDL column:

SQL> drop index i1_c_ddl;

Index dropped.

SQL> alter table t1 modify n1 not null;

Table altered.

SQL> create index i2_n1_c_ddl on t1(n1,c_ddl);

Index created.

SQL> select /*+ gather_plan_statistics */ count(1) from t1 where n1= 101 and C_DDL = 42;

  COUNT(1)
----------
  1
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

--------------------------------------------------------------------
| Id  | Operation         | Name        | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |      1 |        |      1 |
|   1 |  SORT AGGREGATE   |             |      1 |      1 |      1 |
|*  2 |   INDEX RANGE SCAN| I2_N1_C_DDL |      1 |      1 |      1 |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=101 AND "C_DDL"=42)

Even when the added C_DDL column is protected against null values by its presence in a composite index, there is no trace of the hidden NVL function applied to the C_DDL column. This example clearly demonstrates that, in contrast to the table blocks where there is no update of the C_DDL column, an index that is created on the same column will see its leaf blocks immediately being populated by the default value of the C_DDL column.

Before finishing this section, let’s look at one more interesting issue. We have seen so far that each time the CBO has decided to visit a table block, it has applied the NVL function to the C_DDL column to ensure retrieving a non-null C_DDL value. But we have seen that this filter is always applied when the table is fully scanned (TABLE ACCESS FULL). Will the CBO apply this NVL function when the t1 table is accessed via index (TABLE ACCESS BY INDEX ROWID)? Let’s engineer a simple case and observe the CBO reaction in this particular situation:

SQL> drop index i2_n1_c_ddl;

SQL> create index i2_n1_c_ddl on t1(n1);

SQL> select /*+ gather_plan_statistics */ count(1) from t1 where n1= 101 and C_DDL = 42;

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

-------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |      1 |        |      1 |
|   1 |  SORT AGGREGATE              |             |      1 |      1 |      1 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| T1          |      1 |      1 |      1 |
|*  3 |    INDEX RANGE SCAN          | I2_N1_C_DDL |      1 |      1 |      1 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NVL("C_DDL",42)=42)
   3 - access("N1"=101)

Notice that the NVL function is also applied on the C_DDL column even when the table t1 is visited via index row ID.

We can now say that each time the CBO visits a table block, via a single block or a multiblock read, it will apply the NVL function to any DDL optimized column it has to filter from those visited table blocks. However, the CBO will not apply the NVL function to the DDL optimized column if this one is acquired from an index leaf block simply because, in contrast to table blocks, index leaf blocks are physically populated by the column default value at the index creation time.

DDL Optimization for NULL Columns

With the arrival of Oracle Database 12c, it is legitimate to ask whether the DDL optimization is still available. Let’s repeat the same experiment for this release:

12c> alter table t1 add C_DDL number default 42 not null;

Elapsed: 00:00:00.02

This alter table operation happens almost instantaneously. The DDL optimization technique kicks in here, too, as shown, and is proven again via the use of the NVL function in the predicate part of the following query:

12c> select count(1) from t1 where c_ddl=42;

  COUNT(1)
----------
   3000000

12c> select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |  3802 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    13 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  3538K|    43M|  3802   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NVL("C_DDL",42)=42)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

But there is a little addition to the DDL optimization in Oracle Database 12c compared to Oracle Database 11g Release 2. In release 12c, the DDL optimization has been extended to null columns having default value. Consider the following alter table done in Oracle Database 11g Release 2 and Oracle Database 12c respectively to clearly appreciate the difference:

11.2.0.3.0> alter table t1 add C_DDL_2 number default 84;

Table altered.

Elapsed: 00:00:58.25

12c> alter table t1 add C_DDL_2 number default 84;

Elapsed: 00:00:00.02

Adding the nullable C_DDL_2 column took 58 milliseconds in Oracle Database 11g Release 2, but it is instantaneously done in Oracle Database 12c.

The invariable Oracle Database 12c table size before and after the alter table operation demonstrates the DDL optimization technique for null columns having a default value:

12c before alter> SELECT
        segment_name
        ,bytes/1024/1024 mb
    FROM
        dba_segments
    WHERE
        segment_type = 'TABLE'
    AND segment_name = 'T1';

SEGMENT_NAME         MB
-------------------- ----
T1                   112

12c after alter> SELECT
        segment_name
        ,bytes/1024/1024 mb
    FROM
        dba_segments
    WHERE
        segment_type = 'TABLE'
    AND segment_name = 'T1';

SEGMENT_NAME         MB
-------------------- ----
T1                   112

This example clearly demonstrates that in Oracle Database 12c, DDL optimization has been extended to include null columns having default values. When you query table t1 to get the distinct values of the newly added column (C_DDL_2), you will realize that all table rows have seen their metadata (default value 84) updated, as shown via the following query:

12c> select c_ddl_2, count(1) from t1 group by c_ddl_2;

C_DDL_2 COUNT(1)
------- ----------
84      3000000

SQL> select count(1) from t1 where c_ddl_2=84;

  COUNT(1)
----------
   3000000

SQL> select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |  3803 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    13 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  3538K|    43M|  3803   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00006$",0)),NULL,NVL("
              C_DDL_2",84),'0',NVL("C_DDL_2",84),'1',"C_DDL_2")=84)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

However, in order to ensure DDL optimization for null columns with a default value, things become more complex than it used to be for non-null columns in the preceding release. We went from a simple implicit use of the NVL function to a complex and exotic predicate part involving a SYS_OP_VECBIT Oracle nondocumented function and a new internal column SYS_NC00006$ in order to honor the default value, as this one has not been physically updated.

In contrast to what you might immediately think, the SYS_NC00006$ column is not a virtual column. It represents instead a hidden system-generated column, as shown in the following:

12c> SELECT
         column_name
        ,virtual_column
        ,hidden_column
        ,user_generated
     FROM
         user_tab_cols
     WHERE
         table_name = 'T1'
     AND column_name = 'SYS_NC00006$';

COLUMN_NAME          VIR HID USE
-------------------- --- --- ---
SYS_NC00006$         NO  YES NO

Even though this column is hidden, it doesn’t preempt us from selecting it:

12c> select
        a.c_ddl_2
       ,a.SYS_NC00006$
     from t1 a
     where c_ddl_2 =84
     and rownum <=5;

C_DDL_2 SYS_NC00006$
------- ------------
     84
     84
     84
     84
     84

The SYS_NC00006$ column will remain null until the C_DDL_2 column is given a value that is not equal to the default value 84. Consider the following inserts:

12c> insert into t1(n1,n2,n3,c1,c_ddl,c_ddl_2) values (0,0,0,'xxxxx',110,130);

12c> insert into t1(n1,n2,n3,c1,c_ddl,c_ddl_2)  values (1,1,1,'xxxxx',140,150);

12c> insert into t1(n1,n2,n3,c1,c_ddl, c_ddl_2)  values (1,1,1,'xxxxx',200,null);

12c> select
         a.c_ddl_2
        ,a.SYS_NC00006$
    from t1 a
    where a.c_ddl_2 in (130,150);

C_DDL_2 SYS_NC00006$
------- ------------
    130 01
    150 01

SQL> select
        a.c_ddl_2
       ,a.SYS_NC00006$
     from t1 a
     where a.c_ddl_2 is null;

C_DDL_2 SYS_NC00006$
------- ------------
        01

Notice that the SYS_NC00006$ column value is no longer NULL when we insert a non-default value into the C_DDL_2 column (including the explicit NULL value).

Putting together the different pieces of the puzzle, we can easily understand exactly what that exotic but simple predicate part reproduced here is doing:

Predicate Information (identified by operation id):
---------------------------------------------------
 2 - filter(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00006$",0)),NULL,NVL("
              C_DDL_2",84),'0',NVL("C_DDL_2",84),'1',"C_DDL_2")=84)

Oracle is simply checking through its system-generated column and determining via the SYS_OP_VECBIT function the default value of the C_DDL_2 column or the real value introduced by an end user via an explicit insert statement.

Let’s mimic what Oracle is doing with the SYS_NC00006$ column values, which are 01 and NULL:

12c> SELECT
          a.c_ddl_2
          ,TO_CHAR(sys_op_vecbit(a.sys_nc00006$,0)) cbo_ddl
     FROM t1 a
     WHERE a.c_ddl_2 IN (130,150) --  supplied values
   UNION ALL
     SELECT
          a.c_ddl_2
          ,TO_CHAR(sys_op_vecbit(a.sys_nc00006$,0)) cbo_ddl
     FROM t1 a
     WHERE a.c_ddl_2 IS NULL       --  supplied value
   UNION ALL
     SELECT
          a.c_ddl_2
         ,TO_CHAR(sys_op_vecbit(a.sys_nc00006$,0)) cbo_ddl
     FROM t1 a
     WHERE c_ddl_2 =84             --  default value
     AND rownum   <=1
    order by c_ddl_2 nulls last;

 C_DDL_2    CBO_DDL
---------- ---------
  84       {null}
  130       1
  150       1
 {null}     1

The C_DDL_2 column has four distinct values: the default (84) and three explicitly inserted values 130, 150, and NULL. When you use a predicate against the C_DDL_2 column to retrieve a row from a table block, Oracle will transform the C_DDL_2 into a new column value (CBO_DDL in this case) based on the hidden SYS_NC00006$ column to compare it against your input bind (or literal) variable. Consequently, it can mimic correctly all the values of the C_DDL_2 column including those having a default value (84) and that have not been physically updated to reflect this default value.

Summary

Oracle Database 11g introduced a wonderful feature that eliminates worry about the continuity of our application when adding, online, a non-null column with default value to a real-life big production table. This feature, called DDL optimization, allows alter table operations to be done not only instantaneously but also without any need to lock the table. We are aware of no restrictions on such an added column that might discourage its use. Oracle Database 12c extended this new alter table feature to include null columns with a default value. The icing on the cake is that there seems to be no noticeable performance side effects when using the altered column in a predicate where clause. The Oracle Database 12c exotic but inoffensive predicate part ensuring correct values of altered columns seems also to be without side effects. This technique would have been dramatically performant if it had been extended to indexes. However, knowing the complexity of index structures compared to table structures, Oracle is not ready to extend this technique to create an index with columns pointing only to their metadata defined in the table data dictionary.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset