Prior to Oracle Database 11g, adding a column with a default value to a real-life production table was painful, particularly for a huge table where a substantial number of records would be enriched with the new column and physically updated with its default value. Such a table modification was far from being a simple operation. Fortunately, starting with Oracle Database 11g, a new optimization technique for improving the performance of data definition language (DDL) operations has been implemented. Adding a non-null column with a default value to an existing table can be done almost instantaneously. This chapter explains how Oracle has internally transformed such a painful table modification into a simple metadata operation. It also shows how to check the optimizer’s cardinality estimate for such added columns and the ability of such columns to serve as virtual columns and to be used in a column group extension. The chapter investigates the performance penalty this new column might introduce when it is selected from a table. It also explores the effect of this optimizing technique against the creation of an index and, finally, examines the effect of the Oracle Database 12c extension of this technique to nullable columns as well.
Consider the following table having 3 million rows:
SQL> create table t1
as select
rownum n1
, trunc ((rownum-1)/3) n2
, trunc(dbms_random.value(rownum, rownum*10)) n3
, dbms_random.string('U', 10) c1
from dual
connect by level <= 3e6;
SQL> desc t1
Name Null? Type
----------------- ----- ---------
1 N1 NUMBER
2 N2 NUMBER
3 N3 NUMBER
4 C1 VARCHAR2(4000 CHAR)
The following SQL statement adds an extra non-null column with a default value of 42 to this table:
SQL> alter table t1 add C_DDL number DEFAULT 42 NOT NULL;
The two crucial keywords DEFAULT
and NOT NULL
represent the keys driving this new feature.
To appreciate the difference in the execution time of the preceding alter table
command, let’s execute it in two different Oracle Database releases: 10.2.0.4.0 and 11.2.0.3.0:
10.2.0.4.0> alter table t1 add C_DDL number default 42 not null;
Table altered.
Elapsed: 00:00:48.53
11.2.0.3.0> alter table t1 add C_DDL number default 42 not null;
Table altered.
Elapsed: 00:00:00.04
Notice the difference in the execution times. The C_DDL
column was added instantaneously in Oracle Database 11g Release 2, whereas it took almost 49 seconds in Oracle Database 10g Release 2. What is this new mechanism that allows such a rapid execution time when adding a NOT NULL
column with DEFAULT
value to an existing table? How could 3 million rows be updated in just 4 milliseconds?
Let’s verify that the update has really been carried out (from here on, this chapter will refer to Oracle Database version 11.0.2.3 unless specified otherwise):
SQL> select count(1) from t1;
COUNT(1)
----------
3000000
SQL> select count(1) from t1 where c_ddl = 42;
COUNT(1)
----------
3000000
Although Oracle altered the table T1
instantaneously, the query shows that all the C_DDL
column values have been updated with a default value set to 42. How is this possible? Will the following execution plan be of any help here?
SQL> select * from table(dbms_xplan.display_cursor);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3736 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | TABLE ACCESS FULL| T1 | 2712K| 33M| 3736 (1)| 00:00:45 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL("C_DDL",42)=42)
Note
-----
- dynamic sampling used for this statement (level=2)
Notice the presence of the NVL
function in the predicate part of this execution plan. This optimization technique does not tell the whole story. Behind the scenes, Oracle is still considering the C_DDL
column as NULLABLE
(which means it has not been updated), and consequently, Oracle is protecting it from the null threat by replacing it with its DEFAULT
value 42.
Let’s have a look at the execution plan in the earlier release and locate the difference:
10.2.0.4.0 > select count(1) from t1 where c_ddl = 42;
COUNT(1)
----------
3000000
10.2.0.4.0> select * from table(dbms_xplan.display_cursor);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4001 (100)| |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T1 | 3000K| 8789K| 4001 (8)| 00:00:09 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C_DDL"=42)
The absence of the NVL
function in the predicate part together with the long time (00:00:48.53 sec) it took to add the column in Oracle Database 10g Release 2 explain the working concept introduced in Oracle Database 11g Release 1 to optimize the addition of a non-null column with a default value to an existing table.
Another clue to what is happening in this technique in Oracle Database 11.2.0.3 can be found by comparing the table size before and after the ALTER TABLE
statement:
SQL> SELECT
segment_name
,bytes/1024/1024 mb
FROM
dba_segments
WHERE
segment_type = 'TABLE'
AND segment_name = 'T1';
SEGMENT_NAME MB
------------- ----
T1 112
SQL> alter table t1 add C_DDL number default 42 not null;
SQL> SELECT
segment_name
,bytes/1024/1024 mb
FROM
dba_segments
WHERE
segment_type = 'TABLE'
AND segment_name = 'T1';
SEGMENT_NAME MB
------------- ----
T1 112
Even after the addition of a column, the table size remains unchanged, indicating that the table has not been physically updated.
Simply put, in Oracle Database 11g Release 1 and later, when you add a non-null column with a default value, Oracle will not update all existing rows with this default value. Instead, Oracle will store metadata for this new column (NOT NULL
constraint and DEFAULT
value) and allow the table to be modified almost instantaneously irrespective of the size of the altered table. Of course, this is possible at the cost of adding an NVL
function when retrieving the added column from a table.
Now that we understand the wonderful concept of DDL optimization, let’s explore further how Oracle manages this feature to ensure rapidity of DDL while still guaranteeing the correctness of results during data retrieval. This section investigates whether the Oracle Optimizer can accurately estimate the cardinality of such columns and whether this altered column can be used in a column group extension and/or serve as a virtual column.
It’s fairly well known that if the Oracle Optimizer can estimate cardinality accurately, it will almost always generate an optimal execution plan. Hence, it is worth investigating whether the selectivity of a column added via this new technique can be accurately estimated using the following instructions:
SQL> exec dbms_stats.gather_table_stats(user ,'t1', method_opt => 'for all columns size 1' , no_
invalidate => false);
SQL> select /*+ gather_plan_statistics */ count(1) from t1 where C_DDL = 42;
COUNT(1)
--------
3000000
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
---------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.60 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.60 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 3000K| 3000K|00:00:00.45 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL("C_DDL",42)=42)
The Oracle optimizer does a perfect single table cardinality estimation when using the C_DDL
column as Starts*E-Rows = A-Rows
. This was, by the way, expected given the available statistics, as shown in the following:
SQL> select
ut.num_rows*us.density as card
from
user_tables ut
,user_tab_col_statistics us
where
ut.table_name = us.table_name
and ut.table_name = 'T1'
and us.column_name = 'C_DDL';
CARD
-------
3000000
Notice that the cost-based optimizer (CBO) is clever enough to distinguish between a function applied to a column behind the scenes and a function applied by the end user directly into the query. In other words, we all know that the selectivity calculated by the CBO on a predicate of the form function (column)= constant
is 1 percent of the number of rows in the table (30,000 in this case). However, the CBO got a perfect estimation (3,000,000) despite the hidden NVL
function applied by Oracle while retrieving values in the NOT NULL C_DDL
column.
Let’s now disturb the uniformly distributed data in C_DDL
column a little, gather statistics without a histogram, display the new C_DDL
distribution, and check whether the Oracle Optimizer is still able to get an accurate estimate:
SQL> update t1
set c_ddl = 25
where
mod(n1,100) = 0;
30000 rows updated.
SQL> commit;
SQL> begin
dbms_stats.gather_table_stats(user ,'t1'
, method_opt => 'for all columns size 1'
, no_invalidate => false);
end;
/
SQL> select
c_ddl
,count(1)
from t1
group by
c_ddl;
C_DDL COUNT(1)
------- --------
42 2970000
25 30000
The update skews the C_DDL
column to check the cardinality estimate of the optimizer when the column C_DDL
is used in the WHERE
clause predicate as follows:
SQL> select /*+ gather_plan_statistics */ count(1) from t1 where C_DDL = 42;
COUNT(1)
----------
2970000
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
---------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.40 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.40 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 1500K| 2970K|00:00:00.50 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL("C_DDL",42)=42)
The optimizer is no longer doing a perfect estimation (E-Rows
* Starts
≠ A-Rows
), as shown via the operation in line 2. However, this incorrect estimation is not because of the added column. It is common behavior for all columns with an uneven distribution resulting in a large skew.
The issue of inaccurate cardinality estimates can be resolved by collecting a histogram for the C_DDL
column so that the Oracle CBO is aware of nonuniform distribution. However, in such a situation, it’s often better to create a virtual column instead of collecting a histogram. The C_DDL
column can be used as a virtual column and save us from the wrong cardinality estimation:
SQL> alter table t1 add c_ddl_virt number
generated always as (case when c_ddl = 42 then c_ddl else null end)
virtual;
SQL> exec dbms_stats.gather_table_stats (user, 't1', method_opt => 'for columns c_ddl_virt size 1', no_invalidate => false);
SQL> select /*+ gather_plan_statistics */ count(1) from t1 where c_ddl_virt = 42;
COUNT(1)
----------
2970000
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
---------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.85 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.85 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 2970K| 2970K|00:00:00.70 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(CASE NVL("C_DDL",42) WHEN 42 THEN NVL("C_DDL",42) ELSE NULL
END =42)
Thanks to the virtual column on the top of the C_DDL
column, the CBO now gets a perfect estimate and hence an optimal execution plan. Notice in the predicate part that the NVL
function is wrapped around the C_DDL
column. Refer to the definition of the virtual column, and you will find that it doesn’t use the NVL
function in its definition. It is Oracle that recognizes the C_DDL
column through its internal metadata that protects it against null values.
You might also have noticed that to take advantage of the virtual column, the original query was modified and the C_DDL
column was replaced by the C_DDL_VIRT
column. Several Oracle experts have written that the virtual column has an advantage over a function-based index because it doesn’t require a rewrite of the original query in order for the CBO to be able to use it at least for its estimation. However, this is true only for virtual columns defined using the SQL TRUNC
function. Had the original query gone unchanged, the Oracle optimizer would have made a wrong cardinality estimate, as shown in the following corresponding execution plan:
SQL> select /*+ gather_plan_statistics */ count(1) from t1 where C_DDL = 42;
COUNT(1)
----------
2970000
---------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.41 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.41 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 1500K| 2970K|00:00:00.51 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL("C_DDL",42)=42)
You can see that the CBO could not detect the presence of the virtual column and hence could not use it in its cardinality estimation.
One very important and grossly underused technique for enhancing the optimizer estimation is to use a column group extension. Although the CBO can accurately estimate the cardinality of a single column predicate, it cannot spot a correlation between two columns used in an equality predicate. Let’s investigate whether the C_DDL
column, when used in a conjunction predicate with another normal column, presents any anomalies:
SQL> select /*+ gather_plan_statistics */ count(1)
from t1
where C_DDL = 42
and n2 = 22;
COUNT(1)
----------
3
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
----------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.15 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.15 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 1 | 3 |00:00:00.15 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("N2"=22 AND NVL("C_DDL",42)=42))
Notice that the cardinality estimation at operation 2 is wrong. A cardinality that equals 1 is always a symptom of an incorrect CBO estimation.
Let’s then create a column group extension to help the CBO in its cardinality estimation task:
SQL> SELECT
dbms_stats.create_extended_stats
(ownname=>user
,tabname=>'t1'
,extension=>'(c_ddl,n2)')
FROM dual;
DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME)
----------------------------------------
SYS_STU5FUD4#KCC03#_TZNTSTN5AV
As you might have already guessed, creating a column group extension results in the creation of the virtual column SYS_STU5FUD4#KCC03#_TZNTSTN5AV
, on which we can gather statistics without a histogram and issue again the same query with the two initial conjunction predicates:
SQL> begin
dbms_stats.gather_table_stats
(user
,'t1'
,method_opt => 'for columns SYS_STU5FUD4#KCC03#_TZNTSTN5AV size 1'
,cascade => true
,no_invalidate => false
);
end;
/
SQL> select /*+ gather_plan_statistics */ count(1)
from t1
where C_DDL = 42
and n2 = 22;
COUNT(1)
----------
3
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
---------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.14 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.14 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 3 | 3 |00:00:00.14 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("N2"=22 AND NVL("C_DDL",42)=42))
The optimizer, with the help of the new extended column group, now can do a perfect estimation. You might be wondering how this perfect estimation results from the presence of the column group extension. There are two ways to back this assumption. The first way is taken directly from the available statistics, and the second way is taken from the corresponding 10053 trace file, as the following shows:
SQL> select
round (ut.num_rows*us.density) as card
from
user_tables ut
,user_tab_col_statistics us
where
ut.table_name = us.table_name
and ut.table_name = 'T1'
and us.column_name = 'SYS_STU5FUD4#KCC03#_TZNTSTN5AV';
CARD
----
3
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for T1[T1]
SPD: Return code in qosdDSDirSetup: NODIR, estType = TABLE
Column (#5): C_DDL(NUMBER)
AvgLen: 3 NDV: 2 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 25.000000
Column (#2): N2(NUMBER)
AvgLen: 5 NDV: 1000000 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 17.000000
Column (#6): SYS_STU5FUD4#KCC03#_TZNTSTN5AV(NUMBER)
AvgLen: 12 NDV: 1030000 Nulls: 0 Density: 0.000000
ColGroup (#1, VC) SYS_STU5FUD4#KCC03#_TZNTSTN5AV
Col#: 2 5 CorStregth: 1.94
ColGroup Usage:: PredCnt: 2 Matches Full: #1 Partial: Sel: 0.0000
Table: T1 Alias: T1
Card: Original: 3000000.000000 Rounded: 3 Computed: 2.91 Non Adjusted: 2.91
It is clear from this trace file that the optimizer uses the column group extension to get the cardinality estimation using the following formula:
card = t1(num_rows)/ NDV(SYS_STU5FUD4#KCC03#_TZNTSTN5AV)
card = 3000000.000000 / 1030000 = 2,912621 rounded to 3
Remember that we have defined the C_DDL
column to default to 42 when it is not supplied a specific value. Also remember that this has been recorded in the corresponding metadata column without updating the entire table, resulting in the instantaneous change to the structure of the table. But what if during the life cycle of this table, we decide to change the C_DDL
column default value to 0, for example? Will this change the existing metadata?
Before executing this second alter table
, and for the sake of clarity, take a look at the actual data distribution of the C_DDL
column:
SQL> select
c_ddl
,count(1)
from t1
group by
c_ddl;
C_DDL COUNT(1)
------- --------
42 2970000
25 30000
Remember also that the only manual update concerns the C_DDL
value 25 because the value 42 comes from Oracle using its internal recorded metadata against the C_DDL
column. Now let’s modify the default value:
SQL> alter table t1 modify c_ddl default 0;
Elapsed: 00:00:00.03
Again, this second alter table
operation was almost instantaneous. What happens now to the initial query involving the first default value 42? Let’s select rows from table t1
that have the initial default value of C_DDL
column:
SQL> select /*+ gather_plan_statistics */ count(1) from t1 where C_DDL = 42;
COUNT(1)
--------
2970000
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
---------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.63 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.63 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 1500K| 2970K|00:00:00.48 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL("C_DDL",42)=42)
The predicate part still uses the NVL
function, which was, by the way, expected. But what wasn’t foreseen is that the old initial default value of 42 is still referenced in the predicate part. Have we not changed the default value to 0?
In fact, Oracle is considering that the records present in table t1
at the time of the first alter table
have been physically updated with the default value 42 (although this update is only a metadata change). And only a physical update is necessary to put them back to another default value. Altering the t1
table to modify the default value to 0 will not affect the existing records. It will affect only records inserted after the second alter table
modification. This is perfectly correct and conforms to the earlier releases when this optimization technique was not used, as shown via the following example:
SQL> create table t2 as select
rownum n1
, trunc ((rownum -1)/3) n2
from dual
connect by level<=10;
SQL> alter table t2 add n3 number default 10;
This alter table
was executed in Oracle Database 11.2.0.3, and the key word NOT NULL
was omitted so that the DDL optimization technique will not kick in. Selecting from this table will show that the N3
column has been assigned the default value 10, as follows:
SQL> select * from t2;
N1 N2 N3
---------- ---------- ----------
1 0 10
2 0 10
3 0 10
4 1 10
5 1 10
6 1 10
7 2 10
8 2 10
9 2 10
10 3 10
Now, if we change the default value from 10 to 20 and add a few rows without supplying an explicit value for the C_DDL
column, this is what will happen:
SQL> alter table t2 modify n3 default 20;
SQL> select * from t2;
N1 N2 N3
---------- ---------- ----------
1 0 10
2 0 10
3 0 10
4 1 10
5 1 10
6 1 10
7 2 10
8 2 10
9 2 10
10 3 10
SQL> insert into t2(n1,n2) values (10,5);
SQL> select * from t2 where n1 = 10;
N1 N2 N3
---------- ---------- ----------
10 3 10
10 5 20
This example demonstrates that changing the default value of an existing column follows exactly the same algorithm irrespective of the method used to add this column—the conventional method or the Oracle Database 11g new DDL optimization feature. The default value resulting from the second alter table
affects only newly inserted rows.
When a function is applied to a column that figures in the predicate part, it will preclude the use of any index that might exist on this column. In this particular case, will the NVL
function that is applied to the C_DDL
column impeach an index to be used by the CBO if this column is indexed? That’s what we are going to see next.
Consider the following index:
SQL> create index i1_c_ddl on t1(c_ddl);
Index created.
Elapsed: 00:00:02.14
And query this again:
SQL> select /*+ gather_plan_statistics */ count(1) from t1 where C_DDL = 42;
COUNT(1)
----------
2970000
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
----------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.84 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.84 |
|* 2 | INDEX FAST FULL SCAN| I1_C_DDL | 1 | 1500K| 2970K|00:00:00.70 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C_DDL"=42)
There is good news to emphasize here: the index is used. The hidden NVL
function is not applied on the C_DDL
column when accessing it via the index, which explains why the index is used by the CBO. If the index considers the column without its underlying NVL
function, then this means that, at index creation time, the index leaf blocks, in contrast to the table blocks, have been entirely filled up with the default value 42 and not with its corresponding stored metadata.
Another way of reinforcing the nonextension of the DDL optimization technique to indexes is the size of the index itself. Had only a metadata change been made, the size of the index would have been zero. This obviously is not the case, as shown here:
SQL>SELECT
segment_name
,bytes/1024/1024 mb
FROM
dba_segments
WHERE
segment_type = 'INDEX'
AND segment_name = 'I1_C_DDL';
SEGMENT_NAME MB
------------ ---
I1_C_DDL 47
But you could argue that this is the normal behavior: an index cannot contain null column values. Let’s then create a composite multicolumn index with a NOT NULL
column to protect non-null values of the C_DDL
column:
SQL> drop index i1_c_ddl;
Index dropped.
SQL> alter table t1 modify n1 not null;
Table altered.
SQL> create index i2_n1_c_ddl on t1(n1,c_ddl);
Index created.
SQL> select /*+ gather_plan_statistics */ count(1) from t1 where n1= 101 and C_DDL = 42;
COUNT(1)
----------
1
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
--------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | INDEX RANGE SCAN| I2_N1_C_DDL | 1 | 1 | 1 |
--------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("N1"=101 AND "C_DDL"=42)
Even when the added C_DDL
column is protected against null values by its presence in a composite index, there is no trace of the hidden NVL
function applied to the C_DDL
column. This example clearly demonstrates that, in contrast to the table blocks where there is no update of the C_DDL
column, an index that is created on the same column will see its leaf blocks immediately being populated by the default value of the C_DDL
column.
Before finishing this section, let’s look at one more interesting issue. We have seen so far that each time the CBO has decided to visit a table block, it has applied the NVL
function to the C_DDL
column to ensure retrieving a non-null C_DDL
value. But we have seen that this filter is always applied when the table is fully scanned (TABLE ACCESS FULL
). Will the CBO apply this NVL
function when the t1
table is accessed via index (TABLE ACCESS BY INDEX ROWID
)? Let’s engineer a simple case and observe the CBO reaction in this particular situation:
SQL> drop index i2_n1_c_ddl;
SQL> create index i2_n1_c_ddl on t1(n1);
SQL> select /*+ gather_plan_statistics */ count(1) from t1 where n1= 101 and C_DDL = 42;
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
-------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 1 | 1 |
|* 3 | INDEX RANGE SCAN | I2_N1_C_DDL | 1 | 1 | 1 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL("C_DDL",42)=42)
3 - access("N1"=101)
Notice that the NVL
function is also applied on the C_DDL
column even when the table t1
is visited via index row ID.
We can now say that each time the CBO visits a table block, via a single block or a multiblock read, it will apply the NVL
function to any DDL optimized column it has to filter from those visited table blocks. However, the CBO will not apply the NVL
function to the DDL optimized column if this one is acquired from an index leaf block simply because, in contrast to table blocks, index leaf blocks are physically populated by the column default value at the index creation time.
With the arrival of Oracle Database 12c, it is legitimate to ask whether the DDL optimization is still available. Let’s repeat the same experiment for this release:
12c> alter table t1 add C_DDL number default 42 not null;
Elapsed: 00:00:00.02
This alter table
operation happens almost instantaneously. The DDL optimization technique kicks in here, too, as shown, and is proven again via the use of the NVL
function in the predicate part of the following query:
12c> select count(1) from t1 where c_ddl=42;
COUNT(1)
----------
3000000
12c> select * from table(dbms_xplan.display_cursor);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3802 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | TABLE ACCESS FULL| T1 | 3538K| 43M| 3802 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(NVL("C_DDL",42)=42)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
But there is a little addition to the DDL optimization in Oracle Database 12c compared to Oracle Database 11g Release 2. In release 12c, the DDL optimization has been extended to null columns having default value. Consider the following alter table
done in Oracle Database 11g Release 2 and Oracle Database 12c respectively to clearly appreciate the difference:
11.2.0.3.0> alter table t1 add C_DDL_2 number default 84;
Table altered.
Elapsed: 00:00:58.25
12c> alter table t1 add C_DDL_2 number default 84;
Elapsed: 00:00:00.02
Adding the nullable C_DDL_2
column took 58 milliseconds in Oracle Database 11g Release 2, but it is instantaneously done in Oracle Database 12c.
The invariable Oracle Database 12c table size before and after the alter table
operation demonstrates the DDL optimization technique for null columns having a default value:
12c before alter> SELECT
segment_name
,bytes/1024/1024 mb
FROM
dba_segments
WHERE
segment_type = 'TABLE'
AND segment_name = 'T1';
SEGMENT_NAME MB
-------------------- ----
T1 112
12c after alter> SELECT
segment_name
,bytes/1024/1024 mb
FROM
dba_segments
WHERE
segment_type = 'TABLE'
AND segment_name = 'T1';
SEGMENT_NAME MB
-------------------- ----
T1 112
This example clearly demonstrates that in Oracle Database 12c, DDL optimization has been extended to include null columns having default values. When you query table t1
to get the distinct values of the newly added column (C_DDL_2
), you will realize that all table rows have seen their metadata (default value 84) updated, as shown via the following query:
12c> select c_ddl_2, count(1) from t1 group by c_ddl_2;
C_DDL_2 COUNT(1)
------- ----------
84 3000000
SQL> select count(1) from t1 where c_ddl_2=84;
COUNT(1)
----------
3000000
SQL> select * from table(dbms_xplan.display_cursor);
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3803 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | TABLE ACCESS FULL| T1 | 3538K| 43M| 3803 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00006$",0)),NULL,NVL("
C_DDL_2",84),'0',NVL("C_DDL_2",84),'1',"C_DDL_2")=84)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
However, in order to ensure DDL optimization for null columns with a default value, things become more complex than it used to be for non-null columns in the preceding release. We went from a simple implicit use of the NVL
function to a complex and exotic predicate part involving a SYS_OP_VECBIT
Oracle nondocumented function and a new internal column SYS_NC00006$
in order to honor the default value, as this one has not been physically updated.
In contrast to what you might immediately think, the SYS_NC00006$
column is not a virtual column. It represents instead a hidden system-generated column, as shown in the following:
12c> SELECT
column_name
,virtual_column
,hidden_column
,user_generated
FROM
user_tab_cols
WHERE
table_name = 'T1'
AND column_name = 'SYS_NC00006$';
COLUMN_NAME VIR HID USE
-------------------- --- --- ---
SYS_NC00006$ NO YES NO
Even though this column is hidden, it doesn’t preempt us from selecting it:
12c> select
a.c_ddl_2
,a.SYS_NC00006$
from t1 a
where c_ddl_2 =84
and rownum <=5;
C_DDL_2 SYS_NC00006$
------- ------------
84
84
84
84
84
The SYS_NC00006$
column will remain null until the C_DDL_2
column is given a value that is not equal to the default value 84. Consider the following inserts:
12c> insert into t1(n1,n2,n3,c1,c_ddl,c_ddl_2) values (0,0,0,'xxxxx',110,130);
12c> insert into t1(n1,n2,n3,c1,c_ddl,c_ddl_2) values (1,1,1,'xxxxx',140,150);
12c> insert into t1(n1,n2,n3,c1,c_ddl, c_ddl_2) values (1,1,1,'xxxxx',200,null);
12c> select
a.c_ddl_2
,a.SYS_NC00006$
from t1 a
where a.c_ddl_2 in (130,150);
C_DDL_2 SYS_NC00006$
------- ------------
130 01
150 01
SQL> select
a.c_ddl_2
,a.SYS_NC00006$
from t1 a
where a.c_ddl_2 is null;
C_DDL_2 SYS_NC00006$
------- ------------
01
Notice that the SYS_NC00006$
column value is no longer NULL
when we insert a non-default value into the C_DDL_2
column (including the explicit NULL
value).
Putting together the different pieces of the puzzle, we can easily understand exactly what that exotic but simple predicate part reproduced here is doing:
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00006$",0)),NULL,NVL("
C_DDL_2",84),'0',NVL("C_DDL_2",84),'1',"C_DDL_2")=84)
Oracle is simply checking through its system-generated column and determining via the SYS_OP_VECBIT
function the default value of the C_DDL_2
column or the real value introduced by an end user via an explicit insert statement.
Let’s mimic what Oracle is doing with the SYS_NC00006$
column values, which are 01
and NULL
:
12c> SELECT
a.c_ddl_2
,TO_CHAR(sys_op_vecbit(a.sys_nc00006$,0)) cbo_ddl
FROM t1 a
WHERE a.c_ddl_2 IN (130,150) -- supplied values
UNION ALL
SELECT
a.c_ddl_2
,TO_CHAR(sys_op_vecbit(a.sys_nc00006$,0)) cbo_ddl
FROM t1 a
WHERE a.c_ddl_2 IS NULL -- supplied value
UNION ALL
SELECT
a.c_ddl_2
,TO_CHAR(sys_op_vecbit(a.sys_nc00006$,0)) cbo_ddl
FROM t1 a
WHERE c_ddl_2 =84 -- default value
AND rownum <=1
order by c_ddl_2 nulls last;
C_DDL_2 CBO_DDL
---------- ---------
84 {null}
130 1
150 1
{null} 1
The C_DDL_2
column has four distinct values: the default (84
) and three explicitly inserted values 130
, 150
, and NULL
. When you use a predicate against the C_DDL_2
column to retrieve a row from a table block, Oracle will transform the C_DDL_2
into a new column value (CBO_DDL
in this case) based on the hidden SYS_NC00006$
column to compare it against your input bind (or literal) variable. Consequently, it can mimic correctly all the values of the C_DDL_2
column including those having a default value (84
) and that have not been physically updated to reflect this default value.
Oracle Database 11g introduced a wonderful feature that eliminates worry about the continuity of our application when adding, online, a non-null column with default value to a real-life big production table. This feature, called DDL optimization, allows alter table
operations to be done not only instantaneously but also without any need to lock the table. We are aware of no restrictions on such an added column that might discourage its use. Oracle Database 12c extended this new alter table
feature to include null columns with a default value. The icing on the cake is that there seems to be no noticeable performance side effects when using the altered column in a predicate where
clause. The Oracle Database 12c exotic but inoffensive predicate part ensuring correct values of altered columns seems also to be without side effects. This technique would have been dramatically performant if it had been extended to indexes. However, knowing the complexity of index structures compared to table structures, Oracle is not ready to extend this technique to create an index with columns pointing only to their metadata defined in the table data dictionary.