CHAPTER 13: Configuring the Optimizer

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 13

Configuring the Optimizer

The cost optimizer determines the most efficient execution plan for a SQL statement. The optimizer depends heavily on the statistics that you (or the database) gather. This chapter explains how to set the optimizer goal and how to control the behavior of the optimizer. You’ll learn how to enable and disable automatic statistics collection by the database and when to collect statistics manually. You’ll learn how to set preferences for statistics collection as well as how to validate new statistics before making them available to the optimizer. The chapter explains how to lock statistics, export statistics, gather system statistics, restore older versions of statistics, and handle missing statistics.

Bind peeking behavior, wherein the optimizer looks at the bind variable values when parsing a SQL statement, can have unpredictable effects on execution plans. The chapter explains adaptive cursor sharing, which is designed to produce execution plans based on the specific values of bind variables.

Collecting statistics on large tables is always problematic, so the chapter shows how to use the incremental statistics gathering feature to speed up statistics collection for large partitioned tables. You’ll also learn how to use the new concurrent statistics collection feature to optimize statistics collection for large tables.

Collecting extension statistics for expressions and column groups improves optimizer performance, and you’ll learn how to collect these types of statistics. The chapter also explains how to let the database tell you which columns in a table are candidates for creating a column group.

13-1. Choosing an Optimizer Goal

Problem

You want to set the cost optimizer goal for your database.

Solution

You can influence the behavior of the cost optimizer by setting an optimizer goal. The optimizer will collect appropriate statistics based on the goal you set. You set the optimizer goal with the optimizer_mode initialization parameter. You can set the parameter to the value ALL_ROWS or FIRST_ROWS_n, as shown here:

optimizer_mode=all_rows
optimizer_mode=first_rows_n         /* n can be 1,10,100 or 1000 */

The default value for the optimizer_mode parameter is ALL_ROWS.

How It Works

The default value for the optimizer_mode parameter, ALL_ROWS, has the goal of maximizing throughput—it minimizes resource use to complete the processing of the entire statement and get all the requested rows. The alternate value of FIRST_ROWS_n uses the goal of response time, which is the time it takes to return the first n number of rows.

If you set the optimizer_mode parameter to FIRST_ROWS_n, all sessions will use the optimizer goal of best response time. However, you can change the optimizer goal just at the session level by executing a SQL statement such as the following:

SQL> alter session set optimizer_mode=first_rows_1;

Note that the ALL_ROWS optimizer mode setting has built-in bias toward full table scans because its goal is to minimize resource usage. The FIRST_ROWS_n setting, on the other hand, favors index accesses because its goal is minimizing response time and thus returns the requested number of rows as fast as possible.

In addition to the optimizer_mode parameter, you can also set the following parameters to influence the behavior of the optimizer:

optimizer_index_caching
optimizer_index_cost_adj
db_file_multiblock_read_count

In general, changing these parameters at the database level can lead to unexpected optimizer behavior, including potential performance deterioration for some queries. The recommended practice is to leave these parameters at their default levels. We, however, do show (Recipe 13-11) how to use one of these parameters (optimizer_index_cost_adj) at the session level to improve the performance of a long-running query by forcing the optimizer to use an index.

13-2. Enabling Automatic Statistics Gathering

Problem

You want to enable automatic statistics gathering in your database.

Tip Oracle recommends enabling automatic optimizer statistics collection.

Solution

You enable automatic statistics collection by using the enable procedure in the DBMS_AUTO_TASK_ADMIN package. Check the status of the auto optimizer stats collection task in the following way:

SQL> select client_name,status from dba_autotask_client;
 
CLIENT_NAME                                 STATUS
-------------------------------           --------
auto optimizer stats collection           DISABLED
auto space advisor                         ENABLED
sql tuning advisor                         ENABLED
SQL>

Execute the dbms_auto_task_admin.enable procedure to enable the automatic statistics collection task:

SQL> begin dbms_auto_task_admin.enable(
  2  client_name=>'auto optimizer stats collection',
  3  operation=>NULL,
  4  window_name=>NULL);
  5  end;
  6  /
 
PL/SQL procedure successfully completed.

Check the status of the auto optimizer stats collection task:

SQL>  SELECT client_name,status from dba_autotask_client;
 
CLIENT_NAME                              STATUS
-------------------------------        --------
auto optimizer stats collection         ENABLED
auto space advisor                      ENABLED
sql tuning advisor                      ENABLED
 
SQL>

You can disable the statistics collection task by using the dbms_auto_task_admin.disable procedure:

SQL> begin
  2  dbms_auto_task_admin.disable(
  3  client_name=> 'auto optimizer stats collection',
  4  operation=> NULL,
  5  window_name=> NULL);
  6  end;
  7  /
 
PL/SQL procedure successfully completed.
 
SQL>

How It Works

Automatic optimizer statistics collection is enabled by default when you create a database with the DBCA. If you’ve disabled automatic statistics collection, you can enable it by executing the procedure shown in the “Solution” section. Once you enable automatic statistics collection, the database collects statistics whenever they get stale; the database determines this based on the changes made to the tables and indexes. Automating statistics collection eliminates all the work involved in collecting statistics yourself.

When you enable automatic optimizer statistics collection, the auto optimizer stats collection task calls the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC procedure. This procedure is virtually identical to the DBMS_STATS.GATHER_DATABASE_STATS procedure—the big difference is that the database will collect the statistics only during the maintenance window you specify. If you don’t specify a maintenance window, the database uses the default maintenance window, which opens every night from 10 p.m. to 6 a.m. and all day on weekends. The “auto optimizer stats collection auto optimizer stats job collects statistics first for those objects that need the new statistics the most. Thus, the auto statistics collection job will first collect statistics for objects that don’t have any statistics or objects that underwent substantial modifications (usually about 10 percent of the rows). This way, if the statistics collection job doesn’t complete before the end of the maintenance window, the database ensures that objects with the stalest statistics are refreshed.

You can query the DBA_OPTSTAT_OPERATIONS view to find out the beginning and ending times for the automatic statistics collection job, as shown here:

SQL> select operation,target,start_time,end_time from dba_optstat_operations
2*   where operation='gather_database_stats(auto)';
 
 
OPERATION                          START_TIME            END_TIME
---------------------  ----------------------------   ---------------------------
gather_database_stats  26-APR-13 10.00.02.970000 PM   26-APR-13 10.03.13.671000 PM
...
SQL>

Tip Automatic statistics collection works very well with OLTP databases whose data changes moderately on a day-to-day basis but not for most data warehouses that perform nightly data loading from ETL jobs.

The DBA_TAB_MODIFICATIONS view stores information about the inserts, deletes, and updates to a table. By default, the OPTIONS parameter for the GATHER_DATABASE_STATS procedure is set to the value GATHER AUTO. This means that once you enable automatic statistics collection, the database will collect statistics for all tables where more than 10 percent of the rows have been affected by insert, delete, and update operations.

Automatic statistics collection by the database works well for most OLTP databases. However, in a data warehouse environment, you may run into issues because the automatic statistics collection job runs during the nightly maintenance window. If your ETL or ELT jobs load data into a table after the auto job has already collected statistics for that table, you could end up with unrepresentative statistics for that table. In a data warehouse environment, it’s a good idea to collect statistics manually right after the load process completes and disable the default automatic statistics collection job.

Note In Oracle Database 12c, the automatic statistics gathering job uses concurrency when gathering statistics.

13-3. Setting Preferences for Statistics Collection

Problem

You want to set default values for the parameters used by the DBMS_STATS procedures that gather various types of statistics.

Solution

Use the appropriate DBMS_STATS.SET_*_PREFS procedure to change the default values for parameters that control statistics collection. Use the following procedures for changing the default values of the parameters used at various levels of statistics collection:

SET_TABLE_PREFS: Lets you specify default parameters to be used by the DBMS_STATS.GATHER_*_STATS procedures for a specific table.
SET_SCHEMA_PREFS: Lets you change the default parameters to be used by the DBMS_STATS.GATHER_*_STATS procedures for all objects in a specific schema.
SET_DATABASE_PREFS: Lets you change the default parameters to be used by the DBMS_STATS.GATHER_*_STATS procedures for the entire database, including all user schemas and system schemas such as SYS and SYSTEM.
SET_GLOBAL_PREFS: Sets global statistics preferences; this procedure lets you change the default statistic collection parameters for any object in the database that doesn’t have an existing preference at the table level. If you don’t set table-level preferences or you don’t set any parameter explicitly in the DBMS_STATS.GATHER_*_STATS procedure, the parameters default to their global settings.

Here’s an example that shows how to set default preferences at the database level by invoking the SET_DATABASE_PREFS procedure:

SQL>  execute dbms_stats.set_database_prefs('ESTIMATE_PERCENT','20'),

Once you set a preference for a parameter at the database level, it applies to all tables in the database. Note that the SET_*_PREFS procedures accept three parameters:

pname refers to the name of the preference, such as the ESTIMATE_PERCENT preference used in the previous example.
pvalue lets you specify a value for the preference. If you specify NULL as the value for the pvalue parameter, the preference’s value will be set to the Oracle default values.
add_sys is an optional parameter that, if set to TRUE, will also include all Oracle-owned tables in the statistics collection process.

How It Works

All DBMS_STATS.GATHER_*_STATS procedures use default values for the various parameters. There are two ways you can handle the specification of values for the various parameters that are part of the procedures in the DBMS_STATS package such as the GATHER_TABLE_STATS procedure. You can specify the preference values when you execute a procedure such as GATHER_TABLE_STATS to collect statistics. Alternatively, you can change the default values of the preferences at the table, schema, database, or global level with the appropriate DBMS_STATS.SET_*_PREFS procedure, as shown in the “Solution” section. If you don’t specify a value for any of the statistics gathering parameters, the database uses the default value for that parameter.

You can find the default value of any preference by executing the DBMS_STATS.GET_PREFS procedure. The following example shows how to find the value of the current setting of the STALE_PERCENT parameter:

SQL> select dbms_stats.get_prefs ('STALE_PERCENT','SH') stale_percent from dual;
 
STALE_PERCENT
-------------
 
10
 
SQL>

You specify similar preferences when you collect statistics at the table, schema, or database level. Here is a description of the various preferences you can specify to control the way the database gathers statistics.

CASCADE

This specifies whether the database should collect index statistics along with the table statistics. If you set CASCADE=TRUE, this is the same as executing the GATHER_INDEX_STATS procedure on all the indexes in the database, besides collecting table and column statistics. The default is the constant DBMS_STATS.AUTO_CASCADE, which means that Oracle will determine whether to collect any index statistics.

DEGREE

This specifies the degree of parallelism the database must use when gathering statistics. Oracle recommends using the default setting of the constant DBMS_STATS.AUTO_DEGREE. Oracle chooses the correct degree of parallelism based on the object and the parallelism-related initialization parameters. When you use the default DBMS_STATS.AUTO_DEGREE setting, Oracle determines the degree of parallelism based on the size of the object. If the object is small enough, Oracle collects statistics serially, and if the object is large, Oracle uses the default degree of parallelism based on the number of CPUs. Note that the default degree is NULL, which means that the database collects statistics using parallelism only if you set the degree of parallelism at the table level with the DEGREE clause.

ESTIMATE_PERCENT

This specifies the percentage of rows the database must use to estimate the statistics. For large tables, estimation is the only way to complete the statistics collection process within a reasonable time. Statistics collection is not a trivial process—it’s resource-intensive and consumes a lot of time for large tables. You can set a value between 0 and 100 for the estimate_percent parameter. A rule of thumb here is that the more uniform a table’s data, the smaller the sample size can be. On the other hand, if a table’s data is highly skewed, you should use a higher sample size to capture the variations in the data distribution. Of course, setting this parameter to 100 means that the database isn’t doing an estimation—it will collect statistics for each row in a table. Often, DBAs set the estimate_percent parameter too high because they’ve had bad experiences with a table when they set a small sample size. If you think the data is uniformly distributed, even a 1 or 2 percent sample will get you very accurate statistics and save you a bunch of time and processing overhead.

Tip In Oracle Database 11g, the NDV (number of distinct values) count, which is a key statistic calculated by setting the estimate_percent parameter to DBMS_STATS.AUTO_SAMPLE_SIZE, is statistically identical to the NDV count calculated by a 100 percent complete statistics collection. The best practice is to start with the AUTO_SAMPLE_SIZE and set your own sample size only if you must.

It’s not easy to select the best size for the estimate_percent parameter. If you set it too high, it’ll take a long time to collect statistics. If you set it too low, you can gather the statistics quickly all right, but those statistics can very well be inaccurate. By default, the database uses the constant DBMS_STATS.AUTO_SAMPLE_SIZE to determine the best sample size. You specify the AUTO value for the estimate_percent parameter in the following way:

SQL> exec dbms_stats.gather_table_stats(NULL, 'MASSIVE_TABLE', estimate_percent=>   dbms_stats.auto_sample_size)

When you set the AUTO value for the estimate_percent parameter, not only does the database automatically determine the sampling size, but it also adjusts the size of the sample as the data distribution changes. NDV is a good criterion to calculate the accuracy of the statistics collected with varying samples sizes. The NDV of a column is defined as follows:

accuracy rate = 1 - (estimated NDV - actual NDV) /actual NDV

The accuracy rate can range from 0 to 100 percent. A 100 percent sample size will always give you a 100 percent accuracy rate. What is significant is that in Oracle 11g, auto sampling provides accuracy rates that are very close to 100 percent and take a fraction of the time it takes to collect complete statistics for a large table.

METHOD_OPT

You can specify two things with the METHOD_OPT parameter: the columns for which the database will collect statistics and the columns on which the database will create histograms. You can also specify the number of buckets in the histograms. You can specify one of the following options, or both options in combination, for this parameter:

FOR ALL [INDEXED  |  HIDDEN]  COLUMNS  [size_clause]
FOR COLUMNS [size_clause]  column  [size_clause]

Note that the HIDDEN option is available only in Oracle Database 12c and not in earlier releases. The FOR ALL option lets you specify that the database must collect statistics for all columns or only for the indexed columns. If you specify the INDEXED COLUMNS option, the database will collect statistics only for those columns that have an index on them. Be careful with this option because the database will not collect statistics on the table’s columns, instead using basic default statistics for the columns. Using FOR ALL INDEXED COLUMNS in a data warehouse environment could be especially problematic because indexes aren’t heavily used in that environment.

The FOR COLUMNS option lets you specify one or more columns on which the database must gather statistics instead of on all columns in a table, which is the default behavior. Here’s how to specify the column clause in this context:

column:= column_name  |  extension name |  extension

The column_name clause refers to the name of the column, and extension can be either a column group (in the format column_name, column_name [,...]) or an expression.

The key clause for both the FOR ALL and FOR COLUMNS options is size_clause. The clause size_clause determines whether the database should collect histograms for a column and under what conditions. One option is to supply an integer value indicating the number of histogram buckets—in the range 1 through 254—that you would like. Here’s an example:

SQL> exec dbms_stats.gather_table_stats('HR','EMPLOYEES',method_opt=> 'for columns size 254 job_id')
 
PL/SQL procedure successfully completed.
SQL>

When you execute this procedure, the database collects histograms, and there will be 254 histogram buckets.

Note A value of 1 for the integer clause (for example, 'FOR ALL COLUMNS SIZE 1') won’t really create any histograms on the columns because all the data is placed into a single bucket. Also, if there’s already a histogram on a table, setting the value 1 for the integer clause will remove it.

Another option for the clause size_clause is to specify one of the following three values:

REPEAT: Specifies that the database must collect histograms on only those columns that already have histograms collected for them; setting the value repeat instead of the integer 1 value ensures that you retain any useful histograms
AUTO: Lets the database determine for which columns it should collect histograms, based on each column’s data distribution (whether it’s uniform or skewed) and the actual column usage statistics
SKEWONLY: Lets the database determine for which columns it should collect histograms, based on each column’s data distribution

Here is an example that specifies SKEWONLY:

SQL> exec dbms_stats.gather_table_stats('HR','EMPLOYEES',method_opt=> 'for all columns size skewonly')
PL/SQL procedure successfully completed.
SQL>

When you specify SKEWONLY, the database will look at the data distribution for each column to determine whether the data is skewed enough to warrant the creation of a histogram.

The default value for the METHOD_OPT parameter is FOR ALL COLUMNS SIZE AUTO. That is, the database will collect statistics for all columns in a table, and it automatically selects the columns for which it should create histograms. Oracle uses the column usage metrics to determine which size to collect.

NO_INVALIDATE

You can set three different values for this parameter. The value TRUE means that the database doesn’t invalidate the dependent cursors of the table for which it’s collecting statistics. The value FALSE means that the database immediately invalidates the dependent cursors. Finally, you can set this parameter to the value DBMS_STATS.AUTO_INVALIDATE to let Oracle decide to invalidate the cursors—this is also the default value for the NO_INVALIDATE parameter.

GRANULARITY

This parameter determines how the database handles statistics gathering for partitioned tables. Here are the various options you can specify for the GRANULARITY parameter:

ALL: Gathers subpartition-, partition-, and global-level statistics; this setting provides an accurate set of table statistics but is extremely resource-intensive and takes much longer to complete than a statistics collection job with the other options.
GLOBAL: Gathers just global statistics for a table.
PARTITION: Gathers only partition-level statistics. The partition-level statistics are rolled up at the table level and may not be very accurate at the table level.
GLOBAL AND PARTITION: Gathers the global- and partition-level statistics but not the subpartition-level statistics.
SUBPARTITION: Gathers only subpartition statistics.
AUTO: This is the default value for the GRANULARITY parameter and determines the granularity based on the partitioning type.

Note that the ALL setting could take a long time to complete besides using up a lot of resources. It’s not really necessary to gather statistics at the subpartition level for composite partitioned tables. In most cases, the default setting of AUTO works well. Before specifying the ALL setting, consider whether the subpartition level statistics are really useful, since they aren’t used by the optimizer unless you specify the subpartition columns in the predicate. Consider using the INCREMENTAL option instead, which may be a better strategy when dealing with partitioned and subpartitioned tables.

PUBLISH

By default, the database publishes all statistics right after it completes the statistics gathering process. You can specify that the database keep newly collected statistics as pending statistics by setting the PUBLISH parameter to FALSE.

INCREMENTAL

The INCREMENTAL preference determines whether the database maintains a partitioned table’s statistics without having to perform a full table scan. The default value of this parameter is FALSE, meaning the database does a full table scan to maintain global statistics. Recipe 13-19 discusses incremental statistics collection in detail.

STALE_PERCENT

The STALE_PERCENT preference determines the proportion of a table’s rows that must change before the database considers the table’s statistics as “stale” and starts gathering fresh statistics. By default, the STALE_PERCENT parameter is set to 10 percent. Don’t collect statistics on tables that haven’t changed at all, or have changed very little, because you’d be collecting unnecessary statistics.

AUTOSTATS_TARGET

This preference is valid only for auto stats collection, and you specify it when setting global statistics preferences with the SET_GLOBAL_STATS procedure. You can set the following values for this preference:

ALL: This collects statistics for all objects in the database.
ORACLE: This collects statistics for all Oracle-owned objects (for example, the SYSMAN user).
AUTO: The database determines for which objects it should collect statistics.

The default value for the AUTOSTATS_TARGET parameter is AUTO. Note that currently the ALL and AUTO (default) settings work the same way. We’ve incorporated several Oracle best practices for statistics collection in this recipe. Try to stick with the default settings for the preferences unless you have strong reasons to do otherwise. Remember that if you’re creating a new table, you can load the data first and collect statistics just for the table. Create the indexes afterward because the database automatically computes statistics for the indexes during index creation time.

13-4. Manually Generating Statistics

Problem

You’re trying to determine whether you should let the database automatically collect the optimizer statistics or whether you must manually collect the statistics.

Solution

In most cases, the automatic statistics collection task is good enough to collect the optimizer statistics. In fact, there are many production databases that automate statistics collection as shown in Recipe 13-2 and never use a manual statistic collection process. However, there are cases where manual statistic collection may be necessary. Here are two cases when you must manually collect statistics.

Volatile Tables

If your database contains volatile tables that experience numerous deletes (or even truncates) throughout the day, then an automatic stats collection job that runs during the nightly maintenance window isn’t adequate. There are a couple of strategies you can use in this type of situation, where a table’s data keeps fluctuating throughout the day:

Collect the statistics when the table has the number of rows that represent its “typical” state. Once you do this, lock the statistics to prevent the automatic statistics collection job from collecting fresh statistics for this table during the nightly maintenance window.
The other option is to make the statistics of the table null.

You make the statistics of a table null by deleting the statistics first and then locking the table’s statistics right after that, as shown in the following example:

SQL> execute dbms_stats.delete_table_stats('OE','ORDERS'),
 
PL/SQL procedure successfully completed.
 
SQL> execute dbms_stats.lock_table_stats('OE','ORDERS'),
 
PL/SQL procedure successfully completed.
 
SQL>

Bulk Loaded Tables

For tables that you bulk load data into, you must collect statistics immediately after loading the data. If you don’t collect statistics right after bulk loading data into a table, the database can still use dynamic statistics to estimate the statistics, but these statistics aren’t as comprehensive as the statistics that you collect.

In releases prior to Oracle Database 12c, you’d use the capability to lock statistics when a large table’s data was frequently refreshed, so you’d have statistics available immediately after data loading. Locking statistics in this case prevents new statistics from being collected. The optimizer uses the old statistics once the data is loaded and available. In Oracle Database 12c, the database automatically gathers table statistics during the following two types of bulk data loads:

Create table as select to create a table and load it
Insert into ...select into an empty table by using the direct path insert technique

In both of these cases, there’s no need to collect any table statistics because the database automatically collects them during the load. This new feature is especially useful in cases where a large table such as data warehouse’s SALES table is loaded nightly.

It’s important to understand that the online statistics feature gathers only table statistics and not index statistics. You can therefore run DBMS_STATS.GATHER_INDEX_STATS after the bulk load is completed to gather the index statistics and histograms, if you need them.

By default the database always gathers statistics for a table when you bulk load the table. You can disable the stats gathering by specifying the NO GATHER OPTIMZIER STATISTICS hint, as shown here:

CREATE TABLE employees2 AS
  SELECT /*+NO_GATHER_OPTIMIZER_STATISTICS */* FROM employees

You can also enable the stats gathering at the statement level by specifying the GATHER OPTIMIZER STATISICS hint.

Automatic statistics gathering during bulk loads doesn’t happen when a table has the following characteristics:

The table is in an Oracle-owned schema such as SYS.
It’s a nested table.
It’s an index-organized table (IOT).
It’s an external table.
It’s a global temporary table defined as ON COMMIT DELETE ROWS.
It’s a table with virtual columns.
It’s a table whose PUBLISH preference is set to FALSE.
It’s a partitioned table where INCREMENTAL is set to true and extended syntax is not used.

How It Works

Before Oracle introduced the automatic optimizer statistics collection feature in the Oracle Database 10g release, every DBA collected scripts using the recommended DBMS_STATS package (or even the older analyze table command). With automatic statistics collection, DBAs don’t have to collect optimizer statistics by scheduling DBMS_STATS jobs. However, you may still run into situations where automatic statistics collection isn’t appropriate. The “Solution” section describes two such cases and how to handle them by manually collecting the statistics.

You can manually collect statistics at the table, schema, or database level by using the appropriate DBMS_STATS.GATHER_*_STATS procedure.

When you’re manually gathering the optimizer statistics, it’s a good idea to stick with the default settings for the various parameters that control how the database collects the statistics. Often, performance of the statistics gathering job (how fast) and the quality of the statistics itself improve when you revert to the default settings. For example, many DBAs set too high a sample size with the estimate_percent parameter, rather than letting the database use the appropriate sample size based on the DBMS_STATS.AUTO_SAMPLE_SIZE constant.

13-5. Locking Statistics

Problem

You want to lock the statistics for a table or a schema to freeze the statistics.

Solution

You can lock a table or a schema’s statistics by executing the appropriate DBMS_STATS.LOCK_* procedures. For example, you can lock a table’s statistics with the LOCK_TABLE_STATS procedure in the DBMS_STATS package, as shown here:

SQL> execute dbms_stats.lock_table_stats(ownname=>'SH',tabname=>'SALES'),
 
PL/SQL procedure successfully completed.
SQL>

You can unlock the table’s statistics by executing the following procedure:

SQL> execute dbms_stats.unlock_table_stats(ownname=>'SH',tabname=>'SALES'),
 
PL/SQL procedure successfully completed.
 
SQL>

You can lock a schema’s statistics with the DBMS_STATS.LOCK_SCHEMA_STATS procedure, as shown here:

SQL> execute dbms_stats.lock_schema_stats('SH'),
 
PL/SQL procedure successfully completed.
SQL>

Unlock the statistics with the UNLOCK_SCHEMA_STATS procedure:

SQL> execute dbms_stats.unlock_schema_stats('SH'),
 
PL/SQL procedure successfully completed.
 
SQL>

How It Works

You may want to lock a table’s statistics to freeze the current set of statistics. You may also lock the statistics after you delete the existing statistics first; in this case, you are forcing the database to use dynamic statistics to estimate the table’s statistics. Deleting a table’s statistics and then locking the statistics is in effect the same as setting the statistics on a table to null. You have the option of setting the force argument with the GATHER_TABLE__STATS procedure to override a table’s lock on its statistics.

Note Locking a table also locks all statistics that depend on that table, such as index, histogram, and column statistics.

13-6. Handling Missing Statistics

Problem

Certain tables in your database are missing statistics because the tables have had data loaded into them outside the nightly batch window. You can’t collect statistics on the table during the day when the database is handling other workload.

Solution

Oracle uses dynamic statistics to compensate for missing statistics. In earlier releases, the databases always gathered dynamic statistics by default when it was confronted by a table with missing optimizer statistics. In Oracle Database 12c, the optimizer determines whether dynamic statistics collection is useful in a particular case, as well as the statistics level to use. In Oracle Database 12c, not only missing statistics but also insufficient statistics can trigger dynamic statistics collection by the optimizer. By default, when optimizer statistics are either missing or not sufficient, the database automatically collects dynamic statistics.

The database will scan a random sample of data blocks in a table when you enable dynamic sampling. You enable/disable dynamic sampling in the database by setting the optimizer_dynamic_sampling initialization parameter. Dynamic sampling is enabled by default, as you can see from the following:

SQL> show parameter dynamic
 
NAME                           TYPE        VALUE
--------------------------     -------     -----
optimizer_dynamic_sampling     integer         2
SQL>

The default level of dynamic sampling is 2; setting it to 0 disables dynamic sampling. You can modify the default value by setting a different sampling level as shown here:

SQL> alter system set optimizer_dynamic_sampling=4 scope=both;
System altered.
 
SQL>

How It Works

Ideally, you should gather optimizer statistics with the DBMS_STATS package (manually or through automatic statistics collection). In cases where you don’t have a chance to collect statistics for a newly created or newly loaded table, the table won’t have any statistics until the database automatically generates the statistics through its automatic stats collection job or when you schedule a manual statistics collection job. Even if you don’t collect any statistics, the database uses some basic statistics, such as table and index block counts, estimated number of rows, join column, and GROUP BY statistics, to estimate the selectivity of the predicates in a query. Dynamic statistics go a step further, augmenting these basic statistics by dynamically gathering additional statistics at compile time. Dynamic sampling is of particular help when dealing with frequently executed queries that involve tables with no statistics.

Note Oracle doesn’t collect dynamic statistics for external tables.

There’s a cost to collecting dynamic statistics because the database uses resources to gather statistics during query compilation. If you don’t execute these queries many times, the database incurs an overhead each time it executes a query involving table(s) for which it must dynamically collect statistics. For dynamic sampling to really pay off, it must help queries that are executed frequently.

It’s important to understand the significance of the dynamic statistics levels, which can range from 0 to 11. The dynamic statistics level controls when Oracle collects dynamic statistics as well as the size of the sample. You can set the dynamic statistics level with the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter or with a SQL hint.

Unlike in previous releases when the database collected dynamic statistics only when tables had no statistics, in Oracle Database 12c the optimizer decides automatically whether dynamic statistics are useful and which statistics level to use. So, the main determinant of whether the database collects dynamic statistics isn’t the presence or absence of statistics; it is whether the available statistics are sufficient to generate optimal execution plans. Whenever the optimizer deems that the existing statistics are insufficient, it collects and uses dynamic statistics.

Automatic dynamic statistics are enabled when any of the following is true:

The OPTIMIZER_DYNAMIC_SAMPLING initialization parameter is set to its default value.
You set the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter to 11.
You invoke the gathering of dynamic statistics through a SQL hint.

The optimizer automatically collects dynamic statistics when there are missing statistics, stale statistics, or insufficient statistics. The optimizer decides whether to use dynamic statistics based on factors such as the following:

SQL statements use parallel execution.
You’ve created a SQL plan directive.
The SQL statement was captured in SQL Plan Management or Automatic Workload Repository or is currently in the shared SQL area.

Note that the sample size used for dynamic statistics at various sampling levels is in terms of data blocks, not rows. Here is a brief description of the various dynamic statistics levels:

Level 0: Doesn’t use dynamic statistics.
Level 1: Uses dynamic statistics for all tables that don’t have statistics, provided there’s at least one nonpartitioned table without statistics in the query; the table must also not have any indexes, and it must be larger than 32 blocks, which is the sample size for this level.
Level 2: Uses dynamic statistics if at least one table in the query has no statistics; the sample size is 64 blocks.
Level 3: Uses dynamic statistics if the query meets the level 2 criteria and it has one or more expressions in a WHERE clause predicate; a sample size is 64 blocks.
Level 4: Uses dynamic statistics if the query meets all level 3 criteria, and in addition, it uses complex predicates such as an OR/AND operator between multiple predicates; the sample size is 64 blocks.
Levels 5–9: Use dynamic statistics if the statement meets the level 4 criteria; each of these levels differs only in the sample size, which ranges from 128 to 4,086 blocks.
Level 10: This level uses dynamic statistics for all statements, and the sample it uses isn’t really a sample because it checks all data blocks to get the statistics.
Level 11: Uses dynamic statistics automatically when the optimizer deems it necessary. The statistics collected by setting this level are stored and made available to other queries.

You can disable dynamic statistics collection by setting the dynamic statistics level to 0:

SQL> alter session set optimizer_dynamic_sampling=0;

Dynamic statistics are a complement to the statistics collected by the DBMS_STATS package’s procedures. Oracle doesn’t expect you to use this in general because of the additional overhead for gathering optimizer statistics during the generation of an execution plan. Dynamic statistics do help in getting better cardinality estimates but are more suitable for longer-running queries in a data warehouse or a decision support system, rather than for queries in an OLTP database, because of the overhead involved. You must also keep in mind that the statistics collected through dynamic sampling are by no means the same as the statistics collected through the DBMS_STATS procedures. Dynamic statistics merely collect rudimentary statistics such as the number of data blocks and the high and low values of columns. If you must set a dynamic statistics level, do so at the session level with an alter session statement, rather than setting it database-wide.

13-7. Exporting Statistics

Problem

You want to export a set of statistics from your production database to a much smaller test database.

Solution

You can export optimizer statistics from one database to another by using the DBMS_STATS.EXPORT_*_STATS procedures. These procedures let you export optimizer statistics from a source table, schema, or database. Once you complete the export of the statistics, you must execute one of the DBMS_STATS.IMPORT_*_STATS procedures to import the statistics into a different database. You can export statistics at the table, schema, or database level. Here’s an example that explains how to export statistics from a table:

Create a table to hold the exported statistics:

SQL> execute dbms_stats.create_stat_table(ownname=>'SH',stattab=>'mytab',
                                                tblspace=>'USERS')
 
PL/SQL procedure successfully completed.
 
SQL>

Export the statistics for the table SH.SALES from the data dictionary into the mytab table, using the DBMS_STATS.EXPORT_*STATS procedure:

SQL> exec dbms_stats.export_table_stats(ownname=> 'SH',tabname=>'SALES',stattab=
>'mytab')
 
PL/SQL procedure successfully completed.
 
SQL>

Export and import the table mytab to the other database.

In a different database, import the statistics using the DBMS_STATS.IMPORT_*STATS procedure:

SQL> exec dbms_stats.import_table_stats(ownname=>'SH',tabname=>'SALES',stattab=>
'MyTab',no_invalidate=>true);
 
PL/SQL procedure successfully completed.
 
SQL>

How It Works

The EXPORT_TABLE_STATS procedure exports the current statistics for a table from the data dictionary and stores them in a table that you create. Note that this procedure doesn’t generate fresh statistics, and the database will continue to use the current statistics for the table (SH.SALES in our example). By default, the cascade option is true, meaning the procedure will export statistics for all indexes in the SH.SALES table along with the column statistics.

You can make the optimizer use the exported statistics only after you import them into the data dictionary, in the same or a different database. The IMPORT_TABLE_STATS procedure imports the statistics you’ve exported earlier into the data dictionary. Setting the no_invalidate parameter to true (the default is false) ensures that any dependent cursors aren’t invalidated. By default, you can’t import a table’s statistics when the statistics are locked. You can override this property by setting the force parameter to true. If you’re importing the statistics into a different database from the one from which you exported the statistics, you must export the table in which you stored the statistics to the target database. You must then import the table into the target database before you can execute the IMPORT_TABLE_STATS procedure.

Exporting and importing statistics is an ideal way to present the same statistics to the optimizer in a test system as those in a production system to ensure consistent explain plans. It’s also a good strategy when you want to preserve a known set of good statistics for a longer period than what is allowed by the “restore statistics” feature explained in Recipe 13-8. The ability to export and import statistics enables you to test different sets of statistics before deciding which set of parameters is the best for your database.

In this recipe, we showed you how to export and import table-level statistics. The DBMS_STATS package also contains procedures to export and import statistics at the column, index, schema, and database levels. In addition, there are procedures for exporting and importing dictionary statistics, statistics for fixed objects, and system statistics.

13-8. Restoring Previous Versions of Statistics

Problem

The performance of certain queries has deteriorated suddenly after collecting fresh statistics. You want to see whether you can use an older set of statistics that you know worked well.

Solution

Use the DBMS_STATS.RESTORE_STATS procedure to revert to an older set of optimizer statistics. Before you restore older statistics, check how far back you can go to restore older statistics:

SQL> select dbms_stats.get_stats_history_availability from dual;
 
GET_STATS_HISTORY_AVAILABILITY
-----------------------------------------
19-APR-OCT13 07.49.26.718000000 AM -04:00
 
SQL>

The output of this query shows that you can restore statistics to a timestamp that’s more recent than the timestamp shown, which is 19-APR-11 07.49.26.718000000 AM -04:00.

Execute the RESTORE_*_STATS procedures of the DBMS_STATS package to revert to statistics from an earlier period. The following example shows how to restore statistics at the schema level:

SQL> exec dbms_stats.restore_schema_stats(ownname=>'SH',as_of_timestamp=>'19-OCT-13 01.30.31.323000 PM -04:00',no_invalidate=>false)
 
PL/SQL procedure successfully completed.
 
SQL>

How It Works

When the database collects fresh statistics, it doesn’t get rid of the previous set of statistics. Instead, it retains the older versions for a set number of days. You have the ability to restore older statistics by executing the DBMS_STATS.RESTORE_*_STATS procedures, which replace the current statistics with the statistics from the time you specify. Restore statistics when you want the optimizer to use the same execution plans as it did when it had access to an older set of statistics. By default, the database manages the retention and purging of historical statistics. Here’s how to find out how many days of statistics the database retains by default:

SQL> select dbms_stats.get_stats_history_retention from dual;
 
GET_STATS_HISTORY_RETENTION
---------------------------
                         31
SQL>

The database automatically purges statistics it has collected more than 31 days ago (provided newer statistics exist!). You can manually purge all old versions of statistics by executing the DBMS_STATS.PURGE_STATS procedure. You can change the number of days the database retains statistics by executing the following command:

SQL> exec dbms_stats.alter_stats_history_retention(retention=>60);
 
PL/SQL procedure successfully completed.
 
SQL>

The command tells the database to save historical statistics for a period of 60 days.

In the example shown in the “Solution” section, we showed how to restore statistics for a schema. You can similarly restore statistics for a database with the RESTORE_DATABASE_STATS procedure or for a table with the RESTORE_TABLE_STATS procedure. You can also restore dictionary stats with the RESTORE_DICTIONARY_STATS procedure and system stats with the RESTORE_SYSTEM_STATS procedure.

13-9. Gathering System Statistics

Problem

You know the optimizer uses I/O and CPU characteristics of a system during the selection of an execution plan. You want to ensure that the optimizer is using accurate system statistics.

Solution

You can collect two types of system statistics to capture your system’s I/O and CPU characteristics. You can collect workload statistics or noworkload statistics to enable the optimizer to better estimate the true I/O and CPU costs, which are a critical part of query optimization.

When the database gathers noworkload statistics , it simulates a workload. Here’s how you collect noworkload statistics, using the DBMS_STATS.GATHER_SYSTEM_STATS procedure:

SQL> execute dbms_stats.gather_system_stats()
 
PL/SQL procedure successfully completed.
 
SQL>

You can also gather system statistics while the database is processing a typical workload. These system statistics, called workload statistics, are more representative of actual system I/O and CPU characteristics and present a more accurate system hardware picture to the optimizer. You can collect workload system statistics by executing the DBMS_STATS.GATHER_SYSTEM_STATS procedure with the start and stop options:

SQL> execute dbms_stats.gather_system_stats('start')
 
PL/SQL procedure successfully completed.
SQL>

You can execute the previous command before the beginning of the workload window. Once the workload window ends, stop the system statistics gathering by executing the following command:

SQL> execute dbms_stats.gather_system_stats('stop')
 
PL/SQL procedure successfully completed.
 
SQL>

You can also execute the GATHER_SYSTEM_STATS procedure with an interval parameter to instruct the database to collect workload system statistics over a period of time that you specify and automatically stop the statistics gathering process at the end of the period. Here’s an example:

SQL> execute dbms_stats.gather_system_stats('interval',90);
PL/SQL procedure successfully completed.
SQL>

The previous command collects workload statistics for 90 minutes.

Once you collect noworkload or workload system statistics, you can check the values captured for the various system statistics in the sys.aux_stats$ view, shown in the next section.

Tip Oracle highly recommends the gathering of system statistics in order to provide more accurate CPU and I/O cost estimates to the optimizer.

How It Works

Accurate system statistics are critical for the optimizer’s evaluation of alternative execution plans. It’s through its estimates of various system performance characteristics such as I/O speed and CPU speed that the optimizer calculates the cost of, say, a full table scan versus an indexed read.

You can pass up to nine optimizer system statistics to the optimizer by collecting system statistics. The database gathers the first three statistics during a noworkload simulated statistics gathering process. It gathers all nine system statistics during a workload mode system statistics collection. Here’s a summary of the nine system statistics:

cpuspeedNW: This shows the noworkload CPU speed, in terms of the average number of CPU cycles per second.
ioseektim: This is the sum of seek time, latency time, and OS overhead time.
iotfrspeed: This stands for I/O transfer speed and tells the optimizer how fast the database can read data in a single read request.
cpuspeed: This stands for CPU speed during a workload statistics collection.
maxthr: This shows the maximum I/O throughput.
slavethr: This shows the average parallel slave I/O throughput.
sreadtim: The Single Block Read Time statistic shows the average time for a random single-block read.
mreadtim: The Multiblock Read Time statistic shows the average time (in milliseconds) for a sequential multiblock read.
mbrc: The Multi Block Read Count statistic shows the average multiblock read count in blocks.

When you collect the noworkload system statistics, the database captures only the cpuspeedNW, ioseektim, and iotfrspeed system statistics. Here’s a query that shows the default system statistics in an Oracle 11g database (on a Windows system):

SQL> select pname, pval1 from sys.aux_stats$ where sname = 'SYSSTATS_MAIN';
 
PNAME                               PVAL1
-----------------              ----------
CPUSPEED
CPUSPEEDNW                     1183.90219
IOSEEKTIM                              10
IOTFRSPEED                           4096
MAXTHR
MBRC
MREADTIM
SLAVETHR
SREADTIM
 
9 rows selected.
 
SQL>

The database uses noworkload systems statistics by default, with the values of the three noworkload statistics—I/O transfer speed (IOTFRSPEED), I/O seek time (IOSEEKTIM), and CPU speed (CPUSPEEDNW )—initialized to default values when you start the instance. Once you collect the noworkload statistics as shown in the “Solution” section, some or all of the three noworkload system statistics may change. In our case, once we collected the noworkload statistics, the value of CPUSPEEDNW changed to 2039.06 and the value of the IOSEEKTIM statistic changed to 14.756. However, the value of the IOTFRSPEED statistic remained constant at 4096.

Caution You must ensure that you collect the system statistics during a relatively busy time in your database, a time that’s representative of peak workloads for your database.

If you notice that the sys.aux_stats$ view continues to show the default values for noworkload statistics even after you manually gather the statistics a few times, you can manually set the statistics values to known specifications of your I/O or CPU system by using the DBMS_STATS.SET_SYSTEM_STATS procedure. You can use this procedure to set values for any of the nine system statistics.

When you gather system statistics in the workload mode, you’ll see values for some or all of the remaining six system statistics. In our example, these are the system statistics collected by running the GATHER_SYSTEM_STATS procedure in the workload mode.

SQL>  select pname, pval1 from sys.aux_stats$ where sname = 'SYSSTATS_MAIN';
 
PNAME                               PVAL1
------------------             ----------
CPUSPEED                             2040
CPUSPEEDNW                        2039.06
IOSEEKTIM                          14.756
IOTFRSPEED                           4096
MAXTHR
MBRC                                    7
MREADTIM                        46605.947
SLAVETHR
SREADTIM                        51471.538
 
9 rows selected.
 
SQL>

If the database performs any full table scans during the workload statistics collection period, Oracle uses the value of the mbrc and the mreadtim statistics to estimate the cost of a full table scan. In the absence of these two statistics, the database uses the value of the db_file_multiblock_read_count parameter to estimate the cost of full table scans.

You can delete all system statistics by executing the DELETE_SYSTEM_STATS procedure:

SQL> execute dbms_stats.delete_system_stats()
 
PL/SQL procedure successfully completed.
 
SQL>

According to Oracle, collecting workload statistics doesn’t impose an additional overhead on your system. However, ensure that you specify a specific interval or stop the statistics collection after a brief period to avoid potential overhead.

13-10. Validating New Statistics

Problem

You’re collecting new statistics, but you don’t want the database to automatically use those statistics until you confirm that they will not bring in worse performance than what you have now.

Solution

In this example, we’ll show how to keep the database from automatically publishing new statistics for a table. Here are the procedures you must follow to do this:

Execute the following statement to keep the database from automatically publishing new statistics it collects for the SH.SALES table:
```
SQL> execute dbms_stats.set_table_prefs('SH','SALES','PUBLISH','false'),
 
PL/SQL procedure successfully completed.
 
SQL>
```
The statement sets the preference for the PUBLISH parameter to false (default=true) for the SH.SALES table. From here on, the database won’t automatically publish the statistics you collect for the SH.SALES table. Rather, it keeps those statistics in abeyance, pending your approval. These statistics are called pending statistics, because the database hasn’t made them available to the optimizer yet.

Collect new statistics for the SH.SALES table:

SQL> exec dbms_stats.gather_table_stats('sh','sales'),
 
PL/SQL procedure successfully completed.
SQL>

Tell the optimizer to use the newly collected pending statistics so you can test your queries with those statistics:
```
SQL>  alter session set optimizer_use_pending_statistics=true;
 
Session altered.
 
SQL>
```
Perform your tests by running a workload against the SH.SALES table and checking the performance and the execution plans.

If you’re happy with the new set of (pending) statistics, make them public by executing this statement:

SQL> execute dbms_stats.publish_pending_stats('SH','SALES'),
 
PL/SQL procedure successfully completed.
 
SQL>

If you want to delete the new statistics instead, execute the following command:

SQL> exec dbms_stats.delete_pending_stats('SH','SALES'),
 
PL/SQL procedure successfully completed.
 
SQL>

How It Works

Cursors are invalidated in a rolling fashion since the Oracle Database 10g release. Before that release, Oracle used to mark a cursor invalid immediately after you collected new statistics. Starting with the Oracle Database 10g release, Oracle marks a cursor for a “rolling cursor invalidation” instead of marking it invalid right away. When you next execute the query behind the cursor, Oracle generates a random number that takes a value between 0 and the value of the undocumented initialization parameter _optimizer_invalidation_period (the default value is 18,000 seconds, which is 5 minutes). Oracle will then keep the cursor valid for the same number of seconds as the value of the random number. Oracle will hard-parse the query only after the random timeout expires. This means that after this timeout period is completed, Oracle will hard-parse the query using the new statistics that you’ve collected.

However, you can specify that the database not automatically use the new statistics it collects until you decide that the statistics are going to improve or at least don’t degrade current execution plans. You do this by keeping the new statistics in a pending state. Making the statistics available to the optimizer so it can use them in figuring out execution plans is called publishing the statistics. The database stores published statistics in its data dictionary. If you aren’t sure about the efficacy of a new set of statistics, you can keep the database from automatically publishing statistics until you complete testing them first. When you keep statistics in the pending state, the database won’t store them in the data dictionary; instead, it stores them in a private area and makes those statistics available to the optimizer only if you set the optimizer_use_pending_statistics parameter to true.

After specifying that the database must keep newly collected statistics in the pending status, you can choose to either publish the new statistics or delete them. Use the publish_pending_stats procedure to publish the statistics and the delete_pending_stats procedure to delete the statistics. If you delete the pending statistics for an object, the database will use existing statistics for that object.

In this example, we showed how to change the PUBLISH setting for statistics at the table level. You can also do this at the schema level but not at the database level. If working at the schema level, you need to run the following statements instead (the schema name is SH):

SQL> execute dbms_stats.set_schema_prefs('SH','PUBLISH','false'),
SQL> execute dbms_stats.publish_pending_stats(null,null);
SQL> execute dbms_stats.delete_pending_stats('SH'),

13-11. Forcing the Optimizer to Use an Index

Problem

You know that using a certain index on a column is going to speed up a query, but the optimizer doesn’t use the index in its execution plans. You want to force the optimizer to use the index.

Solution

You can force the optimizer to use an index when it isn’t doing so by adjusting the optimizer_index_cost_adj initialization parameter. You can set this parameter at the system or session level. Here’s an example that shows how to set this parameter at the session level:

SQL> alter session set optimizer_index_cost_adj=50;
 
Session altered.
SQL>

The default value for the optimizer_index_cost_adj parameter is 100, and you can set the parameter to a value between 0 and 10000. The lower the value of the parameter, the more likely it is for the optimizer to use an index.

How It Works

The optimizer_index_cost_adj parameter lets you adjust the cost of an index access. The optimizer uses a default value of 100 for this parameter, which means that it evaluates an indexed access path based on the normal costing model. Based on the optimizer’s estimate of the cost of performing an indexed read, it makes the decision of whether to use the index. Usually this works fine. However, in some cases, the optimizer doesn’t use an index even if it leads to a better execution plan because the optimizer’s estimates of the cost of the indexed access path may be off. Since it uses a default value of 100 for the optimizer_index_cost_adj parameter, you make the index cost seem lower to the optimizer by setting this parameter to a smaller value. Any value less than 100 makes the use of an index look cheaper (in terms of the cost of an indexed read) to the optimizer. Often, when you do this, the optimizer starts using the index you want it to use. In our example, we set the optimizer_index_cost_adj parameter to 50, making the cost of an index access path appear half as expensive as its normal cost (100). The lower you set the value of this parameter, the cheaper an index cost access path appears to the optimizer, and the more likely it will be to prefer an index access path to a full table scan.

We recommend that you set the optimizer_index_cost_adj parameter only at the session level for a specific query because it has the potential to change the execution plans for many queries if you set it at the database level. By default, if you set the ALL_ROWS optimizer goal, there’s a built-in preference for full table scans by the optimizer. By setting the optimizer_index_cost_adj parameter to a value less than 100, you’re inducing the optimizer to prefer an index scan over a full table scan. Use the optimizer_index_cost_adj parameter with confidence, especially in an OLTP environment, where you can experiment with low values such as 5 or 10 for the parameter to force the optimizer to use an index.

By default, the optimizer assumes that the cost of a multiblock read I/O associated with a full table scan and the single-block read cost associated with an indexed read are identical. However, a single-block read is likely to be less expensive than a multiblock read. The optimizer_index_cost_adj parameter lets you adjust the cost of a single-block read associated with an index read more accurately to reflect the true cost of an index read vis-à-vis the cost of a full table scan. The default value of 100 means that a single-block read is 100 percent of a multiblock read, so it’s telling the optimizer to treat the cost of an indexed read as identical to the cost of a multiblock I/O full table scan. When you set the parameter to a value of 50, you’re telling the optimizer that the cost of a single-block I/O (index read) is only half the cost of a multiblock I/O. This makes the optimizer choose the indexed read over a full table scan.

Note that accurate system statistics (mbrc, mreadtim, sreadtim, and so on) have a bearing on the use of indexes versus full table scans. Ideally, you should collect workload system statistics and leave the optimizer_index_cost_adj parameter alone. You can also calculate the relative costs of a single-block read and a multiblock read and set the optimizer_index_cost_adj parameter value based on those calculations. However, the best strategy is to simply use the parameter at the session level for a specific statement and not at the database level. Simply experiment with various levels of the parameter until the optimizer starts using the index.

You can also use a more “scientific” way to figure out the correct setting for the optimizer_index_cost_adj parameter by setting it to a value that reflects the “true” difference between single and multiblock reads. You can simply compare the average wait times for the db file sequential read wait event (represents a single-block I/O) and the db file scattered read wait event (represents multiblock I/O) to arrive at an approximate value for the optimizer_index_cost_adj parameter. Issue the following query to view the average wait times for both of the wait events:

 SQL> select event, average_wait from v$system_event
      where event like 'db file s%read';
EVENT                                   AVERAGE_WAIT
-----------------------                 ------------
db file sequential read                          .91
db file scattered read                          1.41
 
SQL>

Based on the output of this query, single-block sequential reads take roughly 75 percent of the time it takes to perform a multiblock (scattered) read. This indicates that the optimizer_cost_index_adj parameter should be set to somewhere around 75. However, as we mentioned earlier, setting the parameter at the database level isn’t recommended; instead, use this parameter sparingly for specific statements where you want to force the use of an index.

13-12. Enabling Query Optimizer Features

Problem

You’ve upgraded your database, but you want to ensure the query plans don’t change because of new optimizer features in the new release.

Solution

By default, the database enables all query optimizer features in the current database version. You can control the set of optimizer features enabled in a database by setting the optimizer_features_enable initialization parameter. For example, if you’re running an Oracle Database 11g Release 2 database, the optimizer features are set to the 11.2 release, as shown here:

SQL> show parameter optimizer_features_enable;
 
NAME                          TYPE       VALUE
-------------------------     ------     --------
optimizer_features_enable     string     12.1.0.1
SQL>

You can set the optimizer features of a database to an earlier release by setting the optimizer_features_enable parameter to a different value from its default value (same as the database release). For example, in a 12.1 release, you can do this:

SQL> alter system set optimizer_features_enable='11.2.0.4';
 
System altered.
 
SQL>

You can now check the current value of the parameter:

SQL> show parameter optimizer_features_enable;
 
NAME                         TYPE      VALUE
-------------------------    ------    --------
optimizer_features_enable    string    11.2.0.4
SQL>

You can set the optimizer_features_enable parameter to any past major release or a point release, all the way back to the Oracle Database 8.0 release.

How It Works

Setting the optimizer_features_enable parameter to the value of the previous database release ensures that when you upgrade the database, the optimizer will behave similarly to the way it did before the upgrade. We say “similarly” and not “exactly the same way” because often some bug fixes and changes in the ways statistics are collected (especially changes in the collection of system statistics) may make the old release perform somewhat differently from how it did earlier. This is a strategy that DBAs commonly use to ensure that query plans don’t suddenly deteriorate following an upgrade. Once you understand the new optimizer features better, you can set the value of the optimizer_features_enable parameter to the same value as the upgraded database release.

Of course, you won’t be able to take advantage of any of the new optimizer features when you set the optimizer_features_enable parameter to a lower value than the current release, but you aren’t going to be surprised by any sudden changes in the execution plans either. Optimizer features don’t change drastically between releases, but it all depends on the database release. For example, there are six major new optimizer features in the 11.1.0.6 release that weren’t in the 10.2.0.2 release. These include the enhanced bind peeking feature and the ability to use extended statistics to estimate selectivity. Different applications will behave differently following the introduction of a new optimizer feature—that’s where the ability to retain the current optimizer feature set during an upgrade provides you a safety net. You get the opportunity to fully test and understand the implications of the new optimizer features before enabling them in a production database.

The example shown in the “Solution” section shows how to set the optimizer features level for an entire database. You can, however, enable it just at the session level (alter session ...) to test for regressions in execution plans following an upgrade. You can also specify the release number with a hint so you can test a query with optimizer features from a specific release, as shown here in an 11.2 release database:

SQL>  select /*+ optimizer_features_enable ('11.1.0.6')  */ sum(sales)
      from sales
      order by product_id;

This SELECT statement was executed in an 11.2 release database but uses optimizer features from the 11.1 release.

13-13. Keeping the Database from Creating Histograms

Problem

You think that the presence of a histogram on a particular column is leading to suboptimal execution plans. You want the database not to use any histograms on that column.

Solution

You need to do two things if you want to keep Oracle from using the histogram it’s automatically collecting on a column:

Drop the histogram by executing the DELETE_COLUMN_STATS procedure:

SQL> begin
  2  dbms_stats.delete_column_stats(ownname=>'SH',tabname=>'SALES',
  3  colname=>'PROD_ID',col_stat_type=>'HISTOGRAM'),
  4  end;
  5  /
 
PL/SQL procedure successfully completed.
 
SQL>

Once you drop the histogram, tell Oracle not to create a histogram on the PROD_ID column by executing the following SET_TABLE_PREFS procedure:

SQL> begin
  2  dbms_stats.set_table_prefs('SH','SALES','METHOD_OPT','FOR  ALL COLUMNS SIZE
     AUTO,
     FOR COLUMNS  SIZE 1 PROD_ID'),
  3  end;
  4  /
 
PL/SQL procedure successfully completed.
 
SQL>

How It Works

For various reasons, DBAs sometimes want to keep the optimizer from using a histogram on a column. If there’s already a histogram on a column, you must first get rid of it and then use the dbms_stats.set_table_prefs procedure to keep the database from creating a histogram on that column. In the Oracle Database 10g release, you drop the histogram first, freeze the statistics (with the lock_table_stats procedure), and then manually collect statistics on the table, specifying that the database must not collect statistics for the column for which you dropped the histogram. Because you locked the statistics, you must also specify the force=true option when executing the dbms_stats.gather_table_stats procedure to manually collect statistics on a table. As you can see, the dbms_stats.set_table_prefs procedure in the 11g release makes things a lot simpler.

In the command shown in the “Solution” section, the FOR ALL COLUMNS SIZE AUTO option tells the database to create histograms on any column that Oracle deems skewed. However, FOR COLUMNS SIZE 1 PROD_ID tells the database not to create a histogram for the column PROD_ID in the SH.SALES table. The SIZE column accepts values from 1 to 254, with the integer number you specify representing the number of buckets in the histogram. Telling the database to use just a single bucket (N=1) means that all data will be in a single bucket; in other words, the database won’t create a histogram on that column.

13-14. Improving Performance When Not Using Bind Variables

Problem

For various reasons, your developers didn’t specify bind variables in the code. You notice heavy latch contention and poor response times because of the nonuse of bind variables. You want to improve the performance of the database in a situation like this, where you can’t change existing code.

Solution

If your applications aren’t using bind variables, there will be an increase in expensive hard-parsing in the database. To avoid this, you need to set the cursor_sharing initialization parameter to a nondefault value. The default value for this parameter is EXACT. You can set the cursor_sharing parameter to FORCE to determine which SQL statements can share the same cursors.

Here’s how you can set the cursor_sharing parameter to force:

SQL> alter system set cursor_sharing=force;

Setting the cursor_sharing parameter to a nondefault value has several implications, as the next section explains.

How It Works

The best practice in writing SQL code is to use bind variables so the SQL statements are shareable. During the parse stage, the optimizer will compare a SQL statement’s text with the texts of existing statements that are stored in the shared pool. The database considers the current statement identical to another statement only if it matches the other statement in all respects, including each character, space, and even case. When you leave the cursor_sharing parameter at its default value of EXACT, Oracle will reuse the shared SQL area when it reexecutes a SQL statement that uses bind variables. There’s no need for hard-parsing the new statement because a parsed version already exists in the shared pool. The new statement can use an existing cursor (it’s called a shared cursor) and not create its own parent cursor.

If the code doesn’t use bind variables but the new SQL statement the database is parsing is the same in all respects to a previously parsed statement in the shared pool, the statement is considered similar to the previous statement.

By default, the database shares cursors when SQL statements are identical but not when they are similar. The database will perform a heavy amount of hard-parsing if applications use literal values instead of bind variables, and in a busy system, it could put enormous pressure on the shared pool and the cursor cache. You can make the database share cursors when the new statement is similar (but not identical) to an existing parsed statement by setting the cursor_sharing parameter to FORCE. Setting the cursor_sharing parameter to FORCE lets the database replace the literal values with system-generated bind variables. The goal here is to reduce the number of parent cursors in the shared pool. Sharing cursors even when the application doesn’t use bind variables relieves the pressure on the shared pool by reducing the number of parent cursors in the cursor cache (in the shared pool). Leaving the cursor_sharing parameter at its default value will make the database perform a hard parse if the statement it’s parsing isn’t identical to an existing statement in the shared pool. However, if you set the parameter to FORCE, the database will perform the much cheaper soft parse when it finds a similar statement in the shared pool.

When to Set CURSOR_SHARING to a Nondefault Value

Ideally, you should leave the cursor_sharing parameter at its default value of EXACT. However, if your response time is suffering because of a heavy amount of library cache misses and the SQL statements aren’t using bind variables, consider setting the cursor_sharing parameter to FORCE. If the application doesn’t use bind variables, your hands are tied—the fixes will be long in coming, and meanwhile, you have a slow-performing database on your hands. Go ahead and change the cursor_sharing parameter from its default setting under these circumstances. There are really no issues with setting the cursor_sharing parameter to a nondefault value, except minor drawbacks such as the nonsupport for star transformation, for example.

Tip Oracle recommends using the FORCE setting for the CURSOR_SHARING parameter in an OLTP environment.

Oracle recommends that, if possible, you should leave the cursor_sharing parameter at its default value of EXACT and use shareable SQL by employing bind variables in your code instead of literal values. If you do decide to change the default setting to FORCE because of pressure in the shared pool and latch contention, be aware that there are some performance implications in doing so. If you set the cursor_sharing parameter to FORCE, the database uses system-generated bind values, uses the same execution plan for each execution of a statement, and uses one parent cursor and one child cursor for each distinct SQL statement.

Implications of Setting CURSOR_SHARING to a Nondefault Value

The FORCE setting can help you get around the nonuse of bind variables in an application by letting the database generate bind values (system-generated bind values, as opposed to user-specified)). However, you should be aware of the differences in the behavior of the optimizer when you set the cursor_sharing parameter to FORCE as opposed to the EXACT setting. The key thing to understand here is that there’s a conflict between query performance and the space used in the shared pool by multiple executions of a query. Here is a summary of the performance implications of setting the cursor_sharing parameter to EXACT and FORCE. Let’s assume the following query, which contains a literal:

select * from employees where job = 'Clerk'

Note that if the query were to use bind variables instead of literals, it would be of the following form:

select * from employees where job=:b

EXACT: The database doesn’t replace any literals, and the optimizer sees the query as it’s presented to the optimizer. The optimizer generates a different plan for each execution of the statement, based on the literal values in the statement. The plan would thus be an optimal one, but each statement has its own parent cursor, and therefore a statement that’s executed numerous times can use a considerable amount of space in the shared pool. This could potentially lead to latch contention and a slowdown in performance.

FORCE: Regardless of whether there’s a histogram, the optimizer will replace the literal values with a bind value and optimize this query as if it were in the following form:

select * from employees where job=:b

The optimizer uses a single plan for each SQL statement, regardless of the literal values. Thus, the execution plan won’t be optimal, as the plan is generic and not based on the literal values. If a query uses literal values, the optimizer will use those values to find the most efficient execution plan. If there are no literals in the SQL statement, it’s hard for the optimizer to figure out the best execution plan. By “peeking” at the value of the bind variables, the optimizer can get a better idea of the selectivity of the where clause condition—it is almost as if literals had been used in the SQL statement. The optimizer peeks at the bind values during the hard-parse state. Since the execution plan is based on the specific value of the bind variable that the optimizer happened to peek at, the execution plan may not be optimal for all possible values of the bind variable.

In this example, the optimizer uses bind peeking based on the specific value of the JOB column it sees. In this case, the optimizer uses the value Clerk to estimate the cardinality for the query. When it executes the same statement (with a different value in the JOB column, say, Manager), the optimizer will use the same plan that it generated the first time (JOB=Clerk). Since there is only one parent cursor and just child cursors for the distinct statements, there’s less pressure on the shared pool. Note that a child cursor uses far less space in the shared pool than a parent cursor. Often, setting the cursor_sharing parameter to FORCE immediately resolves serious latch contention in the database, making this one of the few magic bullets that can help you quickly reduce latch contention.

The choice among the various settings of the cursor_sharing parameter really boils down to an assessment of what’s more critical to database performance: using the default EXACT setting does provide better query performance but leads to the generation of numerous parent cursors. If there’s a severe pressure in the shared pool and consequent latch contention, the entire database will perform poorly. Under these circumstances, you’re better off implementing a system-wide solution by setting the cursor_sharing parameter to FORCE because this guarantees that there’s only a single child cursor for each SQL statement. If you’re concerned about the impact of a single SQL statement, just drop the histogram on the relevant columns used in the SQL statement and set the cursor_sharing parameter to FORCE; this will ensure that the optimizer uses system-generated bind values for the column(s) and will ensure that the SQL statement uses much less space in the shared pool. As you’ll see in the next section, Oracle Database 11g’s adaptive cursor sharing offers an even better solution if you set the cursor_sharing parameter to FORCE and keep the histograms on the columns.

Oracle recommends as a best practice that you write shareable SQL and use the default setting OF EXACT for the cursor_sharing parameter. However, if you’re beset with numerous similar SQL statements, you may find that setting the CUSOR_SHARING parameter to FORCE significantly improves performance by improving cursor sharing, reducing memory usage, and reducing latch contention in the database. If you find that response time is poor because of a high number of misses in the library cache, you can consider changing the setting of the CURSOR_SHARING parameter to EXACT. As mentioned earlier, setting this value will make the optimizer act as if the SQL statement did contain a bind variable and it uses bind peeking to estimate cardinality. All this means that statements that differ just in the bind variables can and will share the same execution plan.

Don’t ignore the following drawbacks that result from setting the cursor_sharing parameter to EXACT:

The database performs more work during the soft parse to find a similar statement in the shared pool.
There is an increase in the maximum lengths of any selected expression that contains literals in a SELECT statement.
Star transformation is not supported.

13-15. Understanding Adaptive Cursor Sharing

Problem

Your database uses user-defined bind variables. You want to know whether there’s anything you can do to optimize database behavior so it doesn’t “blindly” use the same execution plan for all bind variable values.

Solution

In prior releases, Oracle used a single execution plan for each execution of a SQL statement, regardless of the values of the bind variables. In Oracle Database 11g, the database feature called adaptive cursor sharing enables a SQL statement with bind variables to use multiple execution plans, with each execution plan based on the values of the bind variables. Adaptive cursor sharing is enabled by default, and you can’t disable it.

How It Works

The adaptive cursor sharing feature is designed to improve the execution plans for SQL queries that contain bind variables. To understand how adaptive cursor sharing helps, it’s important to understand how Oracle’s bind peeking feature works. Bind peeking (introduced in Oracle 9i) lets the optimizer peek at the value of a bind variable when the database invokes the cursor for the first time. The optimizer uses the “peeked value” to determine the selectivity of the WHERE clause.

The problem with using user-defined bind variables is that the execution plan doesn’t have an accurate measure of the selectivity of the WHERE clause. Bind peeking helps improve matters by letting the optimizer act as if it were actually using a literal value instead of the bind variable, thus helping it generate better execution plans for SQL statements with bind variables. Bind peeking works well when the values of the column in the WHERE clause have a uniform distribution. If the column values are skewed, the plan the optimizer chooses by peeking at the value of the user-defined bind variable may not necessarily be good for all possible values of the bind variable. You thus end up with a situation where the execution plan will be very efficient if the SQL statement has the bind variable value that the optimizer has peeked at—and inefficient execution plans for all other possible values of the bind variable.

Let’s learn how adaptive cursor sharing works, with the help of an example that involves a column with skewed data.

Our test table, DEMO, has 78,681 rows. The data has three columns, which are all skewed. Thus, when we gathered statistics for this table, we created histograms on the three columns, as shown here:

SQL> select column_name,table_name,histogram from user_TAB_COLUMNS
     where table_name='DEMO';
 
COLUMN_NAME          TABLE_NAME       HISTOGRAM
--------------      -----------      ---------------
RNUM                 DEMO             HEIGHT BALANCED
RNAME                DEMO             HEIGHT BALANCED
STATUS               DEMO             FREQUENCY

Note that when the optimizer notices that there’s a histogram on a table, it marks the data in that column as skewed. The column STATUS has two values: Coarse and Fine. Only 157 rows have the value of Coarse, and 78,524 rows have the value Fine, making the data extremely skewed.

Let’s perform a sequence of operations to illustrate how adaptive cursor sharing works. Issue a query with the bind variable set to the value Coarse. Since very few rows in the DEMO table have this value, we expect the database to use an index range scan, which is exactly what the optimizer does. Here is our query and its execution:

SQL> var b varchar2(6)
SQL> exec :b:='Coarse';
 
PL/SQL procedure successfully completed.
 
SQL> select /*+ ACS */ count(*) from demo where status = :b;
  COUNT(*)
----------
       157
 
SQL> select * from table(dbms_xplan.display_cursor);
 
PLAN_TABLE_OUTPUT
------------------------------------------------------
SQL_ID  cxau3vvabpzd0, child number 0
-------------------------------------
select /*+ ACS */ count(*) from demo where status = :b
 
Plan hash value: 3478245284
--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |       |       |     1 (100)|          |
|   1 |  SORT AGGREGATE   |            |     1 |     6 |            |          |
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|*  2 |   INDEX RANGE SCAN| IDX01_DEMO |   157 |   942 |     1   (0)| 00:00:52 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("STATUS"=:B)
19 rows selected.

Next issue the following statement to check whether the database has marked the STATUS column as bind-sensitive or bind-aware or both:

SQL> select child_number, executions, buffer_gets, is_bind_sensitive as
  2  "BIND_SENSI", is_bind_aware as "BIND_AWARE", is_shareable as "BIND_SHARE"
  3  from v$SQL
  4* where sql_text like 'select /*+ ACS */%'
SQL> /
CHILD_NUMBER  EXECUTIONS  BUFFER_GETS  BIND_SENSI   BIND_AWARE  BIND_SHARE
------------  ----------  -----------  -----------  ----------  -----------
           0           1           43           Y            N            Y
 
SQL>

Note that the database marks the STATUS column as bind-sensitive, because there’s a histogram on the column STATUS. Each time you execute the query with a different value for the bind variable, the database compares the execution statistics with those from the prior execution. If the execution statistics differ significantly, it marks the column as bind-aware. One of the inputs the database uses in deciding whether to mark a statement as bind-aware is the number of rows processed. Once a cursor is marked bind-aware, the optimizer will choose an execution plan based on the value of the bind variable. Here, the IS_BIND_AWARE column is marked N because there are no prior execution statistics to compare. The BIND_SHAREABLE column is marked Y.

Issue the query again, with the bind variable set to the value Fine. Since almost all of the rows have the STATUS column set to the value Fine, we expect the optimizer to prefer a full table scan. However, the optimizer picks the same plan as before (INDEX RANGE SCAN). The reason for this is that the database is using the same execution plan from the first execution. Here’s an example:

SQL> exec :b := 'Fine';
 
PL/SQL procedure successfully completed.
 
SQL> select /*+ ACS */ count(*) from demo where status = :b;
 
  COUNT(*)
----------
     78524
 
SQL> select * from table(dbms_xplan.display_cursor);
 
PLAN_TABLE_OUTPUT
------------------------------------------------------
SQL_ID  cxau3vvabpzd0, child number 0
-------------------------------------
select /*+ ACS */ count(*) from demo where status = :b
 
Plan hash value: 3478245284
--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |       |       |     1 (100)|          |
|   1 |  SORT AGGREGATE   |            |     1 |     6 |            |          |
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|*  2 |   INDEX RANGE SCAN| IDX01_DEMO |   157 |   942 |     1   (0)| 00:00:52 |
 
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("STATUS"=:B)
 
19 rows selected.

Since the cursor for the SQL statement is marked bind-sensitive, the optimizer uses the same execution plan (INDEX RANGE SCAN) as before. Note in the following example that the BIND_AWARE column is still marked N. The optimizer is using the same cursor as before (child_number 0).

SQL> select child_number, executions, buffer_gets, is_bind_sensitive as
  2  "BIND_SENSI", is_bind_aware as "BIND_AWARE", is_shareable as "BIND_SHARE"
  3  from v$sql
  4  WHERE sql_text like 'select /*+ ACS */%';
 
CHILD_NUMBER  EXECUTIONS  BUFFER_GETS  BIND_SENSI   BIND_AWARE  BIND_SHARE
------------  ----------  -----------  -----------  ----------  ----------
           0           2          220           Y            N           Y
           
SQL>

Execute the query again, with the same value for the STATUS column as in the previous query ('Fine'). Voila! The optimizer now uses an INDEX FAST FULL SCAN, instead of the INDEX RANGE SCAN. The change in execution plans is automatic; it is as if the optimizer is learning as it goes along and modifies the plan when it’s certain that the new plan is more efficient. Here is the execution and the new plan:

SQL>  exec :b := 'Fine';
PL/SQL procedure successfully completed.
 
SQL> select /*+ ACS */ count(*) from demo where status = :b;
  COUNT(*)
----------
     78524
 
SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
------------------------------------------------------
SQL_ID  cxau3vvabpzd0, child number 1
-------------------------------------
select /*+ ACS */ count(*) from demo where status = :b
 
Plan hash value: 2683512795
--------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time   |
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |       |       |    45 (100)|
   |
|   1 |  SORT AGGREGATE       |            |     1 |     6 |            |
   |
|*  2 |   INDEX FAST FULL SCAN| IDX01_DEMO | 78524 |   460K|    45   (0)| 00:38:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
37 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("STATUS"=:B)
19 rows selected.

Note that the BIND_AWARE column now shows the value Y. When we execute the query with the same bind variable value (Fine) the second time, since the query is marked as bind-sensitive, the database evaluates the execution statistics from the previous execution. Since the statistics are different, it marks the cursor as bind-aware. The optimizer then decides a new plan is more optimal and thus performs a hard parse and generates a new execution plan that uses an INDEX FAST FULL SCAN instead of an INDEX RANGE SCAN. The following query shows details about the child cursors as well as whether the query is bind-sensitive or bind-aware:

SQL> select child_number, executions, buffer_gets, is_bind_sensitive as
  2  "BIND_SENSI", is_bind_aware as "BIND_AWARE", is_shareable as "BIND_SHARE"
  3  from v$sql
  4  WHERE sql_text like 'select /*+ ACS */%';
 
CHILD_NUMBER  EXECUTIONS  BUFFER_GETS  BIND_SENSI  BIND_AWARE  BIND_SHARE
------------  ----------  -----------  ----------  ----------  ----------
           0           2          220           Y           N           Y
           1           1          184           Y           Y           Y
 
SQL>

Note that the IS_BIND_AWARE column shows the value Y now. Notice also that there is a new child cursor (child_number 1) that represents the new execution plan containing the INDEX FAST FULL SCAN—this new cursor is marked bind-aware.

We execute the query again, but this time with the original bind variable value Coarse. The optimizer will choose the correct execution plan by performing an INDEX RANGE SCAN. Here’s the information about the child cursors, as well as whether the query is bind-sensitive or bind-aware:

SQL> select child_number, executions, buffer_gets, is_bind_sensitive as
  2  "BIND_SENSI", is_bind_aware as "BIND_AWARE", is_shareable as "BIND_SHARE"
  3  from v$sql
  4  where sql_text like 'select /*+ ACS */%';
 
CHILD_NUMBER  EXECUTIONS  BUFFER_GETS  BIND_SENSI   BIND_AWARE  BIND_SHARE
------------  ----------  -----------  ----------   ----------  ----------
           0           2          220           Y            N           N
           1           1          184           Y            Y           Y
           2           1            2           Y            Y           Y
 
 
SQL>

The database creates a new child cursor (child_number=2) for this query and marks the original cursor (child_cursor=0) as not being bind-aware. Eventually the database will remove this cursor from the shared pool.

In our example, we used only two values for the bind variable in our tests. What happens if there are dozens of different bind variable values? Oracle doesn’t always perform a hard parse for each distinct bind variable value. Initially it performs a hard parse for some values of the bind variable, during which it determines the relationships between various bind variables and the associated execution plan. After the initial mapping of the bind variable values and the associated execution plans, Oracle is smart enough to simply pick the optimal child cursor from the cache without performing a hard parse for other bind values.

Adaptive cursor sharing is a new feature introduced in the Oracle Database 11g release. In earlier releases, DBAs often flushed the shared pool (and worse, sometimes restarted the database) when confronted with situations where the database apparently started using an inappropriate execution plan for a SQL statement because of the bind peeking effect. In the 11g release, you don’t have to do anything—the optimizer automatically changes execution plans when it encounters skewed data. With adaptive cursor sharing, the database uses multiple execution plans for a statement that uses bind variables, ensuring that the best execution plan is always used, depending on the value of the bind variable. Adaptive cursor sharing means that when different bind variable values indicate different amounts of data to be handled by the query, Oracle adapts its behavior by using different execution plans for the query instead of sticking to the same plan for all bind values. Since adaptive cursor sharing works only where literal values are replaced with binds, Oracle encourages you to use the FORCE setting for the cursor_sharing parameter. If you set the parameter to SIMILAR and you have a histogram on a column, the optimizer doesn’t perform a literal replacement with bind variables, and thus adaptive cursor sharing won’t take place. You must set the cursor_sharing parameter to FORCE for adaptive cursor sharing to work, thus letting the optimizer select the optimal execution plan for different values of the bind variable.

13-16. Creating Statistics on Expressions

Problem

You want to create statistics on an expression such as a user-created function.

Solution

Execute the GATHER_TABLE_STATS procedure of the DBMS_STATS package in the following way to gather statistics on an expression. In this example, we’re gathering statistics for the lower function, which transforms the cust_state_province column.

SQL> execute dbms_stats.gather_table_stats('sh','customers',-
   > method_opt =>'for all columns size skewonly -
   > for columns(lower(cust_state_province)) size skewonly'),
 
PL/SQL procedure successfully completed.
 
SQL>

Alternatively, you can collect expression statistics by invoking the create_extended_stats function. Here’s an example:

SQL> select
     dbms_stats.create_extended_stats(null,'customers',
     '(lower(cust_state_province))')
     from dual;

Note that (lower (cust_state_province)) is called an extension because collecting statistics on functions is a type of Oracle extended statistic. Any statistics you collect for expressions and column groups (see Recipe 13-17) are called extended statistics.

How It Works

The optimizer knows the selectivity of a table’s column and uses the selectivity estimates for creating optimal execution plans. However, applying a function to a column in the WHERE clause of a query throws off the optimizer because it can’t estimate the selectivity of the underlying column. Here’s an example of a function that makes the optimizer’s job harder:

SQL> select count(*) from customers
     where lower(cust_state_province) = 'CA';

Expression statistics on functions enable the optimizer to obtain a vastly more accurate selectivity value for predicates that involve expressions.

You can issue the following query to find details about expression statistics on a table’s columns:

SQL> select extension_name, extension
     from user_stat_extensions
     where table_name='CUSTOMERS';
 
EXTENSION_NAME                                  EXTENSION
------------------------------                 ------------------------------
SYS_STUBPHJSBRKOIK9O2YV3W8HOUE                 (LOWER("CUST_STATE_PROVINCE"))
SQL>

You can delete expression statistics you’ve collected on a table by using the drop_extended_stats function:

SQL> exec dbms_stats.drop_extended_stats(null,'customers','(lower(cust_state_pro
vince))'),
 
PL/SQL procedure successfully completed.
 
SQL>

Note that extended statistics include both statistics on expressions such as a function and statistics gathered for a column group that consists of two or more related columns. Recipe 13-17 shows how to collect statistics on column groups.

13-17. Creating Statistics for Related Columns

Problem

You’re aware that certain columns from a table that are part of a join condition are correlated. You want to make the optimizer aware of this relationship.

Solution

To generate statistics for two or more related columns, you must first create a column group and then collect fresh statistics for the table so the optimizer can use the newly generated “extended statistics.” Use the DBMS_STATS.CREATE_EXTENDED_STATS function to define a column group that consists of two or more columns from a table. Here’s how you execute this function to create a column group that consists of the COUNTRY_ID and CUST_STATE_PROVINCE columns in the table SH.CUSTOMERS:

SQL> select dbms_stats.create_extended_stats(null,'CUSTOMERS',     '(country_id,cust_state_province)') from dual;
 
DBMS_STATS.CREATE_EXTENDED_STATS(NULL,'CUSTOMERS','(COUNTRY_ID,CUST_STATE_PROVI
-------------------------------------------------------------------------------
 
SYS_STUJGVLRVH5USVDU$XNV4_IR#4
SQL>

Once you create the column group, gather fresh statistics for the CUSTOMERS table to generate statistics for the new column group.

SQL> exec dbms_stats.gather_table_stats(null,'customers'),
 
PL/SQL procedure successfully completed.
 
SQL>

How It Works

Often, values in one column of a table influence the values of another column in that table because of natural relationships that exist between the data stored in the two columns. For example, the values of the CUST_STATE_PROVINCE column in the SH.CUSTOMERS table are influenced by the values of the COUNTRY_ID column. You can find a CUST_STATE_PROVINCE value of Florida only in the United States. The optimizer doesn’t know about real-life relationships and thus tends to produce wrong estimates of the all-important cardinality statistic when multiple related columns appear in the WHERE clause of a query or in a group_by key. Column group statistics help the optimizer capture the correlation among a table’s columns. If a query includes the predicates CUST_STATE_PROVINCE ='Florida' and COUNTRY_ID='U.S.', Oracle can derive a better estimate of the combined selectivity of these two predicates by looking up the statistics for the column group instead of using separate statistics for the two columns.

The statistics the database gathers on the column groups that you create are called extended statistics. These statistics provide much more accurate cardinality estimates to the optimizer, which helps the optimizer produce more efficient execution plans. When you create extended statistics, Oracle maintains a subset of statistics for the column groups you create, including the number of distinct values, nulls, and histograms for the group. Even if a query contains columns in addition to the columns that are part of a column group, the optimizer takes advantage of the extended stats that are available to it. For example, suppose you’ve created a column group as shown in this recipe using the CUST_STATE_PROVINCE and COUNTRY_ID columns. If the WHERE clause for a query includes these two columns as well as the CUST_CITY column, Oracle will still take advantage of the extended statistics on the CUST_STATE_PROVINCE and COUNTRY_ID columns.

13-18. Automatically Creating Column Groups

Problem

You know that creating extended statistics by generating statistics on correlated table columns helps generate better execution plans. You want to find out how to select candidate column groups for creating extended statistics.

Solution

In Oracle Database 11.2.0.2 and newer releases, you can use the Auto Column Group Creation feature to let the database tell you which column groups you must create. You can use this feature for creating column groups but not for collecting extended statistics for columns with expressions (see Recipe 13-16).

To use the Auto Column Group Creation capability and let the database provide advice on which column groups to create in the database, you must let the database monitor the workload in the database. Begin by executing the DBMS_STATS.SEED_COL_USAGE procedure to determine the appropriate column groups that you must create:

SQl> begin
     dbms_stats.seed_col_usage(null,null,900);
     end;
     /

By executing this procedure, you’re telling the database to monitor the workload for 15 minutes (900 seconds) to determine whether you need to create any column groups. The procedure captures the column usage information and stores it in the sys.col_group_usage$ view.

Next run some queries to create the workload. If the queries are long-running, you can just run explain plan statements for the queries so the database can capture the column group information. Once the monitoring period (15 minutes) ends, review the captured column usage information by using the following query:

SQL> select dbms_stats.report_col_usage(user,'customers') from dual;

The REPORT_COL_USAGE procedure shows a column usage report for the CUSTOMERS table, based on the queries you’ve executed and the explain plans that you ran. The column usage shows how the database used each column of the CUSTOMERS table and lists column usage in the following format:

Equality predicates (EQ): If a column was used in an equality predicate such as in the clause where COUNTRY_ID='US', that column was used independently. No extension statistics are called for in this case.
FILTER: If a set of columns was used in a SELECT statement that contained one or more of those columns in a GROUP BY clause, all columns in the SELECT statement are recorded as a column group filter.
GROUP_BY: Shows all the columns used together in a GROUP_BY clause.

Once you view the column usage report, you can let Oracle automatically create the column groups for the columns used in the filter predicates and the columns used in the GROUP_BY clause. Do that by executing the following procedure:

SQL>select dbms_stats.create_extended_stats(user,'customers') from dual;

Alternatively, you can create column groups only for columns that you specify by issuing the following command:

SQL> select dbms_stats.create_extended_stats(null,'CUSTOMERS', '(cust_city,cust_state_province,country_id)') from dual
SQL> /
DBMS_STATS.CREATE_EXTENDED_STATS(NULL,'CUSTOMERS','(CUST_CITY,CUST_STATE_PROVIN
-------------------------------------------------------------------------------
SYS_STUMZ$C3AIHLPBROI#SKA58H_N
SQL>

At this point, you’ve created the column group, but there are no statistics on the column group. Regather statistics for the CUSTOMERS table to generate statistics for the new column group. Here’s an example:

SQL> exec dbms_stats.gather_table_stats(user,'customers')
PL/SQL procedure successfully completed.
SQL>

How It Works

Letting Oracle point out potential column groups based on the actual column usage during a workload is far more efficient than trying to figure out the appropriate column groups for each table. Once you run the workload, you can view the column usage report and ask the database to create all proposed column groups for an entire schema at the same time, by executing the dbms_stats.create_extended_stats function and passing the value NULL for the table_name parameter.

13-19. Maintaining Statistics on Partitioned Tables

Problem

You load data into one or more partitions frequently, and the maintenance of global statistics is a problem. You want to collect new global statistics without having to go through a time- and resource-consuming process.

Solution

You can use the incremental statistics maintenance feature in the Oracle 11g release to maintain global statistics after each load into a new partition. For example, if you want to maintain global statistics for the SH.SALES table, here are the steps to follow:

Turn on incremental statistics collection for the SH.SALES table:

SQL> exec dbms_stats.set_table_prefs('SH','SALES','INCREMENTAL','TRUE'),
 
PL/SQL procedure successfully completed.
 
SQL>

After each load into a partition, gather global table-level statistics as shown here:

SQL> exec dbms_stats.gather_table_stats('SH','SALES'),
 
PL/SQL procedure successfully completed.
 
SQL>

To set the incremental statistics collection feature for partitioned tables, you must specify the AUTO_SAMPLE_SIZE value for the ESTIMATE_PERCENT parameter and the AUTO value for the GRANULARITY parameter.

How It Works

The incremental statistics collection feature is disabled by default. You enable the feature by setting the INCREMENTAL preference. In our example, we showed how to set the INCREMENTAL preference at the table level, but you can also set it at the schema or database level.

When dealing with a partitioned table, the optimizer uses both global statistics (statistics for the entire table) and statistics for the individual partitions to select the optimal execution plan. By default, following a change in a partition’s data, the database uses a two-pass scanning technique to maintain accurate table statistics. Under this two-pass technique, the database will do the following:

Scan the entire table to gather the global statistics during the first pass
Scan the changed partitions in the second pass to gather the partition statistics

When you load data into (or delete data from) a partition as part of a nightly batch job, for example, the database will scan the partition to gather the partition-level statistics. In addition, it scans the entire table to gather the table-level global statistics. The database scans not only the changed partitions but also all the other partitions in the table. As you can tell, this full scan of the table each time a partition’s data changes is an expensive process, especially when dealing with large tables.

Once you turn on the incremental statistics collection feature, Oracle uses a far more efficient technique to maintain a partitioned table’s statistics. When a partition’s data changes, the database gathers just the statistics for that partition and derives the global table statistics without scanning any of the other partitions. How does the database maintain the global statistics without scanning the entire table? Oracle can derive some global statistics from partition-level statistics; for example, it derives the total number of rows by just adding up the rows in each partition. For deriving the number of distinct values (NDVs), Oracle makes use of a structure called a synopsis, which is something like a sample of the NDVs in a column. Oracle derives the global NDV by merging all partition synopses. In summary, when you implement incremental statistics collection, Oracle skips the default full table scan to gather the table’s statistics and instead does the following:

Gathers statistics for the partition you loaded and creates synopses for that partition
Creates a global synopsis by merging all the partition-level synopses
Computes the global statistics from the partition-level statistics and the global synopses

Incremental statistics collection is extremely efficient and something you must consider using when dealing with large partitioned tables, especially when you’re loading data into one or more empty partitions frequently, as is the case in many data warehouses.

13-20. Concurrent Statistics Collection for Large Tables

Problem

You want to minimize the amount of time it takes to gather statistics by taking advantage of your multiprocessor environment.

Solution

You can specify the concurrent statistics gathering mode to gather statistics on multiple tables and multiple partitions (and subpartitions) within a table concurrently. By doing this, you can take advantage of your multiprocessor environment and complete the statistics collection process much faster.

By default, concurrent statistics gathering is disabled. You enable it by executing the SET_GLOBAL_PREFS procedure. Follow these steps to enable concurrent statistics gathering:

Set the job_queue_processes parameter to at least 4.
```
SQL>alter system set job_queue_processes=4;
```
If you don’t plan on using parallel execution for gathering statistics (see the following section) but want to fully utilize your system resources, you must set the job_queue_processes parameter to two times the number of CPU cores on the server.

Enable concurrent statistics gathering.

SQL> begin
     dbms_stats.set_global_prefs('CONCURRENT','TRUE'),
     end;
     /

Make sure the user executing this command has the CREATE JOB, MANAGE SCHEDULER, and MANAGE ANY QUEUE privileges.

How It Works

The goal of concurrent statistics gathering is to reduce the statistics gathering time for large tables and partitions. When you enable concurrent statistics gathering, Oracle uses the job scheduler and advanced queuing capabilities of the database to create multiple concurrent statistics gathering jobs. The job_queue_processes parameter determines the maximum number of concurrent statistics gathering jobs. In a RAC environment, you must set this parameter on each node. Concurrent statistics gathering works somewhat differently depending on the level of statistics gathering (table level or not), as explained here.

If you execute the DBMS_STATS.GATHER_TABLE_STATS procedure to collect statistics on a partitioned table, Oracle will create a separate statistics collection job for each partition (and subpartition) in the table. The scheduler determines how many jobs to run concurrently, and how many jobs it must queue, based on the system capacity.

Note The value of the job_queue_processes parameter determines the maximum number of concurrent statistics collection jobs.

If you execute the GATHER_DATABASE_STATS, GATHER_SCHEMA_STATS, or GATHER_DICTIONARY_STATS procedure, Oracle creates a separate statistics collection job for each table and each partition in a partitioned table. To prevent potential deadlocking issues, Oracle won’t process multiple partitioned tables concurrently. Oracle creates a coordinator job for each partitioned table to manage the partition statistics collection jobs. Each job either is a coordinator for a table’s partition-level jobs (if the table is partitioned) or is an actual statistics gathering job. If you have multiple partitioned tables, the database queues all partitioned tables except one; as it finishes gathering statistics for each partitioned table, it dequeues and starts another job for a partitioned table. This queuing behavior doesn’t apply to nonpartitioned tables.

Using a Parallel Execution Strategy

If you’re gathering statistics for very large tables, you can enable parallel execution of the individual statistics gathering jobs. To do this, you must disable the parallel_adaptive_multi_user initialization parameter as shown here:

SQL> alter system set parallel_adaptive_multi_user=false;

Although not necessary, Oracle also recommends that you enable parallel statement queuing by activating the resource manager, creating a temporary resource plan, and enabling queuing for the consumer group OTHER _GROUPS. Here’s a simple example that shows how to create a temporary resource plan and enable the resource manager:

begin
  dbms_resource_manager.create_pending_area();
  dbms_resource_manager.create_plan('parallel_test', 'parallel_test'),
  dbms_resource_manager.create_plan_directive(
        'parallel_test',
        'OTHER_GROUPS',
        'OTHER_GROUPS directive for parallel test',
        parallel_target_percentage => 90);
  dbms_resource_manager.submit_pending_area();
end;
/
ALTER SYSTEM SET RESOURCE_MANAGER_PLAN = 'parallel_test' SID='*';

Monitoring Concurrent Stats Collection Jobs

Use the DBA_SCHEDULER_JOBS view to monitor the concurrent statistics gathering jobs. You can view all the concurrent statistics collection jobs in your database by issuing the following statement:

SQL> select job_name,state,comments
     from dba_scheduler_jobs
     where job_class like 'CONC%';

If you want to limit the output to currently executing jobs, add the line and state='RUNNING' to the previous query. Similarly, you can add the line and state='SCHEDULED' to view only the scheduled jobs that are waiting to run. You can check the elapsed time for the currently executing statistics gathering jobs by issuing the following query:

SQL> select job_name,elapsed_time
     from dba_scheduler_running_jobs
     where job_name like 'ST$%';

13-21. Determining When Statistics Are Stale

Problem

You want to identify which statistics are stale statistics.

Solution

You can query the DBA_TAB_STATISTICS and DBA_IND_STATISTICS views to determine whether statistics are stale for any object. The STALE_STATS view in these two views tells you whether the statistics are stale.

The STALE_STATS view can take one of the following values:

YES: The statistics are stale.
NO: The statistics aren’t stale.
NULL: No statistics were collected for the object.

For the stale/fresh information to be recorded in these two view, table monitoring must be enabled for the object. However, you don’t have to set this explicitly if you’ve already set the STATISTICS_LEVEL initialization parameter to TYPICAL or ALL.

How It Works

The database uses its table monitoring facility to determine whether a database object’s statistics are fresh or stale. Monitoring tracks the DML operations on a table.

If you want to ensure that the two views DBA_TAB_STATISTICS and DBA_IND_STATISTICS have the most up-to-date information, you can execute the following first:

SQL> BEGIN
     DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
     END;
/

Once you flush the database monitoring information as shown here, you can query the STALE_STATS column in the views DBA_TAB_STATISTICS and DBA_IND_STATISTICS to determine whether the statistics are stale.

Here’s an example that shows how to query the DBA_TAB_STASTISTICS view to determine whether the statistics for the SH.SALES table are stale:

SQL> SELECT PARTITION_NAME, STALE_STATS
     FROM   DBA_TAB_STATISTICS
     WHERE  TABLE_NAME = 'SALES'
     AND    OWNER = 'SH'
     ORDER BY PARTITION_NAME;
 hasan
PARTITION_NAME  STALE_STATS
--------------- -----------
SALES_2010      NO
SALES_2011      NO
SALES_H1_2012   NO
SALES_H2_2012   NO
SALES_Q1_2013   NO
SALES_Q1_2014   NO
.
.
SQL>

13-22. Previewing Statistics Gathering Targets

Problem

You want to preview the objects for which the database will gather statistics when you execute a specific statistics gathering function.

Solution

Execute the DBMS_STATS statistics gathering functions in the reporting mode to get a report of the objects that the statistics gathering function will process. The optimizer doesn’t actually gather any statistics when you do this.

Here are the DBMS_STATS.REPORT_GATHER__*_STATS functions that you can use to get a report on statistics collection by the corresponding statistics gathering function:

REPORT_GATHER_TABLE_STATS: Runs the GATHER_TABLE_STATS function in reporting mode
REPORT_GATHER_SCHEMA_STATS: Runs the GATHER_SCHEMA_STATS function in reporting mode
REPORT_GATHER_DICTIONARY_STATS: Runs the GATHER_DICTIONARY_STATS function in reporting mode
REPORT_GATHER_DATABASE_STATS: Runs the GATHER_DATABASE_STATS function in reporting mode
REPORT_GATHER_FIXED_OBJ_STATS: Runs the GATHER_FIXED_OBJ_STATS function in reporting mode
REPORT_GATHER_AUTO_STATS: Runs the automatic statistics gathering job in reporting mode

Here’s an example that shows how to execute the DBMS_STATS.REPORT_GATHER_SCHEMA_STATS function:

VARIABLE my_report CLOB;
BEGIN
  :my_report :=DBMS_STATS.REPORT_GATHER_SCHEMA_STATS(
    ownname      => 'OE'       ,
    detail_level => 'TYPICAL'  ,
    format       => 'HTML'     );
END;
/

You can also use the DBMS_STATS functions to get a report on statistics gathering operations that occurred in the past, during a specific period of time. Here are the functions you can use to get a report on operations that occurred between two points in time:

REPORT_STATS_OPERATIONS: Generates a report of all statistics gathering operations performed between two points in time
REPORT_SINGLE_STATS_OPERATIONS: Generates a report of a specific operation

The following example shows how to get a report on all statistics gathering operations in the past day:

VARIABLE my_report CLOB;
BEGIN
  :my_report := DBMS_STATS.REPORT_STATS_OPERATIONS (
     since        => SYSDATE-1
,    until        => SYSDATE
,    detail_level => 'TYPICAL'
,    format       => 'HTML'
);
END;
/

The following example shows how to generate a report for an individual operation:

BEGIN
  :my_report :=DBMS_STATS.REPORT_SINGLE_STATS_OPERATION (
     OPID    => 999
,    FORMAT  => 'HTML'
);
END;

How It Works

Often, when you’re in the process of collecting optimizer statistics, you want to know exactly for which objects you’ll be collecting statistics for. The new reporting functions provide you with a convenient way to know ahead of time what a particular statistics gathering option will do. Using the DBMS_STATS functions, you can get useful reports on past statistics gathering operations in your database. These reports are definitely superior to reports you can create without using the DBMS_STATS reporting functions.

You can use the statistics gathering reporting functions in a multitenent environment for specific PDBs by providing a set of pluggable database (PDB) IDs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 13: Configuring the Optimizer

Create new playlist

Sign In

Sign Up

Table of Contents for
CHAPTER 13: Configuring the Optimizer