Implementing Query Hints
Placing hints in SQL is a common and simple approach to improve performance. Hints influence Oracle’s optimizer to take a specific path to accomplish a given task, overriding the default path the optimizer may have chosen. Hints can also be viewed as a double-edged sword. If not implemented and maintained properly, they can hurt performance in the long run.
The initial goal of any application should be to implement SQL without hints because over time an application and its data can change, the database can be upgraded, hardware can change, and other factors can affect performance. Hints help an application at a specific point in time, but over time the usefulness of hints is unpredictable. All that said, they can be a useful tool if used in the right way to gain performance benefits for an application.
The most popular reason to use hints is simply to get data out of the database faster, and many of the available hints are geared for that purpose. The Oracle 12c version of the database supports more than 80 documented hints plus many undocumented hints. Hints can be placed in SQL for a multitude of reasons. The purpose of this chapter is to categorize these hints into subsets and then to show specific examples of some of the popular and most performance-impacting hints.
Some of the reasons to place hints in SQL are to change the access path to the database, change the join order or join type for queries that are doing joins, enhance performance for DML, and enhance performance for data warehouse–specific operations, to name a few. In addition, there are a few new hints to take advantage of some of the new features of Oracle 12c.
14-1. Writing a Hint
Solution
Place your hint into the statement using the /*+ hint */ syntax. Here’s an example:
SELECT /*+ full(emp) */ * FROM emp;
Be sure to leave a space following the plus sign. The /*+ sequence is exactly three characters long, with no spaces. Generally, you want to place your hint immediately following the SQL verb beginning the statement. While it is not required to place this sequence of characters after the SQL verb, it is customary to do this.
How It Works
Hints are delimited by special characters placed within your SQL statement. Each hint starts with a forward slash, followed by the star and plus characters. They end with a star and forward slash:
SELECT /*+ full(emp) */ * FROM emp;
Table 14-1 breaks down many of the most popular hints into specific categories. This table is meant to make it easier to zero in on a hint based on your specific need, so keep in mind that some of these hints actually can fit into multiple categories. Another thing to remember about hints is that for many of them, you can enable a particular feature or aspect, and you can disable that same feature or aspect. For example, there is an INDEX hint to enable the use of an index. There is also a NO_INDEX hint, which disables the use of an index. This is true for many of the available hints within the Oracle database.
Table 14-1. Hints by Category
Access path (table) |
FULL |
Access path (index) |
INDEX / NO_INDEX |
Join order |
ORDERED |
Join method |
USE_HASH |
Data warehousing |
STAR_TRANSFORMATION |
Optimizer hints |
FIRST_ROWS |
Parallelism |
PARALLEL / NO_PARALLEL |
DML-related hints |
APPEND |
Miscellaneous hints |
CACHE |
For a complete listing of hints, refer to the Oracle Database Performance Tuning Guide for your version of the database.
14-2. Changing the Access Path
Problem
You have a query that you have determined is not taking the access path you desire.
Solution
You can change the access path of your SQL statement by placing an access path hint in your query. The two most common access path hints to place in a query tell the Oracle optimizer to do a full table scan or use an index. Often, the optimizer does a good job of choosing the best or at least a reasonable path to the data needed for a query. Sometimes, though, because of the specific makeup of data in a table, the statistics for the objects, or the specific configuration of a given database, the optimizer doesn’t necessarily make the best choice. In these cases, you can influence the optimizer by placing a hint in your query.
By the time you decide to place a hint in a query, you should already know that the optimizer isn’t making the choice you want. Let’s say you want to place a hint in your query to tell the optimizer to modify the access path to either perform a full table scan or change how the optimizer will access the data from the table. Full table scans are appropriate if your query will be returning a large number of rows. For example, if you want to perform a full table scan on your table, your hint will appear as follows:
SELECT /*+ full(emp) */ empno, ename
FROM emp
WHERE DEPTNO = 20;
The foregoing hint instructs the optimizer to bypass the use of any possible indexes on the EMP table and simply scan the entire table in order to retrieve the data for the query.
Conversely, let’s say you are retrieving a much smaller subset of data from the EMP table, and you want to get the average salary for those employees in department 20. You can tell the optimizer to use an index on a given table in the query:
SELECT /*+ index(emp emp_i2) */ avg(sal)
FROM emp
WHERE deptno = 20;
Tip Hints with incorrect or improper syntax are ignored by the optimizer. Hints with index names that have been dropped or renamed will also be ignored or produce unpredictable execution plans.
How It Works
Access path hints, like many hints, are placed in your query because you already know what access path the optimizer is going to take for your query, and you believe it will be more efficient using the method you specify with the hint. It is important that before you use a hint, you validate that you are not getting the access path you desire or think you should be getting. You can also gauge the potential performance gain by analyzing the optimizer’s cost of the query with and without the hint.
For example, you want to compare your salary to other jobs in your company, so you write the following query to get, by job title, the minimum, average, and maximum salaries:
SELECT job, min(sal), avg(sal), max(sal)
FROM emp
WHERE deptno=20
GROUP BY job;
-----------------------------------------------
| Id | Operation | Name |
-----------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH GROUP BY | |
| 2 | TABLE ACCESS BY INDEX ROWID| EMP |
| 3 | INDEX RANGE SCAN | EMP_I2 |
-----------------------------------------------
If you want to bypass the use of the index in the query, placing the FULL hint in the query will instruct the optimizer to bypass the use of the index:
SELECT /*+ full(emp) */ job, min(sal), avg(sal), max(sal)
FROM emp
WHERE deptno=20
GROUP BY job;
-----------------------------------
| Id | Operation | Name |
-----------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH GROUP BY | |
| 2 | TABLE ACCESS FULL| EMP |
-----------------------------------
Another way you can tell the optimizer to bypass the use of an index is by telling the optimizer to not use indexes to retrieve the data for a given query. In this particular case, it has the same effect as the FULL hint:
SELECT /*+ no_index(emp) */ job, min(sal), avg(sal), max(sal)
FROM emp
WHERE deptno=20
GROUP BY job;
You can also explicitly state the name of the index you want to bypass:
SELECT /*+ no_index(emp emp_i2) */ job, min(sal), avg(sal), max(sal)
FROM emp
WHERE deptno=20
GROUP BY job;
In both of the foregoing cases, the result is a full table scan. In a different case, you may have a query that could possibly use different indexes. For instance, on our EMP table, we have an index on the DEPTNO column, and we also have an index on the HIREDATE column. If we wanted to execute a query to get the employees who started in 1980 for department 20, our query would look like this:
SELECT empno, ename
FROM emp
WHERE DEPTNO = 20
AND hiredate
BETWEEN to_date('1980-01-01','yyyy-mm-dd')
AND to_date('1980-12-31','yyyy-mm-dd'),
----------------------------------------------
| Id | Operation | Name |
----------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| EMP |
| 2 | INDEX RANGE SCAN | EMP_I1 |
----------------------------------------------
In this case, the optimizer chose the EMP_I1 index, which is the index on the HIREDATE column. We can instruct the optimizer to bypass the use of that index:
SELECT /*+ no_index(emp emp_i1) */ job, min(sal), avg(sal), max(sal)
FROM emp
WHERE deptno=20
GROUP BY job;
In this case, we don’t necessarily know what the optimizer is going to do next. It may decide to use our other index on the DEPTNO column, or it could choose to perform a full table scan. When using index hints, it is good practice to be as specific as possible when instructing the optimizer what to do. Therefore, if we place an index hint to tell the optimizer to use the index on the DEPTNO column, we can see that the optimizer now uses that index:
SELECT /*+ index(emp emp_i2) */ empno, ename
FROM emp
WHERE DEPTNO = 20
AND hiredate
BETWEEN to_date('1980-01-01','yyyy-mm-dd')
AND to_date('1980-12-31','yyyy-mm-dd'),
----------------------------------------------
| Id | Operation | Name |
----------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| EMP |
| 2 | INDEX RANGE SCAN | EMP_I2 |
----------------------------------------------
Other examples of index hints are the INDEX_FFS hint for an index fast full scan and the INDEX SS hint for an index skip scan. The INDEX_SS hint is appropriate if you have a table with composite, multicolumn indexes. It is possible to have Oracle use the index, even if the query does not use the leading column of the index. At times, the INDEX_SS hint can be beneficial to retrieve data fast, even if the column noted in the WHERE clause isn’t the leading column of an index. For example, if we want to get the names of all employees who received a commission, our query would look like this:
SELECT ename, comm FROM emp
WHERE comm > 0;
----------------------------------
| Id | Operation | Name |
----------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| EMP |
----------------------------------
As noted in the explain plan, no index is used. We happen to know there is a composite index on the SAL and COMM columns of our EMP table. We can add a hint to use this index to gain the benefit of having an index on the COMM column, even though it is not the leading column of the index:
SELECT /*+ index_ss(emp emp_i3) */ ename, comm FROM emp
WHERE comm > 0;
----------------------------------------------
| Id | Operation | Name |
----------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| EMP |
| 2 | INDEX SKIP SCAN | EMP_I3 |
----------------------------------------------
Note Historically within Oracle documentation, hints are stated as instructions to the optimizer. In the Oracle 12c Performance Tuning Guide, the documentation implies that hints may be ignored. The guide states that hints only influence the optimizer. If you are not seeing the behavior you expect from a hint, contact Oracle support for direction.
Problem
You have a performance issue with a query where you are joining multiple tables, and the Oracle optimizer is not choosing the join order you desire.
Solution
There are two hints—the ORDERED hint and the LEADING hint—that can be used to influence the join order used within a query.
Using the ORDERED Hint
You are running a query to join two tables, EMP and DEPT, because you want to get the department names for each employee. By placing an ORDERED hint into the query, you can see how the hint alters the execution access path. Here’s an example:
SELECT ename, dname
FROM emp JOIN dept USING(deptno);
------------------------------------------------
| Id | Operation | Name |
------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | MERGE JOIN | |
| 2 | TABLE ACCESS BY INDEX ROWID| DEPT |
| 3 | INDEX FULL SCAN | PK_DEPT |
| 4 | SORT JOIN | |
| 5 | TABLE ACCESS FULL | EMP |
------------------------------------------------
SELECT /*+ ordered */ ename, dname
FROM emp JOIN dept USING(deptno);
-----------------------------------
| Id | Operation | Name |
-----------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN | |
| 2 | TABLE ACCESS FULL| EMP |
| 3 | TABLE ACCESS FULL| DEPT |
-----------------------------------
Using the LEADING Hint
As with the example using the ORDERED hint, you have the same control to specify the join order of the query. The difference with the LEADING hint is that you specify the join order from within the hint itself, while with the ORDERED hint, it is specified in the FROM clause of the query. Here’s an example:
SELECT /*+ leading(emp dept) */ ename, dname
FROM emp JOIN dept USING(deptno);
-----------------------------------
| Id | Operation | Name |
-----------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN | |
| 2 | TABLE ACCESS FULL| EMP |
| 3 | TABLE ACCESS FULL| DEPT |
-----------------------------------
From the foregoing query, we can see that the table order specified in the FROM clause is irrelevant because the order specified in the LEADING hint specifies the join order for the query.
How It Works
The main purpose of specifying either of these hints is for multitable joins where the optimal join order is known. This is usually known from past experience with a given query, based on the makeup of the data and the tables. In these cases, specifying either of these hints will save the optimizer the time of having to process all of the possible join orders in determining the optimal join order. This can improve query performance, especially as the number of tables to join within a query increases.
When using either of these hints, you instruct the optimizer about the join order of the tables. Because of this, it is critically important that you know that the hint will improve the query’s performance. Oracle recommends, where possible, to use the LEADING hint over the ORDERED hint because the LEADING hint has more versatility built in. When specifying the ORDERED hint, you specify the join order from the list of tables in the FROM clause, while with the LEADING hint, you specify the join order within the hint.
14-4. Changing the Join Method
Problem
You have a query where the optimizer is choosing a nonoptimal join type for your query, and you want to override the join type by placing the appropriate hint in the query.
Solution
There are three possible types of joins: nested loops, hash, and sort merge. Depending on the size of your tables, certain join types perform better than others. You can use hints to specify the join order that you prefer.
Nested Loops Join Hint
To invoke a nested loops join, use the USE_NL hint, and place both tables needing the join within parentheses inside the USE_NL hint:
SELECT /*+ use_nl(emp dept) */ ename, dname
FROM emp JOIN dept USING (deptno);
-----------------------------------------------
| Id | Operation | Name |
-----------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS | |
| 2 | NESTED LOOPS | |
| 3 | TABLE ACCESS FULL | DEPT |
| 4 | INDEX RANGE SCAN | EMP_I2 |
| 5 | TABLE ACCESS BY INDEX ROWID| EMP |
-----------------------------------------------
The nested loops join is usually best when joining smaller tables. In a nested loops join, one table is considered the “driving” table. This is the outer table in the join. For each row in the outer, driving table, each row in the inner table is searched for matching rows. In the execution plan for the foregoing statement, the EMP table is the driving, outer table, and it is seen in the execution plan as the outermost part of the plan. The DEPT table is the inner table and is shown as the innermost part of the execution plan.
Hash Join Hint
To invoke a hash join, use the USE_HASH hint, and place both tables needing the join within parentheses inside the USE_HASH hint:
SELECT /*+ use_hash(emp_all dept) */ ename, dname
FROM emp_all JOIN dept USING (deptno);
----------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------
| 0 | SELECT STATEMENT | | 1037K|
| 1 | HASH JOIN | | 1037K|
| 2 | TABLE ACCESS FULL| DEPT | 4 |
| 3 | TABLE ACCESS FULL| EMP_ALL | 1037K|
----------------------------------------------
For the optimizer to use a hash join, it must be an equijoin condition. Hash joins are best used when joining large amounts of data or where a large percentage of rows from a table is needed. The smaller of the two tables is used by the optimizer to build a hash table on the join key between the two tables. In the foregoing example, the DEPT table is the smaller table and will be used to build the hash table. For best performance, the hash table completely resides in memory.
Sort Merge Join Hint
To invoke a sort merge join, use the USE_MERGE hint, and place both tables needing the join within parentheses inside the USE_MERGE hint:
SELECT /*+ use_merge(emp dept) */ ename, dname
FROM emp JOIN dept USING (deptno)
WHERE deptno != 20;
------------------------------------------------
| Id | Operation | Name |
------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | MERGE JOIN | |
| 2 | TABLE ACCESS BY INDEX ROWID| DEPT |
| 3 | INDEX FULL SCAN | PK_DEPT |
| 4 | SORT JOIN | |
| 5 | TABLE ACCESS FULL | EMP |
------------------------------------------------
Sort merge joins, like hash joins, are used to join a large volume of data. Unlike the hash join, the sort merge join is used when the join condition between the tables is not an equijoin. The hash join will generally perform better than the sort merge join, unless the data is already sorted on the two tables. During this operation, the input data from both tables is sorted on the join key and then merged.
Join Hints When Querying Multiple Tables
If you are joining several tables and want to invoke a specific join method between all of the associated tables in the query, you must add a hint for each join condition. Here’s an example:
SELECT /*+ use_hash(employees department) use_hash(departments locations) */
last_name, first_name, department_name, city, state_province
FROM employees JOIN departments USING (department_id)
JOIN locations USING (location_id);
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN | |
| 2 | HASH JOIN | |
| 3 | TABLE ACCESS FULL | LOCATIONS |
| 4 | TABLE ACCESS FULL | DEPARTMENTS |
| 5 | VIEW | index$_join$_001 |
| 6 | HASH JOIN | |
| 7 | INDEX FAST FULL SCAN| EMP_NAME_IX |
| 8 | INDEX FAST FULL SCAN| EMP_DEPARTMENT_IX |
-----------------------------------------------------
How It Works
Table 14-2 summarizes the hints available for each of the different join methods. Hints to instruct the optimizer to choose a join method are sometimes necessary because of several factors:
Table 14-2. Join Methods and Their Hints
Oracle advises against using hints as much as possible because over time what was optimal at one moment under one circumstance and one version of the database software may not be optimal the next time. However, sometimes these hints can simply be helpful in fulfilling the short-term need or simply may be the only way to get the optimizer to do what you want it to do.
Tip The size of your PGA can affect which join method the optimizer uses for your query.
14-5. Changing the Optimizer Version
Problem
You have upgraded to a newer version of Oracle, and you are having query performance problems related to the newer version of Oracle. The problem is isolated to a small number of queries, so you want to place a hint in these queries to use the previous version of the optimizer’s rules and features.
Solution
To specify a version of the optimizer for a given query, you specify the version of the optimizer you desire within the optimizer_features_enable hint. Within parentheses, place the desired version of the database within single quotes.
SELECT /*+ optimizer_features_enable('11.2.0.4') */ *
FROM EMP JOIN DEPT USING(DEPTNO);
This method is mostly used as an interim measure to improve performance immediately following an upgrade, until analysis can be done and a resolution is found with the query and the upgraded version of the database.
How It Works
You can modify the version of the optimizer for a given query. This can be done via the optimizer_features_enable hint and will be in effect only for a given query. The primary reason this hint is used is that a query that performed well under one specific version of Oracle has seen performance degrade immediately following an Oracle database version upgrade.
There is an Oracle initialization parameter, optimizer_features_enable, that can be changed for the entire database instance, and it is an option when widespread performance problems occur within queries immediately after you’ve upgraded your database. Often, however, changing this parameter at the database instance level is not feasible, nor even desired, because the primary reason for upgrading is to take advantage of new features. So, unless there are significant and widespread performance issues, it is not recommended to change the optimizer_features_enable parameter for an entire database instance.
You can, however, change this parameter at the session level as well, which may be much more feasible for a given application. See the following example of changing the parameter at the session level:
alter session set optimizer_features_enable='11.2.0.4';
If you enter an invalid value for the parameter, the resulting error message will display all the valid values, which is helpful, especially because there are so many intermediate patch set levels for a given database version:
alter session set optimizer_features_enable='7';
ERROR:
ORA-00096: invalid value 7 for parameter optimizer_features_enable, must be from
among 12.1.0.1.1, 12.1.0.1, 11.2.0.4,
11.2.0.3, 11.2.0.2, 11.2.0.1, 11.1.0.7, 11.1.0.6, 10.2.0.5, 10.2.0.4, 10.2.0.3,
10.2.0.2, 10.2.0.1, 10.1.0.5, 10.1.0.4,
10.1.0.3, 10.1.0, 9.2.0.8, 9.2.0, 9.0.1, 9.0.0, 8.1.7, 8.1.6, 8.1.5, 8.1.4,
8.1.3, 8.1.0, 8.0.7, 8.0.6, 8.0.5, 8.0.4, 8.0.3, 8.0.0
If you have a given query or a small subset of critical queries that are performing at a substandard performance level after an upgrade, a quick method to return to the preupgrade performance is to use the optimizer_features_enable hint to point to a specific version of the optimizer for a given query.
14-6. Choosing Between a Fast Response and Overall Optimization
Problem
When you execute a query, you can choose between two goals:
Your instance will have a default goal configured for it. You can specify hints on a query-by-query basis to override the default goal and get the behavior you want for a given query.
Solution
There are hints that can be used to override the optimization goal of your database instance. Before using any of the hints related to the optimizer_mode parameter, you first want to validate what your database instance is currently set to. If you have the SELECT ANY DICTIONARY system privilege, you can see what value is set for the optimizer_mode parameter.
SQL> show parameter optimizer_mode
NAME TYPE VALUE
-------------- ------ --------
optimizer_mode string ALL_ROWS
If we run an explain plan for an example query, we can see what the execution plan is by using the default optimizer_mode setting for our database instance.
SELECT *
FROM employees NATURAL JOIN departments;
------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN | |
| 2 | TABLE ACCESS FULL| DEPARTMENTS |
| 3 | TABLE ACCESS FULL| EMPLOYEES |
------------------------------------------
Since the foregoing query is doing full table scans against the tables and we want to see some rows as soon as possible but not necessarily the full result set, we can pass in a FIRST_ROWS hint to accomplish this task. It is apparent that this changes the optimizer’s execution plan in order to provide results as soon as possible. You also have to pass in an integer value within the hint, specifying how many rows you want to see. The following example demonstrates the use of the FIRST_ROWS hint:
SELECT /*+ first_rows(10) */ *
FROM employees NATURAL JOIN departments;
----------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS | |
| 2 | NESTED LOOPS | |
| 3 | TABLE ACCESS FULL | EMPLOYEES |
| 4 | INDEX UNIQUE SCAN | DEPT_ID_PK |
| 5 | TABLE ACCESS BY INDEX ROWID| DEPARTMENTS |
----------------------------------------------------
The integer value supplied within the hint will influence the optimizer’s decision on determining the execution plan for the query. For instance, for the following query, if we increase the number of rows we want to see quickly, it can change the execution plan for the query:
SELECT /*+ first_rows(100) */ *
FROM employees NATURAL JOIN departments;
------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN | |
| 2 | TABLE ACCESS FULL| DEPARTMENTS |
| 3 | TABLE ACCESS FULL| EMPLOYEES |
------------------------------------------
Therefore, be as precise as possible when determining how many rows you want to see using the FIRST_ROWS hint.
If we needed the reverse situation and the database’s default optimizer_modewas set to FIRST_ROWS, we can supply an ALL_ROWS hint to tell the optimizer to use that mode when determining the execution plan:
SQL> alter system set optimizer_mode=first_rows scope=both;
System altered.
SQL> show parameter optimizer_mode
NAME TYPE VALUE
-------------- ------ ----------
optimizer_mode string FIRST_ROWS
SELECT /*+ all_rows */ *
FROM employees NATURAL JOIN departments;
------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN | |
| 2 | TABLE ACCESS FULL| DEPARTMENTS |
| 3 | TABLE ACCESS FULL| EMPLOYEES |
------------------------------------------
As with many Oracle parameters, you can also change the value at the session level, which is also true for optimizer_mode:
alter session set optimizer_mode=first_rows;
SELECT *
FROM employees NATURAL JOIN departments;
----------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS | |
| 2 | NESTED LOOPS | |
| 3 | TABLE ACCESS FULL | EMPLOYEES |
| 4 | INDEX UNIQUE SCAN | DEPT_ID_PK |
| 5 | TABLE ACCESS BY INDEX ROWID| DEPARTMENTS |
----------------------------------------------------
The valid values for optimizer_mode at the session level are FIRST_ROWS, FIRST_ROWS_1, FIRST_ROWS_10, FIRST_ROWS_100, FIRST_ROWS_1000, ALL_ROWS, CHOOSE, and RULE.
How It Works
The fast, initial response goal is often a good choice for queries when a user is awaiting results. It causes the database engine to make choices that allow rows to begin coming back from the query almost immediately. For example, optimizing for initial response often results in a nested loops join because such a join can begin returning rows from the beginning. The trade-off is possibly a longer overall execution time.
The goal of reducing overall query cost is usually a good choice for batch processes. No live, human user is awaiting results, so it is acceptable to spend more time on up-front processing in order to reduce overall query cost. An example might be to execute a hash join, which can’t begin returning rows until the join is done but which might execute in less overall time than a nested loops join.
You can use either the FIRST_ROWS or ALL_ROWS hint in your query to change the optimizer mode, which controls which of the preceding two goals applies to a given query.
To check what the current optimizer mode is for your database instance, check the value of the optimizer_mode initialization parameter. By specifying an optimizer goal hint, you override the optimizer mode set at the database instance level, as well as any settings at the session level.
The FIRST_ROWS hint is popular because it can quickly return the first possible rows back from a query. The FIRST_ROWS hint is also common because ALL_ROWS is the default value for the optimizer_mode parameter. It’s thus unusual to need to specify ALL_ROWS.
14-7. Performing a Direct-Path Insert
Problem
You are doing a DML INSERT statement, and it is performing slower than needed. You want to optimize the INSERT statement to use a direct-path insert technique.
Solution
By using the APPEND or APPEND_VALUES hint, you can significantly speed up the process of performing an insert operation on the database. Here is an example of the performance savings using the APPEND hint. First, we have a query that does a conventional insert between two tables:
INSERT INTO emp_dept
SELECT * FROM emp_ctas_new;
19753072 rows created.
Elapsed: 00:01:17.86
-------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------
| 0 | INSERT STATEMENT | |
| 1 | LOAD TABLE CONVENTIONAL | EMP_DEPT |
| 2 | TABLE ACCESS FULL | EMP_CTAS_NEW |
-------------------------------------------------
Then, if we place the APPEND hint inside the same INSERT statement, we see a considerable gain in performance:
INSERT /*+ append */ INTO emp_dept
SELECT * FROM emp_ctas_new;
19753072 rows created.
Elapsed: 00:00:12.15
-------------------------------------------
| Id | Operation | Name |
-------------------------------------------
| 0 | INSERT STATEMENT | |
| 1 | LOAD AS SELECT | EMP_DEPT |
| 2 | TABLE ACCESS FULL| EMP_CTAS_NEW |
-------------------------------------------
The APPEND hint works with an INSERT statement only with a subquery; it does not work with an INSERT statement with a VALUES clause. For that, you need to use the APPEND_VALUES hint. Here are two examples of an INSERT statement with a VALUES clause, and we can see the effect the hint has on the execution plan:
INSERT INTO emp_dept
VALUES (15867234,'Smith, JR','Sales',1359,'2010-01-01',200,5,20);
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | INSERT STATEMENT | |
| 1 | LOAD TABLE CONVENTIONAL | EMP_DEPT |
---------------------------------------------
INSERT /*+ append_values */ INTO emp_dept
VALUES (15867234,'Smith, JR','Sales',1359,'2010-01-01',200,5,20);
-------------------------------------
| Id | Operation | Name |
-------------------------------------
| 0 | INSERT STATEMENT | |
| 1 | LOAD AS SELECT | EMP_DEPT |
| 2 | BULK BINDS GET | |
-------------------------------------
How It Works
The APPEND hint works within statements performing DML insert operations from another table, that is, using a subquery from within an INSERT SQL statement. This is appropriate for when you need to copy a large volume of rows between tables. By bypassing the Oracle database buffer cache blocks and appending the data directly to the segment above the high-water mark, you save significant overhead. This is a popular method for inserting rows into a table very quickly.
When you specify one of these hints, Oracle will perform a direct-path insert. In a direct-path insert, the data is appended at the end of a table, rather than using free space that is found within current allocated blocks for that table. The APPEND and APPEND_VALUES hints, when used, automatically convert a conventional insert operation into a direct-path insert operation. In addition, if you are using parallel operations during an insert, the default mode of operation is to use the direct-path mode. If you want to bypass performing direct-path operations, you can use the NOAPPEND hint.
Keep in mind that if you are running with either of these hints, there is a risk of contention if you have multiple application processes inserting rows into the same table. If two append operations are inserting rows at the same time, performance will suffer: since the insert append operation appends the data above the high water mark for a segment, only one operation should be done at one time. However, if you have partitioned objects, you can still run several concurrent append operations, as long as each insert operates on separate partitions for a given table.
There are trade-offs in using either append hint. First, direct-path inserts require more space than normal insert operations because they are applied above the high-water mark of the table. This is even more true when running direct-path operations in parallel because then an extent is needed for each parallel insert process. Second, append hints will be ignored if performing them on tables with referential integrity constraints or on tables with triggers.
Starting with Oracle 12.1, statistics are automatically gathered when doing certain direct-path operations. If using either the CREATE TABLE ... AS SELECT or INSERT operation using direct-path operations, statistics are automatically gathered. The following example of a CREATE TABLE ... AS SELECT demonstrates that statistics are gathered automatically:
create table cre_test as select * from dual;
SQL> select owner, table_name, last_analyzed from dba_tables
2 where table_name = 'CRE_TEST';
OWNER TABLE_NAME LAST_ANALY
----- ---------- ----------
HR CRE_TEST 2013-10-21
In the following example, we show two tables to be used to demonstrate statistics gathering for insert statements. Note that the table must be loaded in direct-path mode and be empty for statistics to be gathered. Here is the same query used in the CREATE TABLE example, but in this case used for two empty tables that will be used for INSERT operations:
SQL> select owner, table_name, last_analyzed from dba_tables
2 where table_name in ('INS_CONV_TEST','INS_BULK_TEST'),
OWNER TABLE_NAME LAST_ANALY
----- ------------- ----------
HR INS_BULK_TEST
HR INS_CONV_TEST
2 rows selected.
Let’s insert a row into a table using a conventional INSERT statement:
SQL>
SQL> insert into ins_conv_test select * from dual;
1 row created.
Then, we will do the same for a table using the APPEND hint, which is a direct-path load operation:
SQL> insert/*+ append */into ins_bulk_test select * from dual;
1 row created.
SQL> commit;
Commit complete.
If we then rerun our query against the data dictionary to check the statistics for our two tables, we can see that statistics have been gathered for our table on which we performed a direct-path insert using APPEND:
SQL>
SQL> select owner, table_name, last_analyzed from dba_tables
2 where table_name in ('INS_CONV_TEST','INS_BULK_TEST'),
OWNER TABLE_NAME LAST_ANALY
----- ------------- ----------
HR INS_BULK_TEST 2013-10-21
HR INS_CONV_TEST
2 rows selected.
If, for whatever reason, you want to defer statistics gathering when using these operations, you can use the NO_GATHER_OPTIMIZER_STATISTICS hint:, which will prevent statistics gathering on the affected table. See the following example:
SQL> insert /*+ append no_gather_optimizer_statistics */ into ins_nostat_test
2 select * from dual;
1 row created.
SQL> commit;
Commit complete.
SQL>
SQL> select owner, table_name, last_analyzed from dba_tables
2 where table_name = 'INS_NOSTAT_TEST';
OWNER TABLE_NAME LAST_ANALY
----- --------------- ----------
HR INS_NOSTAT_TEST
Problem
You are creating a view and want to place a hint in the view’s query in order to improve performance on any queries that access the view.
Solution
Hints can be placed in views because a view is simply a stored query in the database.Depending on the type of hint used, as well as the type of view that is being queried, you can determine whether your hint will be used. It is important to understand what type of view you have so you can determine what impact hints will have on that view. To understand this, you first need to determine which of the following describes your view:
A simple view is a view that references only one table, and there are not any grouping functions or expressions:
CREATE view emp_high_sal
AS SELECT /*+ use_index(employees) */ employee_id, first_name, last_name, salary
FROM employees
WHERE salary > 10000;
A complex view can reference multiple tables, or it will have grouping clauses or use functions and expressions:
CREATE or replace view dept_sal
AS SELECT /*+ full(employees) */ department_id, department_name,
departments.manager_id, SUM(salary) total_salary, AVG(salary) avg_salary
FROM employees JOIN departments USING(department_id)
GROUP BY department_id, department_name, departments.manager_id;
A mergeable view is simply one in which the optimizer can replace the query calling the view with the query within the view definition. For example, we simply want to query all the rows from our EMP_HIGH_SAL view. The optimizer simply has to go directly to the EMPLOYEES table:
SELECT * FROM emp_high_sal;
---------------------------------------
| Id | Operation | Name |
---------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| EMPLOYEES |
---------------------------------------
The optimizer has simply replaced the query with the query that defined the view:
SELECT /*+ full(employees) */ department_id, department_name,
departments.manager_id, SUM(salary) total_salary, AVG(salary) avg_salary
FROM employees JOIN departments USING(department_id)
GROUP BY department_id, department_name, departments.manager_id;
With a mergeable view, the hint inside the view is preserved because the essential structure of the view definition is intact based on the query calling the view. See Table 14-3 for the guidelines for hints regarding mergeable views.
Table 14-3. Rules for Using Hints in Mergeable Views
Hint Category |
Placing Hints Inside Views |
Placing Hints in Queries Accessing a View |
---|---|---|
Access path/join |
Used only if the query referencing the view does not reference any other tables or views |
Ignored unless single-table view; if so, hint applied to the single table inside the view |
Optimizer mode hints |
Used unless there are conflicting hints inside the view, in which case they are all ignored |
Hints used regardless of hints inside the views |
For a nonmergeable view, the optimizer must break up the work into two pieces. It first must execute the query that defines the view and then must execute the top-level query. Because of this, the hints defined within the view are preserved. For instance, we are querying our DEPT_SAL view. We can see from the explain plan that the query is broken up into pieces:
SELECT manager_id, sum(total_salary)
FROM dept_sal
GROUP BY manager_id;
-------------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH GROUP BY | |
| 2 | VIEW | DEPT_SAL |
| 3 | HASH GROUP BY | |
| 4 | MERGE JOIN | |
| 5 | TABLE ACCESS BY INDEX ROWID| DEPARTMENTS |
| 6 | INDEX FULL SCAN | DEPT_ID_PK |
| 7 | SORT JOIN | |
| 8 | TABLE ACCESS FULL | EMPLOYEES |
-------------------------------------------------------
See Table 14-4 for the guidelines for hints regarding nonmergeable views.
Table 14-4. Rules for Using Hints in Nonmergeable Views
Hint Category |
Placing Hints Inside Views |
Placing Hints in Queries Accessing a View |
---|---|---|
Access path |
Preserved |
Ignored |
Join |
Preserved |
Preserved |
Optimizer mode hints |
Ignored |
Used, if present |
It can be confusing to understand all the possible scenarios with hints and views, so they need to be used sparingly and only when other means of tuning have not met the needed requirements.
How It Works
Since a view, as mentioned, is simply a stored query, hints can be placed easily inside the view as they would be inside any query. The type of hint placed in the view will determine how and if a hint can be used within the view. Much like performing DML on a view, there are limitations on when hints are used or ignored.
As a rule of thumb, the simpler a view is, the more likely hints can be effective. Because of the uniqueness of each application, query, and view, the only true way to know whether a hint will be used is to simply try the hint and perform an explain plan to validate whether a given hint is used.
Oracle does not recommend placing hints in views; because the underlying objects can change over time, you can expect unpredictable execution plans. Also, views can be created for one specific use but could be used for other purposes later, and any hints in the views may not help every scenario. In addition, hints placed within views are managed differently than if you were simply executing the query itself. Before any hint placed inside a view is used, the optimizer needs to determine whether the view can be merged with the query calling the view.
You can also consider placing hints within queries that access views. It is important to understand the rules of precedence when hints are placed within queries that access the views that have hints within themselves. This especially underlines the need for caution before placing a hint within a view.
Tip Hints placed in queries that select against complex views are ignored.
Problem
You want to improve the performance on a given set of often-used queries and want to use Oracle’s result cache to store the query results so they can be retrieved quickly for future use when the same query has been executed.
Solution
The result cache was created in order to store results from often-used queries in memory for quick and easy retrieval. If you run an explain plan on a given query, you can see whether the results will be stored in the result cache:
SELECT /*+ result_cache */
job_id, min_salary, avg(salary) avg_salary, max_salary
FROM employees JOIN jobs USING (job_id)
GROUP BY job_id, min_salary, max_salary;
---------------------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | RESULT CACHE | 1kvbu5w68ng3y8x745w5k2bap6 |
| 2 | HASH GROUP BY | |
| 3 | MERGE JOIN | |
| 4 | TABLE ACCESS BY INDEX ROWID| JOBS |
| 5 | INDEX FULL SCAN | JOB_ID_PK |
| 6 | SORT JOIN | |
| 7 | TABLE ACCESS FULL | EMPLOYEES |
---------------------------------------------------------------------
Result Cache Information (identified by operation id):
------------------------------------------------------
1 –
If you then query the V$RESULT_CACHE_OBJECTS view, you can validate whether the results of a query are stored in the result cache by looking at the cache ID value from the explain plan.
SELECT ID, TYPE, to_char(CREATION_TIMESTAMP,'yyyy-mm-dd:hh24:mi:ss') cr_date,
BLOCK_COUNT blocks, COLUMN_COUNT columns, PIN_COUNT pins, ROW_COUNT "ROWS"
FROM V$RESULT_CACHE_OBJECTS
WHERE CACHE_ID = ‘1kvbu5w68ng3y8x745w5k2bap6’;
ID TYPE CR_DATE BLOCKS COLUMNS PINS ROWS
--- ---------- ------------------- ------ ------- ---- ----
5 Result 2013-10-22:23:19:12 1 4 0 19
If for some reason the RESULT_CACHE_MODE parameter in your database is set with a default mode of FORCE at the database or table level, you can use the NO_RESULT_CACHE hint to bypass the result cache. If we run our previous query with the result cache mode set to FORCE, it is evident that the result cache is used automatically.
SQL> show parameter result_cache_mode
NAME TYPE VALUE
----------------- ------ -----
result_cache_mode string FORCE
select job_id, min_salary, avg(salary) avg_salary, max_salary
from employees join jobs using (job_id)
group by job_id, min_salary, max_salary;
---------------------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | RESULT CACHE | 1kvbu5w68ng3y8x745w5k2bap6 |
| 2 | HASH GROUP BY | |
| 3 | MERGE JOIN | |
| 4 | TABLE ACCESS BY INDEX ROWID| JOBS |
| 5 | INDEX FULL SCAN | JOB_ID_PK |
| 6 | SORT JOIN | |
| 7 | TABLE ACCESS FULL | EMPLOYEES |
---------------------------------------------------------------------
If we then rerun with the NO_RESULT_CACHE hint, the result cache is not used, and the statement is executed:
SELECT /*+ no_result_cache */ job_id, min_salary, avg(salary) avg_salary, max_salary
FROM employees JOIN jobs USING (job_id)
GROUP BY job_id, min_salary, max_salary;
---------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH GROUP BY | |
| 2 | MERGE JOIN | |
| 3 | TABLE ACCESS BY INDEX ROWID| JOBS |
| 4 | INDEX FULL SCAN | JOB_ID_PK |
| 5 | SORT JOIN | |
| 6 | TABLE ACCESS FULL | EMPLOYEES |
---------------------------------------------------
The following query was run twice in order to show consistent results of not using the result cache (results were suppressed for the second run):
SELECT /*+ no_result_cache */
j.job_id, min_salary, avg(salary) avg_salary, max_salary, department_name
FROM employees_big e, jobs j, departments d
WHERE e.department_id = d.department_id
AND e.job_id = j.job_id
AND salary BETWEEN 5000 AND 9000
GROUP BY j.job_id, min_salary, max_salary, department_name;
JOB_ID MIN_SALARY AVG_SALARY MAX_SALARY DEPARTMENT_NAME
------- ---------- ---------- ---------- ---------------
HR_REP 4000 6500 9000 Human Resources
ST_MAN 5500 8000 8500 Shipping
IT_PROG 4000 9000 10000 IT
MK_REP 4000 6000 9000 Marketing
4 rows selected.
Elapsed: 00:00:07.42
SQL> /
Elapsed: 00:00:07.36
Then, the same query with a result_cache hint was run three times for comparison:
SELECT /*+ result_cache */
j.job_id, min_salary, avg(salary) avg_salary, max_salary, department_name
FROM employees_big e, jobs j, departments d
WHERE e.department_id = d.department_id
AND e.job_id = j.job_id
AND salary BETWEEN 5000 AND 9000
GROUP BY j.job_id, min_salary, max_salary, department_name;
Elapsed: 00:00:07.56
SQL> /
Elapsed: 00:00:00.01
SQL> /
Elapsed: 00:00:00.00
By using the result_cache hint, you can see that the first time running with the hint showed no improvement in the run time. This can be explained because the results have not yet been placed in the result cache. With each subsequent execution, however, we can see that the results are returned immediately because they are now stored in memory within the result cache.
How It Works
The result cache hint, if placed in a query, will override any database-level, table-level, or session-level result cache settings. Before using hints in your queries, you need to determine the configuration of the result cache on your database. There are two separate result caches to look at: the server-side result cache and the client-side result cache. The server-side result cache is part of the shared pool of the SGA and stores SQL query results and PL/SQL function results. Query time can be improved significantly because query results are checked within the result cache first, and if the results exist, they are simply pulled from memory, and the query is not executed. The result cache is most appropriately used for often-run queries that produce the same results.
The result cache can be configured at several levels. As Table 14-5 indicates, it can be configured at the database level, the session level, the table level, or the statement level. The statement level is where hints are specified. If you decide to configure the result cache in your database, there are several initialization parameters that need to be configured. Table 14-6 reviews these parameters. Some are specific parameters for the result cache, while the remaining memory-related PL/SQL package parameters need to be analyzed to see whether they need to be changed to accommodate the result cache. The DBMS_RESULT CACHE can also be used to configure, flush, and retrieve information for the result cache.
Table 14-5. Result Cache Configuration Hierarchy
Configuration Level |
How to Configure Result Cache |
---|---|
Database level |
Configured via initialization parameters (see Table 14-6) |
Table level |
Configured with the CREATE TABLE or ALTER TABLE statement—for example, ALTER TABLE EMPLOYEES RESULT_CACHE (MODE FORCE) |
Session level |
Configured via the ALTER SESSION statement—for example, ALTER SESSION SET RESULT_CACHE_MODE=FORCE |
Statement level |
Configured via the RESULT_CACHE or NO_RESULT_CACHE hint |
Table 14-6. Result Cache Initialization Parameters
Key Result Cache Initialization Parameters |
Description |
---|---|
RESULT_CACHE_MODE |
Indicates whether the result cache is active for all activity or only for manually run activities; MANUAL is the default, which means the result cache is not used unless specified at the table, session, or statement level. FORCE means it will be enabled for all queries for a database instance. |
RESULT_CACHE_MAX_SIZE |
Determines memory allocated for server-side result cache for database. |
RESULT_CACHE_MAX_RESULT |
Determines maximum size for single result for server-side result cache. The default is 5 percent. |
CLIENT_RESULT_CACHE_SIZE |
Determines maximum size for each client-side session result cache. |
MEMORY_TARGET |
By default, 0.25 percent of total is allocated for result cache if this parameter is configured. |
SGA_TARGET |
By default, 0.5 percent of total is allocated for result cache if this parameter is configured. |
SHARED_POOL_SIZE |
By default, 1 percent of total is allocated for result cache if this parameter is configured. |
RESULT_CACHE_REMOTE_EXPIRATION |
Number of minutes that a result dependent on remote database objects will remain valid. The default is 0, which means results dependent on remote objects will not be stored in the result cache. In most circumstances, the default is recommended because remote results can become stale. |
14-10. Directing a Distributed Query to a Specific Database
Problem
You are joining two or more tables that exist on different databases and want to direct the work to take place on a particular database because the remote database is where most of the data resides.
Solution
By default, when you are joining tables that exist on different databases, the database where the query originated is where the majority of the work takes place. You can change this behavior and tell the optimizer which database will do the work:
SELECT /*+ driving_site(employees) */ first_name, last_name, department_name
FROM employees@to_emp_link JOIN departments USING(department_id);
Specifying the remote site as the driver is most appropriate if the volume on the remote site is large or if you are querying many tables on the remote site. To process a distributed query, the optimizer first has to bring rows from remote tables over to the local site, before processing the overall query. This can be very resource-intensive on the temporary tablespaces on the local database. Therefore, by instructing the optimizer to perform the work at the site where the biggest percentage of the data resides, you can drastically improve your query performance.
When specifying the hint, you simply need to specify the remote table or table alias within your hint to direct the optimizer to the site that will do the work. There is no need to specify any hint if you want the optimizer to do the work on the local database; the hint needs to be specified when you want to direct the work to a remote database.
How It Works
Distributed queries can be a blessing and a curse. By being able to join tables from remote databases, it gives users the impression of data transparency (that is, that the data they need to retrieve appears to be in one place because they can assemble a single query to retrieve data when in fact the data may reside on two or more databases). This simplicity in assembling queries is a key advantage of being able to perform distributed queries. The key disadvantage is optimizing distributed queries is difficult. Essentially, the originating or local database where the query is initiated becomes the “driver” database by default. The optimizer at the local site has no knowledge of the makeup or volume of data at the remote site, and therefore the work is split up into pieces, and the query is not, by default, optimized as a single unit. Therefore, it is important to understand the makeup of the data on each database in order to attempt to best optimize the query. The key decision you need to make with a distributed query is which database you want to be the “driving” site. The biggest factors in determining which site should be the driving site are as follows:
In essence, if a majority of tables or a large volume of data resides remotely, it may be beneficial to use a remote database as the driving site. Let’s say we are joining three tables together, and we want to get employee information along with the department they work in and their work address. In this scenario, the employee table, being the largest, resides on one database, while two smaller tables, the department and location tables, reside on our local database:
SELECT first_name, last_name, department_name, street_address, city
FROM employees@to_emp_link JOIN departments USING(department_id)
JOIN locations USING (location_id);
-------------------------------------------
| Id | Operation | Name |
-------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH JOIN | |
| 2 | HASH JOIN | |
| 3 | TABLE ACCESS FULL| LOCATIONS |
| 4 | TABLE ACCESS FULL| DEPARTMENTS |
| 5 | REMOTE | EMPLOYEES |
-------------------------------------------
From the execution plan, we can see that the EMPLOYEES table is the remote table. What this means is before the join to the employee data can occur, all of that employee data must be brought over to the local database before the query can be completed. In this case, the employee data is by far the largest table of the three. There are far more employees than there are departments or locations, and therefore a large volume of data will be brought over to the local database before the remainder of the query can be processed. So, in this case, performance many improve by having the work done on the database where the employee data resides:
SELECT /*+ driving_site(employees) */
first_name, last_name, department_name, street_address, city
FROM employees@to_emp_link JOIN departments USING(department_id)
JOIN locations USING (location_id);
---------------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------------
| 0 | SELECT STATEMENT REMOTE | |
| 1 | HASH JOIN | |
| 2 | VIEW | index$_join$_001 |
| 3 | HASH JOIN | |
| 4 | INDEX FAST FULL SCAN| EMP_DEPARTMENT_IX |
| 5 | INDEX FAST FULL SCAN| EMP_NAME_IX |
| 6 | HASH JOIN | |
| 7 | REMOTE | DEPARTMENTS |
| 8 | REMOTE | LOCATIONS |
---------------------------------------------------------------
Now the explain plan shows the two smaller tables as remote tables, since the database where the EMPLOYEES table resides is now the driving site for the query. Sometimes you may simply need to determine this by trial and error when it is not obvious which site should be the driving site.
Another easy way to determine this is simply by determining which query returns faster. If we want to get the average salary for each department and location, the query would look like the following:
SELECT department_name, city, avg(salary)
FROM employees_big@to_emp_link JOIN departments USING(department_id)
JOIN locations USING (location_id)
GROUP BY department_name, city
ORDER BY 2,1;
DEPARTMENT_NAME CITY AVG(SALARY)
---------------- ------------------- -----------
Human Resources London 6500
Public Relations Munich 10000
Sales Oxford 8955.88235
Accounting Seattle 10150
Administration Seattle 4400
Executive Seattle 19333.3333
Finance Seattle 8600
Purchasing Seattle 4150
Shipping South San Francisco 3475.55556
IT Southlake 5760
Marketing Toronto 9500
11 rows selected.
Elapsed: 00:00:42.87
Since no driving site hint is specified, the local site is the driving site. If we issue the same query specifying the remote and larger table to be the driving site, we see a benefit simply from the time the query takes to execute:
SELECT /*+ driving_site(employees_big) */ department_name, city, avg(salary)
FROM employees_big@to_emp_link JOIN departments USING(department_id)
JOIN locations USING (location_id)
GROUP BY department_name, city
ORDER BY 2,1;
Elapsed: 00:00:22.24
One more way you can try to determine which site should be the driving site is by figuring out exactly what work is being performed on each site. For example, using the foregoing query as an example, if we do not use the hint, perform the following:
First, we can see the execution plan of our query. Again, we are not using the driving_site hint.
----------------------------------------------
| Id | Operation | Name |
----------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT GROUP BY | |
| 2 | HASH JOIN | |
| 3 | HASH JOIN | |
| 4 | TABLE ACCESS FULL| LOCATIONS |
| 5 | TABLE ACCESS FULL| DEPARTMENTS |
| 6 | REMOTE | EMPLOYEES_BIG |
----------------------------------------------
Second, we can determine that the operation occurring on the remote database is the SELECT statement returning columns from the remote EMPLOYEES_BIG table. You can retrieve this information directly from the data dictionary on the remote database or from a tool such as Enterprise Manager.
SELECT "SALARY","DEPARTMENT_ID"
FROM "EMPLOYEES_BIG" "EMPLOYEES_BIG"
If we repeat the foregoing two steps with the same query but this time we insert a driving_site hint for the EMPLOYEES table, we get the following results. First, we can get the execution plan of our query with the driving_site hint:
-------------------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------------------
| 0 | SELECT STATEMENT REMOTE| |
| 1 | RESULT CACHE | 326m75n1yb5kt2qysx7f37cy2y |
| 2 | SORT GROUP BY | |
| 3 | HASH JOIN | |
| 4 | HASH JOIN | |
| 5 | REMOTE | LOCATIONS |
| 6 | REMOTE | DEPARTMENTS |
| 7 | TABLE ACCESS FULL | EMPLOYEES_BIG |
-------------------------------------------------------------
Second, we can see which part of the query is being performed on the remote site. In this case, the data was retrieved from Enterprise Manager:
SELECT "A2"."DEPARTMENT_NAME","A1"."CITY",AVG("A3"."SALARY")
FROM "EMPLOYEES_BIG" "A3","DEPARTMENTS"@! "A2","LOCATIONS"@! "A1"
WHERE "A2"."LOCATION_ID"="A1"."LOCATION_ID" AND "A3"."DEPARTMENT_ID"="A2"."DEPARTMENT_ID"
GROUP BY "A2"."DEPARTMENT_NAME","A1"."CITY" ORDER BY "A1"."CITY","A2"."DEPARTMENT_NAME"
Without the driving site hint, we had to move all rows for the EMPLOYEES_BIG table for the SALARY and DEPARTMENT_ID columns. After transporting this data, the query results could be processed.
With the driving_site hint, we had to move all rows for all columns of the DEPARTMENTS and LOCATIONS table to the remote database. Then, the query results could be processed. And, because we used the driving_site hint, after the query results were compiled, the complete result set had to be transported to the local database. Therefore, you need to factor in not only the data moving between databases for the query itself but also, if you are using the driving_site hint, the results themselves being transported back to the local database where the query originated.
14-11. Gathering Extended Query Execution Statistics
Problem
You don’t know whether the cardinality estimates in your explain plan are accurate for a specific query, and you want to gather extended explain plan statistics to validate those estimates.
Solution
You can use the GATHER_PLAN_STATISTICS hint, which, if placed within a query at runtime, will generate extended runtime statistics. It is a two-step process:
See the following example:
SELECT /*+ gather_plan_statistics */
city, round(avg(salary)) avg, min(salary) min, max(salary) max
FROM employees JOIN departments USING (department_id)
JOIN locations USING (location_id)
GROUP BY city;
CITY AVG MIN MAX
------------------- ----- ----- -----
London 6500 6500 6500
Seattle 8844 2500 24000
Munich 10000 10000 10000
South San Francisco 3476 2100 8200
Toronto 9500 6000 13000
Southlake 5760 4200 9000
Oxford 8956 6100 14000
Then, you can use dbms_xplanto display the extended query statistics. Ensure that the SQL Plus setting SERVEROUTPUT is set to OFF, or else the results will not be properly displayed.
SELECT * FROM table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST'));
---------------------------------------------------------------------------
| Id | Operation | Name | Starts| E-Rows| A-Rows|
---------------------------------------------------------------------------
| 0| SELECT STATEMENT | | 1| | 7|
| 1| HASH GROUP BY | | 1| 7| 7|
|* 2| HASH JOIN | | 1| 106| 107|
|* 3| HASH JOIN | | 1| 27| 27|
| 4| VIEW | index$_join$_004| 1| 23| 23|
|* 5| HASH JOIN | | 1| | 23|
| 6| INDEX FAST FULL SCAN| LOC_CITY_IX | 1| 23| 23|
| 7| INDEX FAST FULL SCAN| LOC_ID_PK | 1| 23| 23|
| 8| VIEW | index$_join$_002| 1| 27| 27|
|* 9| HASH JOIN | | 1| | 27|
| 10| INDEX FAST FULL SCAN| DEPT_ID_PK | 1| 27| 27|
| 11| INDEX FAST FULL SCAN| DEPT_LOCATION_IX| 1| 27| 27|
| 12| TABLE ACCESS FULL | EMPLOYEES | 1| 107| 111|
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMPLOYEES"."DEPARTMENT_ID"="DEPARTMENTS"."DEPARTMENT_ID")
3 - access("DEPARTMENTS"."LOCATION_ID"="LOCATIONS"."LOCATION_ID")
5 - access(ROWID=ROWID)
9 - access(ROWID=ROWID)
Note
-----
- statistics feedback used for this statement
- this is an adaptive plan
Many other options are available using the DISPLAY_CURSOR procedure; ,refer to the Oracle PL/SQL Packages and Types Reference Guide for a more complete listing of these options.
How It Works
The GATHER_PLAN_STATISTICS hint gathers runtime statistics; therefore, the query needs to be executed in order to gather these statistics. If you already have a query that is performing at a substandard optimization level, it may be useful to run your query with the GATHER_PLAN_STATISTICS hint. This can quickly give you information that you simply do not have with a normal explain plan because it shows you estimated and actual information regarding query statistics. From this, you can determine whether the optimizer is optimally executing the SQL, and you can determine whether any optimization is needed.
Keep in mind that it does take some resources in order to gather these extra runtime statistics, so use this option with care. It may even be worthwhile to test the runtime differences in some cases. One key benefit of this hint is that the extra statistics are gathered only for the specific query. That way, the scope is limited and has little effect on other processes in the database, or even a particular session. If you wanted a more global setting to gather extended statistics, you can set STATISTICS_LEVEL=ALL at the session or instance level. One quick set of columns to review are the E-Rows and A-Rows columns. The E-Rows column shows the original optimizer estimates, and the A-Rows column shows the actual statistics gathered during execution. By looking at these columns, you can quickly tell whether the optimizer is executing the query based on accurate statistics. If there is a large discrepancy between these columns, it is a sign of an inefficient execution plan. The one needed calculation for an accurate analysis is for the E-Rows column. You need to multiply the Starts column with E-Rows to accurately compare the total with A-Rows.
Problem
You have materialized views in your database environment and want to have queries that access the source tables that make up the materialized views go against the materialized views directly to retrieve the results.
Solution
You can use the REWRITE hint to direct the optimizer to use a materialized view. The materialized view must have query rewrite enabled, and statistics for the materialized view and the associated objects should be current to increase the likelihood for a query to be rewritten.
To demonstrate how this works, first observe the following materialized view DDL. This materialized view calculates the total compensation for each department for a company. Let’s say the base query embedded in this materialized view is often used by executives of this company to determine how their particular department is doing in terms of distributing compensation to its employees:
CREATE MATERIALIZED VIEW DEPT_SAL_MV
ENABLE QUERY REWRITE
AS
SELECT department_id,
sum(nvl(salary+(salary*commission_pct),salary)) total_compensation
FROM employees
GROUP BY department_id;
Then, let’s see an example of a query that does the same calculation as that in the materialized view, for which the results are already stored:
SELECT /*+ rewrite(dept_sal_mv) */ department_id,
sum(nvl(salary+(salary*commission_pct),salary)) total_compensation
FROM employees
GROUP BY department_id
having sum(nvl(salary+(salary*commission_pct),salary)) > 10000
ORDER by 2;
In the following explain plan we can see that the optimizer used the materialized view in the execution plan, rather than processing the entire query and recalculating the summary:
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT ORDER BY | |
| 2 | MAT_VIEW REWRITE ACCESS FULL| DEPT_SAL_MV |
-----------------------------------------------------
How It Works
Materialized views are commonly used to store the result set for often-executed queries. While regular views are simply stored queries in the data dictionary, materialized views are essentially tables that store the result for these queries. Usually, they are created when there are complex joins, summaries, or aggregations occurring within a query. Since the results are stored in the database, there is no need for the optimizer to reprocess the query to retrieve the data. The end-user community does not have to execute a complex join or aggregation over and over, so it is a considerable performance benefit. Some users may not be aware of the materialized views in your environment and may be executing the raw queries against the star schema or other tables. It is here that the REWRITE hint may help in improving performance on queries that could use a materialized view.
If you enable query rewrite for a materialized view and if a query executes where the results can be found in that materialized view, the optimizer may choose to “rewrite” the query to go directly against the materialized view, rather than process the query itself. In these cases, no hint is usually required because the optimizer will automatically attempt to rewrite the query. However, it’s possible the optimizer may not choose to rewrite the query to use the materialized view, even though that is the desired outcome. In those instances, you can place a hint within your query to have the optimizer use the materialized view, regardless of the execution cost. You can place the actual view name within the hint or place the hint without the view name:
SELECT /*+ rewrite */ department_id,
sum(nvl(salary+(salary*commission_pct),salary)) total_compensation
FROM employees
GROUP BY department_id
having sum(nvl(salary+(salary*commission_pct),salary)) > 10000
ORDER by 2;
Conversely, you can also use a NOREWRITE hint if, for some reason, you do not want the optimizer to use the materialized view. One possible reason is that the data in the materialized view is stale compared to the source tables, and you want to ensure you are getting the most current data. Here we can see that the optimizer bypassed the use of the materialized view and resummarized the data directly from the EMPLOYEES table:
SELECT /*+ norewrite */ department_id,
sum(nvl(salary+(salary*commission_pct),salary)) total_compensation
FROM employees
GROUP BY department_id
having sum(nvl(salary+(salary*commission_pct),salary)) > 10000
ORDER by 2;
------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT ORDER BY | |
| 2 | FILTER | |
| 3 | HASH GROUP BY | |
| 4 | TABLE ACCESS FULL| EMPLOYEES |
------------------------------------------
14-13. Improving Star Schema Query Performance
Problem
You work in a data warehouse environment that contains star schemas, and you want to improve the performance of queries.
Solution
Oracle has a specific solution called star transformation, which was designed to help improve performance against star schemas in the data warehouse environment. Oracle has the STAR_TRANSFORMATION and FACT hints to help improve query performance using star schemas. In your queries, you can use the STAR_TRANSFORMATION or FACT hint, or you can use both. The following query is an example of how to use these hints:
SELECT /*+ star_transformation */ pr.prod_category, c.country_id,
t.calendar_year, sum(s.quantity_sold), SUM(s.amount_sold)
FROM sales s, times t, customers c, products pr
WHERE s.time_id = t.time_id
AND s.cust_id = c.cust_id
AND pr.prod_id = s.prod_id
AND t.calendar_year = '2011'
GROUP BY pr.prod_category, c.country_id, t.calendar_year;
To use just the FACT hint, simply place the fact table name or alias within parentheses in the hint:
SELECT /*+ fact(s) */ pr.prod_category, c.country_id,
At times, the optimizer will be more likely to perform star transformation when both hints are present:
SELECT /*+ star_transformation fact(s) */ pr.prod_category, c.country_id,
Here is a typical explain plan that has undergone star transformation:
-----------------------------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH GROUP BY | |
| 2 | HASH JOIN | |
| 3 | HASH JOIN | |
| 4 | HASH JOIN | |
| 5 | PARTITION RANGE ALL | |
| 6 | TABLE ACCESS BY LOCAL INDEX ROWID| SALES |
| 7 | BITMAP CONVERSION TO ROWIDS | |
| 8 | BITMAP AND | |
| 9 | BITMAP MERGE | |
| 10 | BITMAP KEY ITERATION | |
| 11 | BUFFER SORT | |
| 12 | TABLE ACCESS FULL | CUSTOMERS |
| 13 | BITMAP INDEX RANGE SCAN | SALES_CUST_BIX |
| 14 | BITMAP MERGE | |
| 15 | BITMAP KEY ITERATION | |
| 16 | BUFFER SORT | |
| 17 | VIEW | index$_join$_016 |
| 18 | HASH JOIN | |
| 19 | INDEX FAST FULL SCAN | PRODUCTS_PK |
| 20 | INDEX FAST FULL SCAN | PRODUCTS_PROD_CAT_IX |
| 21 | BITMAP INDEX RANGE SCAN | SALES_PROD_BIX |
| 22 | TABLE ACCESS FULL | TIMES |
| 23 | TABLE ACCESS FULL | CUSTOMERS |
| 24 | VIEW | index$_join$_004 |
| 25 | HASH JOIN | |
| 26 | INDEX FAST FULL SCAN | PRODUCTS_PK |
| 27 | INDEX FAST FULL SCAN | PRODUCTS_PROD_CAT_IX |
-----------------------------------------------------------------------
Note
-----
- star transformation used for this statement
How It Works
Before you start running star queries, there are two key configuration elements that need to be taken care of before star transformation can occur:
If you are at a point to want to use a hint within a star schema, be it the FACT hint or the STAR_TRANSFORMATION hint, it is assumed you have a properly configured environment, or else these hints will not be used by the optimizer. These hints are not required for star transformation, but by using either of these hints, the optimizer will look to do transformation. Even with the hint, however, the optimizer may choose to ignore the request, based on what it thinks the best execution plan will be for the query. Star queries are efficient and perform well because the transformation is designed to operate specifically with star schemas.
If, for some reason, you want to avoid the use of star transformation for your query, simply use the no_star_transformation hint, and the optimizer will bypass the use of star transformation:
SELECT /*+ no_star_transformation */ pr.prod_category, c.country_id,
t.calendar_year, sum(s.quantity_sold), SUM(s.amount_sold)
FROM sales s, times t, customers c, products pr
WHERE s.time_id = t.time_id
AND s.cust_id = c.cust_id
AND pr.prod_id = s.prod_id
AND t.calendar_year = '2011'
GROUP BY pr.prod_category, c.country_id, t.calendar_year;
From the explain plan, we can see that the optimizer did not transform our query:
---------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH GROUP BY | |
| 2 | NESTED LOOPS | |
| 3 | NESTED LOOPS | |
| 4 | NESTED LOOPS | |
| 5 | NESTED LOOPS | |
| 6 | PARTITION RANGE ALL | |
| 7 | TABLE ACCESS FULL | SALES |
| 8 | TABLE ACCESS BY INDEX ROWID| PRODUCTS |
| 9 | INDEX UNIQUE SCAN | PRODUCTS_PK |
| 10 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS |
| 11 | INDEX UNIQUE SCAN | CUSTOMERS_PK |
| 12 | INDEX UNIQUE SCAN | TIMES_PK |
| 13 | TABLE ACCESS BY INDEX ROWID | TIMES |
---------------------------------------------------------
At times, it can be tricky to get the star transformation to take place. It is critically important that you have properly configured the star schema with all the appropriate bitmap indexes. Even having one missing bitmap index can affect the ability to have star transformation occur for your queries, so it is important to be thorough and validate the configuration, especially regarding the bitmap indexes. Some star schemas also employ the use of bitmap join indexes between the fact and dimension tables to aid in achieving star transformation.