Parallelism can help improve performance on particular operations simply by assigning multiple resources to a task. Parallelism is best used on systems with multiple CPUs because the multiple processes used (that is, the parallel processes) will use those extra system resources to more quickly complete a given task.
As a general rule, parallelism is also best used on large tables or indexes and on databases with large volumes of data. It is ideal for use in data warehouse environments, which are large by their nature. Parallelism is not well suited for OLTP environments just because of the transactional nature of those systems.
To use parallelism properly, there are several important factors to understand:
One of the most common pitfalls of parallelism is overuse. It is sometimes seen as a magic bullet to tune and speed up SQL operations. In turn, parallelism can actually lead to poorer rather than better performance. Therefore, it is critically important for the DBA to understand the physical configuration of a system and configure parallelism-related parameters to best suit the system. Educating developers and users of your database about basic questions will increase the success rate of parallel operations. When is it appropriate to use parallelism? How do you properly enable parallelism in SQL operations? What type of operations can be parallelized? Parallelism is a powerful tool to aid in drastically improving the performance of database operations, but with that power comes responsibility.
This chapter focuses on the methods to properly configure your database for parallelism, key operations that can be parallelized, how to induce parallelism in your SQL, and some tools to use to see whether parallel operations are running optimally.
15-1. Enabling Parallelism for a Specific Query
Problem
You have a slow-running query accessing data from a large table. You want to see whether you can speed up the query by instructing Oracle to use multiple processes to retrieve the data.
Solution
There are two distinct types of hints to place in your SQL to try to speed up your query by using multiple processes, or parallelism. One type of hint is for data retrieval itself, and the other is to help speed the process of reading the indexes on a table.
Parallel Hints for Tables
First, you need to determine the degree of parallelism (DOP) desired for the query. This instructs Oracle how many processes it will use to retrieve the data. Second, place a parallel hint inside the query specifying the table(s) on which to execute parallel SQL, as well the degree of parallelism to use for the query. Here’s an example:
SELECT /*+ parallel(emp,4) */ empno, ename
FROM emp;
If you use a table alias in your query, you must use it in your hint, or else the Oracle optimizer will ignore the hint.
SELECT/*+ parallel(e,4) */ empno, ename
FROM emp e;
The hints in the preceding two queries result in four processes dividing the work of reading rows from the EMP table. Four processes working together will get the job done faster in terms of wall-clock time than one process doing all the work by itself.
Optionally, you can omit specifying a degree of parallelism within the hint. If you specify only the table name or alias in the hint, Oracle will derive the degree of parallelism based on the database initialization parameters, which may or may not give you the desired degree of parallelism:
SELECT/*+ parallel(e) */ empno, ename
FROM emp e;
Parallel Hints for Indexes
Specify the parallel_index hint to control parallel access to indexes. You can generally access an index in parallel only when the index is a locally partitioned index. In that case, you can apply the parallel_index hint. Here’s an example:
SELECT /*+ parallel_index(emp, emp_i4 ,4) */ empno, ename
FROM emp
WHERE deptno = 10;
There are two arguments to the parallel_index hint: table name and index name. As with specifying the degree of parallelism on tables, if you omit the degree of parallelism from within an index hint, the database itself will compute the degree of parallelism for the query.
If you alias your tables, then you must use the alias names in your hints. See the preceding section, “Parallel Hints for Tables,” for an example.
How It Works
To effectively use parallel hints, you need to take the following items into consideration:
You also must analyze and understand three key components of your system prior to using parallel hints in queries:
Parallel SQL must be used with caution because it is common to overuse and can cause an overutilization of both CPU and I/O resources, which ultimately results in slower rather than faster performance. Overuse is a common mistake in the use of parallelism.
Depending on the number of tables in your query, you may want to place parallelism on one or more of the tables, depending on their size. A general rule of thumb is that if a table contains more than 10 million rows, or is at least 2 GB in size, it may be a viable candidate for using parallelism.
The degree of parallelism (DOP) should be directly related to the number of CPUs on your system. If you have a single-CPU system, there is little, if any, benefit of using parallel SQL, and the result could very well be returned slower than if no parallelism was used at all.
To help determine whether you can use parallelism on any indexes, you need to first determine whether any of the filtering columns in your WHERE clause are indexed. If so, check to see whether the table is partitioned. Typically, then, for a query on a large table, a parallel_index hint may help the speed of your query. Overall, when trying to determine whether to use parallelism for your query, it’s helpful to perform an explain plan to determine whether parallelism will be used. Also, there may be parallelism already specified for an object within your query, so it is also a good idea to check the DEGREE column in the USER_TABLES or USER_INDEXES view prior to checking the degree of parallelism within a hint.
Table 15-1 shows the different parallel hints that can be used.
Table 15-1. Types of Parallel Hints
Table Head |
Parameters |
---|---|
PARALLEL |
Table name, DOP |
PARALLEL_INDEX |
Table name, index name, DOP |
NO_PARALLEL |
-- |
NO_PARALLEL_INDEX |
-- |
PQ_DISTRIBUTE |
Table name, distribution value |
PQ_CONCURRENT_UNION |
For concurrent processing of UNION and UNION ALL operations |
NO_PQ_CONCURRENT_UNION |
Inhibits concurrent processing of UNION and UNION ALL operations |
PQ_FILTER |
Enables special instructions to the optimizer for correlated subqueries |
PQ_SKEW |
Advises the optimizer that data distribution of join keys for a parallel join operation is highly skewed |
NO_PQ_SKEW |
Asserts the lack of skew |
Oracle gives you many options to help you determine a proper DOP and whether you want to specify it yourself or you want Oracle to determine the DOP for your query. Table 15-2 briefly describes these options.
Table 15-2. Degree of Parallelism Options
Hint Name |
Description |
---|---|
PARALLEL |
Statement always runs in parallel. |
PARALLEL (DEFAULT) |
Same as PARALLEL. |
PARALLEL (AUTO) |
Optimizer computes DOP to be used. |
PARALLEL (MANUAL) |
Parallelism is based on object parallelism. |
PARALLEL (integer) |
The DOP used is specified by the integer. |
Parallel Hints for Tables
To determine whether parallelism is being used in your query, first perform an explain plan on your query. The following are a simple query and its associated execution plan:
select * from emp;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 1218 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| EMP | 14 | 1218 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
If parallelism isn’t being used, you can insert the parallel hint and then rerun the explain plan to verify that the optimizer will use parallelism in the execution plan. Here’s an example:
select /*+ parallel(emp,4) */ * from emp;
----------------------------------------------------------------------
| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | PX COORDINATOR | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | Q1,00 | PCWC | |
| 4 | TABLE ACCESS FULL| EMP | Q1,00 | PCWP | |
----------------------------------------------------------------------
Note The proper database initialization parameters need to be properly set in order for parallelism to be enabled and used via the use of hints.
Parallel Hints for Indexes
Although it is far less common to parallelize index-based queries, it may be of benefit in certain circumstances. For example, you may want to parallelize the query against a local index that is part of a partitioned table. The following is an example query and the resulting execution plan:
SELECT /*+ parallel_index(emp, emp_i3) */ empno, ename
FROM emp
WHERE hiredate between '2010-01-01' and '2010-12-31';
-------------------------------------------------------------------------
|Id|Operation |Name | TQ |IN-OUT| PQ Dist|
-------------------------------------------------------------------------
| 0|SELECT STATEMENT | | | | |
| 1| PX COORDINATOR | | | | |
| 2| PX SEND QC (RANDOM) |:TQ10000|Q1,00| P->S | |
| 3| PX PARTITION RANGE ITERATOR | |Q1,00| PCWC | |
| 4| TABLE ACCESS BY LOCAL INDEX ROWID|EMP |Q1,00| PCWP | |
| 5| INDEX RANGE SCAN |EMP_I3 |Q1,00| PCWP | |
-------------------------------------------------------------------------
When formatting the hint, you can specify all the parameters that tell the optimizer exactly which index to use and what DOP you desire. In the following query, we’re telling the optimizer that we want to use the EMP_I3 index, with a DOP of 4.
SELECT /*+ parallel_index(emp, emp_i3, 4) */ empno, ename
FROM emp
WHERE hiredate between '2010-01-01' and '2010-12-31';
If you omit the DOP from the hint, the optimizer determines the DOP based on the initialization parameter settings. For instance, in the following example, the table name and index name are specified in the hint, but the DOP is not specified. Oracle will compute the DOP for us in these cases.
SELECT /*+ parallel_index(emp, emp_i3) */ empno, ename
FROM emp
WHERE hiredate between '2010-01-01' and '2010-12-31';
You can also simply place the table name in the hint, and the optimizer will determine which index, if any, can be used. If the optimizer determines that no index is suitable, then no index will be used. In the following example, only the table name is used in the hint:
SELECT /*+ parallel_index(emp) */ empno, ename
FROM emp
WHERE hiredate between '2010-01-01' and '2010-12-31';
15-2. Enabling Parallelism at Object Creation
Problem
You have new tables to create in your database that will be growing to a very large size, and you want to speed up the queries against those tables.
Solution
Having a higher than default DOP on a table or index is an easy way to set a more consistent and fixed method of enabling multiple processes on tables and indexes. Enabling parallelism on tables or indexes is done within DDL commands. You can enable parallelism within the CREATE statement when creating a table or an index.
For a new table, if you are expecting to have consistent queries that can take advantage of multiple processes, it may be easier to set a fixed DOP on your object, rather than having to place hints in your SQL or let Oracle set the DOP for you. In the following example, we’ve specified a DOP of 4 on the EMP table:
CREATE TABLE EMP
(
EMPNO NUMBER(4) CONSTRAINT PK_EMP PRIMARY KEY,
ENAME VARCHAR2(10),
JOB VARCHAR2(9),
MGR NUMBER(4),
HIREDATE DATE,
SAL NUMBER(7,2),
COMM NUMBER(7,2),
DEPTNO NUMBER(2) CONSTRAINT FK_DEPTNO REFERENCES DEPT
)
PARALLEL(DEGREE 4);
By placing a static DOP of 4 on the table, any user accessing the EMP table will get a DOP of 4 for each query executed.
select * from emp;
----------------------------------------------------------------------
| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | PX COORDINATOR | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | Q1,00 | PCWC | |
| 4 | TABLE ACCESS FULL| EMP | Q1,00 | PCWP | |
----------------------------------------------------------------------
You can also specify a default DOP when creating an index. There are circumstances where it may be beneficial to create an index with a higher DOP. With large partitioned tables, it is common to have secondary locally partitioned indexes on often-used columns in the WHERE clause. Some queries that use these indexes may benefit from increasing the DOP. In the following DDL, we’ve created this index with a DOP of 4:
CREATE INDEX EMP_I1
ON EMP (HIREDATE)
LOCAL
PARALLEL(DEGREE 4);
How It Works
Placing parallelism on objects themselves helps multiple processes complete the task at hand sooner—whether it be to speed up queries or to help speed up the creation of an index. To be able to assess the proper DOP to place on an object, you should know the access patterns of the data. If less information is known about the objects, the more conservative the DOP should be. Placing a high DOP on a series of objects can hurt performance just as easily as it can help, so enabling DOP on objects needs to be done with careful planning and consideration.
Tip If automatic DOP is enabled and configured properly (PARALLEL_DEGREE_POLICY=AUTO), then the parallelism that you set on objects is ignored, and the optimizer chooses the degree of parallelism to be used. See Recipe 15-10 for details on enabling automatic DOP.
15-3. Enabling Parallelism for an Existing Object
Problem
You have a series of slow-running queries accessing a set of existing database tables, and you want to take steps to reduce the execution time of the queries.
Solution
Setting a higher DOP on an existing table or index is an easy way to have a more consistent and fixed method of enabling multiple processes on tables and indexes. Setting the DOP for tables or indexes is done within DDL commands. You can change the DOP on a table or index by using the ALTER statement. For instance, if you have an existing table that needs to have the DOP changed to accommodate user queries that want to take advantage of multiple processes, they can be added easily to the table, which takes effect immediately. The following example alters the default DOP for a table:
ALTER TABLE EMP
PARALLEL(DEGREE 4);
If, after a time, you want to reset the DOP on your table, you can also do that with the ALTER statement. See the following two examples on how to reset the DOP for a table:
ALTER TABLE EMP
PARALLEL(DEGREE 1);
ALTER TABLE EMP
NOPARALLEL;
If you have an already existing index that you think will benefit from a higher DOP, it can also easily be changed. As with tables, the change takes effect immediately. The following example shows how to change the default DOP for an index:
ALTER INDEX EMP_I1
PARALLEL(DEGREE 4);
As with tables, you can reset the DOP on an index in either of the following two ways:
ALTER INDEX EMP_I4
PARALLEL(DEGREE 1);
ALTER INDEX EMP_I4
NOPARALLEL;
How It Works
Increasing the DOP on an existing object is a sign that you already have a performance issue for queries accessing tables within your database. Monitoring parallelism performance is a key factor in knowing whether the DOP set for an object or set of objects is appropriate. Examine data in V$PQ_TQSTAT to assist in determining the DOP that has been used, or examine data in V$SYSSTAT to assist in determining the extent that parallelism is being used on your database. Refer to Recipe 15-12 for some examples of using these data dictionary views.
15-4. Implementing Parallel DML
Problem
You want to induce parallelism when performing DML operations (INSERT, UPDATE, MERGE, DELETE) in order to speed performance and reduce transaction time.
Solution
If operating within a data warehouse environment or an environment with large tables that require a high volume of bulk transactions, parallel DML can help speed up processing and reduce the time it takes to perform these operations. Parallel DML is disabled by default on a database and must be explicitly enabled with the following statement:
ALTER SESSION ENABLE PARALLEL DML;
By specifying the foregoing statement, it truly enables parallel DML to be possible in a session but does not guarantee it. Parallel DML operations will occur only under certain conditions:
You may desire, in certain circumstances, to force parallel behavior, regardless of the parallel degree you have placed on an object or regardless of any hints you’ve placed in your DML. So, alternatively, you can force parallel DML with one of the following statements:
ALTER SESSION FORCE PARALLEL DML;
ALTER SESSION FORCE PARALLEL DML PARALLEL 4;
From the foregoing examples, it is worth noting that it is always best to specify an explicit DOP. You can see in the first example that no DOP is shown, while the second one shows a DOP of 4. If you omit DOP, Oracle automatically calculates the DOP and, often, can consume a large amount of system resources, without regard to other processes running on the database. Because of the unpredictability of the DOP Oracle automatically derives, it is best to always specify the DOP to be used.
As a general rule, use the FORCE option for exception-based DML processing. It is really an override and should not be regarded as a best practice to use for regularly scheduled processing. It is best used sparingly and can help with occasional large DML operations.
How It Works
Parallel DML can work for any DML operation—INSERT, UPDATE, MERGE, and DELETE. The rules vary slightly depending on which DML operation you are running. If you want to run an INSERT statement in parallel, for instance, first enable parallelism for your session and then execute your INSERT statement with the appropriate mechanism in order for the DML to run in parallel:
ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ PARALLEL(DEPT,4) */ INTO DEPT
SELECT /*+ PARALLEL(DEPT_COPY,4) */ * FROM DEPT_COPY;
With the foregoing statement, we put a parallel hint into the INSERT statement and also put a parallel HINT into the SELECT portion of the statement. It’s important to remember that even if parallelism is in effect for your DML statement, it does not directly impact any parallelism on a related query within the same statement. For instance, the following statement’s DML operation can run in parallel, but the corresponding SELECT statement will run in serial mode because no parallelism is specified on the query itself.
ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ PARALLEL(DEPT,4) */ INTO DEPT
SELECT * FROM DEPT_COPY;
To take full advantage of parallel capabilities, try to parallelize all portions of a statement. If you parallelize the INSERT but not the SELECT, the SELECT portion becomes a bottleneck for the INSERT, and the INSERT performance suffers.
Parallel DML operations can also occur on UPDATE, MERGE, and DELETE statements. Let’s say your company was generous and decided to give everyone in the accounting department a 1 percent raise:
UPDATE /*+ PARALLEL(EMP,4) */ EMP
SET SAL = SAL*1.01
WHERE DEPTNO=10;
Then, after a period of months, your company decides to lay off those employees they gave raises to in accounting:
DELETE /*+ PARALLEL(EMP,4) */ FROM EMP
WHERE DEPTNO=10;
Another way to parallelize a DML transaction within your database is to use the DBMS_PARALLEL_EXECUTE PL/SQL package. Although more complex to configure, there are some key advantages of using this package to run your parallelized transactions:
The obvious benefits of using the DBMS_PARALLEL_EXECUTE package are greater control over how large transactions are run, increased functionality, and more efficient use of database resources. The key trade-off with using this package is it is simply more complex to configure, set up, and run—but may be well worth it when processing large volumes of data.
Tip You must execute the ALTER SESSION ENABLE | FORCE PARALLEL DML statement in order for parallel DML to occur for your transaction.
Restrictions on Parallel DML
There are plenty of restrictions in using parallel DML, and you need to understand that even if dealing with a large volume of data, parallel DML may not be possible in certain circumstances:
Degree of Parallelism
Once you submit a parallelized DML operation for execution, Oracle determines, based on a set of precedence rules, what DOP will be used for the entire statement being submitted. It is important to understand these rules so you get the desired DOP you are expecting for your transaction.
For DML transactions, Oracle applies the following base rules of precedence to determine DOP:
After choosing the appropriate DOP for the insert and query portions of the statement, the query is executed. Note that the DOP chosen for each portion of the statement can be different.
Other Considerations
Using parallel DML can be complex because there are many permutations of possibilities of the type of objects involved: whether they are partitioned, the DOP specified on the objects, the hints specified in statements, and the parallel parameter settings, just to name a few.
Here are some other factors that need to be considered when using parallel DML:
15-5. Creating Tables in Parallel
Problem
You need to quickly create a table from an existing large table and want to employ the use of multiple processes to help speed up the creation of the table.
Solution
If you are administering very large databases (VLDBs) or have to rebuild a large table, parallel DDL is fast and has advantages over running parallel DML. Speed is the biggest factor in choosing to use parallel DDL to create a table from an existing large table. Within your specific DDL command, there is a PARALLEL clause that determines whether operations are to be performed in parallel. This is done by using the CREATE TABLE ... AS SELECT operation:
CREATE TABLE EMP_COPY
PARALLEL(DEGREE 4)
AS
SELECT * FROM EMP;
------------------------------------------------------------------------
| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | | | | |
| 1 | PX COORDINATOR | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | Q1,00 | P->S | QC (RAND) |
| 3 | LOAD AS SELECT | EMP_COPY | Q1,00 | PCWP | |
| 4 | PX BLOCK ITERATOR | | Q1,00 | PCWC | |
| 5 | TABLE ACCESS FULL | EMP | Q1,00 | PCWP | |
------------------------------------------------------------------------
How It Works
The reason parallel DDL is popular is that it is a fast way to perform operations on a large amount of data. The work is divided up in several pieces and done concurrently. Let’s say you just bought a new house and are in the process of moving. If you are loading a large moving truck with boxes, it will simply be faster with four people loading rather than one. Moreover, parallel DDL is an attractive way to perform DML-type operations under the covers of DDL commands.
The most common reasons to use CREATE TABLE ... AS SELECT include the following:
Some of the foregoing operations could also be strictly handled with parallel DML, but using parallel DDL has a distinct advantage over parallel DML. Since DDL operations cannot be rolled back, normal transaction undo is not generated for these operations, and it is simply a more efficient operation.
The DOP for a parallel DDL operation is determined by the object DOP. This also includes the query portion of the statement. If you choose, you can override the DOP of the objects by issuing one of the following commands:
ALTER SESSION FORCE PARALLEL DDL;
ALTER SESSION FORCE PARALLEL DDL PARALLEL 4;
As with parallel DML, it is always best to explicitly set DOP; otherwise, it is unpredictable how much system resources your statement will consume.
If you have a large table from which you need to delete many rows, consider using CREATE TABLE ... AS SELECT rather than using a DML DELETE statement. Deleting rows is an expensive operation. In large data warehouse environments, in scenarios where a large volume of rows needs to be deleted, the cost and time of doing the delete can quickly become unmanageable. Because of the nature of delete, it is very resource-intensive for the database as far as the amount of redo and undo generation it takes to perform the operation. One good rule of thumb to use is that if you are deleting as little as 5–10 percent of the rows of a large table, it can be simply faster to create a new table with all the rows you want to keep.
Here is an example where we are deleting about 20 percent of the rows from our EMP table of 1,234,568 rows:
delete /*+ parallel(emp,4) */ from emp
where empno > 1000000
SQL> /
234568 rows deleted.
Elapsed: 00:00:09.94
--------------------------------------------------------------------------
| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | | | |
| 1 | PX COORDINATOR | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | Q1,01 | P->S | QC (RAND) |
| 3 | INDEX MAINTENANCE | EMP | Q1,01 | PCWP | |
| 4 | PX RECEIVE | | Q1,01 | PCWP | |
| 5 | PX SEND RANGE | :TQ10000 | Q1,00 | P->P | RANGE |
| 6 | DELETE | EMP | Q1,00 | PCWP | |
| 7 | PX BLOCK ITERATOR | | Q1,00 | PCWC | |
| 8 | TABLE ACCESS FULL| EMP | Q1,00 | PCWP | |
--------------------------------------------------------------------------
This delete took 9.94 seconds to run. If we now run a CREATE TABLE ... AS SELECT statement to achieve the same result, we can see the difference in performance.
create table emp_ctas_new2
parallel(degree 4)
nologging
as select /*+ parallel(a,4) */ * from emp_ctas
where empno <= 1000000
SQL> /
Elapsed: 00:00:01.70
-----------------------------------------------------------------------------
|Id |Operation |Name | TQ |IN-OUT|PQ Distrib|
-----------------------------------------------------------------------------
| 0 |CREATE TABLE STATEMENT | | | | |
| 1 | PX COORDINATOR | | | | |
| 2 | PX SEND QC (RANDOM) |:TQ10001 |Q1,01| P->S |QC (RAND) |
| 3 | LOAD AS SELECT |EMP_CTAS_NEW|Q1,01| PCWP | |
| 4 | PX RECEIVE | |Q1,01| PCWP | |
| 5 | PX SEND ROUND-ROBIN |:TQ10000 | | S->P |RND-ROBIN |
| 6 | TABLE ACCESS BY INDEX ROWID|EMP_CTAS | | | |
| 7 | INDEX RANGE SCAN |EMP_CTAS_PK | | | |
-----------------------------------------------------------------------------
Creating the table took 1.7 seconds, which is more than five times faster than performing the same operation with a DELETE statement. If you have indexes on the table, however, you need to consider that as a factor before choosing this method because if you re-create a table, you must also re-create the associated indexes for that table. It’s still likely to be faster, however, because you can re-create any indexes in parallel as well.
Keep in mind that, even though the foregoing example uses parallel DDL on these statements, this concept holds true even if you are running in serial mode. When you need to delete a large number of rows from a table, the CREATE TABLE ... AS SELECT can be compared favorably to DELETE with parallel-executed DDL or nonparallel, serial-executed DDL.
One potential drawback of creating tables in parallel is that the space allocations for these operations may leave the table more fragmented than if you created the table serially. This is a trade-off that should be considered when creating tables in parallel. The DOP that is specified in the operation spawns that number of parallel threads, and one extent is allocated for each thread. So, if you have specified a DOP of 4 for your parallel operation, there will be a minimum of four extents allocated for the operation. Depending on the MINIMUM EXTENT size for the tablespace, Oracle does attempt to trim unused space at the end of the operation. You should expect, though, that parallel create table operations are simply less space-efficient than operations run serially.
15-6. Creating Indexes in Parallel
Problem
You need to create indexes for a large table as quickly as possible and want to employ the use of multiple processes to help speed up the index creation.
Solution
Any time you have a large table, it is a good idea to always create any associated index for that table using parallel DDL, even if you want the DOP on the index to be nonparallelized for queries. The major benefit of creating an index in parallel is that it simply takes much less time to create the index. It always makes sense to create an index for a large table in parallel and then optionally choose to reset the DOP used for queries after the create operation is complete. In the following example, we are creating the index with a DOP of 4, which will be used during the process of creating the index:
CREATE INDEX EMP_COPY_I1
ON EMP_COPY (HIREDATE)
PARALLEL(DEGREE 4);
Then, after the index has been created, we can choose to reset the DOP to a different value for use by queries, using either of the following examples:
ALTER INDEX EMP_COPY_I1 NOPARALLEL;
ALTER INDEX EMP_COPY_I1 PARALLEL(DEGREE 1);
How It Works
The primary reason you want to run parallel DDL on an index is to either create or rebuild a large, existing index. Some of the reasons you may have to do this include the following:
As with tables, if you want to bypass the parallelism specified on the index, you can “force” the issue by running one of the following commands:
ALTER SESSION FORCE PARALLEL DDL;
ALTER SESSION FORCE PARALLEL DDL PARALLEL 4;
15-7. Rebuilding Indexes in Parallel
Problem
You have an existing index that needs to be rebuilt quickly, and you want to use multiple processes to speed up the index rebuild process.
Solution
At some point you may need to rebuild an index, for many of the same reasons as when you want to re-create an index. To rebuild an index in parallel, use the ALTER INDEX command:
ALTER INDEX EMP_NAME_IX
REBUILD
PARALLEL(DEGREE 4);
-------------------------------------------------------------------
|Id |Operation |Name | TQ |IN-OUT|PQ Distrib|
-------------------------------------------------------------------
| 0|ALTER INDEX STATEMENT | | | | |
| 1| PX COORDINATOR | | | | |
| 2| PX SEND QC (ORDER) |:TQ10001 |Q1,01| P->S |QC (ORDER)|
| 3| INDEX BUILD NON UNIQUE|EMP_NAME_IX|Q1,01| PCWP | |
| 4| SORT CREATE INDEX | |Q1,01| PCWP | |
| 5| PX RECEIVE | |Q1,01| PCWP | |
| 6| PX SEND RANGE |:TQ10000 |Q1,00| P->P |RANGE |
| 7| PX BLOCK ITERATOR | |Q1,00| PCWC | |
| 8| TABLE ACCESS FULL|EMPLOYEES |Q1,00| PCWP | |
-------------------------------------------------------------------
If you need to rebuild a partition of a large local index, you can also use parallelism to perform this operation. See the following example:
ALTER INDEX emppart_i1
REBUILD PARTITION emppart2001_p
PARALLEL(DEGREE 4);
----------------------------------------------------------------------------
|Id|Operation |Name |Pstart|Pstop |IN-OUT|PQ Distrib|
----------------------------------------------------------------------------
| 0|ALTER INDEX STATEMENT | | | | | |
| 1| PX COORDINATOR | | | | | |
| 2| PX SEND QC (ORDER) |:TQ10001 | | | P->S |QC ORDER) |
| 3| INDEX BUILD NON UNIQUE |EMPPART_I1| | | PCWP | |
| 4| SORT CREATE INDEX | | | | PCWP | |
| 5| PX RECEIVE | | | | PCWP | |
| 6| PX SEND RANGE |:TQ10000 | | | P->P |RANGE |
| 7| PX BLOCK ITERATOR | | 2 | 2| PCWC | |
| 8| INDEX FAST FULL SCAN|EMPPART_I1| 2 | 2| PCWP | |
----------------------------------------------------------------------------
How It Works
Rebuilding an index has a key advantage, as well as a key disadvantage, over re-creating an index from scratch. The advantage of rebuilding an index is that the existing index is in place until the rebuild operation is complete, so it can therefore be used by queries that are run concurrently with the rebuild process. The main disadvantage of the index rebuild process is that you will need space for both indexes, which is required during the rebuild process. Some of the key reasons to rebuild an index include the following:
When indexes are created or re-created in parallel, keep in mind that the degree of parallelism stays on the index after the creation is complete. For instance, in the foregoing example of rebuilding the EMP_NAME_IX index, we can see that when looking at the data dictionary before and after the rebuild, the degree of parallelism stays on the index. The impact of this is when users issue queries that use the index, it can spawn parallel query slaves, which may not be desirable. Usually, you will want to reset the degree of parallelism to its original value on the index after creating it. To demonstrate, see the following example, which does the following:
Here is the example that does all of the foregoing:
select table_name, index_name , degree
from user_indexes
where index_name = 'EMP_NAME_IX';
TABLE_NAME INDEX_NAME DEGREE
---------- ------------ ---------
EMPLOYEES EMP_NAME_IX 1
ALTER INDEX EMP_NAME_IX
REBUILD
PARALLEL(DEGREE 4);
Index altered.
select table_name, index_name , degree
from user_indexes
where index_name = 'EMP_NAME_IX';
TABLE_NAME INDEX_NAME DEGREE
---------- ------------ ---------
EMPLOYEES EMP_NAME_IX 4
ALTER INDEX EMP_NAME_IX PARALLEL 1;
Index altered.
select table_name, index_name , degree
from user_indexes
where index_name = 'EMP_NAME_IX';
TABLE_NAME INDEX_NAME DEGREE
---------- ------------ ---------
EMPLOYEES EMP_NAME_IX 1
15-8. Moving Partitions in Parallel
Problem
You need to move a table partition to a different tablespace and want to employ the use of multiple processes to accomplish this task.
Solution
Let’s say you want to move a table partition to another tablespace. For instance, you’ve created a tablespace on slower, cheaper storage and you want to move older data there in order to reduce the overall cost of storage on your database. To alter a table to rebuild a partition in parallel, you would issue a command such as the one here:
ALTER TABLE EMPPART
MOVE PARTITION SYS_P820
TABLESPACE USERS
PARALLEL(DEGREE 4);
If there are indexes associated with the partition you are moving, you can also add the UPDATE INDEXES clause, in which case all indexes will be rebuilt as well:
ALTER TABLE EMPPART
MOVE PARTITION SYS_P820
TABLESPACE USERS
PARALLEL(DEGREE 4)
UPDATE INDEXES;
How It Works
The ALTER TABLE statement to move a partition is an easy, efficient way to move data around for a partitioned table. As with some of the other parallel DDL operations shown within this chapter, there are several reasons to move a table partition to a different tablespace:
Table partitioning is often done to store historical data. Over time, partition maintenance often needs to occur for partitioned tables. By enabling the use of parallelism when moving partitions for a table within your database, it can simply be done faster. With maintenance windows shrinking and data access needs growing, this helps perform necessary partition movements faster, while reducing downtime for your database tables.
If there are indexes on the table and you omit the UPDATE INDEXES clause, any affected indexes or index partitions will be left in an unusable state. If the underlying indexes are local partitioned indexes, you only have to rebuild the index partitions that correspond to the table partitions that were moved. If the underlying indexes are either globally partitioned or nonpartitioned, you will have to rebuild all indexes on the table, even if the majority of the partitions are unaffected by the partition move. Because of this impact, it is recommended, where possible, to always use local indexes on partitioned tables.
For our foregoing example with omitting the UPDATE INDEXES clause, the EMPPART table had three indexes. We can see from the data dictionary that the index partitions are now unusable:
SELECT index_name, partition_name, status
FROM dba_ind_partitions
WHERE index_owner = 'HR'
AND status = 'UNUSABLE';
INDEX_NAME PARTITION_NAME STATUS
--------------- --------------- --------
EMPPART_I3 SYS_P804 UNUSABLE
EMPPART_I1 SYS_P1100 UNUSABLE
EMPPART_I2 SYS_P1042 UNUSABLE
We then can issue an index rebuild statement to rebuild each index. An example of rebuilding one of the indexes is shown here:
ALTER INDEX EMPPART_I1 REBUILD PARTITION SYS_P1100
PARALLEL(DEGREE 4);
Index altered.
Then, we can validate that the index is again in a usable state by rerunning the foregoing SELECT statement from DBA_IND_PARTITIONS:
INDEX_NAME PARTITION_NAME STATUS
--------------- --------------- --------
EMPPART_I3 SYS_P804 USABLE
EMPPART_I1 SYS_P1100 USABLE
EMPPART_I2 SYS_P1042 USABLE
Keep in mind that table partition names and index partition names may not match and correspond to each other, and you may have to check the data dictionary to obtain the correct index partition names. This is especially true if you are using interval partitioning.
Yet another way to rebuild any locally partitioned indexes if you omitted the UPDATE INDEXES clause is to rebuild all index partitions using one command. See the following example:
ALTER TABLE EMPPART
MODIFY PARTITION SYS_P820
REBUILD UNUSABLE LOCAL INDEXES;
There may be valid reasons to defer rebuilding indexes. You may simply want to split the work into individual pieces to more easily track where you are in completing the entire operation. For very large partitions with many indexes, it may be more prudent to move the data in one operation and then rebuild each index in separate operations. Keep in mind that the UPDATE INDEXES clause rebuilds each index serially, one at a time. There may be a performance boost by simply running separate, concurrent ALTER INDEX REBUILD statements when rebuilding all the indexes. Each environment is different, the volume is different, and the number and characteristics of the indexes are different, so test the alternatives and implement what is best for your environment.
15-9. Splitting Partitions in Parallel
Problem
You have a partition with a large amount of data and want to split that larger partition into two or more smaller partitions.
Solution
As a DBA, at times the need arises to split partitions, and this operation can also be done in parallel. For instance, let’s say you have a partitioned table that has a default high-end partition with a large amount of data and you want to split that data into multiple partitions. In cases such as these, you can split that default partition in parallel to speed up the partition split process. Here is an example of splitting a partition using parallelism:
ALTER TABLE EMP
SPLIT PARTITION PMAX at ('2011-04-01') INTO
(PARTITION P4 TABLESPACE EMP_S,
PARTITION PMAX TABLESPACE EMP_S)
PARALLEL(DEGREE 4);
How It Works
Adding parallelism can speed up the process of splitting a partition with a large amount of data. Here is an example of a partition with more than 16 million rows, and enabling parallelism for the split operation reduced the time of the split operation. First, the split was performed in parallel:
ALTER TABLE EMPPART SPLIT PARTITION emppart2000_p AT ('2000-01-01')
INTO (PARTITION emppart1990_p, PARTITION emppart2000_p)
PARALLEL(DEGREE 4);
Table altered.
Elapsed: 00:00:53.61
The same split was then performed on a similar table to see the performance impact of doing the split serially:
ALTER TABLE EMPPART2 SPLIT PARTITION emppart2000_p AT ('2000-01-01')
INTO (PARTITION emppart1990_p, PARTITION emppart2000_p);
Table altered.
Elapsed: 00:01:05.36
Again, keep in mind that for parallel operations, an extent needs to be allocated for each parallel operation. For the foregoing partition split operation, the table that used parallelism has a significantly higher number of extents allocated:
SELECT segment_name, partition_name, extents
FROM dba_segments
WHERE segment_name LIKE '%EMP%'
AND owner = 'SCOTT'
ORDER BY 2,1;
SEGMENT_NAME PARTITION_NAME EXTENTS
-------------------- -------------------- ----------
EMPPART EMPPART1990_P 335
EMPPART2 EMPPART1990_P 121
EMPPART EMPPART2000_P 338
EMPPART2 EMPPART2000_P 125
Remember that splitting partitions, like moving partitions, will render affected indexes or partitions of indexes on the associated table unusable. See Recipe 15-8 for an example on checking and rebuilding an index after a partition-level operation.
15-10. Enabling Automatic Degree of Parallelism
Problem
You want to allow Oracle to automatically determine whether a SQL statement should execute in parallel and what DOP it should use.
Solution
Set PARALLEL_DEGREE_POLICY to AUTO to allow Oracle to determine whether a statement runs in parallel. You can set this either at the system level or at the session level. To set it for all SQL statements, run the following command:
alter system set parallel_degree_policy=auto scope=both;
To set it for a single SQL statement, you can alter your session to enable automatic DOP:
alter session set parallel_degree_policy=auto;
One prerequisite of using automatic DOP is to run the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure. This procedure needs to be run only once and gathers information on the hardware characteristics of your system. Before running this procedure, ensure that the parameter disk_asynch_io is set to true. You can also run the following query to ensure that asynchronous I/O is enabled appropriately:
SELECT name,asynch_io
FROM v$datafile f,v$iostat_file i
WHERE f.file#=i.file_no
AND (filetype_name='Data File' or filetype_name='Temp File'),
NAME ASYNCH_IO
----------------------------------------------- ---------
+DATA/orcl1/datafile/system.3522.827962061 ASYNC_ON
+DATA/orcl1/datafile/system.3522.827962061 ASYNC_ON
+DATA/orcl1/datafile/sysaux.3521.827962063 ASYNC_ON
+DATA/orcl1/datafile/undotbs1.3520.827962063 ASYNC_ON
+DATA/orcl1/datafile/users.3514.827962063 ASYNC_ON
+DATA/orcl1/datafile/example.3571.827962145 ASYNC_ON
+DATA/orcl1/datafile/undotbs2.3570.827962239 ASYNC_ON
The DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure has two input parameters and three output parameters. The input parameters are as follows:
After verifying the proper values for the input parameters for the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure, you can run it, as shown in the following example. Note that the three output parameters can be displayed by creating variables within a PL/SQL block when running the procedure:
1* SET SERVEROUTPUT ON
2 DECLARE
3 latc INTEGER;
4 iops INTEGER;
5 mbps INTEGER;
6 BEGIN
7 DBMS_RESOURCE_MANAGER.CALIBRATE_IO (64, 10, iops, mbps, latc);
8 DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops);
9 DBMS_OUTPUT.PUT_LINE ('latency = ' || latc);
10 dbms_output.put_line('max_mbps = ' || mbps);
11* END;
23:51:58 SQL> /
max_iops = 62165
latency = 0
max_mbps = 3368
PL/SQL procedure successfully completed.
Elapsed: 00:15:59.32
How It Works
By default, Oracle executes a statement in parallel only when the DOP is set for the table or the parallel hint is used. You can instruct Oracle to automatically consider using parallelism for a statement via the PARALLEL_DEGREE_POLICY initialization parameter. Oracle takes the following steps when a SQL statement is issued when PARALLEL_DEGREE_POLICY is set to AUTO:
The PARALLEL_DEGREE_POLICY parameter can be set to three different values: AUTO, LIMITED, and MANUAL. MANUAL is the default and turns off the automatic degree of parallelism. LIMITED instructs Oracle to use automatic DOP only on those objects with parallelism explicitly set. The AUTO setting gives Oracle full control over setting automatic DOP.
Once all configuration is complete, you can validate whether automatic DOP is being derived by running a query against a table in your database. The following example shows an explain plan for a query where DOP has been automatically derived:
select /*+ parallel */ count(*) from employees_big;
-------------------------------------------------------------------
|Id|Operation |Name | TQ |IN-OUT|PQ Distrib|
-------------------------------------------------------------------
| 0|SELECT STATEMENT | | | | |
| 1| SORT AGGREGATE | | | | |
| 2| PX COORDINATOR | | | | |
| 3| PX SEND QC (RANDOM) |:TQ10000 | Q1,00| P->S |QC (RAND) |
| 4| SORT AGGREGATE | | Q1,00| PCWP | |
| 5| PX BLOCK ITERATOR | | Q1,00| PCWC | |
| 6| TABLE ACCESS FULL|EMPLOYEES_BIG| Q1,00| PCWP | |
-------------------------------------------------------------------
Note
-----
- automatic DOP: Computed Degree of Parallelism is 10
- parallel scans affinitized
If you have omitted running the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure, you will see the following in the note section of the explain plan:
Note
-----
- automatic DOP: skipped because of IO calibrate statistics are missing
With automatic DOP, there is a shift away from downgrading parallel operations based on available parallel slaves to using a new feature in Oracle 11g R2 called statement queuing. With statement queuing, statements will not be downgraded and will always be run with the query’s specified DOP. If there are not enough slaves to meet that DOP, the statement will be queued until that DOP is available. While it may appear that queuing could actually degrade the performance of queries in your database because some statements may have to wait for the specified DOP to be available, it is designed to improve the overall parallelism performance on the database because running fewer statements with the specified DOP will outperform running more statements, some with a downgraded DOP. There are many other parameters that can be set related to parallelism. Table 15-3 lists other parallel parameters you may want to consider for your application.
Table 15-3. Oracle Parallelism-Related Initialization Parameters
Parameter |
Description |
---|---|
parallel_degree_limit |
Automatic DOP is determined either by the number of CPUs on the system, by the I/O requirements of a given query, or by a set integer value. To use the IO value, you must run the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure. |
parallel_degree_policy |
Determines whether automatic DOP, statement queuing, and in-memory query execution are enabled. The MANUAL setting disables automatic DOP. The AUTO setting gives Oracle full control over setting automatic DOP. The LIMITED value exercises automatic DOP only on those objects with parallelism explicitly set. |
parallel_max_servers |
This specifies the maximum number of parallel processes (from 0 to 3600) for a database instance. |
parallel_min_servers |
This specifies the minimum number of parallel processes for a database instance. Setting to a nonzero value keeps that minimum number of parallel processes alive and ready to accept new requests. This saves start-up costs of these processes but costs more in memory utilization. |
parallel_servers_target |
Setting this parameter tells the database how many parallel processes can run at one time before query statements requiring parallel execution begin to be queued for execution. |
Another benefit of setting automatic DOP is something called in-memory parallel execution or in-memory parallel query. If you’ve set PARALLEL_DEGREE_POLICY to AUTO, this is automatically in effect for your database. In the past, parallel processing reads directly from disk and bypasses the buffer cache during processing. With in-memory parallel execution, Oracle may decide to load data blocks into the buffer cache for processing. By doing this, processing a parallel query may simply be much faster. The decision to store blocks in memory is internal to Oracle and is based on such factors as the size of the object, the frequency of change for the object, the frequency of access, and the size of the buffer cache. In-memory parallel execution can occur for blocks of tables, indexes, and partitioned objects.
Also, if you are running on a Real Application Cluster (RAC) environment, the data blocks and the resulting parallel processing load are spread across the instances of your RAC environment. Furthermore, Oracle implements the notion of node affinity when splitting up the blocks between instances, and the blocks are not shared with other processes occurring within your RAC environment. This makes the parallel processing very efficient.
15-11. Examining Parallel Explain Plans
Problem
You want to understand how to read parallel explain plans.
Solution
When reading your explain plan, interpret it from the innermost to outermost levels and from the bottom going up. For instance, here again is our parallel execution plan from using a parallel hint against the EMP table:
select /*+ parallel(emp,4) */ * from emp;
----------------------------------------------------------------------
| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | PX COORDINATOR | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | Q1,00 | PCWC | |
| 4 | TABLE ACCESS FULL| EMP | Q1,00 | PCWP | |
----------------------------------------------------------------------
Looking at the foregoing plan starting at the bottom, we are doing a full table scan of the EMP table. The PX BLOCK INTERATOR just above the table scan is responsible for taking that request for a full table scan and breaking it up into chunks based on the DOP specified. The PX SEND processes pass the data to the consuming processes. Finally, the PX COORDINATOR is the process used by the query coordinator to receive the data from a given parallel process and return to the SELECT statement.
If you look at the IN-OUT column of your explain plan, you can see the execution flow of the operation and determine whether there are any bottlenecks or any parts of the plan that are not parallelized, which may cause a decrease in the expected performance. As shown in Table 15-5, the operation that normally shows that there may be a bottleneck is the PARALLEL_FROM_SERIAL operation because it means parallel processes are being spawned from a serial operation, which denotes an inefficiency in the process.
For instance, say you have a series of employee tables by region of the country, and a user is performing a query to get information from several of these tables. However, the makeup of the query is such that a bottleneck occurs.
select /*+ parallel(e,4) */ e.ename, d.dname
from emp e join dept d using (deptno);
----------------------------------------------------------------------
|Id |Operation |Name | TQ |IN-OUT|PQ Distrib |
----------------------------------------------------------------------
| 0|SELECT STATEMENT | | | | |
| 1| PX COORDINATOR | | | | |
| 2| PX SEND QC (RANDOM) |:TQ10002|Q1,02| P->S |QC (RAND) |
| 3| MERGE JOIN | |Q1,02| PCWP | |
| 4| SORT JOIN | |Q1,02| PCWP | |
| 5| BUFFER SORT | |Q1,02| PCWC | |
| 6| PX RECEIVE | |Q1,02| PCWP | |
| 7| PX SEND HYBRID HASH |:TQ10000| | S->P |HYBRID HASH|
| 8| STATISTICS COLLECTOR | | | | |
| 9| TABLE ACCESS BY INDEX|DEPT | | | |
ROWID BATCHED
| 10| INDEX FULL SCAN |PK_DEPT | | | |
| 11| SORT JOIN | |Q1,02| PCWP | |
| 12| PX RECEIVE | |Q1,02| PCWP | |
| 13| PX SEND HYBRID HASH |:TQ10001|Q1,01| P->P |HYBRID HASH|
| 14| PX BLOCK ITERATOR | |Q1,01| PCWC | |
| 15| TABLE ACCESS FULL |EMP |Q1,01| PCWP | |
----------------------------------------------------------------------
You can tell from the foregoing explain plan output that the PX SEND process from the DEPT table is serial and is sending data back to be fed into a parallel process. This represents a bottleneck in this query. If we change all aspects of the query to run in parallel, we see an improvement in the execution plan and that both PX SEND processes are parallelized:
select /*+ parallel(e,4) parallel(d,4) */ e.ename, d.dname
from emp e join dept d using (deptno);
------------------------------------------------------------------
|Id |Operation |Name | TQ |IN-OUT|PQ Distrib |
------------------------------------------------------------------
| 0|SELECT STATEMENT | | | | |
| 1| PX COORDINATOR | | | | |
| 2| PX SEND QC (RANDOM) |:TQ10002|Q1,02| P->S |QC (RAND) |
| 3| HASH JOIN BUFFERED | |Q1,02| PCWP | |
| 4| PX RECEIVE | |Q1,02| PCWP | |
| 5| PX SEND HYBRID HASH |:TQ10000|Q1,00| P->P |HYBRID HASH|
| 6| STATISTICS COLLECTOR| |Q1,00| PCWC | |
| 7| PX BLOCK ITERATOR | |Q1,00| PCWC | |
| 8| TABLE ACCESS FULL |DEPT |Q1,00| PCWP | |
| 9| PX RECEIVE | |Q1,02| PCWP | |
| 10| PX SEND HYBRID HASH |:TQ10001|Q1,01| P->P |HYBRID HASH|
| 11| PX BLOCK ITERATOR | |Q1,01| PCWC | |
| 12| TABLE ACCESS FULL |EMP_BIG |Q1,01| PCWP | |
------------------------------------------------------------------
How It Works
Tables 15-4 and 15-5 delineate the fundamental information that can be used to determine the execution plan for a parallel operation. To understand the basics of interpreting your explain plan output, you should be aware of two aspects:
Table 15-4. Parallel Execution Steps
Operation |
Description |
---|---|
PX BLOCK ITERATOR |
In this step, the work to be done is split into pieces, which in turn will be done by the parallel slaves specified. |
PX COORDINATOR |
Much like a project manager, this process coordinates and schedules the parallel slaves’ work, as well as being responsible for getting data back from the parallel slaves once they complete their tasks. |
PX RECEIVE |
These processes are consumer slaves of the data written via the producers of the PX SEND processes. |
PX SEND |
These processes are the producer slaves of getting a portion of the data and writing to areas to be read by the consumers. |
Table 15-5. Parallel Operations
PLAN_TABLE Operation (Other_Tag Column) |
Explain Plan In/Out Tag |
Description |
---|---|---|
PARALLEL_FROM_SERIAL |
S->P |
This denotes that a serial process with the operation is passing information to a parallel process. This is a sign of a bottleneck and an area of potential improvement. |
PARALLEL_TO_PARALLEL |
P->P |
This means that both the producer and the consumer are parallelized. This is the most desired execution flow. |
PARALLEL_TO_SERIAL |
P->S |
This step, although hinting at a bottleneck, is fairly normal. It is toward the top (that is, the end) of an operation and denotes that results from a parallel process are being fed to the query coordinator at the end of the process. |
PARALLEL_COMBINED_WITH PARENT |
PCWP |
This means a step is being combined with its parent step and run simultaneously (for example, a sort/merge operation). |
PARALLEL_COMBINED_WITH CHILD |
PCWC |
This is the same as PCWP, except it means that a child step/slave process is being run simultaneously with the child process from the execution plan. |
The execution steps are the aspects of a parallelized plan, while the operations that occur within your parallel execution plan can help you determine whether you have an optimized plan or one that needs tuning and improvement.
As with nonparallel operations, the explain plan utility is a useful tool in determining what the optimizer is planning to do to complete the task at hand. When executing operations in parallel, there are specific aspects of the explain plan related to parallelism. These are important to understand so you can determine whether the operation is running as optimized as possible. One of the key aspects of analyzing a parallel explain plan is to determine whether there are any aspects of the plan that are being run serially because this bottleneck can reduce the overall performance of a given operation. That it is why it is critical to understand aspects of the explain plan that relate to parallel operations, with the end goal being that all aspects of the operation are parallelized.
15-12. Monitoring Parallel Operations
Problem
You want to quickly get information regarding the performance of your parallel operations from the database.
Solution
If you look at the V$SYSSTAT view, which gives information on system-level statistics in your database, including parallelism-related statistics, you can see, at a quick glance, if the DOP requested was actually used and if any of those operations were downgraded:
SELECT name , value
FROM v$sysstat
WHERE name LIKE '%Parallel%';
---------------------------------------------------------------- ----------
NAME VALUE
---------------------------------------------------------------- ----------
Parallel operations not downgraded 1069711
Parallel operations downgraded to serial 249
Parallel operations downgraded 75 to 99 pct 0
Parallel operations downgraded 50 to 75 pct 0
Parallel operations downgraded 25 to 50 pct 8
Parallel operations downgraded 1 to 25 pct 0
6 rows selected.
If you look at the V$PQ_SYSSTAT view, you can see parallel slave activity on your database. From looking at these statistics, you can quickly see whether parallelism is properly configured on your database just by looking at the parallel slave activity. For instance, if you see that the Servers Shutdown and Servers Started values are high, it can be an indication that the PARALLEL_MIN_SERVERS parameter is set too low because there is overhead occurring to consistently start and stop parallel processes.
SELECT * FROM v$pq_sysstat
WHERE statistic LIKE 'Server%';
STATISTIC VALUE
------------------------------ ----------
Servers Busy 0
Servers Idle 36
Servers Highwater 218
Server Sessions 4965880
Servers Started 33573
Servers Shutdown 33497
Servers Cleaned Up 72
7 rows selected.
If you are looking for session-level statistics regarding a parallel operation, looking at the V$PQ_TQSTAT view is useful in determining exactly how the work was split up among the parallel slaves, as well as giving you information about the actual DOP used based on the information within V$PQ_TQSTAT. Let’s rerun our parallel query against the EMP table with a hint specifying a DOP of 4.
SELECT /*+ parallel(emp,4) */ * FROM emp;
After completion of the query but also within the same session, we can query the V$PQ_TQSTAT view to get information about the parallel operations used for that query:
SELECT dfo_number, tq_id, server_type, process, num_rows, bytes
FROM v$pq_tqstat
ORDER BY dfo_number DESC, tq_id, server_type DESC , process;
DFO_NUMBER TQ_ID SERVER_TYPE PROCESS NUM_ROWS BYTES
---------- ---------- ------------ -------- ---------- ----------
1 0 Producer P000 309399 23665153
1 0 Producer P001 308763 23616810
1 0 Producer P002 309602 23681189
1 0 Producer P003 309189 23648954
1 0 Consumer QC 1234567 94434916
We can see that between the four producer parallel slaves, the work was divided fairly evenly between them. We can also validate that the actual DOP used for this query was 4, as specified in the query hint.
How It Works
One of the quickest methods to analyze the performance of parallel operations within your database is to analyze the dynamic performance views. These views give you a glimpse of how parallelism is performing overall within your database, which can indicate how well-tuned or badly tuned your database is for parallelism. It can also give you very session-specific details, such as how the work was split up between slaves and information on the actual DOP used for a given operation. Table 15-6 gives you an overview of the parallelism-related dynamic performance views.
Table 15-6. Key Dynamic Performance Views Related to Parallel Operations
View Name |
Description |
---|---|
V$PQ_SESSTAT |
Shows parallelism-related session-level statistics, including number of parallel slaves used |
V$PQ_SYSSTAT |
Shows parallelism-related statistics for the database instance, including number of parallel slaves used |
V$PQ_TQSTAT |
Contains statistics on parallel operations across the database instance, including the DOP used and rows processed for each slave of a given operation |
V$SYSSTAT |
Contains at-a-glance statistics on downgraded parallel-related operations |
V$PX_SESSION |
Contains information about sessions running parallel operations and information about the DOP requested and used |
V$PQ_SLAVE |
Contains information about the current parallel slaves being used by a database instance |
V$PX_PROCESS |
Contains information about parallel processes and status |
15-13. Finding Bottlenecks in Parallel Processes
Problem
You have some parallel processes that are underperforming, and you want to do analysis to find the bottlenecks.
Solution
There are many wait events related to parallelism. Many of these events are considered “idle” wait events—that is, they don’t usually indicate a problem. If you query the V$SYSTEM_EVENT view, you can get an idea of the parallelism-related waits that have occurred in your database instance. The following query results show some of the common wait events that can occur:
SELECT event, wait_class, total_waits
FROM v$session_event
WHERE event LIKE 'PX%';
EVENT WAIT_CLASS TOTAL_WAITS
---------------------------- ------------ -----------
PX Deq Credit: need buffer Idle 6667936
PX Deq Credit: send blkd Other 8161247
PX Deq: Execute Reply Idle 490827
PX Deq: Execution Msg Idle 685175
PX Deq: Join ACK Idle 26312
PX Deq: Msg Fragment Idle 67
PX Deq: Parse Reply Idle 20891
PX Deq: Signal ACK Other 25729
PX Deq: Table Q Get Keys Other 3141
PX Deq: Table Q Normal Idle 25120970
PX Deq: Table Q Sample Idle 11124
PX Deq: Table Q qref Other 1705216
PX Idle Wait Idle 241116
PX qref latch Other 1208472
How It Works
Table 15-7 describes some of the key parallelism-related wait events. If you are having significant performance issues, it may be worthwhile to browse these wait events to see whether you have excessive waits or wait times. If so, it may indicate an issue with the processing occurring with the parallel slaves. Again, events that are “idle” generally do not indicate a problem.
Table 15-7. Key Parallelism Wait Events That Could Signify a Tuning Issue
View Name |
Description |
---|---|
PX Deq: Execute Reply |
Denotes that the query coordinator (QC) is waiting for results from parallel slaves; this can be a sign of badly tuned SQL. If high waits, analyze the execution plan for efficiency. |
PX Deq: Parse Reply |
Denotes that parallel slaves are parsing SQL statements; high wait times may point to library cache contention. |
PX Deq: qref latch |
This wait can indicate that the producer slaves are processing too fast, and the consumer slaves cannot keep up. Consider increasing the parallel_execution_message_size parameter or reducing the degree of parallelism at the database, session, or statement level as appropriate for your system. |
PX Deq: Table Q Normal |
This is usually just an idle wait event, but extremely high values may indicate that some producer slaves are slow and that the consumer slaves are waiting. |
Parallelism-related wait events can be grouped into the following categories:
See My Oracle Support note 1097154.1 for a complete description of all parallelism-related wait events.
15-14. Getting Detailed Information on Parallel Sessions
Problem
You have some underperforming parallel processes and need more detailed information on the sessions.
Solution
By turning on session tracing, you can get detailed trace information on your parallel sessions. This is essentially a four-step process:
For example, you are again executing a parallel query against the EMP table. To gather trace information, you would do the following:
alter session set events '10391 trace name context forever, level 128';
select /*+ parallel(emp,4) */ * from emp;
alter session set events '10391 trace name context off';
In Oracle 12c, there appears to be a trace file generated per parallel slave, along with one general trace file. Some of the contents you can see with the trace files look similar to the following excerpt:
PARSING IN CURSOR #47940292534688 len=40 dep=0 uid=102 oct=3 lid=102
tim=1383019215069949
hv=3986116938 ad='3c8882230' sqlid='6cy2sfzqtfnaa'
select /*+ parallel(emp,4) */ * from emp
END OF STMT
PARSE #47940292534688:c=0,e=178,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2873591275,
tim=1383019215069948
EXEC #47940292534688:c=1000,e=120786,p=0,cr=3,cu=0,mis=0,r=0,dep=0,og=1,plh=2873591275,
tim=1383019215190784
FETCH #47940292534688:c=1000,e=232,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=2873591275,
tim=1383019215191208
FETCH #47940292534688:c=1000,e=1074,p=0,cr=0,cu=0,mis=0,r=14,dep=0,og=1,plh=2873591275,
tim=1383019215192766
STAT #47940292534688 id=1 cnt=15 pid=0 pos=1 obj=0 op='PX COORDINATOR (cr=3 pr=0 pw=0 time=120732 us)'
STAT #47940292534688 id=2 cnt=0 pid=1 pos=1 obj=0 op='PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0
time=0 us cost=2 size=570 card=15)
How It Works
As with other session tracing, the trace file can be found in the destination specified under the diagnostic_dest parameter. The user_dump_dest parameter can still be used, although it has been deprecated. The trace file shows granular information for the parallel processes. If you are experiencing significant performance issues with parallelism and you want to delve further into investigating the results of the trace files generated by this event, it may be beneficial to simply create a service request with Oracle in order to get the most detailed information. Reading and understanding these trace files can be difficult and cumbersome, and it may be more expedient to simply send the files to Oracle Support for analysis. Yet another way to validate the DOP used for a parallel operation is to use the _px_trace facility, which also generates a trace file. The base syntax for using the _px_trace is as follows:
alter session set "_px_trace"=[[Verbosity,]area],[[Verbosity,]area],..,[time];
For Verbosity, the possible values are as follows:
For the Area parameter, the possible values are as follows:
For the Time parameter, the only possible value is time.
The following basic example shows how to use this facility with a query you want to analyze:
alter session set "_px_trace"="compilation","execution","messaging";
select /*+ parallel(emp,4) */ * from emp;
Then, within the trace files, you can evaluate the DOP requested and used:
2013-10-28 22:10:51.901117*:PX_Messaging:kxfp.c@12057:kxfpg1srv():
trying to get server 1.0 for q=0x3d3ee5270
slave is local
found dp=0x3d852ac90 flg=18
local slave 1.0 already started..
Got It. 1 so far.
2013-10-28 22:10:51.901117*:PX_Messaging:kxfp.c@12057:kxfpg1srv():
trying to get server 1.1 for q=0x3d3ee5270
slave is local
found dp=0x3d852ade8 flg=18
local slave 1.1 already started..
Got It. 2 so far.
2013-10-28 22:10:51.901117*:PX_Messaging:kxfp.c@12057:kxfpg1srv():
trying to get server 1.2 for q=0x3d3ee5270
slave is local
found dp=0x3d852af40 flg=18
local slave 1.2 already started..
Got It. 3 so far.
2013-10-28 22:10:51.901117*:PX_Messaging:kxfp.c@12057:kxfpg1srv():
trying to get server 1.3 for q=0x3d3ee5270
slave is local
found dp=0x3d852b098 flg=18
local slave 1.3 already started..
Got It. 4 so far.
Acquired 4 slaves on 1 instances avg height=4 #set=1 qser=14337
P000 inst 1 spid 14290
P001 inst 1 spid 14292
P002 inst 1 spid 14294
P003 inst 1 spid 14296
Again, while you may be able to extract helpful information from the trace files, the best source to analyze these files is Oracle Support. More information regarding this utility also can be found on the My Oracle Support note 444164.1.