Congratulations on buying this book. PL/SQL is a great tool to have in your toolbox; however, you should understand that use of PL/SQL is not suitable for all scenarios. This chapter will teach you when to code your application in PL/SQL, how to write scalable code, and, more importantly, when not to code programs in PL/SQL. Abuse of some PL/SQL constructs leads to unscalable code. In this chapter, I will review various cases in which the misuse of PL/SQL was unsuitable and lead to an unscalable application.
PL/SQL and SQL
In a typical row-by-row processing program, code opens a cursor, loops through the rows retrieved from the cursor, and processes those rows. This type of loop-based processing construct is highly discouraged as it leads to unscalable code. Listing 1-1 shows an example of a program using the construct.
Listing 1-1. Row-by-Row Processing
DECLARE
CURSOR c1 IS
SELECT prod_id, cust_id, time_id, amount_sold
FROM sales
WHERE amount_sold > 100;
c1_rec c1%rowtype;
l_cust_first_name customers.cust_first_name%TYPE;
l_cust_lasT_name customers.cust_last_name%TYPE;
BEGIN
FOR c1_rec IN c1
LOOP
-- Query customer details
SELECT cust_first_name, cust_last_name
INTO l_cust_first_name, l_cust_last_name
FROM customers
WHERE cust_id=c1_rec.cust_id;
--
-- Insert in to target table
--
INSERT INTO top_sales_customers (
prod_id, cust_id, time_id, cust_first_name, cust_last_name,amount_sold
)
VALUES
(
c1_rec.prod_id,
c1_rec.cust_id,
c1_rec.time_id,
l_cust_first_name,
l_cust_last_name,
c1_rec.amount_sold
);
END LOOP;
COMMIT;
END;
/
PL/SQL procedure successfully completed.
Elapsed: 00:00:10.93
In Listing 1-1, the program declares a cursor c1, and opens the cursor implicitly using cursor–for-loop syntax. For each row retrieved from the cursor c1, the program queries the customers table to populate first_name
and last_name
to variables. A row is subsequently inserted in to the top_sales_customers table.
There is a problem with the coding practice exemplified in Listing1-1. Even if the SQL statements called in the loop are highly optimized, program execution can consume a huge amount of time. Imagine that the SQL statement querying the customers table consumes an elapsed time of 0.1 seconds, and that the INSERT
statement consumes an elapsed time of 0.1 seconds, giving a total elapsed time of 0.2 seconds per loop execution. If cursor c1 retrieves 100,000 rows, then the total elapsed time for this program will be 100,000 multiplied by 0.2 seconds: 20,000 seconds or 5.5 hours approximately. Optimizing this program construct is not easy. Tom Kyte termed this type of processing as slow-by-slow processing for obvious reasons.
Note Examples in this chapter use SH schema, one of the example schemas supplied by Oracle Corporation. To install the example schemas, Oracle-provided software can be used. You can download it from http://download.oracle.com/otn/solaris/oracle11g/R2/solaris.sparc64_11gR2_examples.zip
for 11gR2 Solaris platform. Refer to the Readme document in the unzipped software directories for installation instructions. Zip files for other platforms and versions are also available from Oracle's web site.
There is another inherent issue with the code in Listing 1-1. SQL statements are called from PL/SQL in a loop, so the execution will switch back and forth between the PL/SQL engine and the SQL engine. This switch between two environments is known as a context switch. Context switches increase elapsed time of your programs and introduce unnecessary CPU overhead. You should reduce the number of context switches by eliminating or reducing the switching between these two environments.
You should generally avoid row-by-row processing. Better coding practice would be to convert the program from Listing 1-1 into a SQL statement. Listing 1-2 rewrites the code, avoiding PL/SQL entirely.
Listing 1-2. Row-by-Row Processing Rewritten
--
-- Insert in to target table
--
INSERT
INTO top_sales_customers
(
prod_id,
cust_id,
time_id,
cust_first_name,
cust_last_name,
amount_sold
)
SELECT s.prod_id,
s.cust_id,
s.time_id,
c.cust_first_name,
c.cust_last_name,
s.amount_sold
FROM sales s,
customers c
WHERE s.cust_id = c.cust_id and
s.amount_sold> 100;
135669 rows created.
Elapsed: 00:00:00.26
The code in Listing 1-2, in addition to resolving the shortcomings of the row-by-row processing, has a few more advantages. Parallel execution can be used to tune the rewritten SQL statement. With the use of multiple parallel execution processes, you can decrease the elapsed time of execution sharply. Furthermore, code becomes concise and readable.
Note If you rewrite the PL/SQL loop code to a join, you need to consider duplicates. If there are duplicates in the customers table for the same cust_id columns, then the rewritten SQL statement will retrieve more rows then intended. However, in this specific example, there is a primary key on cust_id column in the customers table, so there is no danger of duplicates with an equality predicate on cust_id column.
You can nest cursors in PL/SQL language. It is a common coding practice to retrieve values from one cursor, feed those values to another cursor, feed the values from second level cursor to third level cursor, and so on. But the performance issues with a loop-based code increase if the cursors are deeply nested. The number of SQL executions increases sharply due to nesting of cursors, leading to a longer program runtime.
In Listing 1-3, cursors c1, c2, and c3 are nested. Cursor c1 is the top level cursor and retrieves rows from the table t1; cursor c2 is opened, passing the values from cursor c1; cursor c3 is opened, passing the values from cursor c2. An UPDATE
statement is executed for every row retrieved from cursor c3. Even if the UPDATE
statement is optimized to execute in 0.01 seconds, performance of the program suffers due to the deeply nested cursor. Say that cursors c1, c2, and c3 retrieve 20, 50, and 100 rows, respectively. The code then loops through 100,000 rows, and the total elapsed time of the program exceeds 1,000 seconds. Tuning this type of program usually leads to a complete rewrite.
Listing 1-3. Row-by-Row Processing with Nested Cursors
DECLARE
CURSOR c1 AS
SELECT n1 FROM t1;
CURSOR c2 (p_n1) AS
SELECT n1, n2 FROM t2 WHERE n1=p_n1;
CURSOR c3 (p_n1, p_n2) AS
SELECT text FROM t3 WHERE n1=p_n1 AND n2=p_n2;
BEGIN
FOR c1_rec IN c1
LOOP
FOR c2_rec IN c2 (c1_rec.n1)
LOOP
FOR c3_rec IN c3(c2_rec.n1, c2_rec.n2)
LOOP
-- execute some sql here;
UPDATE … SET ..where n1=c3_rec.n1 AND n2=c3_rec.n2;
EXCEPTION
WHEN no_data_found THEN
INSERT into… END;
END LOOP;
END LOOP;
END LOOP;
COMMIT;
END;
/
Another problem with the code in the Listing 1-3 is that an UPDATE
statement is executed. If the UPDATE
statement results in a no_data_found
exception, then an INSERT
statement is executed. It is possible to offload this type of processing from PL/SQL to the SQL engine using a MERGE
statement.
Conceptually, the three loops in Listing 1-3 represent an equi-join between the tables t1,t2, and t3. In Listing 1-4, the logic is rewritten as a SQL statement with an alias of t. The combination of UPDATE
and INSERT
logic is replaced by a MERGE
statement. MERGE
syntax provides the ability to update a row if it exists and insert a row if it does not exist.
Listing 1-4. Row-by-Row Processing Rewritten Using MERGE Statement
MERGE INTO fact1 USING
(SELECT DISTINCT c3.n1,c3.n2
FROM t1, t2, t3
WHERE t1.n1 = t2.n1
AND t2.n1 = t3.n1
AND t2.n2 = t3.n2
) t
ON (fact1.n1=t.n1 AND fact1.n2=t.n2)
WHEN matched THEN
UPDATE SET .. WHEN NOT matched THEN
INSERT .. ;
COMMIT;
Do not write code with deeply nested cursors in PL/SQL language. Review it to see if you can write such code in SQL instead.
Lookup queries are generally used to populate some variable or to perform data validation. Executing lookup queries in a loop causes performance issues.
In the Listing 1-5, the highlighted query retrieves the country_name using a lookup query. For every row from the cursor c1, a query to fetch the country_name is executed. As the number of rows retrieved from the cursor c1 increases, executions of the lookup query also increases, leading to a poorly performing code.
Listing 1-5. Lookup Queries, a Modified Copy of Listing 1-1
DECLARE
CURSOR c1 IS
SELECT prod_id, cust_id, time_id, amount_sold
FROM sales
WHERE amount_sold > 100;
l_cust_first_name customers.cust_first_name%TYPE;
l_cust_last_name customers.cust_last_name%TYPE;
l_Country_id countries.country_id%TYPE;
l_country_name countries.country_name%TYPE;
BEGIN
FOR c1_rec IN c1
LOOP
-- Query customer details
SELECT cust_first_name, cust_last_name, country_id
INTO l_cust_first_name, l_cust_last_name, l_country_id
FROM customers
WHERE cust_id=c1_rec.cust_id;
-- Query to get country_name
SELECT country_name
INTO l_country_name
FROM countries WHERE country_id=l_country_id;
--
-- Insert in to target table
--
INSERT
INTO top_sales_customers
(
prod_id, cust_id, time_id, cust_first_name,
cust_last_name, amount_sold, country_name
)
VALUES
(
c1_rec.prod_id, c1_rec.cust_id, c1_rec.time_id, l_cust_first_name,
l_cust_last_name, c1_rec.amount_sold, l_country_name
);
END LOOP;
COMMIT;
END;
/
PL/SQL procedure successfully completed.
Elapsed: 00:00:16.18
The example in Listing 1-5 is simplistic. The lookup query for the country_name can be rewritten as a join in the main cursor c1
itself. As a first step, you should modify the lookup query into a join. In a real world application, this type of rewrite is not always possible, though.
If you can't rewrite the code to reduce the executions of a lookup query, then you have another option. You can define an associative array to cache the results of the lookup query and reuse the array in later executions, thus effectively reducing the executions of the lookup query.
Listing 1-6 illustrates the array-caching technique. Instead of executing the query to retrieve the country_name for every row from the cursor c1, a key-value pair, (country_id, country_name) in this example) is stored in an associative array named l_country_names. An associative array is similar to an index in that any given value can be accessed using a key value.
Before executing the lookup query, an existence test is performed for an element matching the country_id key value using an EXISTS
operator. If an element exists in the array, then the country_name is retrieved from that array without executing the lookup query. If not, then the lookup query is executed and a new element added to the array.
You should also understand that this technique is suitable for statements with few distinct values for the key. In this example, the number of executions of the lookup query will be probably much lower as the number of unique values of country_id column is lower. Using the example schema, the maximum number of executions for the lookup query will be 23 as there are only 23 distinct values for the country_id column.
Listing 1-6. Lookup Queries with Associative Arrays
DECLARE
CURSOR c1
IS
SELECT prod_id, cust_id, time_id, amount_sold
FROM sales WHERE amount_sold > 100;
l_country_names country_names_type;
l_Country_id countries.country_id%TYPE;
l_country_name countries.country_name%TYPE;
l_cust_first_name customers.cust_first_name%TYPE;
l_cust_lasT_name customers.cust_last_name%TYPE;
TYPE country_names_type IS
TABLE OF VARCHAR2(40) INDEX BY pls_integer;
l_country_names country_names_type;
BEGIN
FOR c1_rec IN c1 LOOP
-- Query customer details
SELECT cust_first_name, cust_last_name, country_id
INTO l_cust_first_name, l_cust_last_name, l_country_id
FROM customers
WHERE cust_id=c1_rec.cust_id;
-- Check array first before executing a SQL statement
IF ( l_country_names.EXISTS(l_country_id)) THEN
l_country_name := l_country_names(l_country_id);
ELSE
SELECT country_name INTO l_country_name
FROM countries
WHERE country_id = l_country_id;
-- Store in the array for further reuse
l_country_names(l_country_id) := l_country_name;
END IF;
--
-- Insert in to target table
--
INSERT
INTO top_sales_customers
(
prod_id, cust_id, time_id, cust_first_name,
cust_last_name, amount_sold, country_name
)
VALUES
(
c1_rec.prod_id, c1_rec.cust_id, c1_rec.time_id, l_cust_first_name,
l_cust_last_name, c1_rec.amount_sold, l_country_name
);
END LOOP;
COMMIT;
END;
/
PL/SQL procedure successfully completed.
Elapsed: 00:00:10.89
Note Associative arrays are allocated in the Program Global Area (PGA) of the dedicated server process in the database server. If there are thousands of connections caching intermediate results in the array, then there will be a noticeable increase in memory usage. You should measure memory usage increase per process and design the database server to accommodate memory increase.
Array-based techniques can be used to eliminate unnecessary work in other scenarios, too. For example, executions of costly function calls can be reduced via this technique by storing the function results in an associative array. (The “Excessive Function Calls” section later in this chapter discusses another technique to reduce the number of executions.)
Note Storing the function results in an associative array will work only if the function is a deterministic function, meaning that for a given set of input(s), the function will always return the same output.
It is not uncommon for code to access the DUAL table excessively. You should avoid overusing DUAL table access. Accessing DUAL from PL/SQL causes context switching, which hurts performance. This section reviews some common reasons for accessing DUAL excessively and discusses mitigation plans.
There is no reason to access DUAL table to perform arithmetic operations or DATE
manipulations, as most operations can be performed using PL/SQL language constructs. Even SYSDATE
can be accessed directly in PL/SQL without accessing SQL engine. In Listing 1-7, the highlighted SQL statement is calculating the UNIX epoch time (epoch time is defined as number of seconds elapsed from January 1, 1970 Midnight) using a SELECT.. from DUAL
syntax. While access to the DUAL table is fast, execution of the statement still results in a context switch between the SQL and PL/SQL engines.
Listing 1-7. Excessive Access to DUAL—Arithmetics
DECLARE
l_epoch INTEGER;
BEGIN
SELECT ((SYSDATE-TO_DATE('01-JAN-1970 00:00:00', 'DD-MON-YYYY HH24:MI:SS'))
* 24 *60 *60 )
INTO l_epoch
FROM DUAL;
dbms_output.put_line(l_epoch);
END;
You can avoid the unnecessary access to DUAL table by performing the arithmetic calculation using PL/SQL constructs like this:
l_epoch := (SYSDATE- TO_DATE('01-JAN-1970 00:00:00', 'DD-MON-YYYY HH24:MI:SS'))
* 24 *60 *60 ;
There's simply no need to invoke SQL to execute numeric operations. Simply do the work from PL/SQL. Your code will be easier to read, and it will perform better.
Excessive access to query the current date or timestamp is another reason for increased access to DUAL table. Consider coding a call to SYSDATE
in the SQL statement directly instead of selecting SYSDATE
into a variable and then passing that value back to the SQL engine. If you need to access the column value after inserting a row, then use returning
clause to fetch the column value. If you need to access SYSDATE
in PL/SQL itself, use PL/SQL construct to fetch the current date in to a variable.
Another common reason for unnecessary access to DUAL table is to retrieve the next value from a sequence. Listing 1-8 shows a code fragment selecting the next value from cust_id_seq in to a variable and then inserting into customers table using that variable.
Listing 1-8. Excessive Access to DUAL— Sequences
DECLARE
l_cust_id NUMBER;
BEGIN
FOR c1 in (SELECT cust_first_name, cust_last_name FROM customers
WHERE cust_marital_status!='married')
LOOP
SELECT cust_hist_id_seq.nextval INTO l_cust_id FROM dual;
INSERT INTO customers_hist
(cust_hist_id, first_name, last_name )
VALUES
(l_cust_id, c1.cust_first_name, c1.cust_last_name)
;
END LOOP;
END;
/
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.89
A better approach is to avoid retrieving the value to a variable and retrieve the value from the sequence directly in the INSERT
statement itself. The following code fragment illustrates an INSERT
statement inserting rows into customers using a sequence-generated value. With this coding practice, you can avoid accessing the DUAL table, and thus avoid context switches between the engines.
Insert into customers (cust_id, ...)
Values (cust_id_seq.nextval,...);
Better yet, rewrite the PL/SQL block as a SQL statement. For example, the following rewritten statement completes in 0.2 seconds compared to a run time of 1.89 seconds with PL/SQL loop-based processing:
INSERT INTO customers_hist
SELECT
cust_hist_id_seq.nextval, cust_first_name, cust_last_name
FROM customers
WHERE cust_marital_status!='married';
23819 rows created.
Elapsed: 00:00:00.20
Another common reason for excessive access to DUAL table is to insert rows into tables involved in a master-detail relationship. Typically, in this coding practice, the primary key value for the master table is fetched from the sequence into a local variable. Then that local variable is used while inserting into master and detail tables. The reason this approach developed is that the primary key value of the master table is needed while inserting into the detail table(s).
A new SQL feature introduced in Oracle Database version 9i provides a better solution by allowing you to return the values from an inserted row. You can retrieve the key value from a newly-inserted master row by using the DML RETURNING
clause. Then you can use that key value while inserting in to the detail table. For example:
INSERT INTO customers (cust_id, ...)
VALUES (cust_id_seq.nextval,...)
RETURNING cust_id into l_cust_id;
...
INSERT INTO customer_transactions (cust_id, ...)
VALUES (l_cust_id,...)
...
It is important to recognize that well designed applications will use functions, procedures, and packages. This section is not a discussion about those well designed programs using modular code practices. Rather, this section is specifically directed towards the coding practice of calling functions unnecessarily.
Executing a function call usually means that a different part of the instruction set must be loaded into the CPU. The execution jumps from one part of instruction to another part of instruction. This execution jump adds to performance issues because it entails a dumping and refilling of the instruction pipeline. The result is additional CPU usage.
By avoiding unnecessary function execution, you avoid unneeded flushing and refilling of the instruction pipeline, thus minimizing demands upon your CPU. Again, I am not arguing against modular coding practices. I argue only against excessive and unnecessary execution of function calls. I can best explain by example.
In Listing 1-9, log_entry
is a debug function and is called for every validation. But that function itself has a check for v_debug
, and messages are inserted only if the debug flag is set to true. Imagine a program with hundreds of such complex business validations performed in a loop. Essentially, the log_entry
function will be called millions of times unnecessarily even if the v_debug
flag is set to false.
Listing 1-9. Unnecessary Function Calls
create table log_table ( message_seq number, message varchar2(512));
create sequence message_id_seq;
DECLARE
l_debug BOOLEAN := FALSE;
r1 integer;
FUNCTION log_entry( v_message IN VARCHAR2, v_debug in boolean)
RETURN number
IS
BEGIN
IF(v_debug) THEN
INSERT INTO log_table
(message_seq, MESSAGE)
VALUES
(message_id_seq.nextval, v_message
);
END IF;
return 0;
END;
BEGIN
FOR c1 IN (
SELECT s.prod_id, s.cust_id,s.time_id,
c.cust_first_name, c.cust_last_name,
s.amount_sold
FROM sales s,
customers c
WHERE s.cust_id = c.cust_id and
s.amount_sold> 100)
LOOP
IF c1.cust_first_name IS NOT NULL THEN
r1 := log_entry ('first_name is not null ', l_debug );
END IF;
IF c1.cust_last_name IS NOT NULL THEN
r1 := log_entry ('Last_name is not null ', l_debug);
END IF;
END LOOP;
END;
/
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.54
The code in Listing 1-9 can be rewritten to call the log_entry
function only if the variable l_debug
flag is set to true. This rewrite reduces unnecessary executions of log_entry
function. The rewritten program completes in 0.43 seconds. The performance improvement will be noticeable with higher number of executions.
...
IF first_name IS NULL AND l_debug=TRUE THEN
log_entry('first_name is null '),
END IF;
...
/
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.43
For a better approach, consider the conditional compilation constructs to avoid the execution of this code fragment completely. In Listing 1-10, the highlighted code uses $IF-$THEN
construct with a conditional variable $debug_on
. If the conditional variable debug_on
is true, then the code block is executed. In a production environment, debug_on
variable will be FALSE, eliminating function execution. Note that the elapsed time of the program reduces further to 0.34 seconds.
Listing 1-10. Avoiding Unnecessary Function Calls with Conditional Compilation
DECLARE
l_debug BOOLEAN := FALSE;
r1 integer;
FUNCTION log_entry( v_message IN VARCHAR2, v_debug in boolean)
RETURN number
IS
BEGIN
IF(v_debug) THEN
INSERT INTO log_table
(message_seq, MESSAGE)
VALUES
(message_id_seq.nextval, v_message
);
END IF;
return 0;
END;
BEGIN
FOR c1 IN (
SELECT s.prod_id, s.cust_id,s.time_id,
c.cust_first_name, c.cust_last_name,
s.amount_sold
FROM sales s,
customers c
WHERE s.cust_id = c.cust_id and
s.amount_sold> 100)
LOOP
$IF $$debug_on $THEN
IF c1.cust_first_name IS NOT NULL THEN
r1 := log_entry ('first_name is not null ', l_debug );
END IF;
IF c1.cust_last_name IS NOT NULL THEN
r1 := log_entry ('Last_name is not null ', l_debug);
END IF;
$END
null;
END LOOP;
END;
/
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.34
The problem of invoking functions unnecessarily tends to occur frequently in programs copied from another template program and then modified. Watch for this problem. If a function doesn't need to be called, avoid calling it.
INTERPRETED VS. NATIVE COMPILATION
If the execution of a function consumes a few seconds of elapsed time, then calling that function in a loop will result in poorly performing code. You should optimize frequently executed functions to run as efficiently as possible.
In Listing 1-11, if the function calculate_epoch
is called in a loop millions of times. Even if the execution of that function consumes just 0.01 seconds, one million executions of that function call will result in an elapsed time of 2.7 hours. One option to resolve this performance issue is to optimize the function to execute in a few milliseconds, but that much optimization is not always possible.
Listing 1-11. Costly Function Calls
CREATE OR REPLACE FUNCTION calculate_epoch (d in date)
RETURN NUMBER DETERMINISTIC IS
l_epoch number;
BEGIN
l_epoch := (d - TO_DATE('01-JAN-1970 00:00:00', 'DD-MON-YYYY HH24:MI:SS'))
* 24 *60 *60 ;
RETURN l_epoch;
END calculate_epoch;
/
SELECT /*+ cardinality (10) */ max( calculate_epoch (s.time_id)) epoch
FROM sales s
WHERE s.amount_sold> 100 and
calculate_epoch (s.time_id) between 1000000000 and 1100000000;
EPOCH
----------
1009756800
Elapsed: 00:00:01.39
Another option is to pre-store the results of the function execution, thus avoiding function execution in a query. You can do this by employing a function-based index. In Listing 1-12, a function-based index on the function calculate_epoch
is created. Performance of the SQL statement improves from 1.39 seconds to 0.06 seconds.
Listing 1-12. Costly Function Call with Function-Based Index
CREATE INDEX compute_epoch_fbi ON sales
(calculate_epoch(time_id))
Parallel (degree 4);
SELECT /*+ cardinality (10) */ max( calculate_epoch (s.time_id)) epoch
FROM sales s
WHERE s.amount_sold> 100 and
calculate_epoch (s.time_id) between 1000000000 and 1100000000;
EPOCH
----------
1009756800
Elapsed: 00:00:00.06
You should also understand that function-based indexes have a cost. INSERT
statements and UPDATE
statements (that update the time_id column) will incur the cost of calling the function and maintaining the index. Carefully weigh the cost of function execution in DML operations against the cost of function execution in SELECT
statement to choose the cheaper option.
Note From Oracle Database version 11g onwards, you can create a virtual column, and then create an index on that virtual column. The effect of an indexed virtual column is the same as that of a function-based index. An advantage of virtual columns over function-based index is that you can partition the table using a virtual column, which is not possible with the use of just function based indexes.
The function result_cache
, available from Oracle Database version 11g, is another option to tune the execution of costly PL/SQL functions. Results from function execution are remembered in the result cache allocated in the Shared Global Area (SGA) of an instance. Repeated execution of a function with the same parameter will fetch the results from the function cache without repeatedly executing the function. Listing 1-13 shows an example of functions utilizing result_cache
to improve performance: the SQL statement completes in 0.81 seconds.
Listing 1-13. Functions with Result_cache
DROP INDEX compute_epoch_fbi;
CREATE OR REPLACE FUNCTION calculate_epoch (d in date)
RETURN NUMBER DETERMINISTIC RESULT_CACHE IS
l_epoch number;
BEGIN
l_epoch := (d - TO_DATE('01-JAN-1970 00:00:00', 'DD-MON-YYYY HH24:MI:SS'))
* 24 *60 *60 ;
RETURN l_epoch;
END calculate_epoch;
/
SELECT /*+ cardinality (10) */ max( calculate_epoch (s.time_id)) epoch
FROM sales s
WHERE s.amount_sold> 100 and
calculate_epoch (s.time_id) between 1000000000 and 1100000000;
EPOCH
----------
1009756800
Elapsed: 00:00:00.81
In summary, excessive function execution leads to performance issues. If you can't reduce or eliminate function execution, you may be able to employ function-based indexes or result_cache
as a short term fix in order to minimize the impact from function invocation.
Excessive database link-based calls can affect application performance. Accessing a remote table or modifying a remote table over a database link within a loop is not a scalable approach. For each access to a remote table, several SQL*Net packets are exchanged between the databases involved in the database link. If the databases are located in geographically separated datacenters or, worse, across the globe, then the waits for SQL*Net traffic will result in program performance issues.
In Listing 1-14, for every row returned from the cursor, the customer table in the remote database is accessed. Let's assume that the round trip network call takes 100ms, so 1 million round trip calls will take 27 hours approximately to complete. A response time of 100ms between the databases located in different parts of the country is not uncommon.
Listing 1-14. Excessive Database Link Calls
DECLARE
V_customer_name VARCHAR2(32);
BEGIN
...
FOR c1 IN (SELECT …)
LOOP
...
SELECT customer_name
INTO v_customer_name
FROM customers@remotedb
WHERE account_id = c1.account_id;
...
END LOOP;
END;
Judicial use of materialized views can be used to reduce network round trip calls during program execution. In the case of Listing 1-14, the customer table can be created as a materialized view. Refresh the materialized view before program execution and access that materialized view in the program. Materializing the table locally reduces the number of SQL*Net round trip calls. Of course, as an Application Designer, you need to compare the cost of materializing the whole table versus the cost of accessing a remote table in a loop, and choose an optimal solution.
Rewriting the program as a SQL statement with a join to a remote table is another option. The query optimizer in Oracle Database can optimize such statements so as to reduce the SQL*Net trip overhead. For this technique to work, you should rewrite the program so that SQL statement is executed once and probably not in a loop.
Materializing the data locally or rewriting the code as a SQL statement with a remote join are the initial steps to tune the program in Listing 1-14. However, if you are unable to do even these things, there is a workaround. As an interim measure, you can convert the program to use a multi-process architecture. For example, process #1 will process the customers in the range of 1 to 100,000, process #2 will process the customers in the range of 100,001 to 200,000, and so on. Apply this logic to the example program by creating 10 processes, and you can reduce the total run time of the program to 2.7 hours approximately. Use of DBMS_PARALLEL_EXECUTE
is another option to consider for splitting the code in to parallel processing.
Triggers are usually written in PL/SQL, although you can write trigger code in Java as well. Excessive triggers are not ideal for performance reasons. Row changes are performed in the SQL engine and triggers are executed in the PL/SQL engine. Once again, you encounter the dreaded context-switch problem.
In some cases, triggers are unavoidable. For example, complex business validation in a trigger can't be avoided. In those scenarios, you should write that type of complex validation in PL/SQL code. You should avoid overusing triggers for simple validation. For example, use check constraints rather than a trigger to check the list of valid values for a column.
Further, avoid using multiple triggers for the same trigger action. Instead of writing two different triggers for the same action, you should combine them into one so as to minimize the number of context switches.
It is a not uncommon to see commits after every row inserted or modified (or deleted) in a PL/SQL loop. The coding practice of committing after every row will lead to slower program execution. Frequent commits generate more redo, require Log Writer to flush the contents of log buffer to log file frequently, can lead to data integrity issues, and consume more resources. The PL/SQL engine is optimized to reduce the effect of frequent commits, but there is no substitute for a well written code when it comes to reducing commits.
You should commit only at the completion of a business transaction. If you commit earlier than your business transaction boundary, you can encounter data integrity issues. If you must commit to improve restartability, consider batch commits. For example, rather than commit after each row, it's better to do batch commit every 1000 or 5000 rows (the choice of batch size depends upon your application). Fewer commits will reduce the elapsed time of the program. Furthermore, fewer commits from the application will also improve the performance of the database.
Don't use dynamic SQL statements in a PL/SQL loop as doing so will induce excessive parsing issues. Instead, reduce amount of hard parsing through the use of bind variables.
In Listing 1-15, the customers table is accessed to retrieve customer details, passing cust_id from cursor c1. A SQL statement with literal values is constructed and then executed using the native dynamic SQL EXECUTE IMMEDIATE
construct. The problem is that for every unique row retrieved from cursor c1, a new SQL statement is constructed and sent to the SQL engine for execution.
Statements that don't exist in the shared pool when you execute them will incur a hard parse. Excessive hard parsing stresses the library cache, thereby reducing the application's scalability and concurrency. As the number of rows returned from cursor c1 increases, the number of hard parses will increase linearly. This program might work in a development database with a small number of rows to process, but the approach could very well become a problem in a production environment.
Listing 1-15. Excessive Parsing
DECLARE
...
BEGIN
FOR c1_rec IN c1
LOOP
-- Query customer details
EXECUTE IMMEDIATE
'SELECT cust_first_name, cust_last_name, country_id
FROM customers
WHERE cust_id= ' || c1_rec.cust_id INTO l_cust_first_name, l_cust_last_name,
l_country_id;
...
END LOOP;
COMMIT;
END;
/
You should reduce hard parses as much as possible. Dynamic SQL statements in a loop tend to increase the effect of hard parse and the effect is amplified if the concurrency of execution increases.
This chapter reviewed various scenarios in which the use of few PL/SQL constructs was not appropriate. Keeping in mind that SQL is a set language and PL/SQL is a procedural language, the following recommendations should be considered as guidelines while designing a program:
Indeed, use PL/SQL for all your data and business processing. Use Java or other language for presentation logic and user validation. You can write highly scalable PL/SQL language programs using the techniques outlined in this chapter.