Chapter 9: Database Integration III

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

C H A P T E R 9

Database Integration III

So far, we have mostly been working with the MySQL database. It is now time to introduce Oracle RDBMS and its capabilities. Oracle RDBMS is the most popular database in the market today. It is the database that the users are most likely to have in their server rooms, at least on the high end.

This chapter will introduce Oracle RDBSM and the PHP OC18 interface (connecting and executing SQL and bind variables). It will also cover the array interface, PL/SQL procedures, IN/OUT arguments, and binding cursors. Next we will discuss large objects and working with LOB columns, and we will finish up with a look at connection pooling.

Oracle RDBMS is very feature rich. A full description would require a minor library. I'll begin this chapter by highlighting its most important features, from the perspective of a PHP programmer.

Introduction to Oracle RDBMS

Oracle RDBMS is a full-fledged relational database that conforms to the ACID properties described in Chapter 7. It employs multi-versioning for consistency in such a manner that readers never block writers. That means processes that execute queries on a particular table will neither block nor be blocked by the processes that modify that table. In contrast with many other databases, Oracle RDBMS has a centralized dictionary and doesn't use the term database like other RDBMS systems. An Oracle instance, which is a collection of processes and shared memory, always accesses a single database. Sessions connect to the instance by attaching to one of Oracle's server processes. That attachment can be dedicated, in which case the server process is dedicated to the single client that is connected to it. The attachment can also be shared, allowing multiple connections to share a single server process. From Oracle version 11g and later, the connection can also be pooled, which means that a pool of processes exists, and any of these processes can serve the given connection at any time. Oracle session is an expensive object, the number of which is limited by an initialization parameter and should not be created lightly. In contrast with some other databases, most notably Microsoft SQL Server, having multiple database sessions per one end user is considered an extremely bad practice.

Oracle database is an all-encompassing entity, further subdivided into tablespaces. A tablespace is just a collection of files, a physical place used for storing objects. Every database object, such as a table or index, is owned by a user. In the Oracle world, the term user is synonymous with schema. That means that for every schema, defined in ANSI SQL standard as a logical collection of objects, there is a username. This tends to produce a large number of users, but has no negative impact. Oracle also supports global temporary tables. The data in a global temporary table can persist for a transaction or session. The tables are called global temporary tables because their visibility is global; they exist even after all the sessions that used them disconnect from the instance. Oracle doesn't support local temporary tables, like SQL Server or PostgreSQL, which only exist for the duration of the session that has created them. Oracle supports cursors instead, but cursors are not as versatile as local temporary tables. That can sometimes present porting problems, especially when porting SQL Server applications to Oracle. Many other databases, such as DB2, SQL Server, MySQL, and PostgreSQL, support local temporary tables that only exist for the duration of session or even transaction. Developers using these databases tend to use temporary tables in profusion, which can produce a large number of permanent objects if a literal translation is done to Oracle. The accepted practice is to translate local temporary tables as cursors, whenever possible.

Oracle also supports rather unique objects called synonyms that can point to another schema, even to another database. Oracle is a fully distributed database; it allows querying remote databases and even full-fledged transactions encompassing several databases. This, however, should be used with care, because distributed databases have some strange and unexpected properties that can severely impact the application.

Row-level locking is supported for greater concurrency; it's a default locking granularity. Oracle locks are implemented in rather unique way, without global locking queues and large memory consumption. That makes Oracle locks cheap. As a matter of fact, the cost of locking a single row in a table is usually the same as the cost of locking multiple rows. Oracle doesn't escalate locks. Row locks will never be converted to a table lock. Explicit table locking in Oracle RDBMS is usually counter-productive and can have a severe negative impact on the application performance and concurrency.

Just as many other database systems, Oracle also has its transaction language or procedural extension called PL/SQL. This is a fully defined programming language, based on Ada, that can be used for developing functions, procedures, triggers, and packages. In addition to PL/SQL, one can also write stored procedures in Java. Java virtual machine is a part of the Oracle database kernel. This capability is very significant, because pure SQL is not sufficient for definition of the business rules. Business rules are usually implemented as database triggers, which makes them consistent across the applications that access the underlying data model. There are two approaches to the business rule implementation: one centered on the database and another one on the application. In my opinion, business rules should be implemented in the database, because it would be difficult and risky to maintain the consistent implementation of the business rules through the application layer. There's too much room for error. Slight variations caused by a possible misunderstanding are more than likely during the lifetime of the data model, and the company may end up with a logically inconsistent database.

This section wouldn't be complete without mentioning real application clusters (RAC). Oracle supports shared disk clusters, which are much more complex than the cleverly organized separate databases, usually known as shared-nothing architecture. In the case of Oracle RAC, several Oracle instances can access a single database that resides on the shared storage. This is illustrated in Figure 9-1.

Figure 9-1. With Oracle RAC, several Oracle instances can access a single database that resides on the shared storage.

Instances on the DB servers 1 and 2 simultaneously access the database on the shared storage. This is much more complex than shared-nothing database clusters because locking needs to be done between the nodes; a complex distributed lock manager (DLM) is needed. On the plus side, the loss of a single node doesn't mean the loss of data. In shared-nothing architecture, the loss of a single node usually means that the data managed by that node is inaccessible to the users. RAC is much more complex, but it allows for the load balancing and fail-over, which means that the entire database is accessible, as long as there is at least one surviving node in the cluster.

There are many more options and capabilities of Oracle RDBMS that go beyond the scope of this book to mention, but are worth learning. Oracle is kind enough to make its informative manuals publicly accessible here: www.oracle.com/technetwork/indexes/documentation/index.html

I wholeheartedly recommend the manuals that cover concepts. For those in need of more sophisticated and detailed introduction, I recommend the books by Tom Kyte, in particular Expert Database Architecture (Apress, 2010). Tom Kyte is a vice president at Oracle, an excellent writer, and is an extremely knowledgeable person whose books are a joy to read.

Oracle RDBMS is a very popular relational database, with a multitude of options. It is standards-compliant, but one should not fall into the trap of creating a database-independent application. Databases are very complex pieces of software, with many different implementations of the same features. Writing an application for a specific database enables one to get an optimal performance out of the allocated hardware and software. When writing an application that will use Oracle RDBMS as its data store, one should conform to the standards followed in the world of Oracle RDBMS, not to some abstract database-independent application standards. Database independence usually means that the application will work equally slow against any supported database, which is hardly a satisfactory solution. On the other hand, writing an application without any regard for portability can result in a vendor lock and eventually increase the price of the application system.

Now, let's proceed with the gory details of the OCI8 interface. The next section will assume that the OCI8 module is installed, either by linking it from the source or by PECL.

The Basics: Connecting and Executing SQL

The OCI8 extension has all the calls we saw earlier in this book when working with MySQL and SQLite extensions. In particular, it has calls to connect to an Oracle instance, prepare a SQL statement, execute it, and fetch the results. Unfortunately, OCI8 is procedural in nature, meaning that error checking must be done manually. For automated error checking, one can use the ADOdb wrapper, which has many, but definitely not all, of the options offered by the OCI8 extension itself. As has been our practice so far, an example is worth a thousand words.

As was the case with other databases, two scripts will be shown here: the first one to load a CSV file into the database, and the second one to execute a query. Both scripts are executed from the command line. Between those two scripts, it will be possible to cover all the basic calls and techniques for working with the Oracle RDBMS, the way this was done for MySQL and SQLite. Listing 9-1 shows the first script, which will load a CSV file into the database. The script is general; it takes connection string, table name, and file name as the command line arguments and loads the specified file into the specified table. There is no assumption of the particular schema or table structure.

Listing 9-1. Script to Load a CSV File into the Database

<?php

if ($argc != 4) {

    die("USAGE:script9.1 <connection> <table_name> <file name>
");

}

$conn   = $argv[1];

$tname = $argv[2];

$fname = $argv[3];

$qry = "select * from $tname";

$dsn = array();

$numrows = 0;

if (preg_match('/(.*)/(.*)@(.*)/', $conn, $dsn)) {

    $conn = array_shift($dsn);

} elseif (preg_match('/(.*)/(.*)/', $conn, $dsn)) {

    $conn = array_shift($dsn);

} else die("Connection identifier should be in the u/p@db form.");

if (count($dsn) == 2) {

    $dsn[2] = "";

}

function create_insert_stmt($table, $ncols) {

    $stmt = "insert into $table values(";

    foreach (range(1, $ncols) as $i) {

        $stmt.= ":$i,";

    }

    $stmt = preg_replace("/,$/", ')', $stmt);

    return ($stmt);

}

try {

    $dbh = oci_connect($dsn[0], $dsn[1], $dsn[2]);

    if (!$dbh) {

        $err = oci_error();

        throw new exception($err['message']);

    }

    $res = oci_parse($dbh, $qry);

    // Oracle needs to execute statement before having description

    // functions available. However, there is a special cheap

    // execution mode which makes sure that there is no performance penalty.

    if (!oci_execute($res, OCI_DESCRIBE_ONLY)) {

        $err = oci_error($dbh);

        throw new exception($err['message']);

    }

    $ncols = oci_num_fields($res);

    oci_free_statement($res);

    $ins = create_insert_stmt($tname, $ncols);

    $res = oci_parse($dbh, $ins);

    $fp = new SplFileObject($fname, "r");

    while ($row = $fp->fgetcsv()) {

        if (count($row) < $ncols) continue;

        foreach (range(1, $ncols) as $i) {

            oci_bind_by_name($res, ":$i", $row[$i - 1]);

        }

        if (!oci_execute($res,OCI_NO_AUTO_COMMIT)) {

            $err = oci_error($dbh);

            throw new exception($err['message']);

        }

        $numrows++;

    }

    oci_commit($dbh);

    print "$numrows rows inserted into $tname.
";

}

catch(Exception $e) {

    print "Exception:
";

    die($e->getMessage() . "
");

}

?>

The execution produces the same result as with the other databases:

 ./script9.1.php scott/tiger imp emp.csv

14 rows inserted into emp.

The CSV file is the same as it was for SQLite in Chapter 7. It is much more cumbersome than the elegant ADOdb version, but using OCI8 can have significant performance advantages, as we will show in the next section. The calls should be fairly recognizable by now: The oci_connect is, of course, used to connect to the database instance. An Oracle connection string usually has a username/password@db form, sometimes without the last part, so it was necessary to parse the connection argument. That is something that preg_match can do in a rather elegant way. We will cover regular expression details later.

The oci_error call is used for detecting errors, oci_parse parses the statement and oci_execute executes the statement. When trapping errors, the oci_error call takes the database handle as the only argument. The last error encountered is actually a connection handle attribute.

The oci_execute call that actually does the insert is called with an additional OCI_NO_AUTO_COMMIT argument. Without that argument, a commit would be issued after each insert. As mentioned in the MySQL section of Chapter 7, the “commit” statement is a very costly one. Not only would we suffer a performance penalty for committing after inserting each row, there is also a possibility of an inconsistent file load. Some rows would be loaded, but some rows would fail, leaving us with the cleanup task, in addition to loading the data. The default is to automatically commit after each insert.

The number of fields is returned by the oci_num_fields call, which takes an executed SQL handle as its argument. This would be impractical with large tables, so there is a special execution mode that doesn't create the result set, so there is no performance penalty. Also, the real parsing of the SQL is usually delayed until the execution phase of the SQL statement, in order to cut down the number of necessary network trips. That means that there is no need to check for errors after the oci_parse call, the place to perform the error checking is after the oci_execute call.

There is, however, a performance penalty for the way that this script is executing. For each row, we are going to the database and checking the results on return. If the database is on a different machine than the one used to execute the PHP script, this includes as many trips over the network as there are rows to insert. Even with a fast network connection, the network overhead can be very significant if there are many rows to insert. Unfortunately, PHP doesn't support direct binds of arrays to SQL placeholders, as is the case with some other languages. Fortunately, there is a trick which utilizes the OCI-Collection class that can help us with that. That trick will be described in the next section.

The basic call that wasn't covered in the script in Listing 9-1 is oci_fetch_row. This will be shown in Listing 9-2, also seen before in the earlier DB integration chapters. The script executes a query, fetches the resulting data, and prints it on the standard output.

Listing 9-2. Script to Execute a Query

<?php

$QRY = "select e.ename,e.job,d.dname,d.loc

        from emp e join dept d on(d.deptno=e.deptno)";

try {

    $dbh = oci_connect("scott", "tiger", "local");

    if (!$dbh) {

        $err = oci_error();

        throw new exception($err['message']);

    }

    $sth = oci_parse($dbh, $QRY);

    if (!oci_execute($sth)) {

        $err = oci_error($dbh);

        throw new exception($err['message']);

    }

    while ($row = oci_fetch_array($sth,OCI_NUM)) {

        foreach ($row as $r) {

            printf("% 12s", $r);

        }

        print "
";

    }

}

catch(exception $e) {

    print "Exception:";

    print $e->getMessage()."
";

    exit(-1);

}

?>

The oci_fetch_array will fetch the next row into an array type of the programmer's choice. We chose an array indexed by numbers, as specified by the OCI_NUM argument. We could have also specified OCI_ASSOC to return an associative array, indexed by the column names, or OCI_BOTH, to return both.

As with the inserts, fetch will also usually fetch row by row. Fortunately, with queries, there is a very simple trick that can help us with that. OCI8 supports the oci_set_prefetch function, which has the following syntax:

bool oci_set_prefetch($stmt,$numrows);

This will create a buffer that can hold $numrows rows and be maintained and used by Oracle. The behavior of the fetch functions will not change, but the speed will change significantly. The pre-fetch buffer is created per statement and cannot be shared or reused.

Listings 9-1 and 9-2 cover all the basics: how to connect to an Oracle instance, execute a SQL statement, and get the result. There are few more calls that belong to the category of basic OCI8 calls. Those are the calls that describe the fields in the result set: oci_field_name, oci_field_type, oci_field_size, oci_field_precision and oci_field_scale. All of those calls take the executed statement and the field number as arguments and return the requested data: name, type, size, precision and scale.

Array Interface

This section will demonstrate how easy it is to insert a large number of rows into an Oracle database in an acceptable amount of time. Large data loads are a fairly frequent occurrence with modern corporate databases. So, let's create the following table and try to load a large data file into it:

SQL> create table test_ins ( 

2  col1 number(10) 

3  ) storage (initial 100M); 



Table created.

The storage clause allocates 100M. That is done to avoid dynamic space allocation, which is probably the worst thing that can happen to a data load. Dynamic space allocation during runtime is slow, can cause concurrency issues, and should be avoided if at all possible. Now, we need a data file to load:

php -r 'for($i=0;$i<10000123;$i++) { print "$i
"; }'>file.dat

For the record, that is 10 million one hundred and twenty three records to load. First, let's see how things work with the methods from the previous section. Listing 9-3 is a very simple script that will read the file and load it into the table that we have just created.

Listing 9-3. A Simple Script to Read the File and Load It into the Table

<?php

if ($argc != 2) {

    die("USAGE:scriptDB.1 <batch size>");

}

$batch = $argv[1];

print "Batch size:$batch
";

$numrows = 0;

$val = 0;

$ins = "insert into test_ins values (:VAL)";

try {

    $dbh = oci_connect("scott", "tiger", "local");

    if (!$dbh) {

        $err = oci_error();

        throw new exception($err['message']);

    }

    $res = oci_parse($dbh, $ins);

    oci_bind_by_name($res, ":VAL", &$val, 20, SQLT_CHR);

    $fp = new SplFileObject("file.dat", "r");

    while ($row = $fp->fgets()) {

        $val = trim($row);

        if (!oci_execute($res, OCI_NO_AUTO_COMMIT)) {

            $err = oci_error($dbh);

            throw new exception($err['message']);

        }

        if ((++$numrows) % $batch == 0) {

            oci_commit($dbh);

        }

    }

    oci_commit($dbh);

    print "$numrows rows inserted.
";

}

catch(Exception $e) {

    print "Exception:
";

    die($e->getMessage() . "
");

}

?>

This is a simple script, but it is still written according to the best rules of programming. Bind is done only once, and commit is invoked in an interval that has to be defined on the command line. The concept of binding a variable to a placeholder was introduced in the previous database chapter So, let's execute the script and see the timing:

time ./script9.3.php 10000

Batch size:10000

10000123 rows inserted  .



real    16m44.110s

user    2m35.295s

sys     1m38.790s

So, for 10 million simple records, we needed 16 minutes on the local machine. That is very, very slow. The main problem lies in the fact that the previous script communicates with the database on a row by row basis, checking the result every time. Doing less frequent commits helps, such as one for every 10,000 rows, but it is not enough. In order to speed things up, we need some more database infrastructure:

SQL> create type numeric_table as table of number(10);

2  /

Type created.

SQL> create or replace procedure do_ins(in_tab numeric_table)

2  as

3  begin

4  forall i in in_tab.first..in_tab.last

5  insert into test_ins values (in_tab(i));

6  end;

7  /

Procedure created.

We created a procedure that takes a PL/SQL table, which is an Oracle collection type that can be thought of as a PHP array, and a type without we couldn't have created the procedure. The procedure takes the PL/SQL table and inserts it into the table TEST_INS, using the Oracle bulk insert mechanism. Now that we have the necessary infrastructure, Listing 9-4 shows a new version of the Listing 9-3.

Listing 9-4. A New Version of Listing 9-3

<?php

if ($argc != 2) {

    die("USAGE:scriptDB.1 <batch size>");

}

$batch = $argv[1];

print "Batch size:$batch
";

$numrows = 0;

$ins = <<<'EOS'

    begin

        do_ins(:VAL);

    end;

EOS;

try {

    $dbh = oci_connect("scott", "tiger", "local");

    if (!$dbh) {

        $err = oci_error();

        throw new exception($err['message']);

    }

    $values = oci_new_collection($dbh, 'NUMERIC_TABLE'),

    $res = oci_parse($dbh, $ins);

    oci_bind_by_name($res, ":VAL", $values, -1, SQLT_NTY);

    $fp = new SplFileObject("file.dat", "r");

    while ($row = $fp->fgets()) {

        $values->append(trim($row));

        if ((++$numrows) % $batch == 0) {

            if (!oci_execute($res)) {

                $err = oci_error($dbh);

                throw new exception($err['message']);

            }

            $values->trim($batch);

        }

    }

    if (!oci_execute($res)) {

        $err = oci_error($dbh);

        throw new exception($err['message']);

    }

    print "$numrows rows inserted.
";

}

catch(Exception $e) {

    print "Exception:
";

    die($e->getMessage() . "
");

}

?>

Let's see how this fares against Listing 9-3. The script is a bit more complex, because it needed additional database infrastructure, but the effort was definitely worth it:

time ./script9.4.php 10000

Batch size:10000

10000123 rows inserted.



real    0m58.077s

user    0m42.317s

sys     0m0.307s

The load time for 10 million records was reduced to 58 seconds and change, from 16 minutes and 44 seconds. Why did we get this huge improvement? First of all, we have created OCI-Collection object on the PHP side, to hold the collection of rows to insert. Oracle collection objects have all the methods one would expect: append, trim, size, and getElem. The append method will add a variable to the collection, trim will remove the specified number of the elements from collection, the size method will return the number of elements in the collection, and getElem will return the element for the given index.

If the table had more columns, we would need one collection object for every column and a type to support it. The script collects 10,000 rows into the collection object and only then hands it to Oracle, thus the name array interface. Second, the procedure does a bulk insert, which is considerably faster than doing simple inserts in a loop. If the target database was on another machine, even with a fast 1GB Ethernet link, the execution time for the first script would take to 45 minutes. The second script would still execute in less than two minutes, because of the greatly reduced number of network trips. Both scripts are committing at the same rate. In the script in Listing 9-3, oci_execute was called with OCI_NO_AUTO_COMMIT and oci_commit was explicitly called for every 10,000 rows. In the script in Listing 9-4, oci_execute was called without disabling the auto-commit feature, which means that commit was issued after every successful completion.

That script cannot be written using ADOdb or PDO, because they do not support the OCI-Collection type. Writing PHP scripts for large data warehouse loads is best done by using the native OCI8 interface. Are there any problems with the second script? For one, it tends to ignore errors. Errors have to be handled in the DO_INS insert procedure, which is something that we didn't do here, for simplicity reasons. PL/SQL command FORALL has an option called SAVE EXCEPTIONS that can be used to inspect the outcome of every row and throw an exception, if needed. PL/SQL is a very mighty language and has many more uses than the simple one we have shown here. Oracle documentation, which contains an excellent manual on PL/SQL, is available on their documentation web site, mentioned previously in this chapter. The next section also deals with PL/SQL.

PL/SQL Procedures and Cursors

In the previous section, we saw bind variables in cooperation with PL/SQL. Bind variables have to be bound to placeholders in PL/SQL code. See Listing 9-5.

Listing 9-5. Insert listing caption here.

<?php

$proc = <<<'EOP'

declare

  stat number(1,0);

begin

  dbms_output.enable();

  select days_ago(:DAYS) into :LONG_AGO from dual;

  dbms_output.put_line('Once upon a time:'||:LONG_AGO);

  dbms_output.get_line(:LINE,stat);

end;

EOP;

$days=60;

$long_ago="";

$line="";



try {

    $dbh = oci_connect("scott","tiger","local");

    if (!$dbh) {

        $err = oci_error();

        throw new exception($err['message']);

    }

    $res = oci_parse($dbh, $proc);

    oci_bind_by_name($res,":DAYS",&$days,20,SQLT_CHR);

    oci_bind_by_name($res,":LONG_AGO",&$long_ago,128,SQLT_CHR);

    oci_bind_by_name($res,":LINE",&$line,128,SQLT_CHR);

    if (!oci_execute($res)) {

       $err=oci_error($dbh);

       throw new exception($err['message']);

    }

    print "This is the procedure output line:$line
";

}

catch(Exception $e) {

    print "Exception:
";

    die($e->getMessage() . "
");

}

?>

When executed, this script produces the following output:

./script9.5.php

This is the procedure output line:Once upon a time:2011-01-31 12:10:26

The function days_ago is a rather trivial user defined function that looks like this:

CREATE OR REPLACE

  FUNCTION days_ago(

      days IN NUMBER)

    RETURN VARCHAR2

  AS

  BEGIN

    RETURN(TO_CHAR(sysdate-days,'YYYY-MM-DD HH24:MI:SS'));

  END;

So, in our little script in Listing 9-5, we have a mixture of almost everything: a user created function with an input argument, system package DBMS_OUTPUT and output arguments, all bundled together in an anonymous PL/SQL code. Bind variables need not to be declared, they are declared by the oci_bind_by_name call. There is no need to declare IN parameters and OUT parameters as in some frameworks; oci_bind_by_name will do both. Bind variables can be of different types. Obviously, they can be numbers and strings, and in the section about the array interface earlier in this chapter, we saw that bind variables can be objects of the OCI-Collection class. It is also possible to bind a statement handle. In Oracle terminology, a statement handle is called a cursor. Oracle's PL/SQL can manipulate cursors very well and can hand them to PHP for execution. Listing 9-6 shows an example.

Listing 9-6. Insert listing caption here.

<?php

$proc = <<<'EOP'

declare

type crs_type is ref cursor;

crs crs_type;

begin

    open crs for select ename,job,deptno from emp;

:CSR:=crs;

end;

EOP;

try {

    $dbh = oci_connect("scott", "tiger", "local");

    if (!$dbh) {

        $err = oci_error();

        throw new exception($err['message']);

    }

    $csr = oci_new_cursor($dbh);

    $res = oci_parse($dbh, $proc);

    oci_bind_by_name($res, ":CSR", $csr, -1, SQLT_RSET);

    if (!oci_execute($res)) {

        $err = oci_error($dbh);

        throw new exception($err['message']);

    }

    if (!oci_execute($csr)) {

        $err = oci_error($dbh);

        throw new exception($err['message']);

    }

    while ($row = oci_fetch_array($csr, OCI_NUM)) {

        foreach ($row as $r) {

            printf("%-12s", $r);

        }

        print "
";

    }

}

catch(Exception $e) {

    print "Exception:
";

    die($e->getMessage() . "
");

}

?>

In Listing 9-6, we call oci_execute twice. The first time, we execute the little PL/SQL script from the variable $proc. This script opens a cursor, of the PL/SQL type ref cursor, for the SQL query that selects three columns from the EMP table, puts that cursor into the bind variable :CSR and exits. After that, it's all PHP.

When the PL/SQL code is executed, it puts Oracle cursor into the bind variable $csr, which was created by a call to oci_new_cursor. Cursors are, as we have previously said, parsed SQL statements. Now that $csr is populated, it needs to be executed and the data can be retrieved. So, the second oci_execute was used to execute that cursor. After that, the data was retrieved and printed on the standard output. The result looks like the following:

./script9.6.php

SMITH        CLERK           20          

ALLEN        SALESMAN    30          

WARD        SALESMAN    30          

JONES        MANAGER     20          

MARTIN      SALESMAN    30          

BLAKE        MANAGER     30          

CLARK        MANAGER     10          

SCOTT        ANALYST       20          

KING           PRESIDENT   10          

TURNER      SALESMAN    30          

ADAMS       CLERK           20          

JAMES         CLERK           30          

FORD          ANALYST      20          

MILLER       CLERK           10

PL/SQL created a SQL statement, parsed it, and handed it to PHP for execution. PHP executed it and produced the result. It is an extremely powerful combination that can be used to great effect in applications.

If the cursor returned from PL/SQL uses locking, oci_execute needs to be called with OCI_NO_AUTO_COMMIT because the implied commit that follows every successful execution will release locks and cause the following error:

PHP Warning:  oci_fetch_array(): ORA-01002: fetch out of sequence in /home/mgogala/work/book/ChapterDB/scriptDB.6.php on line 29

This error was produced by adding “for update of job” to the query in the PL/SQL code. The query was modified to read select ename,job,deptno from emp for update of job. Queries with the “for update” clause will lock the selected rows; this behavior is mandated by the SQL standard. In relational databases, locks are granted for the duration of a transaction. Once the transaction is terminated—for instance, by a commit statement—the cursor becomes invalid, and data can no longer be retrieved. By default, oci_execute issues a commit, and will break queries with “for update” option. There will be a similar error, as shown in the next section.

Note The oci_execute call will execute a commit after every successful execution, even if the executed SQL is a query. If that behavior is not desired, use OCI_NO_AUTO_COMMIT argument.

Now, we can proceed to another important object type.

Working with LOB types

LOB stands for large object. It can be a textual large object, of the character large object type (CLOB), a binary large object type (BLOB), or a pointer to a file of the Oracle type BFILE. The basic characteristic of the LOB type is its size. In this case, size definitely matters.

When relational databases first appeared, things like large documents, media clips, graphic files, and the like were not held in relational databases. Objects of that nature were held in the file system. A paradigm for a collection of documents was a file cabinet, with drawers and, possibly, letter markings. One was supposed to know exactly what he was looking for, preferably with the document number. File systems were modeled after file cabinets. A file system is just a collection of drawers, called directories, that contain documents. Tasks like “please, get me all contracts from 2008 which refer to office furniture, like chairs, tables, and cabinets” would be impossible to complete in the old organization. With the advent of text indexing, such tasks are now very routine. Also, file systems keep very little externally accessible information about the document. File systems typically keep the file name, owner, size, and date, and that's about it. There are no keywords, no external remarks, no author, or any other useful information that may be needed about the document. Keeping all the necessary information means that the old file cabinet paradigm was no longer sufficient; the documents are now increasingly held in the databases themselves.

Oracle has an option called Oracle*Text that comes with every Oracle database, at no extra cost. This option enables the user to create text indexes on documents, parse MS Word documents, Adobe PDF documents, HTML documents, and many other document types. Oracle can also do text searches, just like Sphinx, and its text indexes are tightly integrated into the database. There are also options that analyze maps, measuring distances between two points, and even analyze X-ray images. All those goodies rely on large objects being stored in the database. Of course, PHP is very frequently used in web applications and has great mechanisms for dealing with uploaded files. That makes dealing with LOB columns especially important for PHP applications. Uploading documents and storing them into the database is something that can be reasonably expected, when working with PHP and the Oracle database.

Our next example will load a content of a text file into the database. The text file is Kurt Vonnegut's excellent story “Harrison Bergeron,” obtained from here:

www.tnellen.com/cybereng/harrison.html

The content of the story was stored to disk as a text file called harrison_bergeron.txt. The story is rather short, around 12K, but still larger than the maximum size of VARCHAR2 columns, which is 4K:

ls -l harrison_bergeron.txt

-rw-r--r-- 1 mgogala users 12678 Apr  2 23:28 harrison_bergeron.txt

The document is precisely 12,678 characters long. That fact will be used to check the outcome of our script. Of course, when inserting documents, we will also need a table to insert into. Here is the table used in the next two examples:

CREATE TABLE TEST2_INS

  (

    FNAME VARCHAR2(128),

    FCONTENT CLOB

  ) LOB(FCONTENT) STORE AS SECUREFILE SF_NOVELS (

    DISABLE STORAGE IN ROW DEDUPLICATE COMPRESS HIGH

 ) ;

When creating tables like this, the natural impulse is to create columns named NAME and CONTENT, but these may be reserved words or may become reserved words in some of the future Oracle versions. That can cause unpredictable problems and avoiding such words as column names is a smart principle to follow.

Note Using names like NAME, CONTENT, SIZE or similar is dangerous because of possible clashes with SQL keywords.

Also, when creating LOB columns there are many options to choose from, depending on the database version. Options given to the create table command can significantly impact the storage required to store the LOB, performance of the text indexes and the performance of the data retrieval process. The database in which this table was created is an Oracle 11.2 database. Not all of these options are available in earlier versions that may still be in use today. The option that has been available since Oracle 9i is DISABLE STORAGE IN ROW. If this option is used, Oracle will store the entire LOB column in a separate storage space, called LOB segment, leaving only the information how to find the LOB, also known as LOB locator, in the table row. LOB locators are typically 23 bytes in size. That will render the non-LOB columns of the table much more densely packed and the read of the non-LOB columns much more efficient. For accessing the LOB data, Oracle will have to issue separate I/O requests, therefore decreasing the efficiency of the table reads.

Without the DISABLE STORAGE IN ROW option, Oracle will store up to 4K of the LOB content within the normal table storage, along with the other, non-LOB columns. That will make the table segment much larger and much more sparse, decreasing the efficiency of the indexes on non-LOB columns. That will also decrease the number of reads necessary to read the LOB data. The rule of thumb is to store the LOB columns along with the rest of the table data if LOB columns are always fetched when the table data is needed. If, on the other hand, there is a significant number of situations which do not require LOB columns to be read along with the rest of the data, LOB columns are best stored separately from the non-LOB data, which means DISABLE STORAGE IN ROW. Oracle, by default, stores everything together, if not specifically requested to do otherwise.

The plan is to insert the file name and content into this table. Listing 9-7 shows the script that does it.

Listing 9-7. Insert listsing caption here.

<?php

$ins = <<<SQL

insert into test2_ins(fname,fcontent) values (:FNAME,empty_clob())

returning fcontent into :CLB

SQL;

$qry = <<<SQL

select fname "File Name",length(fcontent) "File Size"

from test2_ins

SQL;

$fname = "harrison_bergeron.txt";

try {

    $dbh = oci_connect("scott", "tiger", "local");

    if (!$dbh) {

        $err = oci_error();

        throw new exception($err['message']);

    }

    $lob = oci_new_descriptor($dbh, OCI_DTYPE_LOB);

    $res = oci_parse($dbh, $ins);

    oci_bind_by_name($res, ":FNAME", $fname, -1, SQLT_CHR);

    oci_bind_by_name($res, ":CLB", $lob, -1, SQLT_CLOB);

    if (!oci_execute($res, OCI_NO_AUTO_COMMIT)) {

        $err = oci_error($dbh);

        throw new exception($err['message']);

    }

    $lob->import("harrison_bergeron.txt");

    $lob->flush();

    oci_commit($dbh);

    $res = oci_parse($dbh, $qry);

    if (!oci_execute($res, OCI_NO_AUTO_COMMIT)) {

        $err = oci_error($dbh);

        throw new exception($err['message']);

    }

    $row = oci_fetch_array($res, OCI_ASSOC);

    foreach ($row as $key => $val) {

        printf("%s = %s
", $key, $val);

    }

}

catch(Exception $e) {

    print "Exception:
";

    die($e->getMessage() . "
");

}

?>

When the script is executed, the result looks like this:

./script9.7.php

File Name = harrison_bergeron.txt

File Size = 12678

So, we have inserted a text file into the database. Listing 9-7 has several important elements. Unlike the OCI-Collection type, OCI-Lob descriptors must be initialized in the database, thus the RETURNING clause in the insert. If we tried to populate the LOB descriptor on the client side and just insert it into the database, without the EMPTY_CLOB() and RETURNING complications, we would have received an error saying that the script is trying to insert an invalid LOB descriptor. The reason for such behavior is that LOB columns are actually files within the database. Storage has to be allocated and the information about the file provided in the descriptor. Descriptor describes an object that can be used to read from the database and write to the database. This is the reason for inserting an empty CLOB and returning it into a PHP descriptor, using the bind call. The method shown previously, with the RETURNING clause, is a general method to use when inserting LOB objects into an Oracle database.

Second, the LOB descriptor is an object that is only valid for the duration of a transaction. Relational databases have transactions and, once in the database, the LOB objects must be afforded the same protection under the ACID rules as any other data in the database. LOB column is, after all, just a column in a database row. Once the transaction completes, there is no guarantee that someone else will not lock the row we have just written and add a little commentary to our text, possibly changing its size or even location. LOB descriptors are, therefore, only valid for the duration of a transaction, which means that OCI_NO_AUTO_COMMIT argument must be used with oci_execute. We can only commit when we're done with modifying the row. Without OCI_NO_AUTO_COMMIT, the following error would have occurred:

./script9.7.php

PHP Warning:  OCI-Lob::import(): ORA-22990: LOB locators cannot span transactions in /home/mgogala/work/book/ChapterDB/scriptDB.7.php on line 18

Of course, an empty LOB would have been inserted, which means that the name of the file would be correct, but the content wouldn't be there. In other words, the database would be logically corrupted. The word corrupt means that the data in the database is inconsistent. Having a file name, without the necessary file is, of course, an inconsistent state of the database. That is very similar to the problem with locking cursors shown in the previous section, but much more dangerous.

The OCI8 interface contains OCI-Lob class. New objects of that class are allocated using the oci_new_descriptor call. That class has more or less the same methods as the DBMS_LOB internal PL/SQL package for dealing with lobs from PL/SQL. Remember, one should think of LOB columns as files being stored in the database. There are many operations one can do with files: read, write, append, get the size, tell the current position, seek, set buffering, reset the position to the beginning (rewind), and flush them to disks. All those operations are also methods of the OCI-Lob class. We used OCI-Lob->import for simplicity, but we could have also used OCI-Lob->write, which is completely analogous to the file system write call. The syntax is the following: int OCI-Lob->write($buffer,$length). The write method returns the number of bytes actually written to the LOB column.

We have used OCI-Lob->flush() method to make sure that all the data transferred from the original file have been actually written to the LOB column at the point of commit. It is a smart strategy that ensures that the data is completely transferred to the server, before the transaction is committed, locks released and the LOB descriptor invalidated. Furthermore, OCI-Lob->import is extremely convenient for small files. With large files, it is entirely possible to encounter various memory problems. PHP scripts usually have memory limits set in the php.ini file and most of the system administrators are not overly generous with allowing PHP scripts to consume large quantities of memory, with typical values being between 32MB and 256MB of memory that a PHP script is allowed to consume. If the web site becomes heavily used, such generosity can bring down the entire machine. Extremely large files, hundreds of MB in size, can only be loaded piecewise, reading reasonably sized chunks into buffers and writing those buffers to the LOB column using OCI-Lob write. The maximum size of a LOB column is 4GB, but one rarely needs to load such large files into the database. The most frequently encountered cases in our careers were loading textual documents into the database, and they are rarely larger than a few megabytes. The OCI-Lob->import() method is normally used for that type of files.

To wrap up the chapter, Listing 9-8 shows a small example script that will read the LOB we just inserted and demonstrate the use of OCI-Lob->read().

Listing 9-8. Script to Demonstrate Using OCI-Lob->read()

<?php

$qry = <<<SQL

DECLARE

fcon CLOB;

BEGIN

SELECT fcontent into fcon

FROM test2_ins

WHERE fname='harrison_bergeron.txt';

:CLB:=fcon;

END;

SQL;

try {

    $dbh = oci_connect("scott", "tiger", "local");

    if (!$dbh) {

        $err = oci_error();

        throw new exception($err['message']);

    }

    $lh = oci_new_descriptor($dbh, OCI_DTYPE_LOB);

    $res = oci_parse($dbh, $qry);

    oci_bind_by_name($res, ":CLB", $lh, -1, SQLT_CLOB);

    if (!oci_execute($res, OCI_NO_AUTO_COMMIT)) {

        $err = oci_error($dbh);

        throw new exception($err['message']);

    }

    $novel = $lh->read(65536);

    printf("Length of the string is %d
", strlen($novel));

}

catch(Exception $e) {

    print "Exception:
";

    die($e->getMessage() . "
");

}

?>

The first question is, why did we wrap our little query into an anonymous PL/SQL block? The answer to that is that binding LOB descriptor to a plain simple placeholder in a SELECT...INTO statement just doesn't work. It produces an invalid LOB handle. Wrapping queries into a simple anonymous PL/SQL handle is not a big deal. The execution part has been repeated over and over again: parse, bind variables, and execute. Reading from a LOB column is as simple as reading from its operating system counterparts, files.

Note LOB columns should be thought of as files stored in the database.

There are many more options, tips and tricks when working with LOB columns. In the latest version of Oracle RDBMS, Oracle 11g, it is possible to compress LOB columns, with the advanced compression option, which is licensed separately. There is a manual, available with all the other Oracle documentation, called the Large Objects Developer's Guide, or, with version 11g, the Securefiles and Large Objects Developer's Guide.

Connecting to DB Revisited: Connection Pooling

This is a “bleeding edge” section. Connection pooling in the database is only available in Oracle 11g, the latest and the greatest release of Oracle RDBMS. Many users have not converted their databases to Oracle 11g yet. Upgrading a production database is a serious project that is not undertaken lightly, but the possibility of connection pooling can be a great argument for upgrading to 11g, if there are many applications that could benefit from the connection pooling. Connection pooling is not available only to PHP users; it is a general mechanism which can be used with other tools as well.

The notion of connection pooling is known to anyone who has ever been working with Java applications and application servers and is intuitive and easy to understand. Basically, the goal is to allocate a certain number of server processes that can be reused by the application. The DBA can allocate a pool of processes and make it available to the applications.

In order to understand the advantages, let's first see what the traditional options are for connecting to an Oracle instance. Before connection pooling, there were only two options, and both needed to be configured by the DBA. The first option was the dedicated server connection. When application requests a dedicated server, an Oracle server process is allocated to service it. It would only service a single application and if the application is idle, the allocated process cannot service any other requests that may be pending. This process exists for the life time of the connection that initiated its creation and exits when it receives a disconnection request. This is the default way of handling connections, usually appropriate for the most applications. Each process has its own work area, in Oracle terms known as the process global area (PGA), which is used for sorting and hashing. When the dedicated process exits, its PGA is de-allocated, as it is non-shared memory, owned by each individual process. Each dedicated server connection incurs the expense of creating a server process. The database has to be configured to allow one process per each connecting user.

The other type of connections to the database, which have existed since Oracle 7, is known as a shared server connection. The database can be configured in such a way that a group of shared server processes exist that will perform SQL statements on behalf of the requesting users. When the process is done with one SQL statement for application A, it is free to start working on another SQL statement for application B. There is no guarantee that the two consecutive SQL statements executed for the same requesting process will be executed by the same shared server. All shared server processes have their work areas in shared memory, which Oracle calls share global area (SGA), which means that a fair amount of configuration effort must be made in order for things to function smoothly. This also requires a large amount of shared memory, which remains permanently allocated and cannot be de-allocated when there is no need for it. Connecting application does not have to create the new process and a small number of processes can handle quite a large number of requesting processes. Configuring and monitoring shared server systems is fairly complex and rarely used.

Connection pooling, available since Oracle 11g and also known as database resident connection pooling (DRCP), provides the best of both worlds. Once a process from the pool is assigned to a session, it stays assigned to the session, for the duration of the session. Furthermore, every process in the pool has its own PGA, so there is no problem with the expensive shared memory configuration.

Connection pooling is mostly configured on the database side, by the DBA, and in the PHP parameters, inside php.ini. Scripts do not have to change their syntax. Existing scripts can use pooling without any modifications. Let's now look at how pooling can be configured.

First, on the Oracle RDBMS side, we have to configure the pool. That's done using the DBMS_CONNECTION_POOL supplied PL/SQL package. The package is described here:

http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/toc.htm

The package allows the administrators to define the maximum number of server processes in the pool, the minimum number of processes, the maximum idle time after which the server process is returned to the pool, maximum session life time, and time to live (TTL). When the session is idle for more than defined by the time to live parameter, it is killed. That helps Oracle to maintain the pool usage. Here is an example of the pool configuration on the DBA side:

begin

dbms_connection_pool.configure_pool(

pool_name => 'SYS_DEFAULT_CONNECTION_POOL',

minsize => 5,

maxsize => 40,

incrsize => 5,

session_cached_cursors => 128,

inactivity_timeout => 300,

max_think_time => 600,

max_use_session => 500000,

max_lifetime_session => 86400);

end;

To do this, the user would have to be connected as SYSDBA. Without going into too much detail, we will use the default pool arguments and only start the pool. Oracle 11.2 supports only a single connection pool, so there is no choice of pools to start:

SQL> connect / as sysdba

Connected.

SQL> exec dbms_connection_pool.start_pool();



PL/SQL procedure successfully completed.

This will start the default pool. Once started, the pool is persistent. Even if the instance is restarted, the pool will be started automatically. Once the pool is started, the parameter oci8.connection_class needs to be set. It's set to a string that identifies your application to Oracle instance. This can later be monitored through Oracle system tables. Here are the settings I use in my php.ini:

oci8.connection_class = TEST

oci8.ping_interval = -1

oci8.events = On

oci8.statement_cache_size = 128

oci8.default_prefetch = 128

Parameter oci8.events enables instance up or down notifications, setting the parameter oci8.ping_interval to -1 disables pinging from the PHP side to see whether the instance is up. That is not needed as the up/down notifications are enabled by setting the events parameter to “on.” The last two parameters are here for performance reasons. OCI8 sessions will cache up to 128 cursors in their user memory and will attempt to bring back rows in batches of 128.

The parameter file is now complete. All that we now need is to connect. For that, we will revisit the script in Listing 9-2 and replace the line that reads

$dbh = oci_connect("scott", "tiger", "local");

with a line that reads

$dbh = oci_pconnect("scott", "tiger", "localhost/oracle.home:POOLED");

That's all there is to it! Nothing else needs to be changed. The script will now execute exactly the same way as it did with the previous connect command. So, what is pconnect? The oci_pconnect creates a persistent connection. When the connection is established, it will not be closed, once the script exits. When connecting, OCI8 will check whether an unused connection with the same credentials already exists and will reuse it if it does. There is also oci_new_connection call that will request a new connection every time. Standard oci_connect, the call that we've been using throughout this chapter, will close the connection when the script exits but will return the existing handle if the connection with the same credentials is requested more than once.

In what situations should pooling be used? One should consider pooling where there is a multitude of processes using the same database credentials to connect to the database and when these processes a repeatedly connecting over a period of time. What are the advantages of using connection pooling? Using connection pooling saves database resources and enables the DBA to better manage the precious database resources. Using connection pooling is a decision to be discussed with the DBA who will have to do the majority of work.

Character Sets in the Database and PHP

When working with databases, there is frequently the issue of character sets to contend with. Oracle stores data in the character set defined by the parameter NLS_CHARACTERSET, which is defined at the creation time and generally cannot be easily changed. Changing of the character sets is supported if and only if the new character set is a superset of the previous character set. Databases can be corrupted when an unsupported character set change is attempted. Most of the time, the only realistic way to change character sets is the export/import, which can take quite a while with terabyte sized databases.

Fortunately for PHP programmers, Oracle also converts the data sent to the client into the client specified character set. There is an environment variable that drives that conversion. Let's create yet another table in the SCOTT schema:

CREATE TABLE TEST3

  (

    TKEY NUMBER(10,0),

    TVAL VARCHAR2(64)

  )

A single row was inserted into that table, containing the following values:

(1,'Überraschung'). The word die Überraschung is German for surprise, and was chosen because of the character at the beginning. This mark above the character U is known as an umlaut. Now, let's create a small PHP script that is a small modification of the script from Listing 9-2 earlier in this chapter (see Listing 9-9).

Listing 9-9. A Small PHP Script

<?php

$QRY = "select * from test3";

try {

    $dbh = oci_new_connect("scott", "tiger", "local");

    if (!$dbh) {

        $err = oci_error();

        throw new exception($err['message']);

    }

    $sth = oci_parse($dbh, $QRY);

    if (!oci_execute($sth)) {

        $err = oci_error($dbh);

        throw new exception($err['message']);

    }

    while ($row = oci_fetch_array($sth, OCI_NUM)) {

        foreach ($row as $r) {

            printf("%-12s", $r);

        }

        print "
";

    }

}

catch(exception $e) {

    print "Exception:";

    print $e->getMessage() . "
";

    exit(-1);

}

?>

This script selects everything from the table TEST3 and displays it on the standard output. There is nothing particularly interesting about this script. It is shown because of the following:

First execution:

unset NLS_LANG

./script9.8.php

1           Uberraschung

Second execution:

export NLS_LANG=AMERICAN_AMERICA.AL32UTF8

./scriptDB.8.php

1           Überraschung

The output of the script differs, depending on the environment variable NLS_LANG. The syntax for the NLS_LANG is <Language>_<Territory>.Character set. The exact syntax, with the examples, is also described in the Oracle documentation, which we wholeheartedly recommend. In the first invocation, there was no NLS_LANG variable defined; Oracle used the default character set from the system, which is US7ASCII on the machine used to develop examples for this book. The output of the script doesn't contain any characters that do not conform to US7ASCII standard; the word is written out as Uberraschung, without the umlaut (the little dots above the letter U). The second time, with the NLS_LANG properly defined, the output was correct: it contained the umlaut characters.

If control with the NLS_LANG doesn't appeal to you, or if your scripts have to display the output in various character sets, it can be specified during the connect time. Character set is actually the fourth argument to the oci_connect. Instead of using the NLS_LANG variable, we could have written oci_connect("scott","tiger","local","AL32UTF8") and the output would also contain the umlaut. Oracle names for character sets can be looked in the documentation and the database itself. The valid names are in the table V$NLS_VALID_VALUES. Oracle supports over 200 different character sets. For details about the particular character set, consult the Oracle documentation.

Of course, in order for PHP to be able to display the content correctly, you should also set iconv.output_encoding to the correct character set, so that the output is shown correctly. I usually set the iconv parameters, like this:

iconv.input_encoding = UTF-8

iconv.internal_encoding = UTF-8

iconv.output_encoding = UTF-8

The input_encoding parameter is not used for anything at this point; it is set just for completeness. This way, PHP will be able to use the correct character set for output and my strings will be properly formatted.

Summary

In this chapter, we covered using the OCI8 extension in detail. The most important feature was the array interface. The array interface makes PHP load scripts execute an order of magnitude faster than without it, but it does require some specific features of OCI8 interface, namely the OCI-Collection class. We have also covered working with LOB types, cursors, and bind variables, which can become handy when developing web applications. Features such as character sets and connection pooling have become an integral part of the modern application systems. With the new versions of Oracle RDBMS, new features will likely be added to the OCI8 interface, but for now, this should provide a reasonably complete coverage of the OCI8 features.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9: Database Integration III

Create new playlist

Sign In

Sign Up

C H A P T E R 9

Database Integration III

Introduction to Oracle RDBMS

The Basics: Connecting and Executing SQL

Array Interface

PL/SQL Procedures and Cursors

Working with LOB types

Connecting to DB Revisited: Connection Pooling

Character Sets in the Database and PHP

Summary

Table of Contents for
Chapter 9: Database Integration III