7.6. Loading Data with SQL*Loader

SQL*Loader is a program that reads data files in many possible formats, parses the data (breaks it into meaningful pieces), and loads the data into database tables. Like Data Pump, myriad options and a hefty book could be devoted to its use. The Oracle Database Utilities manual (part B10825-01) devotes 275 pages of reference material to SQL*Loader alone. This section will not be so comprehensive or attempt to cram all possible uses of SQL*Loader into a few short pages. Instead we will cover the basics and teach you what we feel is necessary for the exam.

SQL*Loader uses the following file types:

Log A mandatory file. If you do not specify a log file, SQL*Loader will try to create one in the current directory with the name of your control file and a .log filename extension. If SQL*Loader cannot create the log file, execution is aborted. The log file contains a summary of the SQL*Loader session, including any errors that were generated.

Control A mandatory file. This file tells SQL*Loader where the other files are, how to parse and load the data, and which tables to load the data into and can contain the data as well.

Data Data files are optional and, if included, hold the data that SQL*Loader reads and loads into the database. The data can be located in the control file, so these files are optional.

Bad Hold the "bad" data records—those that did not pass validation by either SQL*Loader or the database. Bad files are created only if one or more records fail validation. Just as with the log file, if you do not specify a bad file, the database will create one with the name of your control file and a .bad filename extension.

Discard Hold data records that did not get loaded because they did not satisfy the record selection criteria in the control file. Discard files are created only if data records were discarded because they did not satisfy the selection criteria.

SQL*Loader provides a robust toolkit to build data-loading programs for your Oracle10g database. It can operate either on the database server or on a client machine.

The following section will show you how to employ SQL*Loader to load data into your database tables.

7.6.1. Specifying SQL*Loader Command-Line Parameters

To invoke the SQL*Loader program, use the command sqlldr followed by one or more commandline parameters. These parameters can be identified positionally on the command line or with a keyword=value pair. You can mix positional and keyword notation provided that all the keyword notation parameters appear after all the positional parameters.

For example, to invoke SQL*Loader, telling it to use the connect string system/password and use the control file regions.ctl, you can execute any of the following command lines:

sqlldr system/password regions.ctl
sqlldr control=regions.ctl userid=system/password
sqlldr system/password control=regions.ctl

The command-line parameters include those in Table 7.7, which are shown in positional notation order (through bindsize), and can be seen by executing the sqlldr command with no parameters.

Table 7.7. SQL*Loader Command-Line Parameters
ParameterDescription
useridThe database connect string, for example, scott/tiger@prod).
controlThe name of the control file.
logThe name of the log file. The default is the control filename with a .log extension.
badThe name of the bad file. The default is the datafile name, but with a .bad extension.
dataThe name of the datafile. The default is the control filename with a .dat extension.
discardThe name of the discard file. The default is the datafile name, but with a .dsc extension.
discardmaxThe maximum number of discards to allow before failing. The default is all.
skipThe number of records to skip before starting to load. The default is none.
loadThe number of records to load. The default is all.
errorsThe number of errors to allow before failing. The default is 50.
rowsThe number of rows in a conventional path bind array or between direct path data saves. The default is 64 rows in conventional path mode and all rows in direct path mode.
bindsizeThe size of the conventional path bind array in bytes. The default is 256KB.
directIf TRUE, use direct path. The default is FALSE, indicating conventional path.
parfileThe name of a file containing additional command-line parameters. There is no default.

Many of the command-line parameters can also appear in the control file. When they appear as both command-line parameters and in the control file, the command-line options take precedence.

7.6.2. Specifying Control File Options

The control file contains commands to tell SQL*Loader where to find the data, how to parse it, how to load it, what to do when errors occur, and what to do with records that fail validation. A control file has two or three main sections. The first contains session-wide-oriented information, such as log filename, bind size, and whether direct or conventional path loading will be used. The second section contains one or more INTO TABLE blocks. These blocks specify the target tables and columns. The third section, if present, is the actual data. Comments can appear anywhere in the control files (except in the data lines) and should be used liberally. The control file language can be somewhat cryptic, so generous use of comments is encouraged. Comments in a control file start with a double dash and end with a new line. The control file must begin with the line LOAD DATA or CONTINUE LOAD DATA and also have an INTO TABLE clause, together with directions on how to parse the data and load it into which columns.

The best way to learn how to construct a control file is to look at examples and then use variations of them to build your control file. This section gives you several examples, but is certainly not a comprehensive sampling. Again, the intent is to present you with enough information to get you going.

NOTE

For a comprehensive reference, see the Oracle manual Oracle Database Utilities 10g.

The first example is rather simple and straightforward. The control file contains both control file commands and the data. The command line is:

sqlldr hr/hr control=regions.ctl

The control file regions.ctl contains the following:

LOAD DATA
-- Control file begins with LOAD DATA
INFILE *
-- The * tells SQL*Loader the data is inline
INTO TABLE regions TRUNCATE
-- truncate the target table before loading
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
-- how to parse the data
 (region_id, region_name)
-- positional mapping of data file fields to table columns
-- lines following BEGINDATA are loaded
-- no comments are allowed after BEGINDATA
BEGINDATA
1,"Europe"
2,"Americas"
3,"Asia"
4,"Middle East and Africa"

The LOAD DATA command tells SQL*Loader that you are beginning a new data load. If you are continuing a data load that was interrupted, specify CONTINUE LOAD DATA. The command INFILE * tells SQL*Loader that the data will appear in the control file. The table REGIONS is loaded. The keyword TRUNCATE tells SQL*Loader to truncate the table before loading it. Instead of TRUNCATE, you can specify INSERT (the default), which requires the table to be empty at the start of the load. APPEND tells SQL*Loader to add the data to any existing data in the table. REPLACE tells SQL*Loader to issue a DELETE to empty out the table before loading. DELETE differs from a TRUNCATE in that delete DML triggers fire, and DELETE can be rolled back.

The lines in the control file that follow BEGINDATA contain the data to parse and load. The parsing specification tells SQL*Loader that the data fields are comma-delimited and that text data can be enclosed by double quotation marks. These double quotation marks should not be loaded as part of the data. The list of columns enclosed in parentheses are the table columns that will be loaded with the data fields.

In the second example, the same data is loaded into the same table, but it is located in a standalone file called regions.dat and is in the following pipe-delimited, fixed format:

1|Europe                 |
2|Americas               |
3|Asia                   |
4|Middle East and Africa |

The command line is:

sqlldr hr/hr control=regions.ctl

The content of the control file is:

LOAD DATA
INFILE  '/apps/seed_data/regions.dat'
BADFILE '/apps/seed_data/regions.bad'
DISCARDFILE '/apps/seed_data/regions.dsc'
OPTIONS (DIRECT=TRUE)
-- data file spec
INTO TABLE regions APPEND
-- add this data to the existing target table
(region_id   POSITION(1)    INTEGER EXTERNAL
,region_name POSITION(3:25) NULLIF region_name = BLANKS
) -- how to parse the data

The control file tells SQL*Loader where to find the data file (INFILE) as well as the bad and discard files (BADFILE and DISCARDFILE). The OPTIONS line specifies direct path loading. With fixed-format data, the column specification identifies the starting and ending positions. A numeric datatype can be identified as INTEGER EXTERNAL. The directive NULLIF region_name = BLANKS tells SQL*Loader to set the region_name column to NULL if the data field contains only white space.

You shouldn't have to know the minutiae of how to tell SQL*Loader precisely how to parse data—the options are far too arcane to expect you to recite them off the top of your head for an exam—but knowing the SQL*Loader capabilities of reading fixed format and variable format data is essential. More important to your job is knowing about direct path loads and unusable indexes, which are discussed in the next section.

7.6.2.1. Using Direct Path Loading

Direct path loading is a SQL*Loader option that allows you, under certain conditions, to use the direct path interface to load data into a table. The direct path interface can be significantly faster than conventional path loading. With conventional loading, SQL*Loader loads data into a bind array and passes it to the database engine to process with an INSERT statement. Full undo and redo mechanisms operate on conventional path loads.

With direct path loading, SQL*Loader reads data, passing it to the database via the direct path API. The API formats it directly into Oracle data blocks in memory and then flushes these blocks, en masse, directly to the datafiles using multiblock I/O, bypassing the buffer cache, as well as the redo and undo mechanisms. Direct path loads always write to a table above the highwater mark; thus, always increase the number of data blocks that a table is actually using.

The important thing to remember about direct path load is that it is fast, but has restrictions, including the following:

  • Indexes are rebuilt at the end of a direct path load. If unique constraint violations are found, the unique index is left in an unusable state. To correct the index, you must find and remove the constraint violations and then rebuild the index.

    NOTE

    Unusable indexes are a possible result of direct path loading. Make sure you know what causes an unusable index and how to fix it.

  • Direct path load cannot occur if there are active transactions against the table being loaded.

  • Triggers do not fire during direct path loads.

  • Direct path loading into clustered tables is not supported.

  • During direct path loads, foreign key constraints are disabled at the beginning of the load and then reenabled after the load.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset