Database problems are unavoidable. No matter how well your system may be configured and managed, problems will still occur. As an Oracle DBA, it is your responsibility to resolve database issues in a timely manner to ensure maximum database uptime. To do this, you need to be aware of the diagnostic data that Oracle provides to aid in the troubleshooting process.
The alert log provides a high-level record of the significant events that occur in a database instance. It is not relegated to strictly logging errors. The alert log is an excellent source of diagnostic data and is generally considered the first place to check when an error occurs.
When more detailed diagnostic data is required, Oracle's tracing functionality provides an in-depth view into the internal workings of the database. Trace files are automatically produced when background processes encounter problem situations. Trace files can also be produced to analyze SQL execution for performance purposes. The data in these trace files can provide an overwhelming amount of diagnostic data.
In the following sections, you will learn to work with the alert log and with Oracle trace files. You'll learn what types of diagnostic data they provide, how to control their size and location, and how they can be used to troubleshoot database problems.
A primary source of database diagnostic information is the alert log. The alert log is a log file created and maintained by Oracle that logs and timestamps significant events that occur in the database. These significant events include all of the following (and more):
Administrative create, alter, and drop events
Redo log switches
Checkpoints (optional)
Recovery operations
Database startup and shutdown
ALTER SYSTEM operations
Database errors
DBMS_JOB and DBMS_SCHEDULER job errors
Therefore, the alert log represents a valuable record of database events that can be utilized for diagnostic purposes.
In the following sections, you will learn how the alert log is used by Oracle. You will also learn how you can use the alert log for diagnostic purposes. You will see different ways of viewing alert log data, and you will learn how to search for specific entries. Lastly, you'll learn a few alert log maintenance techniques.
The alert log file is automatically created by Oracle. By default, the file will be named alert_SID.log, where SID is the Oracle system identifier. The file is created using the path identified in the BACKGROUND_DUMP_DEST initialization parameter. For example, assume that the following entries exist in your init.ora or SPFILE file:
DB_NAME=lnxl BACK6ROUND_DUMP_DEST=/datal/admin/lnx1/bdump
From this information, you can determine that the full path to your alert log would be
/datal/admin/1nxl/bdump/alert_lnx1.log
When Oracle needs to write an alert log entry, it will attempt to open this file in APPEND mode. If it isn't found, it will create a new file using the naming method described at the beginning of this section. It will then write the entry and close the file.
The alert log can provide a wealth of information regarding your database instance. Because it stores a timestamped record of all the significant events in the database, it is invaluable as a diagnostic aid.
For example, suppose you arrive at work one morning and are immediately greeted by the boss, wondering why his nightly sales report wasn't in his e-mail inbox this morning. You check the alert log and find the following (the underlined entries will be discussed following the code):
Wed Oct 6 01:37:04 2004 Shutting down instance: further logons disabled Wed Oct 6 01:37:06 2004 Stopping background process QMNC Wed Oct 6 01:37:06 2004 Stopping background process CJQ0 Wed Oct 6 01:37:08 2004 Stopping background process MMNL Wed Oct 6 01:37:09 2004 Stopping background process MMON Wed Oct 6 01:37:10 2004 Shutting down instance (immediate) License high water mark = 93 Wed Oct 6 01:37:10 2004 Stopping Job queue slave processes Wed Oct 6 01:37:10 2004 Job queue slave processes stopped Wed Oct 6 01:42:43 2004 Shutting down instance (abort) License high water mark = 93 Instance terminated by USER, pid = 6132 Wed Oct 6 06:17:47 2004 Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Picked latch-free SCN scheme 3 KCCDEBUG_LEVEL = 0 Using LOG_ARCHIVE_DEST_1 parameter default value as /apps/oracle/oracle10gR1/dbs/arch
Autotune of undo retention is turned on. Dynamic strands is set to TRUE Running with 2 shared and 42 private strand(s). Zero-copy redo is FALSE IMODE=BR ILAT =42 LICENSE_MAX_USERS = 0 SYS auditing is disabled Starting up ORACLE RDBMS Version: 10.1.0.2.0. System parameters with non-default values: processes = 350 __shared_pool_size = 150994944 __large_pool_size = 4194304 __java_pool_size = 8388608 filesystemio_options = setall dbwr_io_slaves = 2 sga_target = 524288000 control_files /data1/oradata/incdev/control01.ctl, /data1/oradata/incdev/control02.ctl, / apps/oracle/oradata/incdev/control01.ctl, /apps/oracle/oradata/incdev/control02.ctl db_block_size = 8192 __db_cache_size = 356515840 compatible = 10.1.0.2.0 log_buffer = 2097152 db_file_multiblock_read_count= 16 undo_management = AUTO undo_tablespace = UNDOTBS1 remote_login_passwordfile= EXCLUSIVE db_domain = dispatchers = (PROTOCOL=TCP) (SERVICE=irdbXDB) job_queue_processes = 3 background_dump_dest = /data1/admin/incdev/bdump user_dump_dest = /data1/admin/incdev/udump max_dump_file_size = 250M core_dump_dest = /data1/admin/incdev/cdump sort_area_size = 65536 db_name = incdev
open_cursors = 1500 optimizer_mode = CHOOSE parallel_automatic_tuning= TRUE pga_aggregate_target = 262144000 Deprecated system parameters with specified values: parallel_automatic_tuning End of deprecated system parameter listing PMON started with pid=2, OS id=6136 MMAN started with pid=3, OS id=6138 DBW0 started with pid=4, OS id=6140 LGWR started with pid=5, OS id=6142 CKPT started with pid=6, OS id=6144 SMON started with pid=7, OS id=6146 RECO started with pid=8, OS id=6148 Wed Oct 6 06:17:50 2004 starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'... CJQ0 started with pid=9, OS id=6150 Wed Oct 6 06:17:51 2004 starting up 1 shared server(s) ... Wed Oct 6 06:17:51 2004 ALTER DATABASE MOUNT Wed Oct 6 06:17:51 2004 Controlfile identified with block size 16384 Wed Oct 6 06:17:55 2004 Setting recovery target incarnation to 1 Wed Oct 6 06:17:55 2004 Successful mount of redo thread 1, with mount id 2177287387 Wed Oct 6 06:17:55 2004 Database mounted in Exclusive Mode. Completed: ALTER DATABASE MOUNT Wed Oct 6 06:17:56 2004 ALTER DATABASE OPEN Wed Oct 6 06:17:56 2004 Beginning crash recovery of 1 threads attempting to start a parallel recovery with 3 processes parallel recovery started with 3 processes Wed Oct 6 06:17:57 2004 Started first pass scan Wed Oct 6 06:17:57 2004
Completed first pass scan 4 redo blocks read, 3 data blocks need recovery Wed Oct 6 06:17:57 2004 Started redo application at Thread 1: logseq 88, block 64844, scn 0.0 Recovery of Online Redo Log: Thread 1 Group 1 Seq 88 Reading mem 0 Mem# 0 errs 0: /data1/oradata/incdev/redo01.log Mem# 1 errs 0: /apps/oracle/oradata/incdev/redo01.log Wed Oct 6 06:17:57 2004 Completed redo application Wed Oct 6 06:17:58 2004 Completed crash recovery at Thread 1: logseq 88, block 64848, scn 0.1205991 3 data blocks read, 3 data blocks written, 4 redo blocks read Wed Oct 6 06:17:59 2004 Thread 1 advanced to log sequence 89 Maximum redo generation record size = 132096 bytes Maximum redo generation change vector size = 116476 bytes Private_strands 19 at log switch Thread 1 opened at log sequence 89 Current log# 2 seq# 89 mem# 0: /data1/oradata/incdev/redo02.log Current log# 2 seq# 89 mem# 1: /apps/oracle/oradata/incdev/redo02.log Successful open of redo thread 1 Wed Oct 6 06:17:59 2004 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Wed Oct 6 06:17:59 2004 SMON: enabling cache recovery Wed Oct 6 06:18:00 2004 Successfully onlined Undo Tablespace 1. Wed Oct 6 06:18:00 2004 SMON: enabling tx recovery Wed Oct 6 06:18:00 2004 Database Characterset is WE8ISO8859P1 Wed Oct 6 06:18:00 2004 Published database character set on system events channel Wed Oct 6 06:18:00 2004 All processes have switched to database character set Wed Oct 6 06:18:04 2004 Starting background process QMNC QMNC started with pid=19, OS id=6170
Wed Oct 6 06:18:05 2004 replication_dependency_tracking turned off (no async multimaster replication found) Wed Oct 6 06:18:06 2004 Starting background process MMON Starting background process MMNL MMON started with pid=20, OS id=6172 MMNL started with pid=21, OS id=6174 Wed Oct 6 06:18:07 2004 Completed: ALTER DATABASE OPEN
From the information in this alert log, you can determine a great deal of detailed information regarding the events of the previous night (note that each number in the following list represents a highlighted section in the preceding alert log snippet):
At 1:37:04 A.M., the database was issued a shutdown command (most likely using the immediate clause, because it mentions that further logons are disabled).
Six seconds later (at 1:37:10 A.M.), you can confirm that it was indeed a shutdown immediate request. You can also see that the licensing high water mark reached 93 for the duration of the instance.
About five minutes later, at 1:42:43 A.M., you can see that the shutdown immediate request was replaced with a shutdown abort request. Apparently, whoever was shutting down the instance got tired of waiting for things to shut down gracefully.
Just below that, you'll find the entry:
instance terminated by USER, pid = 6132
While this entry doesn't necessarily identify which user was responsible, it does confirm that it was indeed a user process that issued the shutdown command. If this line had identified a background process name such as PMON or SMON, we could surmise that the background process had an error that forced the instance to be shut down. You could then look to the appropriate process trace file for details.
At 6:17:47 A.M., the database was restarted. You can see the startup process as it occurs. You can also see the initialization parameters used at startup time.
You can see that the instance has a deprecated initialization parameter (parallel_automatic_tuning) defined. This parameter should be removed from your init.ora file.
At 6:17:56 A.M., Oracle started crash recovery on the instance. This is standard whenever an instance is brought down with the abort option.
At 6:17:58 A.M., Oracle completed the crash recovery and continued with the instance startup procedures.
At 6:18:07 A.M., the database was opened successfully.
As you can see, the alert log provides you with a wealth of information about your database, much of which is not available by any other means.
The alert log should be checked on a daily basis for errors or other events that may have occurred without your knowledge. To view the alert log, you can open the file directly through the operating system using any text editor or viewer. To view the alert log through the EM Database Control application, do the following:
Navigate to the Database Home screen.
Under Related Links at the bottom of the screen, select Alert Log Content. The contents of the alert log will be displayed on the screen, as shown in Figure 14.8.
Be aware that when viewing the alert log content through EM Database Control, only 100K of data will be displayed. Therefore, the entire contents of the file may not be viewable.
If you prefer to view alert log data through the database itself, you can take advantage of Oracle's external tables feature. This technique creates a table named ALERT_LOG that will access the alert log file stored externally to the database. Keep in mind that this is just one of the many ways in which this can be accomplished.
To create the table, you must first create a directory object that references the location of the alert log file. Remember that this will be the location defined in the BACKGROUND_DUMP_DEST initialization parameter. Here's an example:
SQL> create directory BDUMP as /data1/admin/lnx1/bdump'; Directory created.
Next, you can create the table, referencing the directory object that you just created. The table will have a single column named log_line that holds one line of the alert log, as shown here:
SQL> create table alert_log ( log_line varchar2(80) ) organization external ( type oracle_loader default directory BDUMP access parameters (records delimited by newline) location('alert_lnx1.log') ) reject limit 1000; Table created.
The table can now be queried, as follows:
SQL> select * from alert_log where log_line like '%ORA-%'; LOG_LINE ----------------------------------------------------------- ORA-12012: error on auto execute of job 756 ORA-04068: existing state of packages has been discarded ORA-04063: package body "JNEXUS.PKG_AC_DIFF" has errors ORA-06508: PL/SQL: could not find program unit being called ORA-06512: at line 1 5 rows selected.
In this example, you can see that five lines contained ORA errors.
|
As new events are recorded in the alert log, they are appended to the end of the existing alert log file. Over time, this file can grow to a significant size and therefore requires periodic maintenance to prevent this from occurring.
Alert log maintenance is relatively simple, but it requires some knowledge of the underlying mechanism used by Oracle to write to the file. When Oracle logs an event to the alert log, it performs the following operations on the file:
Open the alert log file using APPEND mode. If the file exists, move the file pointer to the end of the file. If the file doesn't exist, create it.
Write the alert data to the file.
Close the file.
Now, compare Oracle's method with the following method:
Mount the database.
Open the alert log file using APPEND mode. If the file exists, move the file pointer to the end of the file. If the file doesn't exist, create it.
Open the database.
Write the alert data to the file as events occur.
Close the file when database shutdown occurs.
In the first method, Oracle has almost no relationship with the alert log file. Oracle doesn't care if it even exists. It will just create a new one. The only relationship to the file is for the few microseconds that the log entry is being written.
In the second method, Oracle has a strong relationship with the file and requires that the file exist for the duration of the instance (much like a database datafile). This method would likely require file maintenance to be performed when the database is down.
Because Oracle uses the first method, maintenance is much simpler. The only real question is how long to keep the alert log data.
A historical alert log file is of dubious value, and how long alert log data should be maintained is more a question of company policy than anything else. However, as the preceding Real World Scenario testified, you never know when having historical alert log data may be useful.
As an example, let's assume that you are required to keep the last 30 days' worth of log information. Let's also assume that you want each file to represent one day of database usage. The following pseudocode could be used to build a rudimentary maintenance script:
Every night at midnight, rename the current alert log file to alert_log.YYYYMMDD (where YYYYMMDD represents the current date).
Check to see if any existing logs are older than 30 days.
If found, delete the old logs.
End.
As you can see, the alert log file can simply be renamed (or deleted) without incident. When Oracle needs to write a new alert log entry, it will create a new one.
If searching historical alert log entries is important, you could also store alert log entries into a database table (or external table, as shown earlier in this chapter). This will allow you to use standard SQL to query the alert log history.
A server-generated alert is a notification from Oracle that indicates a potential problem in the database. These alerts generally indicate that the threshold level for a monitored metric has been exceeded.
NOTE
Server-generated alerts were covered in detail in Chapter 8, "Understanding Automatic Database Management."
By default, Oracle computes seven distinct metrics based on the contents of the alert log. Where indicated, these metrics have thresholds defined to automatically generate alerts in certain conditions. In short, this means that Oracle automatically monitors the alert log file for you and notifies you if it finds troublesome messages.
These alert log metrics include the following:
Alert log error trace file Oracle detects any references to system-generated trace files. These are generally trace files produced by a background process that encounters an error situation.
Alert log name Oracle checks the name and location of the alert log file.
Archiver hung alert log error Oracle detects any ORA-00257 messages (indicating that an ARCH archiver process is hung) in the alert log. Because a hung archiver process will suspend the database, this metric has a critical threshold set. Any appearance of an ORA-00257 message will generate a critical alert.
Data block corruption alert log error Oracle detects error messages relating to database block corruption in the alert log. This metric has a critical threshold set and will generate a critical alert when any block corruption messages appear.
Generic alert log error Oracle detects the presence of any ORA errors in the alert log. This metric has a warning threshold set and will generate a warning alert if the error is an ORA-00600.
Session-terminated alert log error Oracle detects any ORA-00603 session unexpectedly terminated messages in the alert log. This metric has a warning threshold set and will generate a warning alert if any ORA-00603 messages appear.
You can also add your own threshold settings to these metrics to instruct Oracle to monitor for other types of messages.
Server-generated alerts can be viewed through EM Database Control by following these steps:
Navigate to the Database Home screen.
In the Alerts section of the screen, you can see the most recent server-generated alert notifications, as shown in Figure 14.9.
If any alert log messages are found in this section, you can click the message to see further details. If not, click the Alert History link at the bottom of the screen to display the Alert History screen.
Under the Metric column, look for an entry labeled Generic Alert Log Error Status. If one is found, click it. The Generic Alert Log Error Status screen is displayed, as shown in Figure 14.10. This screen shows you the appearances of ORA messages in the alert log.
When an Oracle background process encounters an error situation, it will generate a trace file. The trace file contains in-depth details about the error and the state of the process at the time of the error. These trace files are intended to help you and/or Oracle Support to diagnose the cause of the error.
However, tracing is not delegated strictly to error diagnosis, as you are probably well aware. Oracle's SQL trace functionality allows you to trace user processes (sessions) as well. SQL trace produces incredibly detailed output of all SQL executed by a session (or for all sessions, if enabled). These trace files can be used to diagnose poorly performing SQL statements. They can also help you locate inefficient code in applications such as excessive parsing, unnecessary database calls, and so on.
In the following sections, you will learn how to control the location and sizing of trace files. You'll also learn the naming conventions Oracle follows when creating trace files. Next, you'll learn how to add an identifier to user session trace files to make them easier to identify.
NOTE
Please note that the purpose of this section is not to analyze the contents of trace files. That is beyond the scope of this book.
The location and sizing of trace files is controlled by the following initialization parameters:
BACKGROUND_DUMP_DEST The BACKGROUND_DUMP_DEST parameter defines the operating system path where background process trace files will be created. Trace files associated with parallel query slave processes and job queue processes are also created in this location.
This also happens to be the location of the alert log file. This is because Oracle classifies the alert log as a special type of trace file.
USER_DUMP_DEST The USER_DUMP_DEST parameter defines the operating system path where user session trace files will be created. User session trace files are created automatically when the SQL trace feature is enabled for a session (or for the instance).
MAX_DUMP_FILE_SIZE The MAX_DUMP_FILE_SIZE parameter specifies the maximum size for trace files. The default value for this parameter is UNLIMITED, which places no restriction on trace file size.
This parameter applies to all types of trace files.
If a value is specified for this parameter, it must be a numeric value, optionally suffixed with the letter K (specifying kilobytes) or the letter M (specifying megabytes). If no suffix is provided, the value will be interpreted as number of blocks.
When a session trace file reaches the maximum size, tracing will cease. The resultant trace file is truncated at the point when it reached maximum size. This can result in important tracing data being lost.
Oracle trace files receive system-generated filenames. These names follow one of two formatting conventions, depending on the type of process being traced. Background processes use this format:
sid_process_name_process_id.trc
User processes use this format:
sid_ora_process_id.trc
Table 14.1 shows some examples.
SID | Session Type | PID | Trace File Name | Trace File Location |
---|---|---|---|---|
lnxl | User | 12995 | 1nxl_ora_12995.tre | USER_DUMP_DEST |
lnxl | PMON | 1766 | 1nxl_pmon_1766.tre | BACKGROUND_DUMP_DEST |
lnxl | User | 4867 | lnxl_ora_4867.trc | USER_DUMP_DEST |
lnxl | LGWR | 2247 | lnxl_lgwr_2247.trc | BACKGROUND_DUMP_DEST |
Given Oracle's somewhat cryptic file-naming conventions, it can be difficult to locate a specific trace file, unless you happen to know the process ID for the session. To alleviate this problem, the TRACEFILE_IDENTIFIER parameter can be used.
The TRACEFILE_IDENTIFIER parameter allows you specify a text string that will be appended to the names of user session trace files. This identifier makes it easier to locate specific trace files, especially on busy systems.
For example, suppose you have a script that runs each night at midnight. It usually runs in about five minutes, but lately it has been taking considerably longer to finish. You decide to use SQL tracing to diagnose the problem. At the beginning of your SQL script, add the following lines:
ALTER SESSION SET SQLTRACE = TRUE ALTER SESSION SET TIMED_STATISTICS = TRUE ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TIM'
Assuming that the process ID for your job was 10322, you should find a file named lnx1_ora-10322_TIM.trc in the USER_DUMP_DEST directory.
NOTE
The TRACEFILE_IDENTIFIER parameter applies only to user session trace files. It has no effect on background process trace files.