14.3. Diagnostic Sources

Database problems are unavoidable. No matter how well your system may be configured and managed, problems will still occur. As an Oracle DBA, it is your responsibility to resolve database issues in a timely manner to ensure maximum database uptime. To do this, you need to be aware of the diagnostic data that Oracle provides to aid in the troubleshooting process.

The alert log provides a high-level record of the significant events that occur in a database instance. It is not relegated to strictly logging errors. The alert log is an excellent source of diagnostic data and is generally considered the first place to check when an error occurs.

When more detailed diagnostic data is required, Oracle's tracing functionality provides an in-depth view into the internal workings of the database. Trace files are automatically produced when background processes encounter problem situations. Trace files can also be produced to analyze SQL execution for performance purposes. The data in these trace files can provide an overwhelming amount of diagnostic data.

In the following sections, you will learn to work with the alert log and with Oracle trace files. You'll learn what types of diagnostic data they provide, how to control their size and location, and how they can be used to troubleshoot database problems.

14.3.1. Using the Oracle Alert Log

A primary source of database diagnostic information is the alert log. The alert log is a log file created and maintained by Oracle that logs and timestamps significant events that occur in the database. These significant events include all of the following (and more):

  • Administrative create, alter, and drop events

  • Redo log switches

  • Checkpoints (optional)

  • Recovery operations

  • Database startup and shutdown

  • ALTER SYSTEM operations

  • Database errors

  • DBMS_JOB and DBMS_SCHEDULER job errors

Therefore, the alert log represents a valuable record of database events that can be utilized for diagnostic purposes.

In the following sections, you will learn how the alert log is used by Oracle. You will also learn how you can use the alert log for diagnostic purposes. You will see different ways of viewing alert log data, and you will learn how to search for specific entries. Lastly, you'll learn a few alert log maintenance techniques.

14.3.1.1. An Overview of the Alert Log

The alert log file is automatically created by Oracle. By default, the file will be named alert_SID.log, where SID is the Oracle system identifier. The file is created using the path identified in the BACKGROUND_DUMP_DEST initialization parameter. For example, assume that the following entries exist in your init.ora or SPFILE file:

DB_NAME=lnxl
BACK6ROUND_DUMP_DEST=/datal/admin/lnx1/bdump

From this information, you can determine that the full path to your alert log would be

/datal/admin/1nxl/bdump/alert_lnx1.log

When Oracle needs to write an alert log entry, it will attempt to open this file in APPEND mode. If it isn't found, it will create a new file using the naming method described at the beginning of this section. It will then write the entry and close the file.

14.3.1.2. Alert Log Usage

The alert log can provide a wealth of information regarding your database instance. Because it stores a timestamped record of all the significant events in the database, it is invaluable as a diagnostic aid.

For example, suppose you arrive at work one morning and are immediately greeted by the boss, wondering why his nightly sales report wasn't in his e-mail inbox this morning. You check the alert log and find the following (the underlined entries will be discussed following the code):

Wed Oct 6 01:37:04 2004
Shutting down instance: further logons disabled
Wed Oct 6 01:37:06 2004
Stopping background process QMNC
Wed Oct 6 01:37:06 2004
Stopping background process CJQ0
Wed Oct 6 01:37:08 2004
Stopping background process MMNL
Wed Oct 6 01:37:09 2004
Stopping background process MMON
Wed Oct 6 01:37:10 2004
Shutting down instance (immediate)
License high water mark = 93
Wed Oct 6 01:37:10 2004
Stopping Job queue slave processes
Wed Oct 6 01:37:10 2004
Job queue slave processes stopped
Wed Oct 6 01:42:43 2004
Shutting down instance (abort)
License high water mark = 93
Instance terminated by USER, pid = 6132
Wed Oct 6 06:17:47 2004
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
KCCDEBUG_LEVEL = 0
Using LOG_ARCHIVE_DEST_1 parameter default value as
  /apps/oracle/oracle10gR1/dbs/arch

Autotune of undo retention is turned on.
Dynamic strands is set to TRUE
Running with 2 shared and 42 private strand(s). Zero-copy
  redo is FALSE
IMODE=BR
ILAT =42
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 10.1.0.2.0.
System parameters with non-default values:
  processes                = 350
  __shared_pool_size       = 150994944
  __large_pool_size        = 4194304
  __java_pool_size         = 8388608
  filesystemio_options     = setall
  dbwr_io_slaves           = 2
  sga_target               = 524288000
  control_files
  /data1/oradata/incdev/control01.ctl,
  /data1/oradata/incdev/control02.ctl,
  / apps/oracle/oradata/incdev/control01.ctl,
  /apps/oracle/oradata/incdev/control02.ctl
  db_block_size            = 8192
  __db_cache_size          = 356515840
  compatible               = 10.1.0.2.0
  log_buffer               = 2097152
  db_file_multiblock_read_count= 16
  undo_management          = AUTO
  undo_tablespace          = UNDOTBS1
  remote_login_passwordfile= EXCLUSIVE
  db_domain                =
  dispatchers              = (PROTOCOL=TCP)
  (SERVICE=irdbXDB)
  job_queue_processes      = 3
  background_dump_dest     = /data1/admin/incdev/bdump
  user_dump_dest           = /data1/admin/incdev/udump
  max_dump_file_size       = 250M
  core_dump_dest           = /data1/admin/incdev/cdump
  sort_area_size           = 65536
  db_name                  = incdev

open_cursors             = 1500
  optimizer_mode           = CHOOSE
  parallel_automatic_tuning= TRUE
  pga_aggregate_target     = 262144000
Deprecated system parameters with specified values:
  parallel_automatic_tuning
End of deprecated system parameter listing
PMON started with pid=2, OS id=6136
MMAN started with pid=3, OS id=6138
DBW0 started with pid=4, OS id=6140
LGWR started with pid=5, OS id=6142
CKPT started with pid=6, OS id=6144
SMON started with pid=7, OS id=6146
RECO started with pid=8, OS id=6148
Wed Oct 6 06:17:50 2004
starting up 1 dispatcher(s) for network address
  '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
CJQ0 started with pid=9, OS id=6150
Wed Oct 6 06:17:51 2004
starting up 1 shared server(s) ...
Wed Oct 6 06:17:51 2004
ALTER DATABASE  MOUNT
Wed Oct 6 06:17:51 2004
Controlfile identified with block size 16384
Wed Oct 6 06:17:55 2004
Setting recovery target incarnation to 1
Wed Oct 6 06:17:55 2004
Successful mount of redo thread 1, with mount id 2177287387
Wed Oct 6 06:17:55 2004
Database mounted in Exclusive Mode.
Completed: ALTER DATABASE  MOUNT
Wed Oct 6 06:17:56 2004
ALTER DATABASE OPEN
Wed Oct 6 06:17:56 2004
Beginning crash recovery of 1 threads
attempting to start a parallel recovery with 3 processes
  parallel recovery started with 3 processes
Wed Oct 6 06:17:57 2004
Started first pass scan
Wed Oct 6 06:17:57 2004

Completed first pass scan
  4 redo blocks read, 3 data blocks need recovery
Wed Oct 6 06:17:57 2004
Started redo application at
  Thread 1: logseq 88, block 64844, scn 0.0
Recovery of Online Redo Log: Thread 1 Group 1 Seq 88 Reading mem 0
  Mem# 0 errs 0: /data1/oradata/incdev/redo01.log
  Mem# 1 errs 0: /apps/oracle/oradata/incdev/redo01.log
Wed Oct 6 06:17:57 2004
Completed redo application
Wed Oct 6 06:17:58 2004
Completed crash recovery at
  Thread 1: logseq 88, block 64848, scn 0.1205991
  3 data blocks read, 3 data blocks written, 4 redo blocks read
Wed Oct 6 06:17:59 2004
Thread 1 advanced to log sequence 89
Maximum redo generation record size = 132096 bytes
Maximum redo generation change vector size = 116476 bytes
Private_strands 19 at log switch
Thread 1 opened at log sequence 89
  Current log# 2 seq# 89 mem# 0: /data1/oradata/incdev/redo02.log
  Current log# 2 seq# 89 mem# 1: /apps/oracle/oradata/incdev/redo02.log
Successful open of redo thread 1
Wed Oct 6 06:17:59 2004
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Oct 6 06:17:59 2004
SMON: enabling cache recovery
Wed Oct 6 06:18:00 2004
Successfully onlined Undo Tablespace 1.
Wed Oct 6 06:18:00 2004
SMON: enabling tx recovery
Wed Oct 6 06:18:00 2004
Database Characterset is WE8ISO8859P1
Wed Oct 6 06:18:00 2004
Published database character set on system events channel
Wed Oct 6 06:18:00 2004
All processes have switched to database character set
Wed Oct 6 06:18:04 2004
Starting background process QMNC
QMNC started with pid=19, OS id=6170

Wed Oct 6 06:18:05 2004
replication_dependency_tracking turned off (no async
  
multimaster replication found) Wed Oct 6 06:18:06 2004 Starting background process MMON Starting background process MMNL MMON started with pid=20, OS id=6172 MMNL started with pid=21, OS id=6174 Wed Oct 6 06:18:07 2004 Completed: ALTER DATABASE OPEN

From the information in this alert log, you can determine a great deal of detailed information regarding the events of the previous night (note that each number in the following list represents a highlighted section in the preceding alert log snippet):

  1. At 1:37:04 A.M., the database was issued a shutdown command (most likely using the immediate clause, because it mentions that further logons are disabled).

  2. Six seconds later (at 1:37:10 A.M.), you can confirm that it was indeed a shutdown immediate request. You can also see that the licensing high water mark reached 93 for the duration of the instance.

  3. About five minutes later, at 1:42:43 A.M., you can see that the shutdown immediate request was replaced with a shutdown abort request. Apparently, whoever was shutting down the instance got tired of waiting for things to shut down gracefully.

    Just below that, you'll find the entry:

    instance terminated by USER, pid = 6132

    While this entry doesn't necessarily identify which user was responsible, it does confirm that it was indeed a user process that issued the shutdown command. If this line had identified a background process name such as PMON or SMON, we could surmise that the background process had an error that forced the instance to be shut down. You could then look to the appropriate process trace file for details.

  4. At 6:17:47 A.M., the database was restarted. You can see the startup process as it occurs. You can also see the initialization parameters used at startup time.

  5. You can see that the instance has a deprecated initialization parameter (parallel_automatic_tuning) defined. This parameter should be removed from your init.ora file.

  6. At 6:17:56 A.M., Oracle started crash recovery on the instance. This is standard whenever an instance is brought down with the abort option.

  7. At 6:17:58 A.M., Oracle completed the crash recovery and continued with the instance startup procedures.

  8. At 6:18:07 A.M., the database was opened successfully.

As you can see, the alert log provides you with a wealth of information about your database, much of which is not available by any other means.

14.3.1.3. Checking the Alert Log Using EM Database Control

The alert log should be checked on a daily basis for errors or other events that may have occurred without your knowledge. To view the alert log, you can open the file directly through the operating system using any text editor or viewer. To view the alert log through the EM Database Control application, do the following:

  1. Navigate to the Database Home screen.

  2. Under Related Links at the bottom of the screen, select Alert Log Content. The contents of the alert log will be displayed on the screen, as shown in Figure 14.8.

Be aware that when viewing the alert log content through EM Database Control, only 100K of data will be displayed. Therefore, the entire contents of the file may not be viewable.

14.3.1.4. Viewing Alert Log Entries in the Database

If you prefer to view alert log data through the database itself, you can take advantage of Oracle's external tables feature. This technique creates a table named ALERT_LOG that will access the alert log file stored externally to the database. Keep in mind that this is just one of the many ways in which this can be accomplished.

Figure 14.8. EM Database Control Most Recent Alert Log Entries screen

To create the table, you must first create a directory object that references the location of the alert log file. Remember that this will be the location defined in the BACKGROUND_DUMP_DEST initialization parameter. Here's an example:

SQL> create directory BDUMP as /data1/admin/lnx1/bdump';

Directory created.

Next, you can create the table, referencing the directory object that you just created. The table will have a single column named log_line that holds one line of the alert log, as shown here:

SQL> create table alert_log ( log_line varchar2(80) )
   organization external (
   type oracle_loader
   default directory BDUMP
   access parameters (records delimited by newline)
   location('alert_lnx1.log')
   )
   reject limit 1000;

Table created.

The table can now be queried, as follows:

SQL> select * from alert_log
     where log_line like '%ORA-%';

LOG_LINE
-----------------------------------------------------------
ORA-12012: error on auto execute of job 756
ORA-04068: existing state of packages has been discarded
ORA-04063: package body "JNEXUS.PKG_AC_DIFF" has errors
ORA-06508: PL/SQL: could not find program unit being called
ORA-06512: at line 1

5 rows selected.

In this example, you can see that five lines contained ORA errors.

Keep in mind that the preceding example does not represent a thorough search of the alert log. The file may contain warnings, actions, or errors that do not contain the string ORA- in the text.


Real World Scenario: Maintaining Historical Alert Log Data

Early in my DBA career, I created a simple web-based tool to allow me to perform certain administrative functions from a browser (this was back before Enterprise Manager was available through a browser). One of the functions I wanted to be able to perform was to browse the alert log and search for ORA errors. To accomplish this, the alert log data was loaded into a database table on a nightly basis.

This functionality worked well, and I was happy with the results. However, I never got around to coding any maintenance routines to purge the old data, so my historical data grew quite large. This wasn't a requirement, just the product of my laziness.

As fate would have it, my boss came to me with a strange request. Apparently, our applications group was having ongoing issues with a third-party application that had been installed about six months earlier. The vendor was trying to convince my boss that the problems were due to a certain Oracle error that could occur when their application was installed. In other words, he was saying that it wasn't their problem; it was our problem.

"If you can track down your old alert logs," he said, "you'll see that I'm right. There will be an ORA error." Our suspicion was that he was trying to buy time, assuming that we'd have to restore backup tapes and pore through files for some time.

I knew that my application had been collecting the alert log data for some time, but I wasn't sure how far back it went. I asked for the date, keyed it into my application, and Shazam! There it was. We searched through the log file and located a CREATE TABLESPACE command which we recognized as part of the vendor's installation procedure. However, there was no sign of the alleged Oracle error to be found. We were vindicated.


14.3.1.5. Alert Log Maintenance

As new events are recorded in the alert log, they are appended to the end of the existing alert log file. Over time, this file can grow to a significant size and therefore requires periodic maintenance to prevent this from occurring.

Alert log maintenance is relatively simple, but it requires some knowledge of the underlying mechanism used by Oracle to write to the file. When Oracle logs an event to the alert log, it performs the following operations on the file:

  • Open the alert log file using APPEND mode. If the file exists, move the file pointer to the end of the file. If the file doesn't exist, create it.

  • Write the alert data to the file.

  • Close the file.

Now, compare Oracle's method with the following method:

  • Mount the database.

  • Open the alert log file using APPEND mode. If the file exists, move the file pointer to the end of the file. If the file doesn't exist, create it.

  • Open the database.

  • Write the alert data to the file as events occur.

  • Close the file when database shutdown occurs.

In the first method, Oracle has almost no relationship with the alert log file. Oracle doesn't care if it even exists. It will just create a new one. The only relationship to the file is for the few microseconds that the log entry is being written.

In the second method, Oracle has a strong relationship with the file and requires that the file exist for the duration of the instance (much like a database datafile). This method would likely require file maintenance to be performed when the database is down.

Because Oracle uses the first method, maintenance is much simpler. The only real question is how long to keep the alert log data.

A historical alert log file is of dubious value, and how long alert log data should be maintained is more a question of company policy than anything else. However, as the preceding Real World Scenario testified, you never know when having historical alert log data may be useful.

As an example, let's assume that you are required to keep the last 30 days' worth of log information. Let's also assume that you want each file to represent one day of database usage. The following pseudocode could be used to build a rudimentary maintenance script:

  1. Every night at midnight, rename the current alert log file to alert_log.YYYYMMDD (where YYYYMMDD represents the current date).

  2. Check to see if any existing logs are older than 30 days.

  3. If found, delete the old logs.

  4. End.

As you can see, the alert log file can simply be renamed (or deleted) without incident. When Oracle needs to write a new alert log entry, it will create a new one.

If searching historical alert log entries is important, you could also store alert log entries into a database table (or external table, as shown earlier in this chapter). This will allow you to use standard SQL to query the alert log history.

14.3.2. Using Server-Generated Alerts

A server-generated alert is a notification from Oracle that indicates a potential problem in the database. These alerts generally indicate that the threshold level for a monitored metric has been exceeded.

NOTE

Server-generated alerts were covered in detail in Chapter 8, "Understanding Automatic Database Management."

By default, Oracle computes seven distinct metrics based on the contents of the alert log. Where indicated, these metrics have thresholds defined to automatically generate alerts in certain conditions. In short, this means that Oracle automatically monitors the alert log file for you and notifies you if it finds troublesome messages.

These alert log metrics include the following:

Alert log error trace file Oracle detects any references to system-generated trace files. These are generally trace files produced by a background process that encounters an error situation.

Alert log name Oracle checks the name and location of the alert log file.

Archiver hung alert log error Oracle detects any ORA-00257 messages (indicating that an ARCH archiver process is hung) in the alert log. Because a hung archiver process will suspend the database, this metric has a critical threshold set. Any appearance of an ORA-00257 message will generate a critical alert.

Data block corruption alert log error Oracle detects error messages relating to database block corruption in the alert log. This metric has a critical threshold set and will generate a critical alert when any block corruption messages appear.

Generic alert log error Oracle detects the presence of any ORA errors in the alert log. This metric has a warning threshold set and will generate a warning alert if the error is an ORA-00600.

Session-terminated alert log error Oracle detects any ORA-00603 session unexpectedly terminated messages in the alert log. This metric has a warning threshold set and will generate a warning alert if any ORA-00603 messages appear.

You can also add your own threshold settings to these metrics to instruct Oracle to monitor for other types of messages.

Server-generated alerts can be viewed through EM Database Control by following these steps:

  1. Navigate to the Database Home screen.

  2. In the Alerts section of the screen, you can see the most recent server-generated alert notifications, as shown in Figure 14.9.

  3. If any alert log messages are found in this section, you can click the message to see further details. If not, click the Alert History link at the bottom of the screen to display the Alert History screen.

  4. Under the Metric column, look for an entry labeled Generic Alert Log Error Status. If one is found, click it. The Generic Alert Log Error Status screen is displayed, as shown in Figure 14.10. This screen shows you the appearances of ORA messages in the alert log.

Figure 14.9. EM Database Control Alerts screen

Figure 14.10. EM Database Control Generic Alert Log Error Status screen

14.3.3. Using Oracle Trace Files

When an Oracle background process encounters an error situation, it will generate a trace file. The trace file contains in-depth details about the error and the state of the process at the time of the error. These trace files are intended to help you and/or Oracle Support to diagnose the cause of the error.

However, tracing is not delegated strictly to error diagnosis, as you are probably well aware. Oracle's SQL trace functionality allows you to trace user processes (sessions) as well. SQL trace produces incredibly detailed output of all SQL executed by a session (or for all sessions, if enabled). These trace files can be used to diagnose poorly performing SQL statements. They can also help you locate inefficient code in applications such as excessive parsing, unnecessary database calls, and so on.

In the following sections, you will learn how to control the location and sizing of trace files. You'll also learn the naming conventions Oracle follows when creating trace files. Next, you'll learn how to add an identifier to user session trace files to make them easier to identify.

NOTE

Please note that the purpose of this section is not to analyze the contents of trace files. That is beyond the scope of this book.

14.3.3.1. Trace File Location and Sizing Parameters

The location and sizing of trace files is controlled by the following initialization parameters:

BACKGROUND_DUMP_DEST The BACKGROUND_DUMP_DEST parameter defines the operating system path where background process trace files will be created. Trace files associated with parallel query slave processes and job queue processes are also created in this location.

This also happens to be the location of the alert log file. This is because Oracle classifies the alert log as a special type of trace file.

USER_DUMP_DEST The USER_DUMP_DEST parameter defines the operating system path where user session trace files will be created. User session trace files are created automatically when the SQL trace feature is enabled for a session (or for the instance).

MAX_DUMP_FILE_SIZE The MAX_DUMP_FILE_SIZE parameter specifies the maximum size for trace files. The default value for this parameter is UNLIMITED, which places no restriction on trace file size.

This parameter applies to all types of trace files.

If a value is specified for this parameter, it must be a numeric value, optionally suffixed with the letter K (specifying kilobytes) or the letter M (specifying megabytes). If no suffix is provided, the value will be interpreted as number of blocks.

When a session trace file reaches the maximum size, tracing will cease. The resultant trace file is truncated at the point when it reached maximum size. This can result in important tracing data being lost.

14.3.3.2. Trace File Naming

Oracle trace files receive system-generated filenames. These names follow one of two formatting conventions, depending on the type of process being traced. Background processes use this format:

sid_process_name_process_id.trc

User processes use this format:

sid_ora_process_id.trc

Table 14.1 shows some examples.

Table 14.1. Trace File Naming Examples
SIDSession TypePIDTrace File NameTrace File Location
lnxlUser129951nxl_ora_12995.treUSER_DUMP_DEST
lnxlPMON17661nxl_pmon_1766.treBACKGROUND_DUMP_DEST
lnxlUser4867lnxl_ora_4867.trcUSER_DUMP_DEST
lnxlLGWR2247lnxl_lgwr_2247.trcBACKGROUND_DUMP_DEST

Given Oracle's somewhat cryptic file-naming conventions, it can be difficult to locate a specific trace file, unless you happen to know the process ID for the session. To alleviate this problem, the TRACEFILE_IDENTIFIER parameter can be used.

The TRACEFILE_IDENTIFIER parameter allows you specify a text string that will be appended to the names of user session trace files. This identifier makes it easier to locate specific trace files, especially on busy systems.

For example, suppose you have a script that runs each night at midnight. It usually runs in about five minutes, but lately it has been taking considerably longer to finish. You decide to use SQL tracing to diagnose the problem. At the beginning of your SQL script, add the following lines:

ALTER SESSION SET SQLTRACE = TRUE
ALTER SESSION SET TIMED_STATISTICS = TRUE
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TIM'

Assuming that the process ID for your job was 10322, you should find a file named lnx1_ora-10322_TIM.trc in the USER_DUMP_DEST directory.

NOTE

The TRACEFILE_IDENTIFIER parameter applies only to user session trace files. It has no effect on background process trace files.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset