13 Configuring Database Diagnostics

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

13
Configuring Database Diagnostics

CERTIFICATION OBJECTIVES

13.01 Set up Automatic Diagnostic Repository

When you have to deal with database errors or even a database that is completely down, you don’t want to spend extra time documenting one or more database error conditions for Oracle Support. This chapter first gives you a brief overview of how Oracle reports problems with the database, and how it categorizes errors into problems and incidents.

Next, this chapter will step through a few examples of packaging all the necessary information using Support Workbench via the EM interface and sending it to Oracle Support as a service request. The Service Request can be submitted with a few clicks, but it can be customized before submittal either to remove confidential or proprietary information, or to add extra data or information to help Oracle Support solve the issue.

In addition, the Health Monitor framework provides both proactive and reactive tools to deal with database errors. The DBA can run a proactive health check manually using EM or PL/SQL packages. In contrast, the Health Monitor can run diagnostic checks in response to critical database errors.

Finally, the chapter will switch from reporting, managing, and submitting service requests for database problems and incidents, to recovering individual blocks after you have identified the problem. RMAN supports detection and recovery of individual blocks using the DB_BLOCK_CHECKING initialization parameter and the RMAN RECOVER . . . BLOCK command. RMAN also makes it easy to identify failures and implement repairs using the Data Recovery Advisor.

CERTIFICATION OBJECTIVE 13.01
Set Up Automatic Diagnostic Repository

The key to the Automatic Diagnostic Repository (ADR) is the first word: automatic. ADR is an always-on facility that captures errors in trace and dump files the first and any successive times they occur. Thus the “diagnostic” part of the Oracle feature name. The third part, "repository," is a location on disk that stores the diagnostic information on disk and comes with a tool that makes it easy to query the repository even when the database is unavailable.

The following sections provide more details about the structure of the repository, how to retrieve information from the repository, and how to find the diagnostic information you’re looking for in the repository using initialization parameters and data dictionary views. In addition, you’ll see how to easily and quickly package the diagnostic information from the ADR and send it to Oracle support for problem resolution.

Understanding the ADR

The ADR is a file-based repository of diagnostic and other non-critical information for all products in your environment. Each database instance and Automatic Storage Management (ASM) instance has its own directory structure called an ADR home within a top-level directory known as the ADR base. In a Real Application Cluster (RAC) environment, each instance has its own subdirectory, which not only makes it easy to view diagnostics for an individual instance, but also makes it easy for the diagnostic tools to analyze data across instances for cluster-wide problems.

The ADR base directory is also known as the ADR root directory. The location for the ADR base is set depending on the values of initialization parameters and environment variables. If the initialization parameter DIAGNOSTIC_DEST is set, the ADR base directory is set to this value, and all other file locations are set relative to this location. If DIAGNOSTIC_DEST is not set, then DIAGNOSTIC_DEST is set to the environment variable ORACLE_BASE. If ORACLE_BASE is not set, DIAGNOSTIC_DEST is set to the value $ORACLE_HOME/log. Figure 13-1 shows the ADR directory structure for the DW database.

FIGURE 13-1 ADR directory structure for the DW database

For the database in Figure 13-1, the initialization parameter DIAGNOSTIC_DEST is not set, so Oracle sets DIAGNOSTIC_DEST to the value of the environment variable ORACLE_BASE, which in this case is /u01/app/oracle:

If ORACLE_BASE were not set, the location for the ADR would be /u01/app/oracle/product/11.1.0/db_1/log.

You can retrieve the values for each diagnostic directory using the dynamic performance view V$DIAG_INFO, as in this example:

Note the column INST_ID. In a RAC environment, this value differentiates the value of each directory by node. For example, if DW were the cluster name, and the cluster contained three instances DW1, DW2, and DW3, the value for the second instance’s diagnostic trace directory would be as follows:

Compared to previous releases of Oracle, the diagnostic information is more partitioned. In other words, all non-incident traces are stored in the trace subdirectory, all core dumps are in the cdump directory, and all incident dumps are stored as individual directories within the incident subdirectory. Table 13-1 show the ADR location for each type of diagnostic data.

TABLE 13-1 ADR Diagnostic Information Directory Locations

Diagnostic Data Type	Location Within the ADR
Foreground process trace files	`ADR_HOME/trace`
Background process trace files	`ADR_HOME/trace`
Alert log	`ADR_HOME/alert` (XML format) `ADR_HOME/trace` (plaintext format)
Core dumps	`ADR_HOME/cdump`
Incident dumps	`ADR_HOME/incident/incdir_`n

Note the differentiation between trace and dump files. Trace files contain continuous output to diagnose a problem with a running process. A dump file is a onetime diagnostic output file resulting from an incident. Similarly, a core dump is a onetime platform-specific binary memory dump. Note also that there is no initialization parameter or environment variable named ADR_HOME. You can determine the value of ADR_HOME from the row in V$DIAG_INFO containing the name ADR Home:

Using the ADRCI Tool

The ADR Command Interpreter (ADRCI) tool makes it easy to query the contents of the ADR. You can use the tool in command mode, or create scripts to run in batch mode. You can use ADRCI even when the database is down—remember that the ADR is completely file-system based. In addition to querying the contents of the ADR with ADRCI, you can also package incident and problem information into a compressed ZIP file that you can send to Oracle support.

Note that ADRCI does not require a login or any other authorization. The contents of the ADR are protected only by operating system permissions on the directory containing the ADR file structures. For a default installation of Oracle Database 11g, this means that the ADR has the same permissions as the ORACLE_BASE directory and its subdirectories. You can alter the permissions further if you wish, but you must make sure that the user owning the Oracle processes (usually the oracle user) has full read-write access to the ADR.

When you start ADRCI, you see the current ADR base directory. Type help for a list of commands:

Even when there are no incidents or problems to view, you can perform more mundane tasks such as viewing the alert log from ADRCI:

Notice that the ADRCI tool tracks all ADR home directories within the ADR root directory. Therefore, you must select which database, ASM, or listener directory you want to view with ADRCI.

Of course, you can perform the same task from EM. From the bottom of the EM home page, click the Alert Log Contents link. Then select the number of lines at the end of the alert log that you want to see, and click Go. You’ll see the page shown in Figure 13-2.

FIGURE 13-2 Viewing Alert Log Contents from EM

CERTIFICATION OBJECTIVE 13.02
Using Support Workbench

The Support Workbench is accessible through the EM interface, which walks you through all aspects of problem detection, documentation, and resolution. The tools available through the Support Workbench enable you to view details on problems and incidents, run health checks on your database (either reactively or proactively), generate additional diagnostic data for service requests, and run advisors to help you resolve the problem or incident. Support Workbench also facilitates the packaging of all necessary diagnostic and supporting data for submittal to Oracle support using the MetaLink service. Once the service request has been submitted, you can track its status through the resolution cycle and close it when your problem is resolved.

The following sections first explain the distinction among the various events that occur in your database: (1) alerts, (2) problems, and (3) incidents. Next, an overview of how you submit a service request using the quick packaging method will be given. While you are working with Oracle support on the problem, you can run your own health checks and other advisors to help solve the problem. Finally, some advanced packaging options that permit you to add extra information to your service request, such as sample SQL statements and table data, to help Oracle support solve your problem will be described. Conversely, you can edit the package before you submit it to Oracle support to remove proprietary or confidential information.

Understanding Alerts, Problems, and Incidents

Although the alert log (either text format or XML format) contains all alerts for the instance, you see the alerts at the warning and critical levels on the EM home page. The dreaded ORA-00600 error messages are noted on the EM home page. In the next example, you try using a feature that worked in previous versions of Oracle Database but which causes an ORA-00600 message in Oracle Database 11g Release 1:

The alert log shows the error as well:

In Figure 13-3, you see both of these types of alerts near the bottom of the page, including the ORA-00600 error that you inadvertently triggered!

FIGURE 13-3 Viewing error alerts in EM

A problem, defined by the Support Workbench framework, is a critical error in the database, such as the internal error ORA-00600 or some other serious event such as running out of memory in the shared pool or an operating system exception. An incident is a single occurrence of a problem. Each problem has a problem key, which is a text string that contains the error code and optionally other problem characteristics. A problem may have one or many incidents. Each incident is identified by a numeric incident ID and is stored in its own subdirectory of the ADR (ADR_HOME/incident/incdir_n). When an incident occurs, the database performs the following steps:

1. Adds an entry to the alert log (both text and XML-based).

2. Sends an incident alert to EM.

3. Sends an alert via e-mail to administrator(s) (if configured).

4. Gathers trace files and other incident information.

5. Tags all incident information with the incident ID.

6. Creates an incident directory under ADR_HOME for the incident and stores incident information and trace files in the directory.

Managing Service Requests

After Oracle automatically collects incident information and notifies you, you will use Support Workbench as an end-to-end tool to manage the problem. The typical steps you use in Support Workbench are as follows:

1. View critical errors on the EM home page (or via an e-mail message).

2. View the problem details.

3. Gather additional diagnostics and health checks.

4. Create an Oracle Service Request (SR).

5. Package necessary information and send to Oracle support.

6. Track the SR and implement repairs suggested by Oracle and the advisor framework tools.

7. Close the incident(s).

The following sections give you an overview of creating an SR, implementing repairs, and optionally including additional information in the SR using custom packaging.

Creating and Tracking an SR

When an ORA-00600 or any other, less serious error has occurred, you can package the incident details and send them to Oracle support. On the page shown in Figure 13-3, click the message link of the incident, and then click View Problem Details. You see the page in Figure 13-4.

FIGURE 13-4 Viewing problem details

From this page, you can perform a number of actions. You can create a MetaLink service request, research the MetaLink knowledge base, or run additional checkers and diagnostics. You can quickly create a package of all incident information by clicking the Quick Package link. On the page shown in Figure 13-5, you start the process of creating a new package and submitting an SR right after package creation. You supply your MetaLink credentials and Customer Service Identifier (CSI).

FIGURE 13-5 Quick packaging an incident: step 1

When you click Next, EM creates the package. You see the page in Figure 13-6.

FIGURE 13-6 Quick packaging an incident: step 2

On the next page, shown in Figure 13-7, you can review the manifest that Oracle creates for the incident. It shows the summary of the problem and the location of the file containing the manifest.

FIGURE 13-7 Quick packaging an incident: step 3

The final step of the quick packaging process, shown in Figure 13-8, gives you the option to submit the package immediately or at a later time. Typically, you will submit the service request right away, but you may want to submit it later if you think you can solve the problem yourself before users start complaining!

FIGURE 13-8 Quick packaging an incident: step 4

After the package has been submitted to Oracle support, you can view the status of the incident resolution on the Problem Details page shown in Figure 13-4.

Implementing Repairs

On the Problem Details page, you can also attempt to use an Oracle Advisor to implement repairs. The advisor available on the page shown in Figure 13-4 is the SQL Repair Advisor. Also available in your toolbox is the Data Recovery Advisor. Due to the nature of the problem in Figure 13-4, Support Workbench has determined that the problem is with a SQL statement, not a data corruption problem, and therefore has not recommended the Data Recovery Advisor this time. Figure 13-9 shows the Incident Analysis page where you can submit the SQL Repair Advisor job.

FIGURE 13-9 Using SQL Repair Advisor to analyze a failed SQL statement

After the SQL Repair Advisor runs, you can immediately view the results of the job from Advisor Central or return to the database home page and click the Advisor Central link at the bottom of the page.

Using Custom Packaging

Custom packaging gives you more flexibility and more control over the contents of an incident package. Unlike quick packaging, custom packaging enables you to

Add or remove problems or incidents from the package.

Add, edit, or remove trace files from the package.

Add or remove external files (Word documents, export dumps, and so on).

Add other diagnostic information such as SQL test cases.

Remove sensitive information or other unnecessary information to reduce the size of the package.

To initiate custom packaging, start from the Support Workbench home page shown in Figure 13-10, select the incident(s) you want to package, and then click Package.

FIGURE 13-10 Initiating custom packaging from the Support Workbench

As you can see in Figure 13-11, you can create an entirely new package for this incident or even start with an existing package. In the example in Figure 13-11, you selected the package you created during quick packaging when you realized that you needed to add additional documentation and trace files to the incident package before sending it to Oracle support.

FIGURE 13-11 Using an existing package for custom packaging

On the page shown in Figure 13-12, you can see the types of information you can add or remove from the package, such as additional external files, other problems, or other dumps. Once the package is completed, you can generate the new upload file and submit it to Oracle support.

FIGURE 13-12 Adding or removing information from a custom package

Using the Health Monitor

You can use the Oracle Health Monitor framework to proactively or reactively assess the health of your database. Health Monitor checks the status of various database components, including the following:

Datafiles

Memory

Transaction integrity

Metadata

Process usage

You can run health checks using EM via the Support Workbench, or you can manually run a health check using the DBMS_HM PL/SQL package. Some checks can be run only when the database is open. Other checks are available when the instance is running but the database is in NOMOUNT mode. You can list the checks available and whether they are available in offline or online mode by querying the dynamic performance view V$HM_CHECK:

You can view the results of a health check from EM, the ADRCI tool discussed earlier in this chapter, the DBMS_HM package, or the dynamic performance view V$HM_RUN.

From the EM Support Workbench page, click the Run Checkers link to run additional diagnostics. On the page shown in Figure 13-13, you see the health checks recommended for the selected incident. Even if no specific checkers are recommended, you can still run one of the available checkers to, for instance, check the integrity of the redo log stream and the core dictionary objects.

FIGURE 13-13 Run Checkers recommendation page

You can view the results of health checks from the ADRCI tool with the show hm_run command. In the following examples, you retrieve the results of a health monitor run first in text format, then you generate a health monitor report in XML format:

CERTIFICATION OBJECTIVE 13.03
Perform Block Media Recovery

Many of the errors you will encounter will be related to bad blocks in your database datafiles, either because of media failures, server memory errors, or logical corruption caused by Oracle errors. Once you’ve identified these problems using the diagnostic methods provided earlier in this chapter, you can use the tools in the following sections to fix the problems.

As with nearly every Oracle feature, you can adjust the level of control and monitoring that Oracle performs; data block checking is no exception. Regardless of the settings you’ll learn about shortly, Oracle always performs the following checks on a data block when it is read or written to a datafile:

Block version (matches the version of the database)

Data block address (DBA) in the cache is the same as the DBA value in the block buffer

Block checksum is correct

You can repair a corrupt block by either recovering the block, dropping the object containing the bad block, or both. The following sections tell you more about block corruption, how to control the amount of overhead Oracle will use to ensure the integrity of blocks, and how to fix a corrupted block.

Understanding Block Corruption

When Oracle detects a corrupted block, it registers an ORA-01578 error in the alert log and on the EM home page. Included in the error message are the absolute file number and block number of the bad block. In addition, the session reading or writing the bad block sees the same error message. Here is an example of a block corruption error message:

Most often, corruption is caused by operating system or disk hardware failures such as faulty I/O hardware or firmware, operating system caching problems, memory or paging problems, or errors caused by disk repair utilities.

Using the DB_BLOCK_CHECKING Parameter

The initialization parameter DB_BLOCK_CHECKING controls how thoroughly Oracle checks the integrity of every data block that is read or written. The level of checking you enable depends on the level of failure tolerable in your environment (which is usually very low!) balanced against the overhead required to perform the continuous block checks. The possible values for DB_BLOCK_CHECKING are as follows:

OFF or FALSE No block checking is performed.

LOW Basic block checks are performed after blocks are changed in memory or read from disk, including inter-instance block transfers in RAC environments.

MEDIUM Includes all LOW checks plus block checking for all non-IOT (index-organized table) blocks.

FULL or TRUE Includes all LOW and MEDIUM checks plus checks for index blocks.

If you can tolerate the performance overhead, Oracle recommends using FULL. The default value is OFF, even though FULL block checking for the SYSTEM tablespace is always enabled. The overhead for block checking ranges from 1 percent to 10 percent, but is closer to 10 percent in an OLTP environment.

Using Block Media Recovery

If you discover only a small handful of blocks to recover in a database from the aforementioned health checks or results discovered in the alert log, RMAN can perform block media recovery rather than a full datafile recovery. Block media recovery minimizes redo log application time, and drastically reduces the amount of I/O required to recover only the block or blocks in question. While block media recovery is in progress, the affected datafiles can remain online and available to users.

In addition to the block verification performed by Oracle as defined by the DB_BLOCK_CHECKING initialization parameter, an RMAN BACKUP or BACKUP VALIDATE command can add corrupted blocks to the dynamic performance view V$DATABASE_BLOCK_CORRUPTION. In addition, the SQL commands ANALYZE TABLE and ANALYZE INDEX can uncover corrupted blocks.

Block media recovery is only available from within the RMAN application.

You’ll need to know the advantages and disadvantages of block media recovery; as you might expect, there are many more advantages than disadvantages. In addition to touting the benefits of RMAN block media recovery, the prerequisites for block media recovery as well as provide some real-world examples will be defined.

Advantages of Block Media Recovery

Recovering one or a small number of blocks using RMAN has some obvious and some not-so-obvious advantages. First, recovering one block using a recent backup and archived and online redo log files will almost certainly take much less time than restoring and recovering one or more datafiles. In addition, during block media recovery, the entire datafile remains online and available during the recovery process; only the blocks being recovered are unavailable. Therefore, only one table, index, or other database object remains unavailable during block media recovery.

When you use the RMAN RECOVER . . . BLOCK command, RMAN first searches the flashback logs for a good copy of the corrupted block (if Flashback Database is enabled). Otherwise, RMAN uses the latest level 0 or full backup, restores the bad blocks, and performs media recovery on the bad blocks using the redo stream. Note that RMAN cannot use incremental level 1 backups for block media recovery.

You can use the dynamic performance view V$DATABASE_BLOCK_CORRUPTION to view the bad blocks in the database. This view contains blocks that are both physically and logically corrupted. Here are the tools or commands that can populate this view when they find bad blocks:

RMAN backup commands

ANALYZE

dbv operating system utility

SQL queries that try to access a corrupted block

Prerequisites for Using Block Recovery

Before you can use block media recovery, your database must fulfill a few prerequisites. First, the target database must be in ARCHIVELOG mode. Unless your database is for testing or is a read-only database, your database should be in ARCHIVELOG mode for maximum recoverability anyway!

Second, the backups of datafiles with bad blocks must be full backups or level 0 incremental backups. RMAN cannot use level 1 incremental backups for block recovery. Thus, you must have all archived redo log files since the last full backup or level 0 incremental backup.

Alternatively, you can use flashback logs in the flash recovery area for uncorrupt versions of bad blocks. Therefore, you must have Flashback Database enabled. If an uncorrupt version of a bad block is available in the flash recovery area, RMAN will use that block and perform media recovery on the block using archived and online redo log files. Unless the number of bad blocks is large, recovering a block from the flashback logs will certainly be faster than starting with a level 0 incremental or full backup.

Using the RMAN RECOVER . . . BLOCK Command

You can use the RMAN RECOVER . . . BLOCK command in response to an alert or other notification of a bad block. Typically, block corruption is reported in the following locations:

Output from the RMAN LIST FAILURE, VALIDATE, or BACKUP . . . VALIDATE commands

The V$DATABASE_BLOCK_CORRUPTION dynamic performance view

Error messages during a SQL*Plus or other client session

The alert log or user trace files

Results from the SQL commands ANALYZE TABLE or ANALYZE INDEX

Results from the DBVERIFY command-line utility (dbv)

To recover one or more data blocks, RMAN must know the datafile number and block number within the datafile. As mentioned previously, this information is available in a user trace file, as in the following example:

In addition, the block will appear in the view V$DATABASE_BLOCK_CORRUPTION; the columns FILE# and BLOCK# provide the information needed to execute the RECOVER command. The column CORRUPTION_TYPE identifies the type of corruption in the block, such as FRACTURED, CHECKSUM, or CORRUPT. Fixing the block is easily accomplished in RMAN:

A corrupted block must be restored completely. In other words, all redo operations up to the latest SCN against the data block must be applied before the block can be considered usable again.

If all bad blocks are recorded in V$DATABASE_BLOCK_CORRUPTION, you can easily recover all of them at once. Using the following command, RMAN will recover all physically corrupted blocks in V$DATABASE_BLOCK_CORRUPTION:

After RMAN recovers the blocks, they are removed from V$DATABASE_BLOCK_CORRUPTION.

Using the Data Recovery Advisor

The Data Recovery Advisor is part of the Oracle advisor framework and automatically gathers information about a failure when an error is encountered. If you run the Data Recovery Advisor proactively, you are often able to detect and repair a failure before a user query or backup operation detects it. The Date Recovery Advisor can detect relatively small errors such as corrupted blocks. At the other end of the spectrum, it will detect errors that would otherwise prevent successful startup of the database, such as missing online redo log files. Your database may continue running for a short amount of time without online redo log files, but it will not start the next time you shut down and restart. Data Recovery Advisor will catch this error proactively.

Identifying Failures

As with most advisors and Oracle features, you can use either EM or command-line tools to run the Data Recovery Advisor, show the errors, and repair the failures. From the EM home page, click the Advisor Central link at the bottom of the page. From the Advisor Central page, click the Data Recovery Advisor link. If there are outstanding errors, you will see them on this page. The errors are recorded reactively from error events in the database, or as a result of proactively running a health check. Figure 13-14 shows a current data failure in the HR database.

FIGURE 13-14 Viewing data failures using EM

Once the Data Recovery Advisor has identified a failure, you can review the details of the failure using the EM or RMAN interface. From RMAN, you can use the LIST FAILURE, ADVISE FAILURE, REPAIR FAILURE, and CHANGE FAILURE commands. Table 13-2 summarizes the purpose of these commands.

TABLE 13-2 RMAN Failure Advisory and Repair Commands

RMAN Command	Description
`LIST FAILURE`	List failures recorded by the Data Recovery Advisor
`ADVISE FAILURE`	Show recommended repair option
`REPAIR FAILURE`	Repair and close failure using RMAN’s recommendations
`CHANGE FAILURE`	Change the status or close a failure

The LIST FAILURE command has a number of options, depending on what types of errors you want to see:

Failure# List an individual failure’s details (by failure number).

ALL List all failures.

CRITICAL List failures that make the database unavailable.

HIGH List serious failures that make parts of the database unavailable, such as a missing datafile.

LOW List intermittent or lower-priority failures that can wait until more serious problems are fixed. For example, this can include corrupted blocks in infrequently used tablespaces.

CLOSED List only closed failures.

For example, LIST FAILURE 2097 will list the details for a failure with an identifier of 2097. Also, LIST FAILURE ALL lists all open failures of any priority.

Implementing Repairs

Once the Data Recovery Advisor has identified a failure, you can use the RMAN ADVISE FAILURE command to recommend a repair option for the specified failure. RMAN will suggest a repair and create a script with the recommended repair. If the repair is acceptable (in terms of downtime or other factors), you can then run the REPAIR FAILURE (within the same RMAN session) to perform the recommended action. After the repair action completes successfully, the failure is automatically closed.

You can also change the priority of a failure using CHANGE FAILURE. For example, a corrupted block will be recorded as a HIGH failure, but if it is in an infrequently used

tablespace, then you can change its priority to LOW so that you only see other more serious failures in the LIST FAILURE command. However, you cannot change the priority of a CRITICAL failure. You can only change the priority of a failure from HIGH to LOW, or vice versa. Here is how you can change the priority of failure number 307 from HIGH to LOW:

EXERCISE 13-1
Use Data Recovery Advisor

In this exercise, you’ll simulate a serious data failure, query the failure, and repair the failure using RMAN.

1. Create a tablespace XPORT_DW in your database and create one or more tables within the tablespace.

2. At the operating system level, overwrite or delete the datafile for the XPORT_DW tablespace:

3. Attempt to access tables within the tablespace from SQL*Plus:

4. From RMAN, query all open failures and list the details for the failures found:

You can see similar information in EM. Navigate to the Data Recovery Advisor page shown in Figure 13-14, and expand the Failure description. Figure 13-15 shows the failure details

FIGURE 13-15 View and Manage Failures details

5. Return to RMAN and run the ADVISE FAILURE command:

Clicking the Advise button shown on the page in Figure 13-15 gives you the same advice, as you might expect. Figure 13-16 shows the RMAN script that OEM will execute to recover from the media failure.

FIGURE 13-16 OEM Recovery Advisor recommended actions

6. Run the tablespace recovery in RMAN using the recommendations in /u01/app/oracle/diag/rdbms/hr/hr/hm/reco_2543193427.hm, which not coincidentally are the same recommendations you see in Figure 13-16:

7. Verify that the failure has been completed and closed:

Data Recovery Advisor Views

There are several dynamic performance views you can use to retrieve information about failures detected by the Data Recovery Advisor:

V$IR_FAILURE All failures, including closed failures.

V$IR_MANUAL_CHECKLIST Listing of manual advice.

V$IR_REPAIR Listing of repairs.

V$IR_REPAIR_SET Cross reference of failure and advise identifier numbers.

For example, to retrieve the information for the failure in Exercise 13-1, query V$IR_FAILURE as follows:

CERTIFICATION SUMMARY

This chapter started with an overview of the Automatic Diagnostic Repository (ADR). ADR is an always-on facility that captures errors in trace and dump files the first and any successive times that they occur. It makes your life easier when an error does occur because it happens without DBA intervention. In addition to facilities within EM, you can use the ADRCI command-line tool to query the contents of ADR when either the database or EM is down (or sometimes both!).

Once ADR has identified one or more problems, you can leverage the information in ADR by using the Support Workbench interface in EM. The tools available through the Support Workbench enable you to view details on problems and incidents, run health checks on your database (either reactively or proactively), generate additional diagnostic data for service requests, and run advisors to help you resolve the problem or incident. It also steps you through packaging all relevant information into a service request (SR) for Oracle support when you cannot solve the problem yourself. The Support Workbench makes it easy to analyze a problem and perform further analyses on the problem, potential causes, and solutions. You also learned the differences and similarities between alerts, problems, and incidents.

Next, one of the tools that is both proactive and reactive, which was the Health Monitor was discussed. The Checkers page in EM (available in several EM pages) lets you proactively run 24 different types of health checks, such as the Redo Integrity Check, Transaction Integrity Check, and Data Dictionary Integrity Check.

One of the more common problems you will encounter in your database is block corruption. You can control how thoroughly Oracle checks each data block as it is read and written using the DB_BLOCK_CHECKING initialization parameter. The trade-off is spending more CPU time on the front end to proactively detect problems versus relaxing the block checks and potentially discovering the problem later when you cannot afford the time to fix it.

The chapter closed with one of the RMAN-based repair tools, the Data Recovery Advisor. Using the RMAN command-line or EM interface, you can query all failures (open or closed) with a status of CRITICAL, HIGH, or LOW. Once a failure has been identified, you can use the advice from the Data Recovery Advisor (frequently in the form of an RMAN script) to repair the problem.

TWO-MINUTE DRILL

Set Up Automatic Diagnostic Repository

ADR is an always-on facility that captures errors in trace and dump files the first and any subsequent times they occur.

ADR uses a location on disk to store the diagnostic information and comes with a tool that makes it easy to query the repository even when the database is unavailable.

Each database instance or Automatic Storage Management (ASM) instance has its own directory structure called an ADR home within a top-level directory known as the ADR base.

The ADR base directory is also known as the ADR root directory.

If the initialization parameter DIAGNOSTIC_DEST is set, the ADR base directory is set to this value and all other file locations are set relative to this location.

If DIAGNOSTIC_DEST is not set, then DIAGNOSTIC_DEST is set to the environment variable ORACLE_BASE.

If ORACLE_BASE is not set, DIAGNOSTIC_DEST is set to the value $ORACLE_HOME/log.

The ADR diagnostic information is partitioned. All non-incident traces are stored in the trace subdirectory, all core dumps are stored in the cdump directory, and all incident dumps are stored as individual directories within the incident subdirectory.

The ADR Command Interpreter (ADRCI) tool makes it easy to query the contents of the ADR. You can use ADRCI even when the database is down.

ADRCI does not require a login or any other authorization. The contents of the ADR are protected only by operating system permissions on the directory containing the ADR file structures.

Using Support Workbench

The Support Workbench, accessible through the EM interface, walks you through all aspects of problem detection, documentation, and resolution.

Support Workbench also facilitates the packaging of all necessary diagnostic and supporting data for submittal to Oracle support using the MetaLink service.

A problem, as defined by the Support Workbench framework, is a critical error in the database: for example, the internal error ORA-00600 or some other serious event such as running out of memory in the shared pool, or perhaps an operating system exception.

An incident is a single occurrence of a problem.

Each problem has a problem key, which is a text string that contains the error code and optionally other problem characteristics.

Custom packaging within the Support Workbench framework gives you more flexibility and more control over the contents of an incident package.

Health Monitor checks the status of various database components, including datafiles, memory, transaction integrity, metadata and process usage.

You can run health checks using EM via the Support Workbench, or you can manually run a health check using the DBMS_HM PL/SQL package.

Perform Block Media Recovery

When Oracle detects a corrupted block, it registers an ORA-01578 error in the alert log and on the EM home page.

The initialization parameter DB_BLOCK_CHECKING controls how thoroughly Oracle checks the integrity of every data block that is read or written.

If you discover only a small handful of blocks to recover in a database from the aforementioned health checks or results discovered in the alert log, then RMAN can perform block media recovery rather than a full datafile recovery.

When you use the RMAN RECOVER . . . BLOCK command, RMAN first searches the flashback logs for a good copy of the corrupted block (if Flashback Database is enabled).

You can use the dynamic performance view V$DATABASE_BLOCK_CORRUPTION to view the bad blocks in the database.

The target database must be in ARCHIVELOG mode to use RMAN block recovery.

Alternatively, you can use flashback logs in the flash recovery area for uncorrupt versions of bad blocks.

The Data Recovery Advisor is part of the Oracle advisor framework and automatically gathers information about a failure when an error is encountered.

Once the Data Recovery Advisor has identified a failure, you can review the details of the failure using the EM or RMAN interface.

Once the Data Recovery Advisor has identified a failure, you can use the RMAN ADVISE FAILURE command to recommend a repair option for the specified failure.

SELF TEST

The following questions will help you measure your understanding of the material presented in this chapter. Read all the choices carefully, because there might be more than one correct answer. Choose all correct answers for each question.

Set Up Automatic Diagnostic Repository

1. The value of the initialization parameter DIAGNOSTIC_DEST is NULL, the environment variable ORACLE_HOME is set to /u01/app/oracle/product/11.1.0/db_1, and the value of the environment variable ORACLE_BASE is set to /u01/app/oracle. At startup, what value is assigned by Oracle to DIAGNOSTIC_DEST?

A. /u01/app/oracle/diag

B. /u01/app/oracle/log

C. /u01/app/oracle/product/11.1.0/db_1/log

D. /u01/app/oracle

2. Which of the following tasks can you accomplish using the adrci command-line tool?

A. Package incident information into a ZIP file to send to Oracle support.

B. View diagnostic data within ADR.

C. Perform a health check on the database while it is running.

D. Run recommended fixes from the most recent health check on the database.

3. Which of the following directory locations is not available in V$DIAG_INFO? (Choose the best answer.)

A. Diagnostic trace files

B. Diagnostic incident files

C. Diagnostic problem files

D. Active Problem Count

Using Support Workbench

4. You can use the EM Support Workbench for which of the following tasks? (Choose all correct answers.)

A. Run recommended repairs.

B. Manually run a health check.

C. Close problems and incidents.

D. Generate additional SQL test cases to help Oracle support solve the problem.

E. View problems and incidents.

5. Which of the following tasks can you only perform with the EM Support Workbench custom packaging feature?

A. You can specify an SR associated with the package.

B. You can customize the package name and description.

C. You can specify a date and time to upload the package.

D. You can add or remove external files from a package.

6. Which of the following advisors can you run from the EM Support Workbench Incident Details page? (Choose two answers.)

A. The SQL Repair Advisor

B. The Data Recovery Advisor

C. The SQL Tuning Advisor

D. The Disk Repair Advisor

Perform Block Media Recovery

7. Which of the following basic consistency checks are performed by Oracle when a block is written or read? (Choose all correct answers.)

A. Block checksum

B. Data Block Address in cache matches the address on disk

C. Block version

D. The data block is below the HWM when reading or updating a block

8. What are some of the prerequisites for using block media recovery? (Choose all that apply.)

A. Flashback Database must be enabled.

B. The database must be in ARCHIVELOG mode.

C. The last level 1 backup must be available.

D. DB_BLOCK_CHECKING must be set to LOW, MEDIUM, or FULL.

E. All archived redo logs since the last full backup must be available.

9. You can use the RMAN CHANGE FAILURE command to change the priority of which types of failures? (Choose all that apply.)

A. OPEN

B. HIGH

C. CRITICAL

D. LOW

E. CLOSED

SELF TEST ANSWERS

Set Up Automatic Diagnostic Repository

1. D. The ADR root directory (also known as the ADR base) is set by the parameter DIAGNOSTIC_DEST. If it is not set, Oracle sets DIAGNOSTIC_DEST to the environment variable ORACLE_BASE. If ORACLE_BASE is not set, then the ADR root directory is set to $ORACLE_HOME/log.

A, B, and C are wrong. All three locations are not assigned, given the values of DIAGNOSTIC_DEST, ORACLE_BASE, and ORACLE_HOME.

2. A and B. The adrci command-line tool allows you to view diagnostic information in the ADR root directory in addition to packaging both problem and incident information for Oracle support.

C and D are wrong. The adrci command-line tool cannot initiate health checks nor run fixes recommended by other Oracle diagnostic tools.

3. C. The view V$DIAG_INFO does not specify a directory for problems, only incidents. Each incident is labeled with a text string representing the problem identifier.

A, B, and D are listed in V$DIAG_INFO with the operating system–specific path name.

Using Support Workbench

4. B, C, D, and E. You can use the Support Workbench for viewing problems and incidents, manually running health checks, generating additional dumps and test cases for Oracle support, creating and tracking a service request to Oracle support, collecting all data related to a problem and packaging it, and closing the problem when it has been resolved.

A is wrong. The EM support workbench cannot run recommended repairs. However, you can initiate advisors that recommend repairs.

5. D. You can add or remove external files from a package only with the custom packaging method.

A, B, and C are wrong because these tasks can be accomplished using the quick packaging method.

6. A and B. You can run either the Data Recovery Advisor or the SQL Repair Advisor from the Incident Details page.

C and D are wrong. You cannot initiate the SQL Tuning Advisor from the EM Support Workbench. There is no such advisor as the Disk Repair Advisor.

Perform Block Media Recovery

7. A, B, and C. Oracle performs all of these checks regardless of the setting of DB_BLOCK_CHECKING.

D is wrong because Oracle does not check if a block is below the high water mark (HWM) when updating or reading a block.

8. A, B, and E. To use RMAN’s block recovery feature, the database must be in ARCHIVELOG mode, the backups of data files must be full backups or level 0 backups, and archived log files must be available since the last full or level 0 backup. In addition, if Flashback Database is enabled, then RMAN can look for uncorrupted versions of blocks in the flashback logs.

C is wrong because RMAN cannot use level 1 backups for block recovery. D is wrong because DB_BLOCK_CHECKING does not have to be enabled at all to use block recovery.

9. B and D. You can change the priority of a HIGH failure to LOW, and vice versa.

A is wrong because OPEN is not a failure status. C is wrong because you cannot change the priority of a CRITICAL failure. E is wrong because you cannot change the priority of a CLOSED failure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 13 Configuring Database Diagnostics

Create new playlist

Sign In

Sign Up

13Configuring Database Diagnostics

CERTIFICATION OBJECTIVE 13.01Set Up Automatic Diagnostic Repository

Understanding the ADR

Using the ADRCI Tool

CERTIFICATION OBJECTIVE 13.02Using Support Workbench

Understanding Alerts, Problems, and Incidents

Managing Service Requests

Creating and Tracking an SR

Implementing Repairs

Using Custom Packaging

Using the Health Monitor

CERTIFICATION OBJECTIVE 13.03Perform Block Media Recovery

Understanding Block Corruption

Using the DB_BLOCK_CHECKING Parameter

Using Block Media Recovery

Advantages of Block Media Recovery

Prerequisites for Using Block Recovery

Using the RMAN RECOVER . . . BLOCK Command

Using the Data Recovery Advisor

Identifying Failures

Implementing Repairs

Data Recovery Advisor Views

CERTIFICATION SUMMARY

TWO-MINUTE DRILL

Set Up Automatic Diagnostic Repository

Using Support Workbench

Perform Block Media Recovery

SELF TEST

Set Up Automatic Diagnostic Repository

Using Support Workbench

Perform Block Media Recovery

SELF TEST ANSWERS

Set Up Automatic Diagnostic Repository

Using Support Workbench

Perform Block Media Recovery

Table of Contents for
13 Configuring Database Diagnostics

13
Configuring Database Diagnostics

CERTIFICATION OBJECTIVE 13.01
Set Up Automatic Diagnostic Repository

CERTIFICATION OBJECTIVE 13.02
Using Support Workbench

CERTIFICATION OBJECTIVE 13.03
Perform Block Media Recovery