13
Configuring Database Diagnostics

CERTIFICATION OBJECTIVES

13.01 Set up Automatic Diagnostic Repository

13.02 Using Support Workbench

13.03 Perform Block Media Recovery

Image Two-Minute Drill

Image Self Test

When you have to deal with database errors or even a database that is completely down, you don’t want to spend extra time documenting one or more database error conditions for Oracle Support. This chapter first gives you a brief overview of how Oracle reports problems with the database, and how it categorizes errors into problems and incidents.

Next, this chapter will step through a few examples of packaging all the necessary information using Support Workbench via the EM interface and sending it to Oracle Support as a service request. The Service Request can be submitted with a few clicks, but it can be customized before submittal either to remove confidential or proprietary information, or to add extra data or information to help Oracle Support solve the issue.

In addition, the Health Monitor framework provides both proactive and reactive tools to deal with database errors. The DBA can run a proactive health check manually using EM or PL/SQL packages. In contrast, the Health Monitor can run diagnostic checks in response to critical database errors.

Finally, the chapter will switch from reporting, managing, and submitting service requests for database problems and incidents, to recovering individual blocks after you have identified the problem. RMAN supports detection and recovery of individual blocks using the DB_BLOCK_CHECKING initialization parameter and the RMAN RECOVER . . . BLOCK command. RMAN also makes it easy to identify failures and implement repairs using the Data Recovery Advisor.

CERTIFICATION OBJECTIVE 13.01
Set Up Automatic Diagnostic Repository

The key to the Automatic Diagnostic Repository (ADR) is the first word: automatic. ADR is an always-on facility that captures errors in trace and dump files the first and any successive times they occur. Thus the “diagnostic” part of the Oracle feature name. The third part, "repository," is a location on disk that stores the diagnostic information on disk and comes with a tool that makes it easy to query the repository even when the database is unavailable.

The following sections provide more details about the structure of the repository, how to retrieve information from the repository, and how to find the diagnostic information you’re looking for in the repository using initialization parameters and data dictionary views. In addition, you’ll see how to easily and quickly package the diagnostic information from the ADR and send it to Oracle support for problem resolution.

Understanding the ADR

The ADR is a file-based repository of diagnostic and other non-critical information for all products in your environment. Each database instance and Automatic Storage Management (ASM) instance has its own directory structure called an ADR home within a top-level directory known as the ADR base. In a Real Application Cluster (RAC) environment, each instance has its own subdirectory, which not only makes it easy to view diagnostics for an individual instance, but also makes it easy for the diagnostic tools to analyze data across instances for cluster-wide problems.

The ADR base directory is also known as the ADR root directory. The location for the ADR base is set depending on the values of initialization parameters and environment variables. If the initialization parameter DIAGNOSTIC_DEST is set, the ADR base directory is set to this value, and all other file locations are set relative to this location. If DIAGNOSTIC_DEST is not set, then DIAGNOSTIC_DEST is set to the environment variable ORACLE_BASE. If ORACLE_BASE is not set, DIAGNOSTIC_DEST is set to the value $ORACLE_HOME/log. Figure 13-1 shows the ADR directory structure for the DW database.

FIGURE 13-1     ADR directory structure for the DW database

Image

For the database in Figure 13-1, the initialization parameter DIAGNOSTIC_DEST is not set, so Oracle sets DIAGNOSTIC_DEST to the value of the environment variable ORACLE_BASE, which in this case is /u01/app/oracle:

Image

If ORACLE_BASE were not set, the location for the ADR would be /u01/app/oracle/product/11.1.0/db_1/log.

You can retrieve the values for each diagnostic directory using the dynamic performance view V$DIAG_INFO, as in this example:

Image

Note the column INST_ID. In a RAC environment, this value differentiates the value of each directory by node. For example, if DW were the cluster name, and the cluster contained three instances DW1, DW2, and DW3, the value for the second instance’s diagnostic trace directory would be as follows:

Image

Compared to previous releases of Oracle, the diagnostic information is more partitioned. In other words, all non-incident traces are stored in the trace subdirectory, all core dumps are in the cdump directory, and all incident dumps are stored as individual directories within the incident subdirectory. Table 13-1 show the ADR location for each type of diagnostic data.

TABLE 13-1 ADR Diagnostic Information Directory Locations

Diagnostic Data Type

Location Within the ADR

Foreground process trace files

ADR_HOME/trace

Background process trace files

ADR_HOME/trace

Alert log

ADR_HOME/alert (XML format)

ADR_HOME/trace (plaintext format)

Core dumps

ADR_HOME/cdump

Incident dumps

ADR_HOME/incident/incdir_n

Note the differentiation between trace and dump files. Trace files contain continuous output to diagnose a problem with a running process. A dump file is a onetime diagnostic output file resulting from an incident. Similarly, a core dump is a onetime platform-specific binary memory dump. Note also that there is no initialization parameter or environment variable named ADR_HOME. You can determine the value of ADR_HOME from the row in V$DIAG_INFO containing the name ADR Home:

Image

Using the ADRCI Tool

The ADR Command Interpreter (ADRCI) tool makes it easy to query the contents of the ADR. You can use the tool in command mode, or create scripts to run in batch mode. You can use ADRCI even when the database is down—remember that the ADR is completely file-system based. In addition to querying the contents of the ADR with ADRCI, you can also package incident and problem information into a compressed ZIP file that you can send to Oracle support.

Note that ADRCI does not require a login or any other authorization. The contents of the ADR are protected only by operating system permissions on the directory containing the ADR file structures. For a default installation of Oracle Database 11g, this means that the ADR has the same permissions as the ORACLE_BASE directory and its subdirectories. You can alter the permissions further if you wish, but you must make sure that the user owning the Oracle processes (usually the oracle user) has full read-write access to the ADR.

When you start ADRCI, you see the current ADR base directory. Type help for a list of commands:

Image

Image

Even when there are no incidents or problems to view, you can perform more mundane tasks such as viewing the alert log from ADRCI:

Image

Image

Notice that the ADRCI tool tracks all ADR home directories within the ADR root directory. Therefore, you must select which database, ASM, or listener directory you want to view with ADRCI.

Of course, you can perform the same task from EM. From the bottom of the EM home page, click the Alert Log Contents link. Then select the number of lines at the end of the alert log that you want to see, and click Go. You’ll see the page shown in Figure 13-2.

FIGURE 13-2     Viewing Alert Log Contents from EM

Image

CERTIFICATION OBJECTIVE 13.02
Using Support Workbench

The Support Workbench is accessible through the EM interface, which walks you through all aspects of problem detection, documentation, and resolution. The tools available through the Support Workbench enable you to view details on problems and incidents, run health checks on your database (either reactively or proactively), generate additional diagnostic data for service requests, and run advisors to help you resolve the problem or incident. Support Workbench also facilitates the packaging of all necessary diagnostic and supporting data for submittal to Oracle support using the MetaLink service. Once the service request has been submitted, you can track its status through the resolution cycle and close it when your problem is resolved.

The following sections first explain the distinction among the various events that occur in your database: (1) alerts, (2) problems, and (3) incidents. Next, an overview of how you submit a service request using the quick packaging method will be given. While you are working with Oracle support on the problem, you can run your own health checks and other advisors to help solve the problem. Finally, some advanced packaging options that permit you to add extra information to your service request, such as sample SQL statements and table data, to help Oracle support solve your problem will be described. Conversely, you can edit the package before you submit it to Oracle support to remove proprietary or confidential information.

Understanding Alerts, Problems, and Incidents

Although the alert log (either text format or XML format) contains all alerts for the instance, you see the alerts at the warning and critical levels on the EM home page. The dreaded ORA-00600 error messages are noted on the EM home page. In the next example, you try using a feature that worked in previous versions of Oracle Database but which causes an ORA-00600 message in Oracle Database 11g Release 1:

Image

The alert log shows the error as well:

Image

In Figure 13-3, you see both of these types of alerts near the bottom of the page, including the ORA-00600 error that you inadvertently triggered!

FIGURE 13-3     Viewing error alerts in EM

Image

A problem, defined by the Support Workbench framework, is a critical error in the database, such as the internal error ORA-00600 or some other serious event such as running out of memory in the shared pool or an operating system exception. An incident is a single occurrence of a problem. Each problem has a problem key, which is a text string that contains the error code and optionally other problem characteristics. A problem may have one or many incidents. Each incident is identified by a numeric incident ID and is stored in its own subdirectory of the ADR (ADR_HOME/incident/incdir_n). When an incident occurs, the database performs the following steps:

1. Adds an entry to the alert log (both text and XML-based).

2. Sends an incident alert to EM.

3. Sends an alert via e-mail to administrator(s) (if configured).

4. Gathers trace files and other incident information.

5. Tags all incident information with the incident ID.

6. Creates an incident directory under ADR_HOME for the incident and stores incident information and trace files in the directory.

Managing Service Requests

After Oracle automatically collects incident information and notifies you, you will use Support Workbench as an end-to-end tool to manage the problem. The typical steps you use in Support Workbench are as follows:

1. View critical errors on the EM home page (or via an e-mail message).

2. View the problem details.

3. Gather additional diagnostics and health checks.

4. Create an Oracle Service Request (SR).

5. Package necessary information and send to Oracle support.

6. Track the SR and implement repairs suggested by Oracle and the advisor framework tools.

7. Close the incident(s).

The following sections give you an overview of creating an SR, implementing repairs, and optionally including additional information in the SR using custom packaging.

Creating and Tracking an SR

When an ORA-00600 or any other, less serious error has occurred, you can package the incident details and send them to Oracle support. On the page shown in Figure 13-3, click the message link of the incident, and then click View Problem Details. You see the page in Figure 13-4.

FIGURE 13-4     Viewing problem details

Image

From this page, you can perform a number of actions. You can create a MetaLink service request, research the MetaLink knowledge base, or run additional checkers and diagnostics. You can quickly create a package of all incident information by clicking the Quick Package link. On the page shown in Figure 13-5, you start the process of creating a new package and submitting an SR right after package creation. You supply your MetaLink credentials and Customer Service Identifier (CSI).

FIGURE 13-5     Quick packaging an incident: step 1

Image

When you click Next, EM creates the package. You see the page in Figure 13-6.

FIGURE 13-6     Quick packaging an incident: step 2

Image

On the next page, shown in Figure 13-7, you can review the manifest that Oracle creates for the incident. It shows the summary of the problem and the location of the file containing the manifest.

FIGURE 13-7     Quick packaging an incident: step 3

Image

The final step of the quick packaging process, shown in Figure 13-8, gives you the option to submit the package immediately or at a later time. Typically, you will submit the service request right away, but you may want to submit it later if you think you can solve the problem yourself before users start complaining!

FIGURE 13-8     Quick packaging an incident: step 4

Image

After the package has been submitted to Oracle support, you can view the status of the incident resolution on the Problem Details page shown in Figure 13-4.

Implementing Repairs

On the Problem Details page, you can also attempt to use an Oracle Advisor to implement repairs. The advisor available on the page shown in Figure 13-4 is the SQL Repair Advisor. Also available in your toolbox is the Data Recovery Advisor. Due to the nature of the problem in Figure 13-4, Support Workbench has determined that the problem is with a SQL statement, not a data corruption problem, and therefore has not recommended the Data Recovery Advisor this time. Figure 13-9 shows the Incident Analysis page where you can submit the SQL Repair Advisor job.

FIGURE 13-9     Using SQL Repair Advisor to analyze a failed SQL statement

Image

After the SQL Repair Advisor runs, you can immediately view the results of the job from Advisor Central or return to the database home page and click the Advisor Central link at the bottom of the page.

Using Custom Packaging

Custom packaging gives you more flexibility and more control over the contents of an incident package. Unlike quick packaging, custom packaging enables you to

Image Add or remove problems or incidents from the package.

Image Add, edit, or remove trace files from the package.

Image Add or remove external files (Word documents, export dumps, and so on).

Image Add other diagnostic information such as SQL test cases.

Image Remove sensitive information or other unnecessary information to reduce the size of the package.

To initiate custom packaging, start from the Support Workbench home page shown in Figure 13-10, select the incident(s) you want to package, and then click Package.

FIGURE 13-10     Initiating custom packaging from the Support Workbench

Image

As you can see in Figure 13-11, you can create an entirely new package for this incident or even start with an existing package. In the example in Figure 13-11, you selected the package you created during quick packaging when you realized that you needed to add additional documentation and trace files to the incident package before sending it to Oracle support.

FIGURE 13-11     Using an existing package for custom packaging

Image

On the page shown in Figure 13-12, you can see the types of information you can add or remove from the package, such as additional external files, other problems, or other dumps. Once the package is completed, you can generate the new upload file and submit it to Oracle support.

FIGURE 13-12     Adding or removing information from a custom package

Image

Using the Health Monitor

You can use the Oracle Health Monitor framework to proactively or reactively assess the health of your database. Health Monitor checks the status of various database components, including the following:

Image Datafiles

Image Memory

Image Transaction integrity

Image Metadata

Image Process usage

You can run health checks using EM via the Support Workbench, or you can manually run a health check using the DBMS_HM PL/SQL package. Some checks can be run only when the database is open. Other checks are available when the instance is running but the database is in NOMOUNT mode. You can list the checks available and whether they are available in offline or online mode by querying the dynamic performance view V$HM_CHECK:

Image

You can view the results of a health check from EM, the ADRCI tool discussed earlier in this chapter, the DBMS_HM package, or the dynamic performance view V$HM_RUN.

From the EM Support Workbench page, click the Run Checkers link to run additional diagnostics. On the page shown in Figure 13-13, you see the health checks recommended for the selected incident. Even if no specific checkers are recommended, you can still run one of the available checkers to, for instance, check the integrity of the redo log stream and the core dictionary objects.

FIGURE 13-13     Run Checkers recommendation page

Image

You can view the results of health checks from the ADRCI tool with the show hm_run command. In the following examples, you retrieve the results of a health monitor run first in text format, then you generate a health monitor report in XML format:

Image

Image

CERTIFICATION OBJECTIVE 13.03
Perform Block Media Recovery

Many of the errors you will encounter will be related to bad blocks in your database datafiles, either because of media failures, server memory errors, or logical corruption caused by Oracle errors. Once you’ve identified these problems using the diagnostic methods provided earlier in this chapter, you can use the tools in the following sections to fix the problems.

As with nearly every Oracle feature, you can adjust the level of control and monitoring that Oracle performs; data block checking is no exception. Regardless of the settings you’ll learn about shortly, Oracle always performs the following checks on a data block when it is read or written to a datafile:

Image Block version (matches the version of the database)

Image Data block address (DBA) in the cache is the same as the DBA value in the block buffer

Image Block checksum is correct

You can repair a corrupt block by either recovering the block, dropping the object containing the bad block, or both. The following sections tell you more about block corruption, how to control the amount of overhead Oracle will use to ensure the integrity of blocks, and how to fix a corrupted block.

Understanding Block Corruption

When Oracle detects a corrupted block, it registers an ORA-01578 error in the alert log and on the EM home page. Included in the error message are the absolute file number and block number of the bad block. In addition, the session reading or writing the bad block sees the same error message. Here is an example of a block corruption error message:

Image

Most often, corruption is caused by operating system or disk hardware failures such as faulty I/O hardware or firmware, operating system caching problems, memory or paging problems, or errors caused by disk repair utilities.

Using the DB_BLOCK_CHECKING Parameter

The initialization parameter DB_BLOCK_CHECKING controls how thoroughly Oracle checks the integrity of every data block that is read or written. The level of checking you enable depends on the level of failure tolerable in your environment (which is usually very low!) balanced against the overhead required to perform the continuous block checks. The possible values for DB_BLOCK_CHECKING are as follows:

Image OFF or FALSE No block checking is performed.

Image LOW Basic block checks are performed after blocks are changed in memory or read from disk, including inter-instance block transfers in RAC environments.

Image MEDIUM Includes all LOW checks plus block checking for all non-IOT (index-organized table) blocks.

Image FULL or TRUE Includes all LOW and MEDIUM checks plus checks for index blocks.

If you can tolerate the performance overhead, Oracle recommends using FULL. The default value is OFF, even though FULL block checking for the SYSTEM tablespace is always enabled. The overhead for block checking ranges from 1 percent to 10 percent, but is closer to 10 percent in an OLTP environment.

Using Block Media Recovery

If you discover only a small handful of blocks to recover in a database from the aforementioned health checks or results discovered in the alert log, RMAN can perform block media recovery rather than a full datafile recovery. Block media recovery minimizes redo log application time, and drastically reduces the amount of I/O required to recover only the block or blocks in question. While block media recovery is in progress, the affected datafiles can remain online and available to users.

In addition to the block verification performed by Oracle as defined by the DB_BLOCK_CHECKING initialization parameter, an RMAN BACKUP or BACKUP VALIDATE command can add corrupted blocks to the dynamic performance view V$DATABASE_BLOCK_CORRUPTION. In addition, the SQL commands ANALYZE TABLE and ANALYZE INDEX can uncover corrupted blocks.

Image Block media recovery is only available from within the RMAN application.

You’ll need to know the advantages and disadvantages of block media recovery; as you might expect, there are many more advantages than disadvantages. In addition to touting the benefits of RMAN block media recovery, the prerequisites for block media recovery as well as provide some real-world examples will be defined.

Advantages of Block Media Recovery

Recovering one or a small number of blocks using RMAN has some obvious and some not-so-obvious advantages. First, recovering one block using a recent backup and archived and online redo log files will almost certainly take much less time than restoring and recovering one or more datafiles. In addition, during block media recovery, the entire datafile remains online and available during the recovery process; only the blocks being recovered are unavailable. Therefore, only one table, index, or other database object remains unavailable during block media recovery.

When you use the RMAN RECOVER . . . BLOCK command, RMAN first searches the flashback logs for a good copy of the corrupted block (if Flashback Database is enabled). Otherwise, RMAN uses the latest level 0 or full backup, restores the bad blocks, and performs media recovery on the bad blocks using the redo stream. Note that RMAN cannot use incremental level 1 backups for block media recovery.

You can use the dynamic performance view V$DATABASE_BLOCK_CORRUPTION to view the bad blocks in the database. This view contains blocks that are both physically and logically corrupted. Here are the tools or commands that can populate this view when they find bad blocks:

Image RMAN backup commands

Image ANALYZE

Image dbv operating system utility

Image SQL queries that try to access a corrupted block

Prerequisites for Using Block Recovery

Before you can use block media recovery, your database must fulfill a few prerequisites. First, the target database must be in ARCHIVELOG mode. Unless your database is for testing or is a read-only database, your database should be in ARCHIVELOG mode for maximum recoverability anyway!

Second, the backups of datafiles with bad blocks must be full backups or level 0 incremental backups. RMAN cannot use level 1 incremental backups for block recovery. Thus, you must have all archived redo log files since the last full backup or level 0 incremental backup.

Alternatively, you can use flashback logs in the flash recovery area for uncorrupt versions of bad blocks. Therefore, you must have Flashback Database enabled. If an uncorrupt version of a bad block is available in the flash recovery area, RMAN will use that block and perform media recovery on the block using archived and online redo log files. Unless the number of bad blocks is large, recovering a block from the flashback logs will certainly be faster than starting with a level 0 incremental or full backup.

Using the RMAN RECOVER . . . BLOCK Command

You can use the RMAN RECOVER . . . BLOCK command in response to an alert or other notification of a bad block. Typically, block corruption is reported in the following locations:

Image Output from the RMAN LIST FAILURE, VALIDATE, or BACKUP . . . VALIDATE commands

Image The V$DATABASE_BLOCK_CORRUPTION dynamic performance view

Image Error messages during a SQL*Plus or other client session

Image The alert log or user trace files

Image Results from the SQL commands ANALYZE TABLE or ANALYZE INDEX

Image Results from the DBVERIFY command-line utility (dbv)

To recover one or more data blocks, RMAN must know the datafile number and block number within the datafile. As mentioned previously, this information is available in a user trace file, as in the following example:

Image

In addition, the block will appear in the view V$DATABASE_BLOCK_CORRUPTION; the columns FILE# and BLOCK# provide the information needed to execute the RECOVER command. The column CORRUPTION_TYPE identifies the type of corruption in the block, such as FRACTURED, CHECKSUM, or CORRUPT. Fixing the block is easily accomplished in RMAN:

Image

A corrupted block must be restored completely. In other words, all redo operations up to the latest SCN against the data block must be applied before the block can be considered usable again.

If all bad blocks are recorded in V$DATABASE_BLOCK_CORRUPTION, you can easily recover all of them at once. Using the following command, RMAN will recover all physically corrupted blocks in V$DATABASE_BLOCK_CORRUPTION:

Image

After RMAN recovers the blocks, they are removed from V$DATABASE_BLOCK_CORRUPTION.

Using the Data Recovery Advisor

The Data Recovery Advisor is part of the Oracle advisor framework and automatically gathers information about a failure when an error is encountered. If you run the Data Recovery Advisor proactively, you are often able to detect and repair a failure before a user query or backup operation detects it. The Date Recovery Advisor can detect relatively small errors such as corrupted blocks. At the other end of the spectrum, it will detect errors that would otherwise prevent successful startup of the database, such as missing online redo log files. Your database may continue running for a short amount of time without online redo log files, but it will not start the next time you shut down and restart. Data Recovery Advisor will catch this error proactively.

Identifying Failures

As with most advisors and Oracle features, you can use either EM or command-line tools to run the Data Recovery Advisor, show the errors, and repair the failures. From the EM home page, click the Advisor Central link at the bottom of the page. From the Advisor Central page, click the Data Recovery Advisor link. If there are outstanding errors, you will see them on this page. The errors are recorded reactively from error events in the database, or as a result of proactively running a health check. Figure 13-14 shows a current data failure in the HR database.

FIGURE 13-14     Viewing data failures using EM

Image

Once the Data Recovery Advisor has identified a failure, you can review the details of the failure using the EM or RMAN interface. From RMAN, you can use the LIST FAILURE, ADVISE FAILURE, REPAIR FAILURE, and CHANGE FAILURE commands. Table 13-2 summarizes the purpose of these commands.

TABLE 13-2 RMAN Failure Advisory and Repair Commands

RMAN Command

Description

LIST FAILURE

List failures recorded by the Data Recovery Advisor

ADVISE FAILURE

Show recommended repair option

REPAIR FAILURE

Repair and close failure using RMAN’s recommendations

CHANGE FAILURE

Change the status or close a failure

The LIST FAILURE command has a number of options, depending on what types of errors you want to see:

Image Failure# List an individual failure’s details (by failure number).

Image ALL List all failures.

Image CRITICAL List failures that make the database unavailable.

Image HIGH List serious failures that make parts of the database unavailable, such as a missing datafile.

Image LOW List intermittent or lower-priority failures that can wait until more serious problems are fixed. For example, this can include corrupted blocks in infrequently used tablespaces.

Image CLOSED List only closed failures.

For example, LIST FAILURE 2097 will list the details for a failure with an identifier of 2097. Also, LIST FAILURE ALL lists all open failures of any priority.

Implementing Repairs

Once the Data Recovery Advisor has identified a failure, you can use the RMAN ADVISE FAILURE command to recommend a repair option for the specified failure. RMAN will suggest a repair and create a script with the recommended repair. If the repair is acceptable (in terms of downtime or other factors), you can then run the REPAIR FAILURE (within the same RMAN session) to perform the recommended action. After the repair action completes successfully, the failure is automatically closed.

You can also change the priority of a failure using CHANGE FAILURE. For example, a corrupted block will be recorded as a HIGH failure, but if it is in an infrequently used

tablespace, then you can change its priority to LOW so that you only see other more serious failures in the LIST FAILURE command. However, you cannot change the priority of a CRITICAL failure. You can only change the priority of a failure from HIGH to LOW, or vice versa. Here is how you can change the priority of failure number 307 from HIGH to LOW:

Image

EXERCISE 13-1
Use Data Recovery Advisor

In this exercise, you’ll simulate a serious data failure, query the failure, and repair the failure using RMAN.

1. Create a tablespace XPORT_DW in your database and create one or more tables within the tablespace.

2. At the operating system level, overwrite or delete the datafile for the XPORT_DW tablespace:

Image

3. Attempt to access tables within the tablespace from SQL*Plus:

Image

4. From RMAN, query all open failures and list the details for the failures found:

Image

You can see similar information in EM. Navigate to the Data Recovery Advisor page shown in Figure 13-14, and expand the Failure description. Figure 13-15 shows the failure details

FIGURE 13-15     View and Manage Failures details

Image

5. Return to RMAN and run the ADVISE FAILURE command:

Image

Image

Clicking the Advise button shown on the page in Figure 13-15 gives you the same advice, as you might expect. Figure 13-16 shows the RMAN script that OEM will execute to recover from the media failure.

FIGURE 13-16     OEM Recovery Advisor recommended actions

Image

6. Run the tablespace recovery in RMAN using the recommendations in /u01/app/oracle/diag/rdbms/hr/hr/hm/reco_2543193427.hm, which not coincidentally are the same recommendations you see in Figure 13-16:

Image

Image

7. Verify that the failure has been completed and closed:

Image

Data Recovery Advisor Views

There are several dynamic performance views you can use to retrieve information about failures detected by the Data Recovery Advisor:

Image V$IR_FAILURE All failures, including closed failures.

Image V$IR_MANUAL_CHECKLIST Listing of manual advice.

Image V$IR_REPAIR Listing of repairs.

Image V$IR_REPAIR_SET Cross reference of failure and advise identifier numbers.

For example, to retrieve the information for the failure in Exercise 13-1, query V$IR_FAILURE as follows:

Image

CERTIFICATION SUMMARY

This chapter started with an overview of the Automatic Diagnostic Repository (ADR). ADR is an always-on facility that captures errors in trace and dump files the first and any successive times that they occur. It makes your life easier when an error does occur because it happens without DBA intervention. In addition to facilities within EM, you can use the ADRCI command-line tool to query the contents of ADR when either the database or EM is down (or sometimes both!).

Once ADR has identified one or more problems, you can leverage the information in ADR by using the Support Workbench interface in EM. The tools available through the Support Workbench enable you to view details on problems and incidents, run health checks on your database (either reactively or proactively), generate additional diagnostic data for service requests, and run advisors to help you resolve the problem or incident. It also steps you through packaging all relevant information into a service request (SR) for Oracle support when you cannot solve the problem yourself. The Support Workbench makes it easy to analyze a problem and perform further analyses on the problem, potential causes, and solutions. You also learned the differences and similarities between alerts, problems, and incidents.

Next, one of the tools that is both proactive and reactive, which was the Health Monitor was discussed. The Checkers page in EM (available in several EM pages) lets you proactively run 24 different types of health checks, such as the Redo Integrity Check, Transaction Integrity Check, and Data Dictionary Integrity Check.

One of the more common problems you will encounter in your database is block corruption. You can control how thoroughly Oracle checks each data block as it is read and written using the DB_BLOCK_CHECKING initialization parameter. The trade-off is spending more CPU time on the front end to proactively detect problems versus relaxing the block checks and potentially discovering the problem later when you cannot afford the time to fix it.

The chapter closed with one of the RMAN-based repair tools, the Data Recovery Advisor. Using the RMAN command-line or EM interface, you can query all failures (open or closed) with a status of CRITICAL, HIGH, or LOW. Once a failure has been identified, you can use the advice from the Data Recovery Advisor (frequently in the form of an RMAN script) to repair the problem.

Image TWO-MINUTE DRILL

Set Up Automatic Diagnostic Repository

Image ADR is an always-on facility that captures errors in trace and dump files the first and any subsequent times they occur.

Image ADR uses a location on disk to store the diagnostic information and comes with a tool that makes it easy to query the repository even when the database is unavailable.

Image Each database instance or Automatic Storage Management (ASM) instance has its own directory structure called an ADR home within a top-level directory known as the ADR base.

Image The ADR base directory is also known as the ADR root directory.

Image If the initialization parameter DIAGNOSTIC_DEST is set, the ADR base directory is set to this value and all other file locations are set relative to this location.

Image If DIAGNOSTIC_DEST is not set, then DIAGNOSTIC_DEST is set to the environment variable ORACLE_BASE.

Image If ORACLE_BASE is not set, DIAGNOSTIC_DEST is set to the value $ORACLE_HOME/log.

Image The ADR diagnostic information is partitioned. All non-incident traces are stored in the trace subdirectory, all core dumps are stored in the cdump directory, and all incident dumps are stored as individual directories within the incident subdirectory.

Image The ADR Command Interpreter (ADRCI) tool makes it easy to query the contents of the ADR. You can use ADRCI even when the database is down.

Image ADRCI does not require a login or any other authorization. The contents of the ADR are protected only by operating system permissions on the directory containing the ADR file structures.

Using Support Workbench

Image The Support Workbench, accessible through the EM interface, walks you through all aspects of problem detection, documentation, and resolution.

Image Support Workbench also facilitates the packaging of all necessary diagnostic and supporting data for submittal to Oracle support using the MetaLink service.

Image A problem, as defined by the Support Workbench framework, is a critical error in the database: for example, the internal error ORA-00600 or some other serious event such as running out of memory in the shared pool, or perhaps an operating system exception.

Image An incident is a single occurrence of a problem.

Image Each problem has a problem key, which is a text string that contains the error code and optionally other problem characteristics.

Image Custom packaging within the Support Workbench framework gives you more flexibility and more control over the contents of an incident package.

Image Health Monitor checks the status of various database components, including datafiles, memory, transaction integrity, metadata and process usage.

Image You can run health checks using EM via the Support Workbench, or you can manually run a health check using the DBMS_HM PL/SQL package.

Perform Block Media Recovery

Image When Oracle detects a corrupted block, it registers an ORA-01578 error in the alert log and on the EM home page.

Image The initialization parameter DB_BLOCK_CHECKING controls how thoroughly Oracle checks the integrity of every data block that is read or written.

Image If you discover only a small handful of blocks to recover in a database from the aforementioned health checks or results discovered in the alert log, then RMAN can perform block media recovery rather than a full datafile recovery.

Image When you use the RMAN RECOVER . . . BLOCK command, RMAN first searches the flashback logs for a good copy of the corrupted block (if Flashback Database is enabled).

Image You can use the dynamic performance view V$DATABASE_BLOCK_CORRUPTION to view the bad blocks in the database.

Image The target database must be in ARCHIVELOG mode to use RMAN block recovery.

Image Alternatively, you can use flashback logs in the flash recovery area for uncorrupt versions of bad blocks.

Image The Data Recovery Advisor is part of the Oracle advisor framework and automatically gathers information about a failure when an error is encountered.

Image Once the Data Recovery Advisor has identified a failure, you can review the details of the failure using the EM or RMAN interface.

Image Once the Data Recovery Advisor has identified a failure, you can use the RMAN ADVISE FAILURE command to recommend a repair option for the specified failure.

SELF TEST

The following questions will help you measure your understanding of the material presented in this chapter. Read all the choices carefully, because there might be more than one correct answer. Choose all correct answers for each question.

Set Up Automatic Diagnostic Repository

1. The value of the initialization parameter DIAGNOSTIC_DEST is NULL, the environment variable ORACLE_HOME is set to /u01/app/oracle/product/11.1.0/db_1, and the value of the environment variable ORACLE_BASE is set to /u01/app/oracle. At startup, what value is assigned by Oracle to DIAGNOSTIC_DEST?

A. /u01/app/oracle/diag

B. /u01/app/oracle/log

C. /u01/app/oracle/product/11.1.0/db_1/log

D. /u01/app/oracle

2. Which of the following tasks can you accomplish using the adrci command-line tool?

A. Package incident information into a ZIP file to send to Oracle support.

B. View diagnostic data within ADR.

C. Perform a health check on the database while it is running.

D. Run recommended fixes from the most recent health check on the database.

3. Which of the following directory locations is not available in V$DIAG_INFO? (Choose the best answer.)

A. Diagnostic trace files

B. Diagnostic incident files

C. Diagnostic problem files

D. Active Problem Count

Using Support Workbench

4. You can use the EM Support Workbench for which of the following tasks? (Choose all correct answers.)

A. Run recommended repairs.

B. Manually run a health check.

C. Close problems and incidents.

D. Generate additional SQL test cases to help Oracle support solve the problem.

E. View problems and incidents.

5. Which of the following tasks can you only perform with the EM Support Workbench custom packaging feature?

A. You can specify an SR associated with the package.

B. You can customize the package name and description.

C. You can specify a date and time to upload the package.

D. You can add or remove external files from a package.

6. Which of the following advisors can you run from the EM Support Workbench Incident Details page? (Choose two answers.)

A. The SQL Repair Advisor

B. The Data Recovery Advisor

C. The SQL Tuning Advisor

D. The Disk Repair Advisor

Perform Block Media Recovery

7. Which of the following basic consistency checks are performed by Oracle when a block is written or read? (Choose all correct answers.)

A. Block checksum

B. Data Block Address in cache matches the address on disk

C. Block version

D. The data block is below the HWM when reading or updating a block

8. What are some of the prerequisites for using block media recovery? (Choose all that apply.)

A. Flashback Database must be enabled.

B. The database must be in ARCHIVELOG mode.

C. The last level 1 backup must be available.

D. DB_BLOCK_CHECKING must be set to LOW, MEDIUM, or FULL.

E. All archived redo logs since the last full backup must be available.

9. You can use the RMAN CHANGE FAILURE command to change the priority of which types of failures? (Choose all that apply.)

A. OPEN

B. HIGH

C. CRITICAL

D. LOW

E. CLOSED

SELF TEST ANSWERS

Set Up Automatic Diagnostic Repository

1. Image D. The ADR root directory (also known as the ADR base) is set by the parameter DIAGNOSTIC_DEST. If it is not set, Oracle sets DIAGNOSTIC_DEST to the environment variable ORACLE_BASE. If ORACLE_BASE is not set, then the ADR root directory is set to $ORACLE_HOME/log.

Image A, B, and C are wrong. All three locations are not assigned, given the values of DIAGNOSTIC_DEST, ORACLE_BASE, and ORACLE_HOME.

2. Image A and B. The adrci command-line tool allows you to view diagnostic information in the ADR root directory in addition to packaging both problem and incident information for Oracle support.

Image C and D are wrong. The adrci command-line tool cannot initiate health checks nor run fixes recommended by other Oracle diagnostic tools.

3. Image C. The view V$DIAG_INFO does not specify a directory for problems, only incidents. Each incident is labeled with a text string representing the problem identifier.

Image A, B, and D are listed in V$DIAG_INFO with the operating system–specific path name.

Using Support Workbench

4. Image B, C, D, and E. You can use the Support Workbench for viewing problems and incidents, manually running health checks, generating additional dumps and test cases for Oracle support, creating and tracking a service request to Oracle support, collecting all data related to a problem and packaging it, and closing the problem when it has been resolved.

Image A is wrong. The EM support workbench cannot run recommended repairs. However, you can initiate advisors that recommend repairs.

5. Image D. You can add or remove external files from a package only with the custom packaging method.

Image A, B, and C are wrong because these tasks can be accomplished using the quick packaging method.

6. Image A and B. You can run either the Data Recovery Advisor or the SQL Repair Advisor from the Incident Details page.

Image C and D are wrong. You cannot initiate the SQL Tuning Advisor from the EM Support Workbench. There is no such advisor as the Disk Repair Advisor.

Perform Block Media Recovery

7. Image A, B, and C. Oracle performs all of these checks regardless of the setting of DB_BLOCK_CHECKING.

Image D is wrong because Oracle does not check if a block is below the high water mark (HWM) when updating or reading a block.

8. Image A, B, and E. To use RMAN’s block recovery feature, the database must be in ARCHIVELOG mode, the backups of data files must be full backups or level 0 backups, and archived log files must be available since the last full or level 0 backup. In addition, if Flashback Database is enabled, then RMAN can look for uncorrupted versions of blocks in the flashback logs.

Image C is wrong because RMAN cannot use level 1 backups for block recovery. D is wrong because DB_BLOCK_CHECKING does not have to be enabled at all to use block recovery.

9. Image B and D. You can change the priority of a HIGH failure to LOW, and vice versa.

Image A is wrong because OPEN is not a failure status. C is wrong because you cannot change the priority of a CRITICAL failure. E is wrong because you cannot change the priority of a CLOSED failure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset