11.2. Performing Recovery Operations

Once the inevitable database failure occurs, you can perform a relatively quick and painless recovery operation if you have followed the backup guidelines we presented in Chapter 10 and clearly understand the types of failures presented at the beginning of this chapter.

Before we show you how to perform recovery, however, it is important for you to understand how an Oracle instance starts up and what kinds of failures can occur at each startup phase. Understanding the startup phases is important because some types of recovery operations must occur in a particular phase. Once a database is started, the instance will fail under a number of conditions that we will describe in detail.

Next, we will describe how instance recovery works and how to tune instance recovery, and then show you ways to easily recover from several types of user errors. Finally, we will show you how to recover from media failures due to the loss of both critical and non–system-critical datafiles.

11.2.1. Understanding Instance Startup

Starting up a database involves several phases, from being shut down to being open and available to users. If certain prerequisites are not present, database startup halts, and you must take some kind of remedial action to permit startup to proceed. In the following list are the four basic database states along with their prerequisites after you type the STARTUP command at the SQL*Plus prompt.

SHUTDOWN No background processes are active. A STARTUP command is used when the database is in this state; the STARTUP command fails if you are in any other state unless you are using STARTUP FORCE to restart an instance.

NOMOUNT Also known as the STARTED state, the instance must be able to access the initialization parameter file, either as a text-based init.ora file or an SPFILE.

MOUNT In this state, the instance checks that all control files listed in the initialization parameter file are present and identical. Even if one of the multiplexed control files is unavailable or corrupted, the instance does not enter the MOUNT state and stays in the NOMOUNT state.

OPEN Most of the time spent in instance startup occurs during this phase. All redo log groups must have at least one member available, and all datafiles that are marked as online must be available.

You are notified in a number of ways that a redo log group member is missing or a datafile is missing. If a datafile is missing or corrupted, you will get a message while you are running the STARTUP command, as in this example:

SQL> startup

ORACLE instance started.

Total System Global Area  197132288 bytes
Fixed Size                   778076 bytes
Variable Size             162537636 bytes
Database Buffers           33554432 bytes
Redo Buffers                 262144 bytes
Database mounted.

ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: '/u05/oradata/ord/users01.dbf'

SQL>

The message in SQL*Plus shows only the first datafile that needs attention. You will have to use the dynamic performance view V$RECOVER_FILE to display a list of all files that need attention. Here is a query against the view V$RECOVER_FILE and a second query joining V$RECOVER_ FILE and V$DATAFILE given the previous STARTUP command:

SQL> select file#, error from v$recover_file;

     FILE# ERROR
---------- ---------------------------------------------
          4 FILE NOT FOUND
         11 FILE NOT FOUND

SQL> select file#, name from
  2       v$datafile join v$recover_file using (file#);

     FILE# NAME
---------- ----------------------------------------
         4 /u05/oradata/ord/users01.dbf
        11 /u08/oradata/ord/idx02.dbf
SQL>

If a datafile is offline or taken offline, the instance can still start as long as the datafile does not belong to the SYSTEM or UNDO tablespace. Once the instance is started, you can proceed to recover the missing or corrupted datafile and subsequently bring it online. If all files are available, but out of synch, automatic instance recovery is performed as long as the online redo log files can bring all datafiles to the same SCN. Otherwise, media recovery is required using archived redo log files.

If a redo log group member is missing, a message is generated in the alert log, but the database will still open.

11.2.2. Keeping an Instance from Failing

Media failures are not always critical, depending on which type of datafile is lost. If any of the multiplexed copies of the control file are lost, an entire redo log group, or any datafile from the SYSTEM or UNDO tablespace, the instance will fail.

In some cases, the instance becomes unavailable to users but will not shut down; in this case, you can use SHUTDOWN ABORT to force the instance to shut down without resynchronizing the datafiles with the control file. The next time the instance is started, instance recovery will be performed. If you plan on starting up the instance right after using SHUTDOWN ABORT, you can instead use STARTUP FORCE as shorthand for a SHUTDOWN ABORT and a STARTUP.

NOTE

Later in this chapter, we will show you how to recover from the loss of a control file, a redo log file member, or one or more datafiles.

11.2.3. Recovering from Instance Failure

As we discussed earlier in this chapter in the section "Instance Failures," an instance failure is any kind of failure that prevents the synchronization of the database's datafiles and control file before the instance is shut down.

Oracle automatically recovers from instance failure during instance recovery. Instance recovery is initiated by simply starting up the database with the STARTUP command.

NOTE

Instance recovery is also known as crash recovery.

During a STARTUP operation, Oracle first attempts to read the initialization file, and then it mounts the control file and attempts to open the datafiles referenced in the control files. If the data files are not synchronized, instance recovery is initiated.

NOTE

We discussed instance startup phases in the section "Understanding Instance Startup" earlier in this chapter.

Instance recovery occurs in two distinct phases: the first phase uses the online redo log files to restore the datafiles to the state before instance failure in a roll forward operation; after this step is completed, Oracle uses the undo tablespace to roll back any uncommitted transactions. The roll forward operation includes data in the undo tablespace; without a consistent undo tablespace, the roll back operation cannot succeed. Once the roll forward operation completes, the database is open to users while the roll back operation completes. After the roll back phase, the datafiles contain only committed data.

11.2.4. Tuning Instance Recovery

Before a user receives a "Commit complete" message, the new or changed data must first be successfully written to a redo log file. At some point in the future, the same information must be used to update the datafiles; this operation usually lags behind the redo log file write because sequential writes to the redo log file are by nature faster than random writes to one or more datafiles on disk.

As we discussed in Chapter 10, checkpoints keep track of what still needs to be written from the redo log files to the datafiles. Any transactions not yet written to the datafiles are at an SCN after the last checkpoint.

The amount of time required for instance recovery depends on how long it takes to bring the datafiles up-to-date from the last checkpoint position to the latest SCN in the control file. To prevent performance problems, the distance between the checkpoint position and the end of the redo log group cannot be more than 90 percent of the size of the redo log group.

You can tune instance recovery by setting an MTTR target, in seconds, using the initialization parameter FAST_START_MTTR_TARGET. The default value for this parameter is zero; the maximum is 3,600 seconds (1 hour).

With a setting of zero, which disables the target, the likelihood that writes to the redo logs wait for writes to the datafiles is reduced. However, if FAST_START_MTTR_TARGET is set to a low nonzero value, writes to the redo logs most likely wait for writes to the datafiles. Although this reduces the amount of time it takes to recover the instance in the case of an instance failure, it affects performance and response time. Setting this value too high can result in an unacceptable amount of time needed to recover the instance after an instance failure.

Two other parameters control instance recovery time:

LOG_CHECKPOINT_TIMEOUT This is the maximum number of seconds that any new or modified block in the buffer cache waits until it is written to disk.

FAST_START_IO_TARGET This is similar to FAST_START_MTTR_TARGET, except that the recovery operation is specified as the number of I/Os instead of the number of seconds to finish instance recovery.

Setting either of these parameters overrides FAST_START_MTTR_TARGET. As part of the enhanced manageability features introduced with Oracle9i, setting FAST_START_MTTR_TARGET is the easiest and most straightforward way to define your database's recovery time given the time-based constraints included in most typical service-level agreements (SLAs).

The EM Database Control interface makes it easy to adjust FAST_START_MTTR_TARGET. From the Advisor Central screen, accessible at the bottom of the Database Control home page, click MTTR Advisor. In the example in Figure 11.1, you adjust the desired MTTR value to 60 seconds on the Configure Recovery Settings screen.

When you click the Apply button, the new value for FAST_START_MTTR_TARGET goes into effect immediately and stays in effect when the instance is restarted.

Figure 11.1. Adjusting MTTR for instance recovery

Using the SQL*Plus command line, you can accomplish this task by using the ALTER SYSTEM command, as in this example:

SQL> alter system set fast_start_mttr_target=60 scope=both;
System altered.

Using SCOPE=BOTH, the new value of the parameter takes effect immediately and stays in effect the next time the instance is restarted.

11.2.5. Recovering from User Errors

Earlier in this chapter, in the section "User Error Failures," we presented a number of scenarios in which a user's data was inadvertently changed or deleted or a table was dropped. In the following sections, we'll show you how to use Flashback Query to retrieve selected rows from a previous state of a table, how to recover a table using Flashback Drop and a tablespace's recycle bin, how to bring back an entire table and its dependent objects (such as indexes) back to a point of time in the past using Flashback Table, and query previous transactions in the online and archived redo logs using the LogMiner utility.

11.2.5.1. Using Flashback Query

One of the features introduced in Oracle9i is called Flashback Query. It allows a user to "go back in time" and view the contents of a table as it existed at some point in the recent past. A Flashback Query looks a lot like a standard SQL SELECT statement, with the addition of the AS OF TIMESTAMP clause.

Before users can take advantage of the Flashback Query feature, you, the DBA, must perform two tasks:

  • Make sure that there is an undo tablespace in the database that is large enough to retain changes made by all users for a specified period of time. This is the same tablespace that is used to support COMMIT and ROLLBACK functionality (discussed in Chapter 8, "Managing Consistency and Concurrency").

  • Specify how long the undo information will be retained for use by flashback queries by using the initialization parameter UNDO_RETENTION. This parameter is specified in seconds; therefore, if you specify UNDO_RETENTION=172800, the undo information for flashback queries can be available for up to two days.

The key to Flashback Query is using the AS OF TIMESTAMP clause in the SELECT statement; you can specify the time stamp as any valid expression that evaluates to a date or time stamp value. In the following example, you want to query the EMPLOYEES table as it existed 15 minutes ago:

SQL> select employee_id, last_name, email
  2     from hr.employees
  3  as of timestamp (systimestamp - interval '15' minute)
  4  where employee_id = 101;

EMPLOYEE_ID LAST_NAME             EMAIL
----------- --------------------- -------------------
        101 Kochhar               NKOCHHAR

1 row selected.

You can just as easily specify an absolute time of day to retrieve the contents of the row at that time, as in this example:

SQL> select employee_id, last_name, email
  2     from hr.employees
  3  as of timestamp
  4     (to_timestamp ('01-Sep-04 16:18:57.845993',
  5                    'DD-Mon-RR HH24:MI:SS.FF'))
  6  where employee_id = 101;

EMPLOYEE_ID LAST_NAME             EMAIL
----------- --------------------- -------------------
        101 Kochhar               NTKOCHHAR

If your Flashback Query requires undo data that is no longer available in the undo tablespace, you will receive an error message:

SQL> select employee_id, last_name, email
  2     from hr.employees
  3  as of timestamp (systimestamp - interval '10' month)
  4  where employee_id = 101;

select employee_id, last_name, email
         *
ERROR at line 1:
ORA-08180: no snapshot found based on specified time

Real World Scenario: Using Flashback Query to Investigate a Customer Complaint

In your custom widget company, an error in the Accounting Department added $2,000 to two orders placed yesterday:

SQL> update orders
  2     set order_total = order_total+2000
  3  where order_id in (2367,2361);

2 rows updated.

SQL> select order_id, customer_id, order_total
  2     from orders where order_id in (2367,2361);

  ORDER_ID CUSTOMER_ID ORDER_TOTAL
---------- ----------- -----------
      2361         108    122131.3
      2367         148    146054.8

2 rows selected.

Today, the customer with ID 108 called to complain that the bill from his last order (order number 2361) is $2,000 more than expected. Sharon, one of the order-entry clerks, retrieves the row from the ORDERS table with the information for order number 2361:

SQL> select order_id, customer_id, order_total

2  from orders where order_id = 2361;

  ORDER_ID CUSTOMER_ID ORDER_TOTAL
---------- ----------- -----------
      2361         108    122131.3

1 row selected.

Before calling back the customer, Sharon finds out from the Accounting Department that a day ago, two of the orders were incorrectly modified with an additional surcharge. To confirm whether this particular order was affected by the accounting error, she uses a Flashback Query to see if this order had a different order total two days ago:

SQL> select order_id, customer_id, order_total from orders
  2     as of timestamp (sysdate - 2)
  3  where order_id = 2361;

  ORDER_ID CUSTOMER_ID ORDER_TOTAL
---------- ----------- -----------
      2361         108    120131.3

1 row selected.

This Flashback Query confirms that the order total for this order was $2,000 less two days ago. The AS OF TIMESTAMP clause specifies how far back in the past you want to view the contents of this table. In this case, (sysdate - 2) evaluates to today's date minus two days—in other words, two days ago. Sharon concludes that at some point in the past two days, this was one of the orders that was incorrectly modified. To find all the orders that have the incorrect surcharge, she uses another Flashback Query as a nested query to compare the order totals:

SQL> select o.order_id, o.customer_id,
  2    o.order_total "CURR_TOTAL", oo.order_total "ORIG_TOTAL"
  3  from orders o,
  4        (select order_id, order_total from orders
  5         as of timestamp (sysdate - 2)) oo
  6  where o.order_id = oo.order_id and
  7        o.order_total != oo.order_total;

  ORDER_ID CUSTOMER_ID ORDER_TOTAL ORIG_TOTAL
---------- ----------- ----------- ----------
      2361         108    122131.3   120131.3

2367         148    146054.8   144054.8

2 rows selected.

In this query, Sharon is comparing the entire contents of the current ORDERS table to the entire contents of the ORDERS table as it was two days ago and selecting records in which the order totals don't match. She now knows which records must be updated with the correct order total amount.


11.2.5.2. Using Flashback Drop and the Recycle Bin

Another user recovery flashback feature, Flashback Drop, lets you restore a dropped table without using tablespace point-in-time recovery, as required in previous versions of Oracle. Although tablespace point-in-time recovery could effectively restore a table and its contents to a point in time before it was dropped, it was potentially time-consuming and had the side effect of losing work from other transactions that occurred within the same tablespace after the table was dropped.

In the following two sections, we will talk about the new logical structure available in each tablespace: the recycle bin and how you can query the recycle bin and retrieve dropped objects from it. We will also describe some minor limitations involved in using the recycle bin.

11.2.5.2.1. Recycle Bin Concepts

The recycle bin is a logical structure within each tablespace that holds dropped tables and objects related to the tables, such as indexes. The space associated with the dropped table is not immediately available but shows up in the data dictionary view DBA_FREE_SPACE. When space pressure occurs in the tablespace, objects in the recycle bin are deleted in a first-in first-out (FIFO) fashion, maximizing the amount of time that the most recently dropped object remains in the recycle bin.

NOTE

The recycle bin, new to Oracle 10g, is implemented as a data dictionary table.

The dropped object still belongs to the owner and still counts against the quota for the owner in the tablespace; in fact, the table itself is still directly accessible from the recycle bin, as you will see in subsequent examples.

11.2.5.2.2. Retrieving Dropped Tables from the Recycle Bin

You retrieve a dropped table from the recycle bin at the SQL command line by using the FLASHBACK TABLE...TO BEFORE DROP command. In the following example, the user GARY retrieves the table ORDER_ITEMS from the recycle bin after discovering that the table was inadvertently dropped:

SQL> select order_id, line_item_id, product_id
  2  from order_items
  3  where rownum < 5;

from order_items
     *
ERROR at line 2:
ORA-00942: table or view does not exist


SQL> flashback table order_items to before drop;

Flashback complete.

SQL> select order_id, line_item_id, product_id
  2  from order_items
  3  where rownum < 5;

  ORDER_ID LINE_ITEM_ID PRODUCT_ID
---------- ------------ ----------
      2355            1       2289
      2356            1       2264
      2357            1       2211
      2358            1       1781

SQL>

If the table ORDER_ITEMS was re-created after it was dropped, Gary would add the RENAME TO clause in the FLASHBACK TABLE command to give the restored table a new name, as in the following example:

SQL> drop table order_items;

Table dropped.

SQL> flashback table order_items to before drop
  2      rename to order_items_old_version;

Flashback complete.

SQL> select order_id, line_item_id, product_id
  2  from order_items_old_version
  3  where rownum < 5;

  ORDER_ID LINE_ITEM_ID PRODUCT_ID
---------- ------------ ----------
      2355            1       2289

2356            1        2264
      2357            1        2211
      2358            1        1781

SQL>

If the table to be retrieved from the recycle bin was dropped more than once, and you want to retrieve an incarnation of the table before the most recent one, you can use the name of the table in the recycle bin; you can query the view RECYCLEBIN or use the SHOW RECYCLEBIN command.

Using the EM Database Control, you can retrieve a dropped table from the recycle bin on the Perform Recovery: Dropped Objects Selection screen, as shown in Figure 11.2.

The user GARY has one table, ORDER_ITEMS, in the recycle bin that was dropped on June 21, 2004. Subsequent screen in this wizard let you rename the restored tables if a name conflicts with an existing object.

11.2.5.2.3. Recycle Bin Considerations and Limitations

A few limitations are associated with the recycle bin:

  • Only non-SYSTEM locally managed tablespaces can have a recycle bin. However, dependent objects in a dictionary-managed tablespace are protected if the dropped object is in a locally managed tablespace.

  • A table's dependent objects are saved in the recycle bin when the table is dropped, except for bitmap join indexes, referential integrity constraints (foreign key constraints), and materialized view logs.

  • Indexes are protected only if the table is dropped first; explicitly dropping an index does not place the index into the recycle bin.

11.2.5.3. Using Flashback Table

Flashback Table allows you to recover one or more tables to a specific point in time without having to use more time-consuming recovery operations such as tablespace point-in-time recovery or Flashback Database that can also affect the availability of the rest of the database. Flashback Table works in-place by rolling back only the changes made to the table or tables and their dependent objects, such as indexes. Flashback Table is different from Flashback Drop; Flashback Table undoes recent transactions to an existing table, whereas Flashback Drop recovers a dropped table. Flashback Table uses data in the undo tablespace, whereas Flashback Drop uses the recycle bin.

The FLASHBACK TABLE command brings one or more tables back to a point in time before any number of logical corruptions have occurred on the tables. To be able to flashback a table, you must enable row movement for the table. Because DML operations are used to bring the table back to its former state, the ROWIDs in the table change. As a result, Flashback Table is not a viable option for applications that depend on the table's ROWIDs to remain constant.

In the following example, you find out that someone in the HR department has accidentally deleted all the employees in department 60, the IT department, along with the row for IT in the DEPARTMENTS table. Because this happened less than 15 minutes ago, you are sure that there is enough undo information to support a Flashback Table operation.

Figure 11.2. The Perform Recovery: Dropped Objects Selection screen

Before performing the Flashback Table operation, you first enable row movement in the two affected tables, as in the following example:

SQL> alter table hr.employees enable row movement;
Table altered.

SQL> alter table hr.departments enable row movement;
Table altered.

Before running the FLASHBACK TABLE command, you confirm that the row in DEPARTMENTS for the IT department is still missing using this query:

SQL> select * from hr.departments where
  2     department_name = 'IT';

no rows selected

Next, you flash back the table to 15 minutes ago, specifying both tables in the same command, as follows:

SQL> flashback table hr.employees, hr.departments
  2     to timestamp systimestamp - interval '15' minute;

Flashback complete.

Finally, you check to see if the IT department is truly back in the table:

SQL> select * from hr.departments where
  2     department_name = 'IT';

DEPARTMENT_ID DEPARTMENT_NAME     MANAGER_ID LOCATION_ID
------------- ------------------ ---------- -----------
           60 IT                        103         1400

SQL>

If you either flashback too far or not far enough, you can simply rerun the FLASHBACK TABLE command with a different time stamp or SCN, as long as the undo data is still available.

Although the rest of the database is unaffected by a Flashback Table operation, the FLASHBACK TABLE command acquires exclusive DML locks on the tables involved in the flashback. This is usually not an availability issue, because the users who would normally use the table are waiting for the flashback operation to complete anyway!

Integrity constraints are not violated when one or more tables are flashed back; this is why you typically group tables related by integrity constraints or parent-child relationships in the FLASHBACK TABLE command.

Using EM Database Control, you can flashback a table by selecting the Maintenance tab from the EM Database Control home page, and clicking the Perform Recovery link. On the Perform Recovery: Type screen, shown in Figure 11.3, select Tables in the Object Type drop-down box.

Figure 11.3. Selecting Tables as the object type for recovery

After clicking the Next button, select either Flashback to a Timestamp or Flashback to a Known SCN, and specify the time stamp or SCN as the desired point in time for the recovered table, as you can see in Figure 11.4.

Click Next to skip to step 4 in the recovery dialog, the Perform Recovery: Flashback Tables screen, as shown in Figure 11.5, where you specify the two tables you used in the SQL*Plus command-line example earlier in this section: HR.EMPLOYEES and HR.DEPARTMENTS.

At the end of this sequence of steps, you can view the SQL command that will be executed. In the Perform Recovery: Review screen shown in Figure 11.6, you can see the summary presented before the flashback is initiated, and here is the SQL you see when you click the Show SQL button:

FLASHBACK TABLE HR.EMPLOYEES, HR.JOBS,
    HR.DEPARTMENTS, HR.LOCATIONS TO TIMESTAMP
     TO_TIMESTAMP('2004-09-12 01:15:25 PM',
    'YYYY-MM-DD HH:MI:SS AM')

The EM Database Control version of the command shows two more tables than in your command-line version of this recovery scenario, HR.JOBS and HR.LOCATIONS, because by default the recovery wizard includes all dependent objects.

Figure 11.4. Selecting a time stamp or an SCN for table recovery

Figure 11.5. Selecting tables to flashback

Figure 11.6. The Perform Recovery: Review screen

11.2.5.4. Using LogMiner

Oracle LogMiner is another tool you can use to view past activity in the database. The LogMiner tool can help find changed records in redo log files by using a set of PL/SQL procedures and functions. LogMiner extracts all DDL and DML activity from the redo log files for viewing via the dynamic performance view V$LOGMNR_CONTENTS. In addition to extracting the DDL and DML statements used to change the database, the V$LOGMNR_CONTENTS view also contains the DML statements needed to reverse the change made to the database. This is a good tool for not only pinpointing when changes were made to a table but also for automatically generating the SQL statements needed to reverse those changes.

LogMiner works differently from Oracle's Flashback Query feature. The Flashback Query feature allows a user to see the contents of a table at a specified time in the past; LogMiner can search a time period for all DDL against the table. A Flashback Query uses the undo information stored in the undo tablespace; LogMiner uses redo logs, both online and archived. Both tools can be useful for tracking down how and when changes to database objects took place.

You can configure and use LogMiner either from a SQL command line or via a GUI-based interface within Oracle Enterprise Manager, as shown in Figure 11.7, by choosing Tools Database Applications Logminer Viewer.

This LogMiner session initiated through EM shows a sequence of DML statements executed by user GARY against the ORDER_ITEMS table. The SQL Redo column shows the DML statement used to change the ORDER_ITEMS table, and the SQL Undo column shows how to reverse the change made by the DML statement in the SQL Redo column. Double-clicking a row in the report opens a second window that shows the complete text of both the SQL Undo and SQL Redo columns, as shown in Figure 11.8.

Figure 11.7. Selecting a DML statement using LogMiner

Figure 11.8. Using LogMiner to undo a transaction

NOTE

LogMiner does not actually undo the change; it only provides the statements that you can use to undo the change. You can extract and run any or all DML commands you find in the redo logs, keeping in mind any integrity constraints in place for the tables you are modifying.

11.2.6. Recovering from Loss of a Control File

Losing one of the multiplexed control files immediately aborts the instance. Assuming that you have not lost every control file, recovering from this failure is fairly straightforward.

NOTE

Recovering from the loss of all control files is covered in OCP: Oracle 10g Administration II Study Guide (Sybex, 2005).

Here are the steps to recover from the loss of a control file:

  1. If the instance is not shut down, use SHUTDOWN ABORT to force a complete shutdown.

  2. Copy one of the good copies of the control file to the location of the corrupted or missing control file. If the corrupted or missing control file resided on a failed disk, copy it to another suitable location instead, and update the initialization parameter file to update the control file reference. Alternatively, you can temporarily remove the reference from the initialization parameter file until you find a suitable location. However, it is highly desirable to maintain at least two, if not more, copies of the control file available in the case of another media failure.

  3. Start the instance with STARTUP.

In the following example, you use a server parameter file (SPFILE) for initialization parameters, and you decide to temporarily do without a third multiplexed control file until the disk containing the lost control file is repaired. The initialization parameter file parameter CONTROL_FILES will be changed using the ALTER SYSTEM ... SCOPE=SPFILE command when the instance is started in NOMOUNT mode. You cannot start in MOUNT mode because that mode checks for the existence of all copies of the control file, and as far as the SPFILE is concerned, we are still missing a control file.

The first step is to start the database in NOMOUNT mode, as you can see in this example:

SQL> startup nomount

ORACLE instance started.

Total System Global Area  188743680 bytes
Fixed Size                   778036 bytes
Variable Size             162537676 bytes
Database Buffers           25165824 bytes
Redo Buffers                 262144 bytes
SQL>

Looking at the dynamic performance view V$SPPARAMETER, you can see that you still have three copies of the control file referenced, but the disk containing the third copy has failed:

SQL> select name, value from v$spparameter
  2     where name = 'control_files';

NAME            VALUE
--------------- --------------------------------------------------
control_files    /u02/oradata/ord/control01.ctl
control_files    /u06/oradata/ord/control02.ctl
control_files    /u07/oradata/ord/control03.ctl

In the next step, you change the value of CONTROL_FILES in the SPFILE and restart the instance, as you can see here:

SQL> alter system set control_files =
  2    '/u02/oradata/ord/control01.ctl',

3    '/u06/oradata/ord/control02.ctl'
  4  scope = spfile;

System altered.

SQL> shutdown immediate

ORA-01507: database not mounted

ORACLE instance shut down.

SQL> startup
ORACLE instance started.

Total System Global Area  188743680 bytes
Fixed Size                   778036 bytes
Variable Size             162537676 bytes
Database Buffers           25165824 bytes
Redo Buffers                 262144 bytes
Database mounted.
Database opened.
SQL>

Once the instance is restarted successfully, you confirm that the control file is no longer being referenced, as you can see in this query:

SQL> select name, value from v$spparameter
  2  where name = 'control_files';

NAME            VALUE
--------------- --------------------------------------
control_files    /u02/oradata/ord/control01.ctl
control_files    /u06/oradata/ord/control02.ctl

You still have two multiplexed copies of the control file; therefore, you are covered in case of a media failure of the disk containing one of the remaining control files.

11.2.7. Recovering from Loss of a Redo Log File

A database instance stays up as long as at least one member of a redo log group is available. The alert log records the loss of a redo log group member; as with most database status information, the EM Database Control allows you to easily review the contents of the alert log.

The dynamic performance view V$LOGFILE provides the status of each member of each redo log file member of each redo log group; the STATUS column is defined as follows:

INVALID The file is corrupted or missing.

STALE This redo log file member is new and has never been used.

DELETED The file is no longer being used.

<blank> The redo log file is in use and is not corrupted.

When you are aware of a missing or deleted redo log file group member, follow these three steps to ensure that you maintain a maximum level of redundancy. Losing the remaining member( s) of the redo log group will cause the instance to fail.

  1. Verify which redo log file group member is missing.

  2. Archive the log file group's contents; if you clear this log file group before archiving it, you must back up the full database to ensure maximum recoverability of the database in the case of the loss of a datafile. Use the command ALTER SYSTEM ARCHIVE LOG GROUP groupnum; to force the archive operation. (groupnum refers to the redo log group that you want to archive.)

  3. Clear the log group to re-create the missing redo log file members using the command ALTER DATABASE CLEAR LOGFILE GROUP groupnum;. Alternatively, you can replace the missing member by copying one of the good group members to the location of the missing member; using ALTER DATABASE CLEAR LOGFILE GROUP has the advantage of being platform independent.

In this example, you lose a redo log file group member and check the status of the redo log file groups using V$LOGFILE:

SQL> select * from v$logfile
  2  order by group#;

    GROUP# STATUS  TYPE   MEMBER                       IS_
---------- ------- ------ ---------------------------- ---
         1          ONLINE /u07/oradata/ord/redo01.log  NO
         1          ONLINE /u08/oradata/ord/redo01.log  NO
         2          ONLINE /u07/oradata/ord/redo02.log  NO
         2          ONLINE /u08/oradata/ord/redo02.log  NO
         3          ONLINE /u07/oradata/ord/redo03.log  NO
         3          ONLINE /u08/oradata/ord/redo03.log  NO

6 rows selected.

SQL> ! rm /u08/oradata/ord/redo01.log

SQL> select * from v$logfile order by group#;

GROUP# STATUS  TYPE   MEMBER                       IS_
---------- ------- ------ ---------------------------- ---
         1          ONLINE /u07/oradata/ord/redo01.log  NO
          1 INVALID ONLINE /u08/oradata/ord/redo01.log  NO
         2          ONLINE /u07/oradata/ord/redo02.log  NO
         2          ONLINE /u08/oradata/ord/redo02.log  NO
         3          ONLINE /u07/oradata/ord/redo03.log  NO
         3          ONLINE /u08/oradata/ord/redo03.log  NO

6 rows selected.

It appears that group number 1 has a missing member, so you want to archive group number 1 using the ALTER SYSTEM ARCHIVE command:

SQL> alter system archive log group 1;

Finally, you can re-create the missing redo log file group member using the ALTER DATABASE command mentioned in step 3:

SQL> alter database clear logfile group 1;

Database altered.

Checking the view V$LOGFILE again, you can see that the redo log group member is no longer invalid:

SQL> select * from v$logfile order by group#;

    GROUP# STATUS  TYPE   MEMBER                       IS_
---------- ------- ------ ---------------------------- ---
         1          ONLINE /u07/oradata/ord/redo01.log  NO
         1          ONLINE /u08/oradata/ord/redo01.log  NO
         2          ONLINE /u07/oradata/ord/redo02.log  NO
         2          ONLINE /u08/oradata/ord/redo02.log  NO
         3          ONLINE /u07/oradata/ord/redo03.log  NO
         3          ONLINE /u08/oradata/ord/redo03.log  NO

6 rows selected.

By reviewing the contents of the alert log using either the EM Database Control interface by clicking the Alert Log Content link at the bottom of the Database Control home page or by reviewing the file $ORACLE_BASE/admin/ord/bdump/alert_ord.log, you can see the failures associated with the missing redo log group member:

Sun Sep 12 17:31:43 2004
ARC1: Evaluating archive   log 1 thread 1 sequence 2500

Sun Sep 12 17:31:43 2004
Errors in file /u01/app/oracle/admin/ord/bdump/ord_arc1_3717.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/u08/oradata/ord/redo01.log'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3

11.2.8. Recovering from Loss of a System-Critical Datafile

When you lose a system-critical datafile, in other words, a file from the SYSTEM or UNDO tablespace, the kinds of recovery available depend on whether you are operating in ARCHIVELOG mode or NOARCHIVELOG mode. Oracle strongly recommends operating in ARCHIVELOG mode for any production database that is not read-only.

11.2.8.1. Loss of a System-Critical Datafile in NOARCHIVELOG Mode

The loss of a system-critical datafile in NOARCHIVELOG mode requires complete restoration of the database, including the control files and all datafiles, not just the missing datafiles. As a result, you must reenter any changes made to the database since the last backup.

11.2.8.2. Loss of a System-Critical Datafile in ARCHIVELOG Mode

The recovery of a system-critical datafile in ARCHIVELOG mode cannot proceed while the database is open; recovery must be performed while the database is in the MOUNT state. Because the database is operating in ARCHIVELOG mode, you will not have to reenter any committed transactions in the system-critical datafile.

When a system-critical datafile is lost, such as the datafile for the SYSTEM tablespace, the instance will abort; in the rare circumstance that this does not happen, shut down the database and start it in MOUNT mode, as in this example:

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area  197132288 bytes
Fixed Size                   778076 bytes
Variable Size             162537636 bytes
Database Buffers           33554432 bytes
Redo Buffers                 262144 bytes
Database mounted.
SQL>

From the EM Database Control interface, your only options at this point are to start up the database or perform recovery, as you can see in Figure 11.9.

When you click Perform Recovery, you are prompted for both operating system credentials and database credentials in the Perform Recovery: Credentials screen, as shown in Figure 11.10. Enter these into the appropriate text boxes and click Continue.

Figure 11.9. The database status

Figure 11.10. The Perform Recovery: Credentials screen

On the Perform Recovery: Type screen, shown in Figure 11.11, select Datafiles as the object type to recover, and click Next. You are going to restore the SYSTEM tablespace's datafile.

On the Perform Recovery: Datafiles screen, shown in Figure 11.12, specify the datafile or datafiles that constitute the SYSTEM tablespace; in this case, it is /u05/oradata/ord/system01.dbf. Click Next.

Figure 11.11. The Perform Recovery: Type screen

Figure 11.12. The Perform Recovery: Datafiles screen

On the Perform Recovery: Rename screen, shown in Figure 11.13, you can restore the datafile to an alternate location. In this case, you want to restore and recover the datafile to the same location, so click Next.

Figure 11.13. The Perform Recovery: Rename screen

Figure 11.14. The Perform Recovery: Review screen

Figure 11.14 shows the final screen before the recovery begins; it includes the RMAN command that is executed when you click Submit.

Clicking Submit runs the RMAN command and initiates the recovery operation. After the recovery operation completes, the EM Database Control provides the output of the RMAN session that recovered the critical datafile, as you can see in Figure 11.15.

Click OK. You can now start up the database by clicking the Startup button in the EM Database Control interface or by issuing the SQL command ALTER DATABASE OPEN, as in this example:

SQL> alter database open;
Database altered.

11.2.9. Recovering from Loss of a Non–System-Critical Datafile

If you lose a non–system-critical datafile, in other words, not the SYSTEM or UNDO tablespace, your options are similar to those when you lose a system-critical datafile, except that most of your recovery effort in ARCHIVELOG mode can occur while the database is open to users, who can use tablespaces other than the one being recovered.

11.2.9.1. Loss of a Non–System-Critical Datafile in NOARCHIVELOG Mode

As with a system-critical datafile, the loss of a non–system-critical datafile in NOARCHIVELOG mode requires complete restoration of the database, including the control files and all datafiles, not just the missing datafiles. As a result, you must reenter any changes made to the database since the last backup.

Figure 11.15. The recovery operation succeeded.

11.2.9.2. Loss of a Non–System-Critical Datafile in ARCHIVELOG Mode

The loss of a non–system-critical datafile in ARCHIVELOG mode affects only objects that are in the missing file, and recovery can proceed while the rest of the database is online. Because you are in ARCHIVELOG mode, no committed transactions in the lost datafile will have to be reentered.

Recovering from the loss of a non–system-critical datafile is not quite as complicated as the recovery from a system-critical datafile that you saw earlier in the chapter; the database is continuously available to all users except for the datafiles being recovered. In the EM Database Control interface, click the Maintenance tab on the EM Database Control home page, and click the Perform Recovery link, as you did in the earlier example. As you can see on the Perform Recovery: Type screen, shown in Figure 11.16, you leave the Object Type as Datafiles and the Operation Type as Recover to Current Time or a Previous Point-in-Time.

Click Next and provide the name of the datafiles to restore; in this case, the USERS tablespace has been corrupted, so you specify the datafiles associated with the USERS tablespace in the Perform Recovery: Datafiles screen, as shown in Figure 11.17. You can use the EM Database Control or the following query to determine the name of the datafile associated with the corrupted datafile, as in this example:

SQL> select t.name, d.name
  2     from v$tablespace t join v$datafile d using (ts#)
  3     where t.name = 'USERS';

NAME                 NAME
-------------------- ----------------------------------
USERS                 /u05/oradata/ord/users01.dbf
USERS                 /u05/oradata/ord/users02.dbf

Figure 11.16. The Perform Recovery: Type screen

When you click Next, you can recover the datafile and store it in an alternate location in the Perform Recovery: Rename screen, as shown in Figure 11.18. In this example, you restore to the original location.

Figure 11.17. The Perform Recovery: Datafiles screen

Figure 11.18. The Perform Recovery: Rename screen

Click Next to display step 3 of the recovery operation, the Perform Recovery: Review screen, as shown in Figure 11.19. Not only do you see the RMAN script that will be submitted, but you can edit the script before it is submitted. In addition, you can confirm the datafiles that will be recovered.

When you click Submit, the RMAN script is executed, and the USERS tablespace is recovered. Because the database is in ARCHIVELOG mode, you will not lose any committed transactions in the USERS tablespace.

Alternatively, you can run the RMAN script at the RMAN command prompt, and you will see output similar to the following:

[oracle@oltp ord]$ rman target /

Recovery Manager: Release 10.1.0.2.0 - Production

Copyright (c) 1995, 2004, Oracle.  All rights reserved.

connected to target database: ORD (DBID=1387044942)

RMAN>    run { sql 'alter database datafile 4 offline';
               sql 'alter database datafile 7 offline';
               restore datafile 4,7;
               recover datafile 4,7;
              sql 'alter database datafile 4 online';
              sql 'alter database datafile 7 online'; }

using target database controlfile instead of recovery catalog

sql statement: alter database datafile 4 offline

sql statement: alter database datafile 7 offline

Starting restore at 12-SEP-04
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=250 devtype=DISK

channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00004 to /u05/oradata/ord/users01.dbf
restoring datafile 00007 to /u05/oradata/ord/users02.dbf
channel ORA_DISK_1: restored backup piece 1
piece handle=/OraFlash/ORD/backupset/
  2004_09_06/o1_mf_nnndf_TAG20040906T233842_0mtgtrod_.bkp
tag=TAG20040906T233842

channel ORA_DISK_1: restore complete
Finished restore at 12-SEP-04

Starting recover at 12-SEP-04
using channel ORA_DISK_1

starting media recovery

archive log thread 1 sequence 2352 is already on disk as file
    /OraFlash/ORD/archivelog/2004_09_07/
         o1_mf_1_2352_0mtj55w8_.arc
. . .
archive log
filename=/OraFlash/ORD/archivelog/2004_09_12/
         o1_mf_1_2512_0nb127m7_.arc thread=1 sequence=2512
media recovery complete
Finished recover at 12-SEP-04

sql statement: alter database datafile 4 online

sql statement: alter database datafile 7 online

RMAN>

Figure 11.19. The Perform Recovery: Review screen

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset