Tuesday, July 24, 2007

Backup & Recover2

Subject: TECH: Internals of RecoveryType: REFERENCE Creation Date: 13-SEP-1996 Oracle7 v7.2 Recovery Outline Authors: Andrea Borr & Bill Bridge Version: 1 May 3, 1995 Abstract This document gives an overview of how database recovery works in Oracle7 version 7.2. It is assumed that the reader is familiar with the Database Administrator's Guide for Oracle7 version 7.2. The intention of this document is to describe the recovery algorithms and data structures, providing more details than the Administrator's Guide. Table of Contents 1 Introduction 1.1 Instance Recovery and Media Recovery: Common Mechanisms 1.2 Instance Failure and Recovery, Crash Failure and Recovery 1.3 Media Failure and Recovery 2 Fundamental Data Structures 2.1 Controlfile 2.1.1 Database Info Record (Controlfile) 2.1.2 Datafile Record (Controlfile) 2.1.3 Thread Record (Controlfile) 2.1.4 Logfile Record (Controlfile) 2.1.5 Filename Record (Controlfile) 2.1.6 Log-History Record (Controlfile) 2.2 Datafile Header 2.3 Logfile Header 2.4 Change Vector 2.5 Redo Record 2.6 System Change Number (SCN) 2.7 Redo Logs 2.8 Thread of Redo 2.9 Redo Byte Address (RBA) 2.10 Checkpoint Structure 2.11 Log History 2.12 Thread Checkpoint Structure 2.13 Database Checkpoint Structure 2.14 Datafile Checkpoint Structure 2.15 Stop SCN 2.16 Checkpoint Counter 2.17 Tablespace-Clean-Stop SCN 2.18 Datafile Offline Range 3 Redo Generation 3.1 Atomic Changes 3.2 Write-Ahead Log 3.3 Transaction Commit 3.4 Thread Checkpoint 3.5 Online-Fuzzy Bit 3.6 Datafile Checkpoint 3.7 Log Switch 3.8 Archiving Log Switches 3.9 Thread Open 3.10 Thread Close 3.11 Thread Enable 3.12 Thread Disable 4 Hot Backup 4.1 BEGIN BACKUP 4.2 File Copy 4.3 END BACKUP 4.4 "Crashed" Hot Backup 5 Instance Recovery 5.1 Detection of the Need for Instance Recovery 5.2 Thread-at-a-Time Redo Application 5.3 Current Online Datafiles Only 5.4 Checkpoints 5.5 Crash Recovery Completion 6 Media Recovery 6.1 When to Do Media Recovery 6.2 Thread-Merged Redo Application 6.3 Restoring Backups 6.4 Media Recovery Commands 6.4.1 RECOVER DATABASE 6.4.2 RECOVER TABLESPACE 6.4.3 RECOVER DATAFILE 6.5 Starting Media Recovery 6.6 Applying Redo, Media Recovery Checkpoints 6.7 Media Recovery and Fuzzy Bits 6.7.1 Media-Recovery-Fuzzy 6.7.2 Online-Fuzzy 6.7.3 Hotbackup-Fuzzy 6.8 Thread Enables 6.9 Thread Disables 6.10 Ending Media Recovery (Case of Complete Media Recovery) 6.11 Automatic Recovery 6.12 Incomplete Recovery 6.12.1 Incomplete Recovery UNTIL Options 6.12.2 Incomplete Recovery and Consistency 6.12.3 Incomplete Recovery and Datafiles Known to the Controlfile 6.12.4 Resetlogs Open after Incomplete Recovery 6.12.5 Files Offline during Incomplete Recovery 6.13 Backup Controlfile Recovery 6.14 CREATE DATAFILE: Recover a Datafile Without a Backup 6.15 Point-in-Time Recovery Using Export/Import 7 Block Recovery 7.1 Block Recovery Initiation and Operation 7.2 Buffer Header RBA Fields 7.3 PMON vs. Foreground Invocation 8 Resetlogs 8.1 Fuzzy Files 8.2 Resetlogs SCN and Counter 8.3 Effect of Resetlogs on Threads 8.4 Effect of Resetlogs on Redo Logs 8.5 Effect of Resetlogs on Online Datafiles 8.6 Effect of Resetlogs on Offline Datafiles 8.7 Checking Dictionary vs. Controlfile on Resetlogs Open 9 Recovery-Related V$ Fixed-Views 9.1 V$LOG 9.2 V$LOGFILE 9.3 V$LOG_HISTORY 9.4 V$RECOVERY_LOG 9.5 V$RECOVER_FILE 9.6 V$BACKUP 10 Miscellaneous Recovery Features 10.1 Parallel Recovery (v7.1) 10.1.1 Parallel Recovery Architecture 10.1.2 Parallel Recovery System Initialization Parameters 10.1.3 Media Recovery Command Syntax Changes 10.2 Redo Log Checksums (v7.2) 10.3 Clear Logfile (v7.2) 1 Introduction The Oracle RDBMS provides database recovery facilities capable of preserving database integrity in the face of two major failure modes: 1. Instance failure: loss of the contents of a buffer cache, or data residing in memory. 2. Media failure: loss of database file storage on disk. Each of these two major failure modes raises its own set of challenges for database integrity. For each, there is a set of requirements that a recovery utility addressing that failure mode must satisfy. Although recovery processing for the two failure modes has much in common, the requirements differ enough to motivate the implementation of two different recovery facilities: 1. Instance recovery: recovers data lost from the buffer cache due to instance failure. 2. Media recovery: recovers data lost from disk storage. 1.1 Instance Recovery and Media Recovery: Common Mechanisms Both instance recovery and media recovery depend for their operation on the redo log. The redo log is organized into redo threads, referred to hereafter simply as threads. The redo log of a single-instance (non-Parallel Server option) database consists of a single thread. A Parallel Server redo log has a thread per instance. A redo log thread is a set of operating system files in which an instance records all changes it makes - committed and uncommitted - to memory buffers containing datafile blocks. Since this includes changes made to rollback segment blocks, it follows that rollback data is also (indirectly) recorded in the redo log. The first phase of both instance and media recovery processing is roll-forward. Roll-forward is the task of the RDBMS recovery layer. During roll-forward, changes recorded in the redo log are re- applied (as needed) to the datafiles. Because changes to rollback segment blocks are recorded in the redo log, roll-forward also regenerates the corresponding rollback data. When the recovery layer finishes its task, all changes recorded in the redo log have been restored by roll-forward. At this point, the datafile blocks contain not only all committed changes, but also any uncommitted changes recorded in the redo log. The second phase of both instance and media recovery processing is roll-back. Roll-back is the task of the RDBMS transaction layer. During roll-back, undo information from rollback segments (as well as from save-undo/deferred rollback segments, if appropriate) is used to undo uncommitted changes that were applied during the roll-forward phase. 1.2 Instance Failure and Recovery, Crash Failure and Recovery Instance failure, a failure resulting in the loss of the instance's buffer cache, occurs when an instance is aborted, either unexpectedly or expectedly. Examples of reasons for unexpected instance aborts are operating system crash, power failure, or background process failure. Examples of reasons for expected instance aborts are use of the commands SHUTDOWN ABORT and STARTUP FORCE. Crash failure is the failure of all instances accessing a database. In the case of a single-instance (non-Parallel Server option) database, the terms crash failure and instance failure are used interchangeably. Crash recovery (equivalent to instance recovery in this case) is the process of recovering all online datafiles to a consistent state following a crash. This is done automatically in response to the ALTER DATABASE OPEN command. In the case of the Parallel Server option, the term crash failure is used to refer to the simultaneous failures of all open instances. Parallel Server crash recovery is the process of recovering all online datafiles to a consistent state after all instances accessing the database have failed. This is done automatically in response to the ALTER DATABASE OPEN command. Parallel Server instance failure refers to the failure of an instance while a surviving instance continues in operation. Parallel Server instance recovery is the automatic recovery by a surviving instance of a failed instance. Instance failure impairs database integrity because it results in loss of the instance's dirty buffer cache. A "dirty" buffer is one whose memory version differs from its disk version. An instance that aborts has no opportunity for writing out "dirty" buffers so as to prevent database integrity breakage on disk following a crash. Loss of the dirty buffer cache is a problem due to the fact that the cache manager uses algorithms optimized for OLTP performance rather than for crash-tolerance. Examples of performance-optimizing cache management algorithms that make the task of instance recovery more difficult are as follows: 7 LRU (least recently used) based buffer replacement 7 no-datablock-force-at-commit (see 3.3). As a consequence of the performance-oriented cache management algorithms, instance failure can cause database integrity breakage as follows: A. At crash time, the datafiles on disk might contain some but not all of a set of datablock changes that constitute a single atomic change to the database with respect to structural integrity (see 2.5). B. At crash time, the datafiles on disk might contain some dat- ablocks modified by uncommitted transactions. C. At crash time, the datafiles on disk might contain some dat- ablocks missing changes from committed transactions. During instance recovery, the RDBMS recovery layer repairs database integrity breakages A and C. It also enables subsequent repair - by the RDBMS transaction layer - of database integrity breakage B. In addition to the requirement that it repair any integrity breakages resulting from the crash, instance recovery must meet the following requirements: 1. Instance recovery must accomplish the repair using the current online datafiles (as left on disk after the crash). 2. Instance Recovery must use only the on-line redo logs. It must not require use of the archived logs. Although instance recov- ery could work successfully from archived logs (except for a database running in NOARCHIVELOG mode), it could not work autonomously (requirement 4) if an operator were required to restore archived logs. 3. The invocation of instance recovery must be automatic, implicit at the next database startup. 4. Detection of the need for repair and the repair itself must pro- ceed autonomously, without operator intervention. 5. The duration of the roll-forward phase of instance recovery is governed by both RDBMS internal mechanisms (checkpoint) and user-configurable parameters (e.g. number and sizes of logfiles, checkpoint-frequency tuning parameters, parallel recovery parameters). As seen above, Oracle's buffer cache component is optimized for OLTP performance rather than for crash-tolerance. This document describes some of the mechanisms used by the cache and recovery components to solve the problems posed by use of performance- optimizing cache algorithms such as LRU buffer replacement and no-datablock-force-at-commit. These mechanisms enable instance recovery to meet its requirements while allowing optimal OLTP performance. These mechanisms include: 7 Log-Force-at-Commit: see 3.3. Facilitates repair of breakage type C by guaranteeing that, at transaction commit time, all of the transaction's redo records, including its "commit record," are stored on disk in the on-line redo log. 7 Checkpointing: see 3.4, 3.6. Bounds the amount of transaction redo that instance recovery must potentially apply. Works in conjunction with online-log switch management to ensure that instance recovery can be accomplished using only online logs and current online datafiles. 7 Online-Log Switch Management: see 3.7. Works in conjunction with checkpointing to ensure that instance recovery can be accomplished using only online logs and current online datafiles. It guarantees that the current checkpoint is beyond an online logfile before that logfile is reused. 7 Write-Ahead-Log: see 3.2. Facilitates repair of breakage types A and B by guaranteeing that: (i) at crash time there are no changes in the datafiles that are not in the redo log; (ii) no datablock change was written to disk without first writing to the log sufficient information to enable undo of the change should a crash intervene before commit. 7 Atomic Redo Record Generation: see 3.1. Facilitates repair of breakage types A and B. 7 Thread-Open Flag: 5.1. Enables detection at startup time of the need for crash recov- ery. 1.3 Media Failure and Recovery Instance failure affects logical database integrity. Because instance failure leaves a recoverable version of the online datafiles on the post-crash disk, instance recovery can use the online datafiles as a starting point. Media failure, on the other hand, affects physical storage media integrity or accessibility. Because the original datafile copies are damaged, media recovery uses restored backup copies of the datafiles as a starting point. Media recovery then uses the redo log to roll-forward these files, either to a consistent present state or to a consistent past state. Media recovery is run by issuing one of the following commands: RECOVER DATABASE, RECOVER TABLESPACE, RECOVER DATAFILE. Depending on the failure scenario, a media failure has the potential for causing database integrity breakages similar to those caused by an instance failure. For example, an integrity breakage of type A, B, or C could result if I/O accessibility to a datablock were lost between the time the block was read into the buffer cache and the time DBWR attempted to write out an updated version of the block. More typical, however, is the case of a media failure that results in the permanent loss of the current version of a datafile, and hence of all updates to that datafile that occurred since the last time the file was backed up. Before media recovery is invoked, backup copies of the damaged datafiles are restored. Media recovery then applies relevant portions of the redo log to roll-forward the datafile backups, making them current. Current implies a pre-failure state consistent with the rest of the database Media recovery and instance recovery have in common the requirement to repair database integrity breakages A-C. However, media recovery and instance recovery differ with respect to requirements 1-5. The requirements for media recovery are as follows: 1. Media recovery must accomplish the repair using restored backups of damaged datafiles. 2. Media recovery can use archived logs as well as the online logs. 3. Invocation of media recovery is explicit, by operator com- mand. 4. Detection of media failure (i.e. the need to restore a backup) is not automatic.Once a backup has been restored however, detection of the need to recover it via media recovery is auto- matic. 5. The duration of the roll-forward phase of media recovery is governed solely by user policy (e.g. frequency of backups, parallel recovery parameters) rather than by RDBMS internal mechanisms. 2 Fundamental Data Structures 2.1 Controlfile The controlfile contains records that describe and keep state information about all the other files of the database. The controlfile contains the following categories of records: 7 Database Info Record (1) 7 Datafile Records (1 per datafile) 7 Thread Records (1 per thread) 7 Logfile Records (1 per logfile) 7 Filename Records (1 per datafile or logfile group member) 7 Log-History Records (1 per completed logfile) Fields of the controlfile records referenced in the remainder of this document are listed below, together with the number(s) of the section(s) describing their use: 2.1.1 Database Info Record (Controlfile) 7 resetlogs timestamp: 8.2 7 resetlogs SCN: 8.2 7 enabled thread bitvec: 8.3 7 force archiving SCN: 3.8 7 database checkpoint thread (thread record index): 2.13, 3.10 2.1.2 Datafile Record (Controlfile) 7 checkpoint SCN: 2.14, 3.4 7 checkpoint counter: 2.16, 5.3, 6.2 7 stop SCN: 2.15, 6.5, 6.10, 6.13 7 offline range (offline-start SCN, offline-end checkpoint): 2.18 7 online flag 7 read-enabled, write-enabled flags (1-1: read/write, 1-0: read- only) 7 filename record index 2.1.3 Thread Record (Controlfile) 7 thread checkpoint structure: 2.12, 3.4, 8.3 7 thread-open flag: 3.9, 3.11, 8.3 7 current log (logfile record index) 7 head and tail (logfile record indices) of list of logfiles in thread: 2.8 2.1.4 Logfile Record (Controlfile) 7 log sequence number: 2.7 7 thread number: 8.4 7 next and previous (logfile record indices) of list of logfiles in thread: 2.8 7 count of files in group: 2.8 7 low SCN: 2.7 7 next SCN: 2.7 7 head and tail (filename record indices) of list of filenames in group: 2.8 7 "being cleared" flag: 10.3 7 "archiving not needed" flag: 10.3 2.1.5 Filename Record (Controlfile) 7 filename 7 filetype 7 next and previous (filename record indices) of list of filenames in group: 2.8 2.1.6 Log-History Record (Controlfile) 7 thread number: 2.11 7 log sequence number: 2.11 7 low SCN: 2.11 7 low SCN timestamp: 2.11 7 next SCN: 2.11 2.2 Datafile Header Fields of the datafile header referenced in the remainder of this document are listed below, together with the number(s) of the section(s) describing their use: 7 datafile checkpoint structure: 2.14 7 backup checkpoint structure: 4.1 7 checkpoint counter: 2.16, 3.4, 5.3, 6.2 7 resetlogs timestamp: 8.2 7 resetlogs SCN: 8.2 7 creation SCN: 8.1 7 online-fuzzy bit: 3.5, 6.7.1, 8.1 7 hotbackup-fuzzy bit: 4.1, 4.4, 6.7.1, 8.1 7 media-recovery-fuzzy bit: 6.7.1, 8.1 2.3 Logfile Header Fields of the logfile header referenced in the remainder of this document are listed below, together with the number(s) of the section(s) describing their use: 7 thread number: 2.7 7 sequence number: 2.7 7 low SCN: 2.7 7 next SCN: 2.7 7 end-of-thread flag: 6.10 7 resetlogs timestamp: 8.2 7 resetlogs SCN: 8.2 2.4 Change Vector A change vector describes a single change to a single datablock. It has a header that gives the Data Block Address(DBA) of the block, the incarnation number, the sequence number, and the operation. After the header is information that depends on the operation. The incarnation number and sequence number are copied from the block header when the change vector is constructed. When a block is made "new," the incarnation number is set to a value that is greater than its previous incarnation number and the sequence number is set to one. The sequence number on the block is incremented after every change is applied. 2.5 Redo Record A redo record is a group of change vectors describing a single atomic change to the database. For example, a transaction's first redo record might group a change vector for the transaction table (rollback segment header), a change vector for the undo block (rollback segment), and a change vector for the datablock. A transaction can generate multiple redo records. The grouping of change vectors into a redo record allows multiple database blocks to be changed so that either all changes occur or no changes occur, despite arbitrary intervening failures. This atomicity guarantee is one of the fundamental jobs of the cache layer. Recovery preserves redo record atomicity across failures. 2.6 System Change Number (SCN) An SCN defines a committed version of the database. A query reports the contents of the database as it looked at some specific SCN. An SCN is allocated and saved in the header of a redo record that commits a transaction. An SCN may also be saved in a record when it is necessary to mark the redo as being allocated after a specific SCN. SCN's are also allocated and stored in other data structures such as the controlfile or datafile headers. An SCN is at least 48 bits long. Thus they can be allocated at a rate of 16,384 SCN's per second for over 534 years without running out of them. We will run out of SCN's in June, 2522 AD (we use 31 day months for time stamps). 2.7 Redo Logs All changes to database blocks are made by constructing a redo record for the change, saving this record in a redo log, then applying the change vectors to the datablocks. Recovery is the process of applying redo to old versions of datablocks to make them current. This is necessary when the current version has been lost. When a redo log becomes full it is closed and a log switch occurs. Each log is identified by its thread number (see below), sequence number (within thread), and the range of SCN's spanned by its redo records. This information is stored in the thread number, sequence number, low SCN, and next SCN fields of the logfile header. The redo records in a log are ordered by SCN. Moreover, redo records containing change vectors for a given block occur in increasing SCN order across threads (case of Parallel Server). Only some records have SCN's in their header, but every record is applied after the allocation of the SCN appearing with or before it in the log. The header of the log contains the low SCN and the next SCN. The low SCN is the SCN associated with the first redo record (unless there is an SCN in its header). The next SCN is the low SCN of the log with the next higher sequence number for the same thread. The current log of an enabled thread has an infinite next SCN, since there is no log with a higher sequence number. 2.8 Thread of Redo The redo generated by an instance - by each instance in the Parallel Server case - is called a thread of redo. A thread is comprised of an online portion and (in ARCHIVELOG mode) an archived portion. The online portion of a thread is comprised of two or more online logfile groups. Each group is comprised of one or more replicated members. The set of members in a group is referred to variously as a logfile group, group, redo log, online log, or simply log. A redo log contains only redo generated by one thread. Log sequence numbers are independently allocated for each thread. Each thread switches logs independently. For each logfile, there is a controlfile record that describes it. The index of a log's controlfile record is referred to as its log number. Note that log numbers are equivalent to log group numbers, and are globally unique (across all threads). The list of a thread's logfile records is anchored in the thread record (i.e. via head and tail logfile record indices), and linked through the logfile records, each of which stores the thread number. The logfile record also has fields identifying the number of group members, as well as the head and tail (i.e. filename record indices) of the list (linked through filename records) of filenames in the group. 2.9 Redo Byte Address (RBA) An RBA points to a specific location in a particular redo thread. It is ten bytes long and has three components: log sequence number, block number within log, and byte number within block. 2.10 Checkpoint Structure The checkpoint structure is a data structure that defines a point in all the redo ever generated for a database. Checkpoint structures are stored in datafile headers and in the per-thread records of the controlfile. They are used by recovery to know where to start reading the log thread(s) for redo application. The key fields of the checkpoint structure are the checkpoint SCN and the enabled thread bitvec. The checkpoint SCN effectively demarcates a specific location in each enabled thread (for a definition of enabled see 3.11). For each thread, this location is where redo was being generated at some point in time within the resolution of one commit. The redo record headers in the log can be scanned to find the first redo record that was allocated at the checkpoint SCN or higher. The enabled thread bitvec is a mask defining which threads were enabled at the time the checkpoint SCN was allocated. Note that a bit is set for each thread that was enabled, regardless of whether it was open or closed. Every thread that was enabled has a redo log that contains the checkpoint SCN. A log containing this SCN is guaranteed to exist (either online or archived). The checkpoint structure also stores the time that the checkpoint SCN was allocated. This timestamp is only used to print a message to aid a person looking for a log. In addition, the checkpoint structure stores the number of the thread that allocated the checkpoint SCN and the current RBA in that thread when the checkpoint SCN was allocated. Having an explicitly-stored thread RBA (as opposed to only having the checkpoint SCN as an implicit thread location "pointer") makes the log sequence number (part of the RBA) and archived log name readily available for the single-instance (i.e. single-thread, non Parallel Server) case. A checkpoint structure for a port that supports up to 1023 threads of redo is 150 bytes long. A VMS checkpoint is 30 bytes and supports up to 63 threads of redo. 2.11 Log History The controlfile can be configured (using the MAXLOGHISTORY clause of the CREATE DATABASE or CREATE CONTROLFILE command) to contain a history record for every logfile that is completed. Log history records are small (24 bytes on VMS). They are overwritten in a circular fashion so that the oldest information is lost. For each logfile, the log-history controlfile record contains the thread number, log sequence number, low SCN, low SCN timestamp, and next SCN (i.e. low SCN of the next log in sequence). The purpose of the log history is to reconstruct archived logfile names from an SCN and thread number. Since a log sequence number is contained in the checkpoint structure (part of the RBA), single thread (i.e. non-Parallel Server) databases do not need log history to construct archived log names. The fields of the log history records are viewable via the V$LOG_HISTORY "fixed-view" (see Section 9 for a description of the recovery-related "fixed-views"). Additionally, V$RECOVERY_LOG, which displays information about archived logs needed to complete media recovery, is derived from information in the log history records. Although log history is not strictly needed for easy administration of single-instance (non- Parallel Server) databases, enabling use of V$LOG_HISTORY and V$RECOVERY_LOG might be a reason to configure it. 2.12 Thread Checkpoint Structure Each enabled thread's controlfile record contains a checkpoint structure called the thread checkpoint. The SCN field in this structure is known as the thread checkpoint SCN. The thread number and RBA fields in this structure refer to the associated thread. The thread checkpoint structure is updated each time an instance checkpoints its thread (see 3.4). During such thread checkpoint events, the instance associated with the thread writes to disk in the online datafiles all dirty buffers modified by redo generated before the thread checkpoint SCN. A thread checkpoint event guarantees that all pre-thread- checkpoint-SCN redo generated in that thread for all online datafiles has been written to disk. (Note that if the thread is closed, then there is no redo beyond the thread checkpoint SCN; i.e. the RBA points just past the last redo record in the current log.) It is the job of instance recovery to ensure that all of the thread's redo for all online datafiles is applied. Because of the guarantee that all of the thread's redo prior to the thread checkpoint SCN has already been applied, instance recovery can make the guarantee that, by starting redo application at the thread checkpoint SCN, and continuing through end-of-thread, all of the thread's redo will have been applied. 2.13 Database Checkpoint Structure The database checkpoint structure is the thread checkpoint of the thread that has the lowest checkpoint SCN of all the open threads. The number of the database checkpoint thread - the number of the thread whose thread checkpoint is the current database checkpoint - is recorded in the database info record of the controlfile. If there are no open threads, then the database checkpoint is the thread checkpoint that contains the highest checkpoint SCN of all the enabled threads. Since each instance guarantees that all redo generated before its own thread checkpoint SCN has been written, and since the database checkpoint SCN is the lowest of the thread checkpoint SCNs, it follows that all pre-database-checkpoint-SCN redo in all instances has been written to all online datafiles. Thus, all pre-database-checkpoint-SCN redo generated in all threads for all online datafiles is guaranteed to be in the files on disk already. This is described by saying that the online datafiles are checkpointed at the database checkpoint. This is the rationale for using the database checkpoint to update the online datafile checkpoints (see below) when an instance checkpoints its thread (see 3.4). 2.14 Datafile Checkpoint Structure The header of each datafile contains a checkpoint structure known as the datafile checkpoint. The SCN field in this structure is known as the datafile checkpoint SCN. All pre-checkpoint-SCN redo generated in all threads for a given datafile is guaranteed to be in the file on disk already. An online datafile has its checkpoint SCN replicated in its controlfile record. Note: Oracle's recovery layer code is designed to "tolerate" a discrepancy in checkpoint SCN between the file header and the controlfile record. These values could get out of sync should an instance failure occur between the time the file header was updated and the time the controlfile "transaction" committed. (Note: A controlfile "transaction" is an RDBMS internal mechanism, independent of the Oracle transaction layer, that allows an arbitrarily large update to the controlfile to be "committed" atomically.) The execution of a datafile checkpoint (see 3.6) for a given datafile updates the checkpoint structure in the file header, and guarantees that all pre-checkpoint-SCN redo generated in all threads for that datafile is on disk already. A thread checkpoint event (see 3.4) guarantees that all pre- database-checkpoint-SCN redo generated in all threads for all online datafiles has been written to disk. The execution of a thread checkpoint may advance the database checkpoint (e.g. in the single-instance case; or if the thread having the oldest checkpoint changed from being the current thread to another thread). If the database checkpoint does advance, then the new database checkpoint is used to update the datafile checkpoints of all the online datafiles (except those in hot backup: see Section 4). It is the job of media recovery (see Section 6) to ensure that all redo for a recovery-datafile (i.e. a datafile being media-recovered) generated in any thread through the recovery end-point is applied. Because of the guarantee that all recovery-datafile-redo generated in any enabled thread prior to that datafile's checkpoint SCN has already been applied, media recovery can make the guarantee that, by starting redo application in each enabled thread with the datafile checkpoint SCN and continuing through the recovery end-point (e.g. end-of-thread on all threads in the case of complete media recovery), all redo for the recovery-datafile from all threads will have been applied. Since the datafile checkpoint is stored in the header of the datafile itself, it is also present in backup copies of the datafile. It is the job of hot backup (see Section 4) to ensure that - despite the occurrence of ongoing updates to the datafile during the backup copy operation - the version of the datafile's checkpoint captured in the backup copy satisfies the checkpoint-SCN guarantee with respect to the versions of the datafile's datablocks captured in the backup copy. 2.15 Stop SCN Each datafile's controlfile record has a field called the stop SCN. If the file is offline or read-only, the stop SCN is the SCN beyond which no further redo exists for that datafile. If the file is online and any instance has the database open, the stop SCN is set to "infinity." The stop SCN is used during media recovery to determine when redo application for a particular datafile can stop. This ensures that media recovery will terminate when recovering an offline file while the database is open. The stop SCN is set whenever a datafile is taken offline or set read- only. This is true whether the offline was "immediate" (due to an I/ O error, or due to taking the file's tablespace offline "immediate"), "temporary" (due to taking the file's tablespace offline "temporary"), or "normal" (due to taking the file's tablespace offline "normal"). However, in the case of a datafile taken offline "immediate," there is no file checkpoint (see 3.6), and dirty buffers are discarded. Hence, media recovery may need to apply redo from before the stop SCN in order to bring the datafile online. However, media recovery does not need to look for redo after the stop SCN, since it does not exist. If the stop SCN is equal to the datafile checkpoint SCN, then the file does not need recovery. 2.16 Checkpoint Counter There is a checkpoint counter kept in both the datafile header and in the datafile's controlfile record. Its purpose is to allow detection of the fact that a datafile or controlfile is a restored backup. The checkpoint counter is incremented every time checkpoints of online files are being advanced (e.g. by thread checkpoint). Thus the datafile's checkpoint counter is incremented even though the datafile's checkpoint is not being advanced because the file is in hot backup (see Section 4), or because its checkpoint SCN is already beyond that of the intended checkpoint (e.g. the file is new or has undergone a recent datafile checkpoint). The old value of the checkpoint counter - matching the checkpoint counter in the datafile's controlfile record - is also remembered in the file header. It is usually one less than the current counter in the header, but may differ from the current counter by more than one if the previous file header update failed after the header was written but before the controlfile "transaction" committed. A mismatch in checkpoint counters between the datafile header and the datafile's controlfile record is used to detect when a backup datafile (or a backup controlfile) has been restored. 2.17 Tablespace-Clean-Stop SCN TS$, a data dictionary table that describes tablespaces, has a column called the tablespace-clean-stop-SCN. It identifies an SCN at which a tablespace was taken offline or set read-only "cleanly": i.e. after checkpointing its datafiles (see 3.6). The SCN at which the datafiles are checkpointed is recorded in TS$ as the tablespace-clean-stop SCN. It allows such a "clean-stopped" tablespace to survive (i.e. not need to be dropped after) a RESETLOGS open (see 8.6). During media recovery, prior to resetlogs, the "clean-stopped" tablespace would be set offline. After resetlogs, the tablespace - which needs no recovery - is permitted to be brought online and/or set read-write. (An immediate backup of the tablespace is recommended). The tablespace-clean-stop SCN is set to zero (after being set momentarily to "infinity" during datafile state transition) when bringing an offline-clean tablespace online, or setting a read-only tablespace read-write. The tablespace-clean-stop SCN is also zeroed when taking a tablespace offline "immediate" or "temporary." A tablespace that has a non-zero tablespace-clean-stop SCN in TS$ is clean at that SCN: the tablespace currently contains all redo up through that SCN, and no redo for the tablespace beyond that SCN exists. If the tablespace's datafiles are still in the state they had when the tablespace was taken offline "normal" or set read-only - i.e. they are not restored backups, are not fuzzy, and are checkpointed at the clean-stop SCN - then the tablespace can be brought online without recovery. Note that the semantics of the tablespace-clean-stop SCN differ from those of a constituent datafile's stop SCN in the datafile's controlfile record. The controlfile stop SCN designates an SCN beyond which no redo for the datafile exists. This does not imply that the datafile currently contains all redo up through that SCN. The tablespace-clean-stop SCN is stored in TS$ rather than in the controlfile so that it is covered by redo and will finish in the correct state - i.e. reflecting the correct online/offline state of the tablespace - following an incomplete recovery (see 6.12). Its value will not be lost if a backup controlfile is restored, or if a new controlfile is created. Furthermore, the presence of the tablespace- clean-stop SCN in TS$ allows an offline normal (or read-only) tablespace to survive (not need to be dropped after) a RESETLOGS open, since it is known that no redo application is needed to bring it online (see 8.6 for more detail). Thus, for example, an offline normal (or read-only) tablespace that was offline during an incomplete recovery can be brought online (or set read-write) subsequent to a RESETLOGS open. Without the tablespace-clean-stop SCN, there would be no way of knowing that the tablespace does not need recovery using redo that was discarded by the resetlogs. The only alternative would have been to force the tablespace to be dropped. 2.18 Datafile Offline Range The offline-start SCN and offline-end checkpoint fields of the controlfile datafile record describe the offline range. If valid, they delimit a log range guaranteed not to contain any redo for the datafile. Thus, media recovery can skip this log range when recovering the datafile, obviating the need to access old archived log data (which may be uavailable or unusable due to resetlogs: see Section 7). This optimization aids in recovering a datafile that is presently online (or read-write), but that was offline-clean (or read- only) for a long time, and whose last backup dates from that time. For example, this would be the case if, after a RESETLOGS open, an offline normal (or read-only) tablespace had been brought online (or set read-write), but not yet backed up. When a datafile transitions from offline-clean to online (or from read-only to read-write), the offline range is set as follows: The offline-start SCN is set from the tablespace-clean-stop SCN saved when setting the file offline (or read-only). The offline-end checkpoint is set from the file checkpoint taken when setting the file online (or read-write). 3 Redo Generation Redo is generated to describe all changes made to database blocks. This section describes the various operations that occur while the database is open and generating redo. 3.1 Atomic Changes The most fundamental operation is to atomically change a set of datablocks. A foreground process intending to change one or more datablocks first acquires exclusive access to cache buffers containing those blocks. It then constructs the change vectors describing the changes. Space is allocated in the redo log buffer to hold the redo record. The redo log buffer - the buffer from which LGWR writes the redo log - is located in the SGA (System Global Area). It may be necessary to ask LGWR to write the buffer to the redo log in order to make space. If the log is full, LGWR may need to do a log switch in order to make the space available. Note that allocating space in the redo buffer also allocates space in the logfile. Thus, even though the redo buffer has been written, it may not be possible to allocate redo log space. After the space is allocated, the foreground process builds the redo record in the redo buffer. Only after the redo record has been built in the redo buffer may the datablock buffers be changed. Writing the redo to disk is the real change to the database. Recovery ensures that all changes that make it into the redo log make it into the datablocks (except in the case of incomplete recovery). 3.2 Write-Ahead Log Write-ahead log is a cache-enforced protocol governing the order in which dirty datablock buffers are written vs. when the redo log buffer is written. According to write-ahead log protocol, before DBWR can write out a cache buffer containing a modified datablock, LGWR must write out the redo log buffer containing redo records describing changes to that datablock. Note that write-ahead log is independent of log-force-at-commit (see 3.3). Note also that write-ahead log protocol only applies to datafile writes that originate from the buffer cache. In particular, write- ahead log does not apply to so-called direct path writes (e.g. originating from direct path load, table create via subquery, or index create). Direct path writes (targeted above the segment high- water mark) originate not as writes out of the buffer cache, but as bulk-writes out of the foreground process' data space. Indeed, correct handling of direct path writes by media recovery dictates a write-behind-log protocol. (The basic reason is that, because the bulk-writes do not go through the buffer cache, there is no mechanism to guarantee their completion at checkpoint). One guarantee made by write-ahead log protocol is that there are no changes in the datafiles that are not in the redo log, regardless of intervening failure. This is what enables recovery to preserve the guarantee of redo record atomicity despite intervening failure. Another guarantee made by write-ahead log protocol is that no datablock change can be written to disk without first writing to the redo log sufficient information to enable the change to be undone should the transaction fail to commit. That undo-enabling information is written to the redo log in the form of "redo" for the rollback segment. Write-ahead log protocol plays a key role in enabling the transaction layer to preserve the guarantee of transaction atomicity despite intervening failure. 3.3 Transaction Commit Transaction commit allocates an SCN and builds a commit redo record containing that SCN. The commit is complete when all of the transaction's redo (including commit redo record) is on disk in the log. Thus, commit forces the redo log to disk - at least up to and including the transaction's commit record. This is termed log- force-at-commit. Recovery is designed such that it is sufficient to write only the redo log at commit time - rather than all datablocks changed by the transaction - in order to guarantee transaction durability despite intervening failure. This is termed no-datablock-force-at-commit. 3.4 Thread Checkpoint A thread checkpoint event, executed by the instance associated with the redo thread being checkpointed, forces to disk all dirty buffers in that instance that contain changes to any online datafile before a designated SCN - the thread checkpoint SCN. Once all redo in the thread prior to the checkpoint SCN has been written to disk, the thread checkpoint structure in the thread's controlfile record is updated in a controlfile transaction. When a thread checkpoint begins, an SCN is captured and a checkpoint structure is initialized. Then all the dirty buffers in the instance's cache are marked for checkpointing. DBWR proceeds to write out the marked buffers in a staged manner. Once all the marked buffers have been written, the SCN in the checkpoint structure is set to the captured SCN, and the thread checkpoint structure in the thread's controlfile record is updated in a controlfile transaction. A thread checkpoint might or might not advance the database checkpoint. If only one thread is open, the new checkpoint is the new database checkpoint. If multiple threads are open, the database checkpoint will advance if the local thread is the current database checkpoint. Since the new checkpoint SCN was allocated recently, it is most likely greater than the thread checkpoint SCN in some other open thread. If it advances, the database checkpoint becomes the new lowest-SCN open thread checkpoint. If the old checkpoint SCN for the local thread was higher than the current checkpoint SCN of some other open thread, then the database checkpoint does not change. If the database checkpoint is advanced, then the checkpoint counter is advanced in every online datafile header. Furthermore, for each online datafile that is not in hot backup (see Section 4), and not already checkpointed at a higher SCN (e.g. as would be the case for a recently added or recovered file), the datafile header checkpoint is advanced to the new database checkpoint, and the file header is written to disk. Also, the checkpoint SCN in the datafile's controlfile record is advanced to the new database checkpoint SCN. 3.5 Online-Fuzzy Bit Note that more changes - beyond those already in the marked buffers - may be generated after the start of checkpoint. Such changes would be generated at SCNs higher than the SCN that will be recorded in the file header. They could either be changes to marked buffers that were added since checkpoint start, or else changes to unmarked buffers. Buffers containing these changes could written out for a variety of reasons. Thus, the online files are online-fuzzy; that is, they generally contain changes in the future of (i.e. generated at higher SCNs than) their header checkpoint SCN. A datafile is virtually always online-fuzzy while it is online and the database is open. Online-fuzzy state is indicated by setting the so-called online-fuzzy bit in the datafile header. The online-fuzzy bits of all online datafiles are set at database open time. Also, when a datafile is brought online while the database is open, its online-fuzzy bit is set. The online-fuzzy bits are cleared after the last instance does a shutdown "normal" or "immediate." Other occasions for clearing the online-fuzzy bits are: (i) the finish of crash recovery; (ii) when media recovery "checkpoints" (flushes its buffers) after encountering an end-crash-recovery redo record (see 5.5); (iii) when taking a datafile offline "temporary" or "normal" (i.e. an offline operation that is preceded by a file checkpoint); (iv) when BEGIN BACKUP is issued (see 4.1). As will be seen in 8.1, open with resetlogs will fail if any online datafile has the online-fuzzy bit (or any fuzzy bit) set. 3.6 Datafile Checkpoint A datafile checkpoint event, executed by all open instances (for all open threads), forces to disk all dirty buffers in any instance that contain changes to a particular datafile (or set of datafiles) before a designated SCN - the datafile checkpoint SCN. Once all datafile- related redo from all open threads prior to the checkpoint SCN has been written to disk, the datafile checkpoint structure in the file header is updated and written to disk. Datafile checkpoints occur as part of operations such as beginning hot backup (see Section 4) and offlining datafiles as part of taking a tablespace offline normal (see 2.17). 3.7 Log Switch When an instance needs to generate more redo but cannot allocate enough blocks in the current log, it does a log switch. The first step in a log switch is to find an online log that is a candidate for reuse. The first requirement for the candidate log is that it must not be active: i.e. it must not be needed for crash/instance recovery. In other words, it must be overwritable without losing redo data needed for instance recovery. The principle enforced is that a logfile cannot be reused until the current thread checkpoint is beyond that logfile. Since instance recovery starts at the current thread checkpoint SCN/RBA (and expects to find that RBA in an online redo log), the ability to do instance recovery using only online logs translates into the requirement that the current thread checkpoint SCN be beyond the highest SCN associated with redo in the candidate log. If this is not the case, then the thread checkpoint currently in progress - e.g. the one started when the candidate log was originally switched into (see below) - is hurried up to complete. The other requirement for the candidate log is that it does not need archiving. Of course, this requirement only applies to a database running in ARCHIVELOG mode. If archiving is required, the archiver is posted. As soon as the log switch completes, a new thread checkpoint is started in the new log. Hopefully, the checkpoint will complete before the next log switch is needed. 3.8 Archiving Log Switches Each thread switches logs independently. Thus, when running Parallel Server, an SCN is almost never at the beginning of a log in all threads. However, it is desirable to have roughly the same range of SCNs in the archived logs of all enabled threads. This ensures that the last log archived in each thread is reasonably current. If an unarchived log for an enabled thread contained a very old SCN (as would occur in the case of a relatively idle instance), it would not be possible to use archived logs from a primary site to do recovery to a higher SCN at a standby site. This would be true even if the log with the low SCN contained no redo. This problem is solved by forcing log switches in other threads when their current log is significantly behind the log just archived. For the case of an open thread, a lock is used to "kick" the laggard instance into switching logs and archiving when it can. For the case of a closed thread, the archiving process in the active instance does the closed thread's log switch and archiving for it. Note that this can result in a thread that is enabled but never used having a bunch of archived logs with only a file header. A force archiving SCN is maintained in the database info controlfile record to implement this feature. The system strives to archive any log that contains that SCN or less. In general, the log with the lowest SCN is archived first. The command ALTER SYSTEM ARCHIVE LOG CURRENT can be used to manually archive the current logs of all enabled threads. It forces all threads, open and closed, to switch to a new log. It archives what is necessary to ensure all the old logs are archived. It does not return until all redo generated before the command was entered is archived. This command is useful for ensuring all redo logs necessary for the recovery of a hot backup are archived. It is also useful for ensuring the potential currency of a standby site in a configuration in which archived logs from a primary site are shipped to a standby site for application by recovery in case of disaster (i.e. "standby database"). 3.9 Thread Open When an instance opens the database, it needs to open a thread for redo generation. The thread is chosen at mount time. A system initialization parameter can be used to specify the thread to mount by number. Otherwise, any available publicly-enabled thread can be chosen by the instance at mount time. A thread-mounted lock is used to prevent two instances from mounting the same thread. When an instance opens a thread, it sets the thread-open flag in the thread's controlfile record. While the instance is alive, it holds a set of thread-opened locks (one held by each of LGWR, DBWR, LCK0, LCK1, ...). (These are released at instance death, enabling one instance to detect the death of another in the Parallel Server environment: see 5.1). Also at thread open time, a new checkpoint is captured and used for the thread checkpoint. If this is the first database open, this becomes the new database checkpoint, ensuring all online files have their header checkpoints advanced at open time. Note that a log switch may be forced at thread open time. 3.10 Thread Close When an instance closes the database, or when a thread is recovered by instance/crash recovery, the thread is closed. The first step in closing a thread is to ensure that no more redo is generated in it. The next step is to ensure that all changes described by existing redo records are in the online datafiles on disk. In the case of normal database close, this is accomplished by doing a thread checkpoint. The SCN from this final thread checkpoint is said to be the "SCN at which the thread was closed." Finally, the thread's controlfile record is updated to clear the thread-open flag. In the case of thread close by instance recovery, the presence in the online datafiles of all changes described by thread redo records is ensured by starting redo application at the most recent thread checkpoint and continuing through end-of-thread. Once all changes described by thread redo records are in the online datafiles, the thread checkpoint is advanced to the end-of-thread. Just as in the case of a normal thread checkpoint, this checkpoint may advance the database checkpoint. If this is the last thread close, the database checkpoint thread field in the database info controlfile record - which normally points to an open thread - will be left pointing at this thread, even though it is closed. 3.11 Thread Enable In order for a thread to be opened, it must be enabled. This ensures that its redo will be found during media recovery. A thread may be enabled in either public or private mode. A private thread can only be mounted by an instance that specifies it in the THREAD system initialization parameter. This is analogous to rollback segments. A thread must have at least two online redo log groups while it is enabled. An enabled thread always has one online log that is its current log. The next SCN of the current log is infinite, so that any new SCN allocated will be within the current log. A special thread- enable redo record is written in the thread of an instance enabling a new thread (i.e. via ALTER DATABASE ENABLE THREAD). The thread-enable redo record is used by media recovery to start applying redo from the new thread. Note that this means it takes an open thread to enable another thread. This chicken and egg problem is resolved by having thread one automatically enabled publicly at database creation. This also means that databases that do not run in Parallel Server mode do not need to enable a thread. 3.12 Thread Disable If a thread is not going to be used for a long while, it is best to disable it. This means that media recovery will not expect any redo to be found in the thread. Once a thread is disabled, its logs may be dropped. A thread must be closed before it can be disabled. This ensures all its changes have been written to the datafiles. A new SCN is allocated to save as the next SCN for the current log. The log header is marked with this SCN and flags saying it is the end of a disabled thread. It is important that a new current SCN is allocated. This ensures the SCN in any checkpoint with this thread enabled will appear in one of the logs from the thread. Note that this means a thread must be open in order to disable another thread. Thus, it is not possible to disable all threads. 4 Hot Backup A hot backup is a copy of a datafile that is taken while the file is in active use. Datafile writes (by DBWR) go on as usual during the time the backup is being copied. Thus, the backup gets a "fuzzy" copy of the datafile: 7 Some blocks may be ahead in time versus other blocks of the copy. 7 Some blocks of the copy may be ahead of the checkpoint SCN in the file header of the copy. 7 Some blocks may contain updates that constitute breakage of the redo record atomicity guarantee with respect to other blocks in this or other datafiles. 7 Some block copies may be "fractured" (due to front and back halves being copied at different times, with an intervening update to the block on disk). The "hotbackup-fuzzy" copy is unusable without "focusing" (via the redo log) that occurs when the backup is restored and undergoes media recovery. Media recovery applies redo (from all threads) from the begin-backup checkpoint SCN (see Step 2. in Section 4.1) through the end-point of the recovery operation (either complete or incomplete). The result is a transaction-consistent "focused" version of the datafile. There are three steps to taking a hot backup: 7 Execute the ALTER TABLESPACE ... BEGIN BACKUP command. 7 Use an operating system copy utility to copy the constituent datafiles of the tablespace(s). 7 Execute the ALTER TABLESPACE ... END BACKUP com- mand. 4.1 BEGIN BACKUP The BEGIN BACKUP command takes the following actions (not necessarily in the listed order) for each datafile of the tablespace: 1. It sets a flag in the datafile header - the hotbackup-fuzzy bit - to indicate that the file is in hot backup. The header with this flag set (copied by the copy utility) enables the copy to be recognized as a hot backup. A further purpose of this flag in the online file header is to cause the checkpoint in the file header to be "frozen" at the begin-backup checkpoint value that will be set in Step 4. This is the value that it must have in the backup copy in order to ensure that, when the backup is recovered, media recovery will start redo application at a suffi- ciently early checkpoint SCN so as to cover all changes to the file in all threads since the execution of BEGIN BACKUP (see 6.5). Since we cannot guarantee that the file header will be the first block to be written out by the copy utility, it is important that the file header checkpoint structure remain "frozen" until END BACKUP time. This flag keeps the datafile checkpoint structure "frozen" during hot backup, preventing it (and the checkpoint SCN in the datafile's controlfile record) from being updated during thread checkpoint events that advance the database checkpoint. New in v7.2: While the file is in hot backup, a new "backup" checkpoint structure in the datafile header receives the updates that the "frozen" checkpoint would have received. 2. It executes a datafile checkpoint, capturing the resultant "begin-backup" checkpoint information, including the begin- backup checkpoint SCN. When the file is checkpointed, all instances are requested to write out all dirty buffers they have for the file. If the need for instance recovery is detected at this time, the file checkpoint operation waits until it is completed before proceeding. Checkpointing the file at begin-backup time ensures that only file blocks changed after begin-backup time might have been written to disk during the course of the file copy. This guarantee is crucial to enabling block before- image logging to cope with the fractured block problem, as described in Step 3. 3. [Platform-dependent option]: It starts block before-image log- ging for the file. During block before-image logging, all instances log a full block before-image to the redo log prior to the first change to each block of the file (since the backup started, or since the block was read anew into the buffer cache). This is to forestall a recovery problem that would arise if the backup were to contain a fractured block copy (mis- matched halves). This could happen if (the database block size is greater than the operating system block size, and) the front and back halves of the block were copied to the backup at dif- ferent times - with an intervening update to the block on disk. In this eventuality, recovery can reconstruct the block using the logged block before-image. 4. It sets the checkpoint in the file header equal to the begin- backup checkpoint captured in Step 2. This file header check- point will be "frozen" until END BACKUP is executed. 5. It clears the file's online-fuzzy bit. The online-fuzzy bit remains clear during the course of the file copy operation, thus ensuring a cleared online-fuzzy bit in the file copy. Note that the online-fuzzy bit is set again by the execution of END BACKUP. 4.2 File Copy The file copy is done by utilities that are not part of Oracle. The presumption is that the platform vendor will have backup facilities that are superior to any portable facility that we could develop. It is the responsibility of the administrator to ensure that copies are only taken between the BEGIN BACKUP and END BACKUP commands, or when the file is not in use. 4.3 END BACKUP The END BACKUP command takes the following actions for each datafile of the tablespace: 1. It restores (i.e. sets) the file's online-fuzzy bit. 2. It creates an end-backup redo record (end-backup "marker") for the datafile. This record, interpreted only by media recov- ery, contains the begin-backup checkpoint SCN (i.e. the SCN matching that in the "frozen" checkpoint in the backup's header). This record serves to mark the end of the redo gener- ated during the backup. The end-backup "marker" is used by media recovery to determine when all redo generated between BEGIN BACKUP and END BACKUP has been applied to the datafile. Upon encountering the end-backup "marker", media recovery can (at the next media recovery checkpoint: see 6.7.1) clear the hotbackup-fuzzy bit. This is only important in preventing an incomplete recovery that might erroneously attempt to end before all redo generated between BEGIN BACKUP and END BACKUP has been applied. Ending incomplete recovery at such a point may result in an inconsis- tent file, since the backup copy may already have contained changes beyond this endpoint. As will be seen on 8.1, open with resetlogs following incomplete media recovery will fail if any online datafile has the hotbackup-fuzzy bit (or any other fuzzy bit) set. 3. It clears the file's hotbackup-fuzzy bit. 4. It stops block before-image logging for the file. 5. It advances the file checkpoint to the current database check- point. This compensates for any file header update(s) missed during thread checkpoints that may have advanced the data- base checkpoint while the file was in hot backup state, with its checkpoint "frozen".



SECOND PART:-
Subject: TECH: Internals of Recovery (Part 2)Type: REFERENCE Creation Date: 13-SEP-1996 4.4 "Crashed" Hot Backup A normal shutdown of the instance that started a backup, or the last remaining instance, is not allowed while any files are in hot backup. Nor may a file in backup be taken offline normal or temporary. This is to ensure an end-backup "marker" is generated whenever possible, and to make administrators aware that they forgot to issue the END BACKUP command, and that the backup copy is unusable. When an instance failure or shutdown abort leaves a hot backup operation incomplete (i.e. lacking termination via END BACKUP), any file that was in backup before the failure has its hotbackup- fuzzy bit set and its checkpoint "frozen" at the begin-backup checkpoint. Even though the online file's datablocks are actually current to the database checkpoint, the file's header makes it look like a restored backup that needs media recovery and is current only to the begin-backup checkpoint. Crash recovery will fail - claiming media recovery is required - if it encounters an online file in "crashed" hot backup state. The file does not actually need media recovery, however, but only an adjustment to its file header to take it out of "crashed" hot backup state. Media recovery could be used to recover and allow normal open of a database that has files left in "crashed" hot backup state. For v7.2 however, a preferable option - because it requires no archived logs - is to use the (new in v7.2) command ALTER DATABASE DATAFILE... END BACKUP on the files left in "crashed" hot backup state (identifiable using the V$BACKUP fixed-view: see 9.6). Following execution of this command, crash recovery will suffice to open the database. Note that the ALTER TABLESPACE ... END BACKUP format of the command cannot be used when the database is not open. This is because the database must be open in order to translate (via the data dictionary) tablespace names into their constituent datafile names. 5 Instance Recovery Instance recovery is used to recover from both crash failures and Parallel Server instance failures. Instance recovery refers either to crash recovery or to Parallel Server instance recovery (where a surviving instance recovers when one or more other instances fail). The goal of instance recovery is to restore the datablock changes that were in the cache of the dead instance and to close the thread that was left open. Instance recovery uses only online redo logfiles and current online datafiles (not restored backups). It recovers one thread at a time, starting at the most recent thread checkpoint and continuing until end-of-thread. 5.1 Detection of the Need for Instance Recovery The kernel performs instance recovery automatically upon detecting that an instance died leaving its thread-open flag set in the controlfile. Instance recovery is performed automatically on two occasions: 1. at the first database open after a crash (crash recovery); 2. when some but not all instances of a Parallel Server fail. In the case of Parallel Server, a surviving instance detects the need to perform instance recovery for one or more failed instances by the following means: 1. A foreground process in a surviving instance detects an "invalid block lock" condition when it attempts to bring a datablock into the buffer cache. This is an indication that another instance died while a block covered by that lock was in a potentially "dirty" state in its buffer cache. 2. The foreground process sends a notification to its instance's SMON process, which begins a search for dead instances. 3. The death of another instance is detected if the current instance is able to acquire that instance's thread-opened locks (see 3.9). SMON in the surviving instance obtains a stable list of dead instances, together with a list of "invalid" block locks. Note: After instance recovery is complete, locks in this list will undergo "lock cleanup" (i.e. they will have their "invalid" condition cleared, making the underlying blocks accessible again). 5.2 Thread-at-a-Time Redo Application Instance recovery operates by processing one thread at a time, thereby recovering one instance at a time. It applies all redo (from the thread checkpoint through the end-of-thread) from each thread before starting on the next thread. This algorithm depends on the fact that only one instance at a time can have a given block modified in its cache. Between changes to the block by different instances, the block is written to disk. Thus, a given block (as read from disk during instance recovery) can need redo applied from at most one thread - the thread containing the most recent modification. Instance recovery can always be accomplished using the online redo logs for the thread being recovered. Crash recovery operates on the thread with the lowest checkpoint SCN first. It proceeds to recover the threads in the order of increasing thread checkpoint SCNs. This ensures that the database checkpoint is advanced by each thread recovered. 5.3 Current Online Datafiles Only The checkpoint counters are used to ensure that the datafiles are the current online files rather than restored backups. If a backup copy of a datafile is restored, then media recovery is required. Media recovery is required for a restored backup even if recovery can be accomplished using the online logs. The reason is that crash recovery applies all post-thread-checkpoint redo from each thread before starting on the next thread. Crash recovery can use this thread-at-a-time redo application algorithm because a given datablock can need redo application from at most one thread. However, starting recovery from a restored backup enables no such assumption about the number of threads that have relevant redo. Thus, the thread-at-a-time algorithm would not work. Recovering a backup requires thread-merged redo application: i.e. application of all post-file-checkpoint redo, simultaneously merging redo from all threads in SCN order. This thread-merged redo application algorithm is the one used by media recovery (see Section 6). Crash recovery would not suffice - even with thread-merged redo application - to recover a backup datafile, even if it were checkpointed at the current database checkpoint. The reason is that in all but the database checkpoint thread, crash recovery would miss applying redo between the database checkpoint and the (higher) thread checkpoint. By contrast, media recovery would start redo application at the file checkpoint in all threads. Furthermore, crash recovery might fail even if it started redo application at the file checkpoint in all threads. The reason is that crash recovery assumes that it will need only online logfiles. All but the database checkpoint thread might have already archived and re-used a needed log. If the STARTUP RECOVER command is used (in place of simple STARTUP), and crash recovery fails due to datafiles needing media recovery (e.g. they are restored backups), then media recovery via RECOVER DATABASE (see 6.4.1) is automatically executed prior to database open. 5.4 Checkpoints Instance recovery does not attempt to apply redo that is before the checkpoint SCN of a datafile. (The datafile header checkpoint SCNs are not used to decide where to start recovery, however.) The redo from the thread checkpoint through the end-of-thread must be read to find the end-of-thread and the highest SCN allocated by the thread. These are then used to close the thread and advance the thread checkpoint. The end of a instance recovery almost always advances the datafile checkpoints, and always advances the checkpoint counters. 5.5 Crash Recovery Completion At the termination of crash recovery, the "fuzzy bits" - online- fuzzy, hotbackup-fuzzy, media-recovery-fuzzy - of all online datafiles are cleared. A special redo record, the end-crash-recovery "marker," is generated. This record is interpreted by media recovery to know when it is permissible to clear the online-fuzzy and hotbackup-fuzzy bits of the datafiles undergoing recovery (see 6.6). 6 Media Recovery Media recovery is used to recover from a lost or damaged datafile, or from a lost current controlfile. It is used to transform a restored datafile backup into a "current" datafile. It is also used to restore changes that were lost when a datafile went offline without a checkpoint. Media recovery can apply archived logs as well as online logs. Unlike instance or crash recovery, media recovery is invoked only via explicit command. 6.1 When to Do Media Recovery As was seen in 5.3, a restored datafile backup always needs media recovery, even if its recovery can be accomplished using only online logs. The same is true of a datafile that went offline without a checkpoint. The database cannot be opened if any of the online datafiles needs media recovery. A datafile that needs media recovery cannot be brought online until media recovery has been executed. Unless the database is not open by any instance, media recovery can only operate on offline files. Media recovery may be explicitly invoked to recover a database prior to open even when crash recovery would have sufficed. If so, crash recovery - though it may find nothing to do - will still be invoked automatically at database open. Note that media recovery may be run - and, in cases such as restored backups or datafiles that went offline immediate, must be run - even if recovery can be accomplished using only the online logs. Media recovery may find nothing to do - and signal the "no recovery required" error - if invoked for files that do not need recovery. If the current controlfile is lost and a backup controlfile is restored in its place, media recovery must be done. This is the case even if all of the datafiles are current. 6.2 Thread-Merged Redo Application Media recovery uses a thread-merged redo application algorithm: i.e. it applies redo from all threads simultaneously, merging redo records in increasing SCN order. The process of media-recovering a backup datafile differs from the process of crash-recovering a current online datafile in the following fundamental way: Crash recovery applies redo from one thread at a time because any block of a current online file can need redo from at most one thread (one instance at a time can dirty a block in cache). With a restored backup, however, no assumption can be made about the number of threads that have redo relevant to particular block. In general, recovering a backup requires simultaneous application of redo from all threads, with merging of redo records across threads in SCN order. Note that this algorithm depends on a redo-generation- time guarantee that changes for a given block occur in increasing SCN order across threads (case of Parallel Server). 6.3 Restoring Backups The administrator may copy backup versions of datafiles to the current datafile while the database is shut down or the file is offline. There is a strong assumption that backups are never copied to files that are currently accessible. Every file header read verifies that this has not been done by comparing the checkpoint counter in the file header with the checkpoint counter in the datafile's controlfile record. 6.4 Media Recovery Commands There are three media recovery commands: 7 RECOVER DATABASE 7 RECOVER TABLESPACE 7 RECOVER DATAFILE The only essential difference in these commands is in how the set of files to recover is determined. They all use the same criteria for determining if the files can be recovered. There is a lock per datafile that is held exclusive by a process doing media recovery on a file, and is held shared by an instance that has the database open with the file online. Media recovery signals an error if it cannot get the lock for a file it is asked to recover. This prevents two recovery sessions from recovering the same file, and prevents media recovery of a file that is in use. 6.4.1 RECOVER DATABASE This command does media recovery on all online datafiles that need any redo applied. If all instances were cleanly shutdown, and no backups were restored, this command will signal the "no recovery required" error. It will also fail if any instances have the database open, since they will have the datafile locks. 6.4.2 RECOVER TABLESPACE This command does media recovery on all datafiles in the tablespaces specified. In order to translate (i.e. via the data dictionary) the tablespace names into datafile names, the database must be open. This means that the tablespaces and their constituent datafiles must be offline in order to do the recovery. An error is signalled if none of the tablepace's constituent files needs recovery. 6.4.3 RECOVER DATAFILE This command specifies the datafiles to be recovered. The database may be open; or it may be closed, as long as the media recovery locks can be acquired. If the database is open in any instance, then datafile recovery can only recover offline files. 6.5 Starting Media Recovery Media recovery starts by finding the media-recovery-start SCN: i.e. the lowest SCN of the datafile header checkpoints of the files being recovered. Note: An exception occurs if a file's checkpoint is in its offline range (see 2.18). In that case, the file's offline-end checkpoint is used in place of its datafile header checkpoint in computing the media-recovery-start SCN. A buffer for reading redo is allocated for each thread in the enabled thread bitvec of the media-recovery-start checkpoint (i.e. the datafile checkpoint with the lowest SCN). The initial file header checkpoint SCN of every file is saved to ensure that no redo from a previous use of the file number is applied, as well as to eliminate needlessly attempting to apply redo to a file from before its checkpoint. The stop SCNs (from the datafiles' controlfile records) are also saved. If finite, the highest stop SCN can be used to allow recovery to terminate without needlessly searching for redo beyond that SCN to apply (see 6.10). At recovery completion, any datafile initially found to have a finite stop SCN will be left checkpointed at that stop SCN (rather than at the recovery end-point). This allows an offline-clean or read-only datafile to be left checkpointed at an SCN that matches the tablespace-clean-stop-SCN of its tablespace. 6.6 Applying Redo, Media Recovery Checkpoints A log is opened for each thread of redo that was enabled at the time the media-recovery-start SCN was allocated (i.e. for each thread in the enabled thread bitvec of the media-recovery-start checkpoint). If the log is online, then it is automatically opened. If the log was archived, then the user is prompted to enter the name of the log (unless automatic recovery is being used). The redo is applied from all the threads in the order it was generated, switching threads as needed. The order of application of redo records without an SCN is not precise, but it is good enough for rollback to make the database consistent. Except in the case of cancel-based incomplete recovery (see 6.12.1) and backup controlfile recovery (see 6.13), the next online log in sequence is accessed automatically, if it is on disk. If not, the user is prompted for the next log. At log boundaries, media recovery executes a "checkpoint." As part of media recovery checkpoint, the dirty recovery buffers are written to disk and the datafile header checkpoints of the files undergoing recovery are advanced, so that the redo does not need to be reapplied. Another type of media recovery "checkpoint" occurs when a datafile initially found to have a finite stop SCN reaches that stop SCN. At such a stop SCN boundary, all dirty recovery buffers are written to disk, and the datafiles that have been made current have their datafile header checkpoints advanced to their stop SCN values. 6.7 Media Recovery and Fuzzy Bits 6.7.1 Media-Recovery-Fuzzy The media-recovery-fuzzy bit is a flag in the datafile header that is used to indicate that - due to ongoing redo application by media recovery - the file may contain changes in the future of (at SCNs beyond) the current header checkpoint SCN. The media-recovery- fuzzy bit is set at the start of media recovery for each file undergoing recovery. Generally the media-recovery-fuzzy bits can be cleared when a media recovery checkpoint advances the checkpoints in the datafile headers. They are left clear when a media recovery session completes successfully or is cancelled. As will be seen on 8.1, open with resetlogs following incomplete media recovery will fail if any online datafile has the media- recovery-fuzzy bit (or any fuzzy bit) set. 6.7.2 Online-Fuzzy Upon encountering an end-crash-recovery "marker" (or a file- specific offline-immediate "marker": generated when a datafile goes offline without a checkpoint), media recovery can (at the next media recovery checkpoint) clear (if set) the online-fuzzy and hotbackup-fuzzy bits in the appropriate datafile header(s). 6.7.3 Hotbackup-Fuzzy Upon encountering an end-backup "marker" (or an end-crash- recovery "marker"), media recovery can (at the next media recovery checkpoint) clear the hotbackup-fuzzy bit. Open with resetlogs following incomplete media recovery will fail if any online datafile has the hotbackup-fuzzy bit (or any fuzzy bit) set. This prevents a successful RESETLOGS open following an incomplete recovery that terminated before all redo generated between BEGIN BACKUP and END BACKUP had been applied. Ending incomplete recovery at such a point would generally result in an inconsistent file, since the backup copy may already have contained changes between this endpoint and the END BACKUP. 6.8 Thread Enables A special thread-enable redo record is written in the thread of an instance enabling a new thread. If media recovery encounters a thread-enable redo record, it allocates a new redo buffer, opens the appropriate log in the new thread, and prepares to start applying redo from the new thread. 6.9 Thread Disables When a thread is disabled, its current log is marked as the end of a disabled thread. After media recovery finishes applying redo from such a log, it deallocates the thread's redo buffer and stops looking for redo from the thread. 6.10 Ending Media Recovery (Case of Complete Media Recovery) The current (i.e. last) log in every enabled thread has the end-of- thread flag set in its header. Complete (as opposed to incomplete: see 6.12) media recovery always continues redo application through the end-of-thread in all threads. The end-of-thread log can be identified without having the current controlfile, since the end- of-thread flag is in the log header rather than in the logfile's controlfile record. Note: Backing up and later restoring copies of current online logs is dangerous, and can lead to mis-identification of the current true end-of-thread. This is because the end-of-thread flag in the backup copy will in general be out-of-date with respect to the current end- of-thread log. If the datafiles being recovered have finite stop SCNs in their controlfile records (assuming a current controlfile), then media recovery can stop prior to the end-of-threads. Redo application for a datafile with a finite stop SCN can terminate at that SCN, since it is guaranteed that no redo for that datafile beyond that SCN was generated. As described on 2.15, the stop SCN is set when a datafile goes offline. Note that without the optimization that allows recovery of a file with a finite stop SCN to terminate at that SCN, it could not be guaranteed that recovery of an offline datafile while the database is open would terminate. 6.11 Automatic Recovery Automatic recovery is invoked by using the AUTOMATIC option of the media recovery command. It saves the user the trouble of entering the names of archived logfiles, provided they are on disk. If the sequence number of the log can be determined, then a name can be constructed by concatenating the current values of the initialization parameters LOG_ARCHIVE_DEST and LOG_ARCHIVE_FORMAT. The current LOG_ARCHIVE_DEST is assumed, unless the user overrides it by specifying a different archiving destination for the recovery session. The media- recovery-start checkpoint (see 6.5) contains (in the RBA field) the initial log sequence number for one thread (i.e. the thread that generated the checkpoint). If multiple threads of redo are enabled, the log history section of the controlfile (if configured) can be used to map the media-recovery-start SCN to a log sequence number for each thread. Once the initial recovery log is found for a thread, all subsequent logs needed from the thread follow in order. If it is not possible to determine the initial log sequence number, the user will have to guess and try logs until the right one is accepted. The timestamp from the media-recovery-start checkpoint is reported to aid in this effort. 6.12 Incomplete Recovery A RECOVER DATABASE execution can be stopped and the database opened before all the redo has been applied. This type of recovery is termed incomplete recovery. The subsequent database open is termed a RESETLOGS open. Incomplete recovery effectively sets the entire database backwards in time to a transaction-consistent state at or near the recovery end- point. All subsequent updates to the database are lost and must be re-entered. Use of incomplete recovery is indicated in the following circumstances: 7 Media recovery is necessary (e.g. due to datafile damage or loss), but cannot be complete (i.e. all redo cannot be applied) because all copies of a needed online or archived redo log were lost. 7 All copies of an active (i.e. needed for instance recovery) log were damaged or lost while the database was open. Since crash recovery is precluded, this case reduces to the previous case. 7 It is necessary to reverse the effect of an erroneous user action (e.g. table drop or batch run); and it is acceptable to set the entire database - not just the affected schema objects - backwards to a point-in-time before the error. 6.12.1 Incomplete Recovery UNTIL Options There are three types of incomplete recovery. They differ in the means used to stop the recovery: 7 Cancel-Based (RECOVER DATABASE UNTIL CANCEL) 7 Change-Based (RECOVER DATABASE UNTIL CHANGE) 7 Time-Based (RECOVER DATABASE UNTIL TIME) The UNTIL CANCEL option terminates recovery when the user enters "cancel" rather than the name of a log. Online logs are not automatically applied in this mode in case cancellation at the next log is desired. If multiple threads of redo are being recovered, there may be logs in other threads that are partially applied when the recovery is cancelled. The UNTIL CHANGE option terminates redo application just before any redo associated with the specified SCN or higher. Thus the transaction that committed at that SCN will be rolled back. If you want to recover through a transaction that committed at a specific SCN, then add one to the specified SCN. The UNTIL TIME option works similarly to the UNTIL CHANGE option, except that a time rather than an SCN is specified. Recovery uses the timestamps in the redo block headers to convert the specified time into an SCN. Then recovery is stopped when that SCN is reached. 6.12.2 Incomplete Recovery and Consistency In order to avoid database corruption when running incomplete recovery, all datafiles must be recovered to the exact same point. Furthermore, no datafile must have any changes in the future of this point. This requires that incomplete media recovery must start from datafiles restored from backups whose copies completed prior to the intended stop time. The system uses file header fuzzy bits (see 8.1) to ensure that the datafiles contain no changes in the future of the stop time. 6.12.3 Incomplete Recovery and Datafiles Known to the Controlfile If recovering to a time before a datafile was dropped, the dropped file must appear in the controlfile used for recovery. Otherwise it would not be recovered. One alternative for achieving this is to recover using a backup controlfile made before the datafile was dropped. Another alternative is to use the CREATE CONTROLFILE command to construct a controlfile that lists the dropped datafile. Recovering to a time before a file was added is not a problem. The extra datafile will be eliminated from the controlfile after the database is open. The unwanted file may be taken offline before the recovery to avoid accessing it. 6.12.4 Resetlogs Open after Incomplete Recovery The next database open after an incomplete recovery must specify the RESETLOGS option. Amongst other effects (see Section 7), resetlogs throws away the redo that was not applied during the incomplete recovery, and marks the database so that the skipped redo can never be accidentally applied by a subsequent recovery. If the incomplete recovery was a mistake (e.g. the lost log was found), the next open can specify the NORESETLOGS option. However, for the open with NORESETLOGS to succeed, it must be preceded by a successful execution of complete recovery (i.e. one in which all redo is applied). 6.12.5 Files Offline during Incomplete Recovery If a file is offline during incomplete recovery, it will not be recovered. This is ok if the file is part of a tablespace that was taken offline normal, and that is still offline normal at the recovery end- point. Otherwise, if the file is still offline when the resetlogs is done, the tablespace containing the file will have to be dropped. This is because it will need media recovery with logs from before the resetlogs. In general V$DATAFILE should be checked to ensure that files are online before running an incomplete recovery. Only files that will be dropped and files that are part of offline normal (or read-only) tablespaces should be offline (Section 8.6). 6.13 Backup Controlfile Recovery If recovery is done with a controlfile other than the current one, then backup controlfile recovery (RECOVER DATABASE...USING BACKUP CONTROLFILE) must be used. This applies both to the case of a restored controlfile backup, and to the case of a "backup" controlfile created via CREATE CONTROLFILE...RESETLOGS. Use of CREATE CONTROLFILE...RESETLOGS makes a controlfile that is a "backup." Only a backup controlfile recovery can be run after executing CREATE CONTROLFILE...RESETLOGS. Only a RESETLOGS open can be used after executing CREATE CONTROLFILE...RESETLOGS. Use of CREATE CONTROLFILE...RESETLOGS is indicated if (all copies of) an online redo log were lost in addition to (all copies of) the control file. By contrast, CREATE CONTROLFILE...NORESETLOGS makes a controlfile that is "current"; i.e. it has knowledge of the current state of the online logfiles and log sequence numbers. A backup controlfile recovery is not necessary following CREATE CONTROLFILE...NORESETLOGS. Indeed, no recovery at all is required if there was a clean shutdown, and if no datafile backups have been restored. A normal or NORESETLOGS open may follow CREATE CONTROLFILE ...NORESETLOGS. A backup controlfile lacks valid information about the current online logs and datafile stop SCNs. Hence, recovery cannot look for online logs to automatically apply. Moreover, recovery must assume infinite stop SCN's. A RESETLOGS open corrects this information. The backup controlfile may have a different set of threads enabled than did the original controlfile. That set will be the effective enabled thread set following RESETLOGS open. The BACKUP CONTROLFILE option may be used either alone or in conjunction with an incomplete recovery option. Unless an incomplete recovery option is included, all threads must be applied to the end-of-thread. This is validated at open resetlogs time. It is currently required that a RESETLOGS open follow execution of backup controlfile recovery, even if no incomplete recovery option was used. The following procedure could be used to avoid a backup controlfile recovery and resetlogs in case the only problem is a lost current controlfile (and a backup controlfile exists): 1. Copy the backup controlfile to the current control file and do a STARTUP MOUNT. 2. Issue ALTER DATABASE BACKUP CONTROLFILE TO TRACE NORESETLOGS. 3. Issue the CREATE CONTROLFILE...NORESETLOGS com- mand from the SQL script output by Step 2. It is important to assure that the CREATE CONTROLFILE command issued in Step 3 creates a controlfile reflecting a database structure equivalent to that of the lost current controlfile. For example, if a datafile was added since the backup controlfile was saved, then the CREATE CONTROLFILE command should be modified to declare the added datafile. Failure to specify the BACKUP CONTROLFILE option on the RECOVER DATABASE command when the controlfile is indeed a backup can frequently be detected. One indication of a restored backup controlfile would be a datafile header checkpoint count that is greater than the checkpoint count in the datafile's controlfile record. However, this test may not catch the backup controlfile if the datafiles are also backups. Another test validates the online logfile headers against their corresponding controlfile records, but this too may not always catch an old controlfile. 6.14 CREATE DATAFILE: Recover a Datafile Without a Backup If a datafile is lost or damaged and no backup of the file is available, it can be recovered using only information in the redo logs and control file. The following conditions must be met: 1. All redo logs written since the datafile was originally created must be available. 2. A control file in which the datafile is declared (i.e. name and size information) must be available or re-creatable. The CREATE DATAFILE clause of the ALTER DATABASE command is first used to create a new, empty replacement for the lost datafile. RECOVER DATAFILE is then used to apply all redo generated for the file from the time of its original creation until the time it was lost. After all redo logs written since the datafile was originally created have been applied, the file will have been restored to its state at the time it was lost. This mechanism is useful for recovering a recently-created datafile for which no backup has yet been taken. The original datafiles of the SYSTEM tablespace cannot be recovered by this means, however, since relevant redo data is not saved at database creation time. 6.15 Point-in-Time Recovery Using Export/Import Occasionally, it may become necessary to reverse the effect of an erroneous user action (e.g. table drop or batch run). One approach would be to perform an incomplete media recovery to a point-in- time before the corruption, then open the database with the RESETLOGS option. Using this approach, the entire database - not just the affected schema objects - would be set backwards in time. This approach has an undesirable side-effect: it discards committed transactions. Any updates that occurred subsequent to the resetlogs SCN are lost and must be re-entered. Resetlogs has another undesirable side-effect: it renders all pre-existing backups unusable for future recovery. Setting a mission-critical database globally back in time is often not an acceptable solution. The following procedure is an alternative whose effect on the mission-critical database is to set just the affected schema objects - termed the recovery-objects - backwards in time. Point-in-time incomplete media recovery is run against a side-copy of the production database, called the recovery-database. The initial version of the recovery-database is created using backups of the production database that were taken before the corruption occurred. Non-relevant objects in the recovery-database can be taken offline in order to avoid unnecessarily recovering them. However, the SYSTEM tablespace and all tablespaces containing rollback segments must participate in the media recovery in order to allow a clean open. (Note that this is a good reason to place rollback segments and data segments into separate tablespaces.) After it has undergone point-in-time incomplete media recovery, the recovery-database is opened with the RESETLOGS option. The recovery-database is now set backwards to a point-in-time before the recovery-objects were corrupted. This effectively creates pre-corruption versions of the recovery-objects in the recovery-database. These objects can then be exported from the recovery-database and imported back into the production database. Prior to importing the recovery-objects, the production database is prepared as follows: 7 In the case of recovering an erroneously updated schema object, the copy of the object in the production database is pre- pared by discarding just the data; e.g. the table is truncated. 7 In the case of recovering an erroneously dropped schema object, the object is re-created (empty) in the production data- base. The import operation is then executed, using the data-only option as appropriate. Since export/import can be a lengthy process, it may be desirable to postpone it until a time when recovery-object unavailability can be tolerated. In the meantime, the recovery- objects can be made available, albeit at degraded performance, via a database link between the production database and the recovery- database. An undesirable side-effect of this approach is that transaction consistency across objects is lost. This side-effect can be avoided by widening the recovery-object set to include all objects that must be kept transaction-consistent. 7 Block Recovery Block recovery is the simplest type of recovery. It is performed automatically by the system during normal operation of the database, and is transparent to the user. 7.1 Block Recovery Initiation and Operation Block recovery is used to clean up the state of a buffer whose modification by a foreground process (in the middle of invoking a redo application callback to apply a change vector to the buffer) was interrupted by the foreground process dying or signalling an error. Recovery involves (i) reading the block from disk; (ii) using the current thread's online redo logs to reconstruct the buffer to a state consistent with the redo already generated; and (iii) writing the recovered block back to disk. If block recovery fails, then after a second attempt, the block is marked logically corrupt (by setting the block sequence number to zero) and a corrupt block error is signalled. Block recovery is guaranteed doable using only the current thread's online redo logs, since: 1. Block recovery cannot require redo from another thread or from before the last thread checkpoint. 2. Online logs are not reused until the current thread checkpoint is beyond the log. 3. No buffer currently in the cache can need recovery from before the last thread checkpoint. 7.2 Buffer Header RBA Fields The buffer header (an in-memory data structure) contains the following fields pertaining to block recovery: Low-RBA and High-RBA: Delineate the range of redo (from the current thread) that needs to be applied to the disk version of the block in order make it consistent with redo already generated. Recovery-RBA: A place marker for recording progress in case the invoker of block recovery is PMON and complete recovery in one invocation would take too long (see next section). 7.3 PMON vs. Foreground Invocation If an error is signalled while a foreground process is in a redo application callback, then the process itself executes block recovery. If foreground process death is detected during a redo application callback, on the other hand, PMON executes block recovery. Block recovery may require an unbounded amount of time and I/O. However, PMON cannot be allowed to spend an inordinate amount of time working on the recovery of one block while neglecting other necessary time-critical tasks. Therefore, a limit is placed on the amount of redo applied by one PMON call to block recovery. (A port-specific constant specifies the maximum number of redo log blocks applied per invocation). As PMON applies redo during invocations of block recovery, it updates the recovery-RBA in the buffer header to record its progress. When a PMON call to block recovery causes the recovery-RBA to reach the high-RBA, then block recovery for that block is complete. 8 Resetlogs The RESETLOGS option is needed on the first database open following: 7 Incomplete recovery 7 Backup controlfile recovery 7 CREATE CONTROLFILE...RESETLOGS. The primary function of resetlogs is to discard the redo that was not applied during incomplete recovery, ensuring that the skipped redo can never be accidentally applied by a subsequent recovery. To accomplish this, resetlogs effectively invalidates all existing redo in all online and archived redo logfiles. This has the side effect of making any existing datafile backups unusable for future recovery operations. Resetlogs also reinitializes the controlfile information about online logs and redo threads, clears the contents of any existing online redo log files, creates the online redo log files if they do not currently exist, and resets the log sequence number in all threads to one. 8.1 Fuzzy Files The most important requirement when doing a RESETLOGS open is that all datafiles be validated as recovered to the same point-in- time. This is what ensures that all the changes in a single redo record are done atomically. It is also important for other consistency reasons. If all threads of redo have been applied through end-of-thread to all online datafiles, then we can be sure that the database is consistent. If incomplete recovery was done, there is the possibility that a file was not restored from a sufficiently old backup. In the general case, this is detectable if the file has a different checkpoint than the other files (exceptions: offline or read-only files). The other possibility is that the file is fuzzy - i.e. it may contain changes in the future of its checkpoint. As seen earlier, the following "fuzzy bits" are maintained in the file header to determine if a file is fuzzy: 7 online-fuzzy bit (see 3.5, 6.7.2) 7 hotbackup-fuzzy bit (see 4, 6.7.3) 7 media-recovery-fuzzy bit (see 6.7.1) Open with resetlogs following incomplete media recovery will fail if any online datafile has any of the three fuzzy bits set. Redo records are created at the end of a hot backup (the end- backup "marker") and after crash recovery (the end-crash-recovery "marker") to enable media recovery to determine when it can clear the fuzzy bits. Resetlogs signals an error if any of the datafiles has any of the fuzzy bits set. Except in the following special circumstances, resetlogs signals an error if any of the datafiles is recovered to a checkpoint SCN different from the one at which the other files are checkpointed (i.e. the resetlogs SCN: see 8.2): 1. A file recovered to an SCN earlier than the resetlogs SCN would be tolerated in case there were no redo generated for the file between its checkpoint SCN and the resetlogs SCN. For example, such would be the case if the file were read-only, and its offline range spanned the checkpoint SCN and resetlogs SCN. In this case, resetlogs would allow the file but set it offline. 2. A file checkpointed at an SCN later than the resetlogs SCN would be tolerated in case its creation SCN (allocated at file creation time and stored in the file header) showed it to have been created after the resetlogs SCN. During the data dictio- nary vs. controlfile check performed by RESETLOGS open (see 8.7), such a file would be found to be missing from the data dictionary but present in the controlfile. As a conse- quence, it would be eliminated from the controlfile. 8.2 Resetlogs SCN and Counter A resetlogs SCN and resetlogs timestamp - known together as the resetlogs data - are kept in the database info record of the controlfile. The resetlogs data is intended to uniquely identify each execution of a RESETLOGS open. The resetlogs data is also stored in each datafile header and in each logfile header. A redo log cannot be applied by recovery if its resetlogs data does not match that in the database info record of the controlfile. Except for some very special circumstances (e.g. offline normal or read-only tablespaces), a datafile cannot be recovered or accessed if its resetlogs data does not match that of the database info record of the controlfile. This ensures that changes discarded by resetlogs do not get back into the database. It also renders previous backups unusable for future recovery operations, making it prudent to take a database backup immediately after a resetlogs. 8.3 Effect of Resetlogs on Threads Each thread's controlfile record is updated to clear the thread-open flag and to set the thread-checkpoint SCN to the resetlogs SCN. Thus, the thread appears to have been closed at the resetlogs SCN. The set of enabled threads from the enabled thread bitvec of the database info controlfile record is used as is. It does not matter which threads were enabled at the end of recovery, since none of the old redo can ever be applied to the database again. The log sequence numbers in all threads are also reset to one. One of the enabled threads is picked as the database checkpoint. 8.4 Effect of Resetlogs on Redo Logs The redo is thrown away by zeroing all the online logs. Note that this means that redo in the online logs would be lost forever - and there would be no way to undo the resetlogs in an emergency - if the online logs were not backed up prior to executing resetlogs. Note that ensuring the ability to undo an erroneous resetlogs is the only valid rationale for making backups of online logs. Undoing an erroneous resetlogs requires re-running the entire recovery operation from the beginning, after restoring backups of all datafiles, controlfile, and online logs. One log is picked to be the current log for every enabled thread. That log header is written as log sequence number one. Note that the set of logs and their thread association is picked up from the controlfile (i.e. using the thread number and log list fields of the logfile records). If it is a backup controlfile, this may be different from what was current the last time the database was open. 8.5 Effect of Resetlogs on Online Datafiles The headers of all the online datafiles are updated to be checkpointed at the new database checkpoint. The new resetlogs data is also written to the header. 8.6 Effect of Resetlogs on Offline Datafiles The controlfile record for an offline file is set to indicate the file needs media recovery. However that will not be possible because it would be necessary to apply redo from logs with the wrong resetlogs data. This means that the tablespace containing the file will have to be dropped. There is one important exception to this rule. When a tablespace is taken offline normal or set read-only, the checkpoint SCN written to the headers of the tablespace's constituent datafiles is saved in the data dictionary TS$ table as the tablespace-clean-stop SCN (see 2.17). No recovery is ever needed to bring a tablespace and its files online if the files are not fuzzy and are checkpointed at exactly the tablespace-clean-stop SCN. Even the resetlogs data in the offline file header is ignored in this case. Thus a tablespace that is offline normal is unaffected by any resetlogs that leaves the database at a time when the tablespace is offline. 8.7 Checking Dictionary vs. Controlfile on Resetlogs Open After the rollback phase of RESETLOGS open, the datafiles listed in the data dictionary FILE$ table are compared with the datafiles listed in the controlfile. This is also done on the first open after a CREATE CONTROLFILE. There is the possibility that incomplete recovery ended at a time when the files in the database were different from those in the controlfile used for the recovery. Using a backup controlfile or creating one can have the same problem. Checking the dictionary does not do any harm, so it could be done on every database open; however there is no point in wasting the time under normal circumstances. The entry in FILE$ is compared with the entry in the controlfile for every file number. Since FILE$ reflects the space allocation information in the database, it is correct, and the controlfile might be wrong. If the file does not exist in FILE$ but the controlfile record says the file exists, then the file is simply dropped from the controlfile. If a file exists in FILE$ but not in the controlfile, a placeholder entry is created in the control file under the name MISSINGnnnn (where nnnn is the file number in decimal). MISSINGnnnn is flagged in the control file as being offline and needing media recovery. The actual file corresponding (with respect to the file header contents as opposed to the file name) to MISSINGnnnn can be made accessible by renaming MISSINGnnnn to point to it. In the RESETLOGS open case however, rename can succeed in making the file usable only in case the file was read-only or offline normal. If, on the other hand, MISSINGnnnn corresponds to a file that was not read-only or offline normal, then the rename operation cannot be used to make it accessible, since bringing it online would require media recovery with redo from before the resetlogs. In this case, the tablespace containing the datafile must be dropped. When the dictionary check is due to open after CREATE CONTROLFILE...NORESETLOGS rather than to open resetlogs, media recovery may be used to make the file current. Another option is to repeat the entire operation that lead up to the dictionary check with a controlfile that lists the same datafiles as the data dictionary. For incomplete recovery, this would involve restoring all backups and repeating the recovery. 9 Recovery-Related V$ Fixed-Views The V$ fixed-views contain columns that extract information from data structures dynamically maintained in memory by the kernel. These "views" make this information accessible to the DBA under SYS. The following is a summary of recovery-related information that is viewable via V$ views: 9.1 V$LOG Contains log group information from the controlfile: GROUP# THREAD# SEQUENCE# SIZE_IN_BYTES MEMBERS_IN_GROUP ARCHIVED_FLAG STATUS_OF_ GROUP (unused, current, active, inactive) LOW_SCN LOW_SCN_TIME 9.2 V$LOGFILE Contains log file (i.e. group member) information from the controlfile: GROUP# STATUS_OF_MEMBER (invalid, stale, deleted) NAME_OF_MEMBER 9.3 V$LOG_HISTORY Contains log history information from the controlfile: THREAD# SEQUENCE# LOW_SCN LOW_SCN_TIME NEXT_SCN 9.4 V$RECOVERY_LOG Contains information (from the controlfile log history) about archived logs needed to complete media recovery.: THREAD# SEQUENCE# LOW_SCN_TIME ARCHIVED_NAME 9.5 V$RECOVER_FILE Contains information on the status of files needing media recovery: FILE# ONLINE_FLAG REASON_MEDIA_RECOVERY_NEEDED RECOVERY_START_SCN RECOVERY_START_SCN_TIME 9.6 V$BACKUP Contains status information relative to datafiles in hot backup: FILE# FILE_STATUS (no-backup-active, backup-active, offline-normal, error) BEGIN_BACKUP_SCN BEGIN_BACKUP_TIME 10 Miscellaneous Recovery Features 10.1 Parallel Recovery (v7.1) The goal of the parallel recovery feature is to use compute and I/O parallelism to reduce the elapsed time required to perform crash recovery, single-instance recovery, or media recovery. Parallel recovery is most effective at reducing recovery time when several datafiles on several disks are being recovered concurrently. 10.1.1 Parallel Recovery Architecture Parallel recovery partitions recovery processing into two operations: 1. Reading the redo log. 2. Applying the change vectors. Operation #1 does not easily lend itself to parallelization. The redo log(s) must be read in sequentially, and merged in the case of media recover. Thus, this task is assigned to one process: the redo-reading-process. Operation #2, on the other hand, easily lends itself to parallelization. Thus, the task of change vector application is delegated to some number of redo-application-slave-processes. The redo-reading-process sends change vectors to the redo- application-slave-processes using the same IPC (inter-process- communication) mechanism used by parallel query. The change vectors are distributed based on the hash function that takes the block address as argument (i.e. DBA modulo # redo-application- slave-processes). Thus, each redo-application-slave-process handles only change vectors for blocks whose DBAs hash to its "bucket" number. The redo-application-slave-processes are responsible for reading the datablocks into cache, checking whether or not the change vectors need to be applied, and applying the change vectors if needed. This architecture achieves parallelism in log read I/O, datablock read I/O, and change vector processing. It allows overlap of log read I/Os with datablock read I/Os. Moreover, it allows overlap of datablock read I/Os for different hash "buckets." Recovery elapsed time is reduced as long as the benefits of compute and I/O parallelism outweigh the costs of process management and inter- process-communication. 10.1.2 Parallel Recovery System Initialization Parameters PARALLEL_RECOVERY_MAX_THREADS PARALLEL_RECOVERY_MIN_THREADS These initialization parameters control the number of redo- application-slave-processes used during crash recovery or media recovery of all datafiles. PARALLEL_INSTANCE_RECOVERY_THREADS This initialization parameter controls the number of redo-appli- cation-slave-processes used during instance recovery. 10.1.3 Media Recovery Command Syntax Changes RECOVER DATABASE has a new optional parameter for specify- ing the number of redo-application-slave-processes. If specified, it overrides PARALLEL_RECOVERY_MAX_THREADS. RECOVER TABLESPACE has a new optional parameter for spec- ifying the number of redo-application-slave-processes. If speci- fied, it overrides PARALLEL_RECOVERY_MIN_THREADS. RECOVER DATAFILE has a new optional parameter for specify- ing the number of redo-application-slave-processes. If specified, it overrides PARALLEL_RECOVERY_MIN_THREADS. 10.2 Redo Log Checksums (v7.2) The log checksum feature allows a potential corruption in an online redo log to be detected when the log is read for archiving. The goal is to prevent the corruption from being propagated, undetected, to the archive log copy. This feature is intended to be used in conjunction with a new command, CLEAR LOGFILE, that allows a corrupted online redo log to be discarded without having to archive it. A new initialization parameter, LOG_BLOCK_CHECKSUM, controls activation of log checksums. If it is set, a log block checksum is computed and placed in the header of each log block as it is written out of the redo log buffer. If present, checksums are validated whenever log blocks are read for archiving or recovery. If a checksum is detected as invalid, an attempt is made to read another member of the log group (if any). If an irrecoverable checksum error is detected - i.e. the checksum is invalid in all members - then the log read operation fails. Note that a rudimentary mechanism for detecting log block header corruption was added, along with log group support, in v7.1. The log checksum feature extends corruption detection to the whole block. If an irrecoverable checksum error prevents a log from being read for archiving, then the log cannot be reused. Eventually log switch - and redo generation - will stall. If no action is taken, the database will hang. The CLEAR LOGFILE command provides a way to obviate the requirement that the log be archived before it can be reused. 10.3 Clear Logfile (v7.2) If all members of an online redo log group are "lost" or "corrupted" (e.g. due to checksum error, media error, etc.), redo generation may proceed normally until it becomes necessary to reuse the logfile. Once the thread checkpoints of all threads are beyond the log, it is a potential candidate for reuse. Possible scenarios preventing reuse are the following: 1. The log cannot be archived due to a checksum error; it cannot be reused because it needs archiving. 2. A log switch attempt fails because the log is inaccessible (e.g. due to a media error). The log may or may not have been archived. The ALTER DATABASE CLEAR LOGFILE command is provided as an aid to recovering from such scenarios involving an inactive online redo log group (i.e. one that is not needed for crash recovery). CLEAR LOGFILE allows an inactive online logfile to be "cleared": i.e. discarded and reinitialized, in a manner analogous to DROP LOGFILE followed by ADD LOGFILE. In many cases, use of this command obviates the need for database shutdown or resetlogs. Note: CLEAR LOGFILE cannot be used to clear a log needed for crash recovery (i.e. a "current" or "active" log of an open thread). Instead, if such a log becomes lost or corrupted, shutdown abort followed by incomplete recovery and open resetlogs will be necessary. Use of the UNARCHIVED option allows the log clear operation to proceed even if the log needs archiving: an operation that would be disallowed by DROP LOGFILE. Furthermore, CLEAR LOGFILE allows the log clear operation to proceed in the following cases: 7 There are only two logfile groups in the thread. 7 All log group members have been lost through media failure. 7 The logfile being cleared is the current log of a closed thread. All of these operations would be disallowed in the case of DROP LOGFILE. Clearing an unarchived log makes unusable any existing backup whose recovery would require applying redo from the cleared log. Therefore, it is recommended that the database be immediately backed up following use of CLEAR LOGFILE with the UNARCHIVED option. Furthermore, the UNRECOVERABLE DATAFILE option must be used if there is a datafile that is offline, and whose recovery prior to onlining requires application of redo from the cleared logfile. Following use of CLEAR LOGFILE with the UNRECOVERABLE DATAFILE option, the offline datafile, together with its entire tablespace, will have to be dropped from the database. This is due to the fact that redo necessary to bring it online has been cleared, and there is no other copy of it. The foreground process executing CLEAR LOGFILE processes the command in several steps: 7 It checks that the logfile is not needed for crash recovery and is clearable. 7 It sets the "being cleared" and "archiving not needed" flags in the logfile controlfile record. While the "being cleared" flag is set, the logfile is ineligible for reuse by log switch. 7 It recreates a new logfile, and performs multiple writes to clear it to zeroes (a lengthy process). 7 It resets the "being cleared" flag. If the foreground process executing CLEAR LOGFILE dies while execution is in process, the log will not be usable as the current log. Redo generation may stall and the database may hang, much as would happen if log switch had to wait for checkpoint completion, or for log archive completion. Should the process executing CLEAR LOGFILE die, the operation should be completed by reissuing the same command. Another option would be to drop the partially-cleared log. CLEAR LOGFILE could also fail due to an I/ O error encountered while writing zeros to a log group member. An option for recovering would be to drop that member and add another to replace it.

1 comment:

  1. Вот еще немного ссылок на тему, Трах Инцест Смотреть

    Огромные Сиськи Онлайн - http://zhercojob.t35.com/
    Секс Русских Студентов - http://zahirleo.t35.com/
    Эротические Игры Видео - http://kennanclinton.t35.com/
    Папины Дочки Секс - http://chadwickkato.t35.com/
    Эротика С Сюжетом - http://quentinmacaulay.t35.com/
    Мясистые Сиськи - http://drewanthony.t35.com/
    Анальный Секс - http://armandogabriel.t35.com/
    Симпсоны Эротика - http://griffithraphael.t35.com/
    Трах Спящих Порно Видео - http://elliottbrandon.t35.com/
    Видео Секс Новое - http://drewreuben.t35.com/
    Секс Порно Просмотр - http://daltonzachary.t35.com/
    Блондинки С Большими Сиськами Онлайн - http://rigelzachery.t35.com/
    Фото Юных Писек - http://kellyreece.t35.com/
    Эротика Секс Фильмы Бесплатные - http://vladimircolin.t35.com/
    Анастасия Секс - http://kasimirharrison.t35.com/
    Бесплатные Порно Секс Видео Ролики - http://larsbasil.t35.com/
    Секс С Матерью Видео Онлайн - http://cairoomar.t35.com/
    Эротика Любительская Домашняя - http://brendanbaxter.t35.com/
    Дойки Сом Голые Сиськи - http://amerymacaulay.t35.com/
    Порево Трах Ебля - http://leolionel.t35.com/
    Эротика 70 Годов - http://allistaircairo.t35.com/
    Онлайн Эротика Инцест - http://theybacta.t35.com/
    Размеры Сисек - http://jesserafael.t35.com/
    Посмотреть Письки - http://elijahkermit.t35.com/
    Эротика Полнометражные Фильмы - http://phelanduncan.t35.com/
    Домашнее Видео Секс Без Регистрации - http://garthnasim.t35.com/
    Русская Эротика - http://cartereaton.t35.com/
    Мохнатые Письки Женщин - http://nisttrusces.t35.com/
    Очень Красивые Письки - http://cyrusnoble.t35.com/
    Секс С Домработницей Видео - http://brodymoses.t35.com/
    Сиськи 8 - http://axelzeph.t35.com/
    Эротика Азии - http://nicholaswade.t35.com/
    Бесплатное Видео Секс Трах - http://abelorson.t35.com/
    Фото Писек Красивых Девушек - http://adrianprescott.t35.com/
    Скачать Бесплатно Песню Безопасный Секс - http://martintimon.t35.com/
    Трах Лесбиянок Онлайн - http://alvinclark.t35.com/
    Смотреть Секс Взрослых - http://olivermark.t35.com/
    Кармен Электра Эротическое Видео - http://brodyhamilton.t35.com/
    Американский Секс Видео - http://joshuaclark.t35.com/
    Порно Фото Большие Сиськи Зрелые - http://tarikkane.t35.com/
    Секс Обои - http://calebforrest.t35.com/
    Секс Инцест Онлайн - http://holmesuriel.t35.com/
    Дочка С Папой Секс Видео - http://dylanreese.t35.com/
    Девочки С Большими Сиськами Видео - http://donovanarmando.t35.com/
    Эротическое Видео Пособие - http://cooperclayton.t35.com/
    Секс Анфисы Чеховой Видео - http://geoffreyjordan.t35.com/
    Животный Секс Ролики - http://dominicoleg.t35.com/
    Порно Секс Фото Видео Бесплатно - http://philipraymond.t35.com/
    Эротика Мария - http://testmatchbuzz.t35.com/
    Познакомлюсь Для Секса - http://bakerdaquan.t35.com/

    ReplyDelete