Tuesday, July 24, 2007

Backup & Recovery6

Subject: Physical Backup and Recovery: An Insider's Perspective



PHYSICAL BACKUP AND RECOVERY: AN INSIDER'S PERSPECTIVE


INTRODUCTION
============

This article discusses the most common concepts of backup and
recovery of the Oracle database.

This article discusses physical backups and not logical backups.
(An export of the database is an example of a logical backup.)


Why do we need backups?
======================

The most important responsibility of a database administrator is to
prepare for the possibility of media, hardware and software failure.
Should any of these failures occur the major goal is to make the
database available to users within an acceptable time, ensuring
that committed data is undamaged.

This paper discusses only physical backups and not logical backups.
An export of the database is an example of a logical backup.


What are the things that go wrong and eventually lead to recovery?
=================================================================

One or more database files are damaged.
One or more redo log files (including online) are damaged.
One or all control files are damaged.

What is recovery?
================

Restoring the files from backup and rolling forward in time is RECOVERY.

AN INTERNAL VIEW
================

Some admnistrative actions can destroy a database beyond repair. This is
particularly true of backup and recovery commands. To understand the
dangers of certain commands, a database administrator should have a
good understanding of the contents and purpose of the control file,
datafiles and log files. The basics are provided here.


Control file
============

A control file reflects the structure of a database at particular points
in time. It contains the checkpoint information, names of log files
and data files, header information of the files and log sequence number,
which is very important for recovery purposes. The recovery is done
only by applying the log files whose sequence number is greater than
log sequence number in the control file.

Datafile information in control file:
------------------------------------

- Names of datafiles and log files with exact path.
- File size.
- Block size.(Oracle block size)
- Whether the datafile is online or offline.
- Whether the datafile was taken offline automatically or not.
- Whether the datafile belongs to the system tablespace or not.
- Entry for each datafile giving a log sequence number when tablespace
was taken offline.

Log file information in control file:
------------------------------------

- Name with exact path.
- File size.
- Block size. (O/S block size)
- Log sequence#
- Has the file been archived.

Information in the datafile header
==================================

- Log sequence number of next log file that could be applied.
- Whether online backup in progress.

Information in the log file header
==================================

- Log sequence#
- Archival information.

BACKUP
======

Offline Backup
==============

A backup taken when the database is shut down normally is known as
offline or a cold backup. Copying of the datafiles, control file and
online redo log files must be done by using an operating system
copy utility. This is a considered as a complete backup of the
database. Any changes made after this backup will be unrecoverable if
the database is running in NOARCHIVELOG mode. All transactions are
recorded in online redo log files whether archiving or not. When redo
logs are archived (ARCHIVELOG mode), ORACLE allows you to apply these
transactions after restoring files that were damaged (assuming an
active Redo log file was not among the files damaged).

Whenever the schema of the database is changed, such as a new datafile
is added or a file is renamed or a tablespace is created or dropped,
shutdown the database and at least make a copy of the control file and
the newly added datafile. A complete backup of the database is
preferred.

Online Backup
=============

At sites where a database must operate 24-hours per day and when it is
not feasible to take offline backups, then an alternative is provided
by Oracle where physical backups are performed while the database remains
available for both reading and updating. For this kind of backup the
database must be in ARCHIVELOG mode. Only data files and current
control file need to be backed up. Unlike offline backups, the unit of
a online backup is a tablespace, and any or all tablespaces can backed
up whenever needed. Different datafiles can be backed up at different
times.

Procedure
---------

ALTER TABLESPACE ts_name BEGIN BACKUP

Then perform an operating system backup of all datafiles in that
tablespace. Once the backup is completed then it is very important
to issue the command:

ALTER TABLESPACE ts_name END BACKUP

All this must be done while the database is open.

Frequency of online backups
---------------------------

In order to determine how frequently to back up the files of a
database, balance the amount of time available for taking backups
and the time available for recovery after media failure. The time for
recovery depends on how old your most recent copy of the damaged file
is. The older your backup, the more redo log files need to be applied,
and the longer recovery will take.

Backup strategies should be tested before being used to protect a
production database. Ensure that backups of all datafiles and of all
necessary redo logs are kept, and that backups are restored correctly.
(If file compression is used, verify that the file is correct after
decompression.)

What happens between BEGIN BACKUP and END BACKUP?
------------------------------------------------

Once the ALTER TABLESPACE ts_name BEGIN BACKUP is issued the status
in the datafile header is changed to indicate that the datafile is
being backed up. Oracle stops recording the occurrence of checkpoints
in the header of the database files. This means that when a database
file is restored, it will have knowledge of the most recent checkpoint
that occurred BEFORE the backup, not any that occurred during the
backup. This way, the system will ask for the appropriate set of redo
log files to apply should recovery be needed. Since vital information
needed for recovery is recorded in the Redo logs, these REDO LOGS are
considered as part of the backup. Hence, while backing up the database
in this way the database must be in ARCHIVELOG mode. Status in the
datafile header is not reset until END BACKUP is issued.
On END BACKUP, the system again begins noting the occurrence of the
checkpoints in each file of the database. The checkpoint in the
datafile header is changed during the next log switch after END BACKUP
is issued. The above information will allow the tablespace to be
recovered as if the database had been offline when the backup took
place.

Critical Files
==============

All the files belonging to the database are important. Along with
other tablespaces, special care should be taken to ensure that the
SYSTEM tablespace and tablespaces containing rollback segments, are
protected by backups. Also backup the control file and datafile
immediately after adding it to a tablespace or after creating
tablespace if archiving is enabled. If media failure damages a datafile
that has not been backed up, recovering it's tablespace is not
possible. After backing up the newly added datafile, include it in the
regular datafile backup rotation.

RECOVERY
========

Instance Failure
================

Instance failure is a hardware, software, or system failure that
prevents an instance from continuing work. It can be caused by
a CPU failure, an operating system failure, a power outage, failure
of one of the ORACLE background processes or a failure to access a
required database file when the file is not lost or damaged.

Instance Recovery
=================

Instance recovery is automatic. Restarting the database performs the
instance recovery. It involves two steps.

1) Rolling forward.
- data that has not been recorded in the database
- the contents of rollback segments

2) Rolling back transactions that have been explicitly rolled
back or have not been committed.

3) Releasing any resources held by transactions in process at the
time of the failure.

4) Instance recovery is not necessary if the database is shutdown
normally.

Media failure
=============

Media failure is a hardware, software, or system failure that prevents
reading or writing to files that are required to operate the database.
Media failure is a failure caused by the loss of the control file,
database files or redo log file.

Media Recovery
==============

What is needed for media recovery?
---------------------------------

The database must be operating in ARCHIVELOG mode. In addition, you must
have the latest backup of database, all online redo log files, archived logs,
current control file.

Performing Recovery
-------------------

You can use different commands to recover your database. They are:

1) RECOVER DATABASE

This command is only used with a current control file. Database must
be mounted, but not OPEN. The control file is compared with the
datafile header and brings the datafiles up to date by applying
archived redo logs to make the control file entries match the datafile
header. Online redo logs are applied to make the datafiles current.

Once the recovery is complete open the database with:

ALTER DATABASE OPEN

2) RECOVER DATAFILE

This command is used when database is up and cannot be stopped. Can
also be used when the database is in mounted state. The tablespace
which contains these datafiles must be taken offline.

Issue RECOVER DATAFILE , command. You will be
prompted for log files. The changes will be applied only to these
files.

Once the media recovery is complete the tablespace can be brought
online. Allows 'multi-tasking' recovery. Different datafiles can
be recovered parallelly using different sessions or terminals.
Very useful when there are several datafiles to be recovered.

3) RECOVER TABLESPACE

Tablespace must be offline. Database must be in OPEN state. Recovers
a single tablespace to the current state. This command cannot be
used on SYSTEM tablespace or a tablespace which has rollback
segments having a status "in use". If having database OPEN is not an
issue, can recover using standard recovery (RECOVER DATABASE).

4) RECOVER DATABASE MANUAL

Manual recovery after media failure enables you to control how
many redo log files to apply to the database. This can used to undo
an inadvertent change to the database by stopping recovery before the
change took place. MANUAL option needed for recovery with a control
file backup (old copy) and current control file is not available.

Database must be MOUNTed but not OPEN. After MOUNTing the database
connect internal and issue RECOVER DATABASE MANUAL command. Then you
will be prompted beginning with the earliest redo log file recorded in
the header of each database file. The recovery process will continue to
prompt for redo log files until CANCEL is typed when prompted for the
next redo log file. Recovery can be cancelled at any time of any redo
log.

5) RECOVER DATABASE UNTIL