Forum Stats

  • 3,825,018 Users
  • 2,260,455 Discussions
  • 7,896,382 Comments

Discussions

RMAN using archives for recovery when incremental level 1 is available

user8930540
user8930540 Member Posts: 31 Blue Ribbon
edited Mar 12, 2020 7:41AM in Recovery Manager (RMAN)

Dears,

Background:

We are trying to change our backup & restore strategy to use only FORCE LOGGED level 0  & level 1 (purge archive logs at all other times - the window outside of the level 0 & level 1  - which will take all non-logged archives) for DWH system which is hammered with non logged operation at all times. Our operations is aware of this and are OK with data loss (which can be reloaded). Need for such backup is, we want a DB with no non-logged corruption in case of a restore/recovery done.

DB version : 12.1.0.2

PSU level : APR 2017

Platform: Linux x86_64

Issue:

We are trying to certify our procedure to see if oracle is able to use FORCE LOGGED level 0 and level 1 successfully for the restore/recovery operation using a test restore/recovery of the whole database. But we ran into situation where 1 or few datafiles were recovered using archive files (we didnt purge archives yet, since we haven't certified the procedure) compared to incremental level 1.

Below is the command used for recovery (we didnt force 'recover database noredo' to apply incremental level 1 yet).

recover database DELETE ARCHIVELOG MAXSIZE 200G;

Out of 682 dfs, 669 are recovered using incremental level 1 backup. Rest of the 13 had archives to recover them (out of the 13, 6 are new dfs which are created after level 0 backup was kicked off, so we have no concerns for them using archives, we will find a way around for them in our procedure) - but the other 7 had some unique finding, please read the below table - notice the highlighted boxes... (coming from v$backup_datafile) - this is an example data.

Query used: [column heading may be misleading]

select *

from

(

select c.file#

,c.checkpoint_change# current_ckpt#

,c.checkpoint_time current_ckpt_time

,a.ckptscn level_0_ckpt#

,b.incscn level1_ckpt#

from v$datafile_header c

,(select file#,checkpoint_change# ckptscn from v$backup_datafile where incremental_level=0 and completion_time > to_date('27/02/2020 04:05:00')) a

,(select file#,incremental_change# incscn from v$backup_datafile where incremental_level=1 and completion_time > to_date('27/02/2020 04:05:00')) b

where b.file#=a.file#(+)

and c.file#=b.file#

)

where (level_0_ckpt# is NULL OR level_0_ckpt#!=level1_ckpt#)

order by 1;

Example Data (not actual one):

INCREMENTAL LEVEL 0 (only incremental level 0 before level 1)INCREMENTAL LEVEL 1 (only incremental level 1 post level 0)
DF#INCREMENTAL SCNCHECKPOINT CHANGE#INCREMENTAL CHANGE#CHECKPOINT CHANGE#
10101020
20111121
30131523
40101224
50111125

Question : Is it possible to have such a change# jump for a df (same) which stayed across level 0 and level 1?

We observed this jump on all those 7 dfs, the incremental change# of level 1 > checkpoint change# of level 0!!!!

Read: https://docs.oracle.com/database/121/BRADV/rcmcncpt.htm#BRADV89498  ( About the Incremental Backup Algorithm)

Oracle in the doc says the below under Incremental start SCN

The incremental start SCN is most often the checkpoint SCN of the parent of the level 1 backup.

***most often - not 100% of the time!!!!!***

then in the next passage it says

If the backup is cumulative, then the incremental start SCN is the checkpoint SCN of the most recent level 0 backup.

Our case is cumulative for now, since we didnt have any level 1 between our level 0 or level 1. One level 0 and one level 1.

We assume at this stage, oracle is properly picking the archives as its way to move forward since there are chance we may miss few blocks in case it uses the incremental level 1 to recover the db.

So the reason for this discussion thread is

1) Is our understanding valid at this stage on

          a) the way oracle's incremental backup is performed?

          b) why oracle picked the archives over incremental level 1?

2) If our understanding is valid, why oracle had a jump between the lvl0.checkpoint_change# and lvl1.incremental_change#?

We need the above questions clarified to move forward with our desired backup/restore strategy.

Thank you!

Tagged:

Answers

  • EdStevens
    EdStevens Member Posts: 28,778 Gold Crown
    edited Mar 11, 2020 10:34AM

    Well, the simple answer is that, according to your given RECOVER statement, you don't specify a recovery point in time or scn.  So of course the recovery will try to roll forward as far as possible.  That will necessarily include all archivelogs that were created after the last available backup, and even into the current online redo logs.

  • Dude!
    Dude! Member Posts: 22,829 Black Diamond
    edited Mar 11, 2020 10:48AM

    If you backup an open database, you will always require incremental + archivelogs to perform a consistent recovery. You cannot use incremental level backups only to recover a database to a consistent state unless the backup was performed while the database was shutdown properly. RMAN will choose a recovery strategy depending on what data it has available and appears to be the most efficient for the desired recovery task.

    What are you trying to accomplish, resp. what are you hoping to gain with your strange backup strategy?

  • user8930540
    user8930540 Member Posts: 31 Blue Ribbon
    edited Mar 11, 2020 11:58AM

    Hi Dude,

    The aim of the backup strategy is to offer a non-logged block corruption free db to the end users.

    We aim to use incremental level 0 + incremental level 1 + archive generated during the incremental level 1 alone (we have set until time in our run script and the time is just the end of the incremental level 1).

    The restore or recover command just works fine and it will allow db to be opened in resetlogs fine no issues (but there will non-logged block corruption).

    But the concern is coming from usage of archivelogs when incremental level 1 is available, this means it is recovering the dfs using the archives generated between incremental level 0 and level 1 which we shouldnt, when we have an incremental level 1 available. Hope this clarifies.

    In understanding that behavior we found what we explained in the issue description (relationship between incremental_change# of level 1 backup and checkpoint_change# of level 0 backup).

    Thank you!

  • user8930540
    user8930540 Member Posts: 31 Blue Ribbon
    edited Mar 11, 2020 12:03PM

    Hi Dude,

    Forgot to address the gain you asked for...

    We dont want to waste backup space on archives (assume we will have several TBs of archive per day generated) which will result in corrupted DB.

    Hence we chose FORCE LOGGED level 0 and level 1 sticking to the archives needed only in this window (we have a backup script which handles this properly).

    So you generate a lot of archive with force logging (5X more than no force logging in our case) during the level 0 and level 1 and purge the archives generated in rest all window (this compensates the space we need to store the force logged archives.

    Hope it is clear now.

    Please let us know in case you ever happened to notice the scenario we have explained. To us it looks incremental level 1 isnt backing up all modified blocks, if the situation explained holds true - it is a far more  risky than anything else!!!!!!

    Thanks

  • user8930540
    user8930540 Member Posts: 31 Blue Ribbon
    edited Mar 11, 2020 12:07PM

    Dear EdStevens,

    We didnt post the whole script here for recovery here. But we indeed included the time, you can see below..

    run

    {

    allocate sbt channels ....

    set until time "to_date('02-mar-2020 03:53:00','dd-mon-yyyy hh24:mi:ss')";

    recover database DELETE ARCHIVELOG MAXSIZE 200G;

    release channels...

    }

    Hope it is clear now.

    Please let us know in case you ever happened to notice the scenario we have explained. To us it looks incremental level 1 isnt backing up all modified blocks, if the situation explained holds true - it is a far more  risky than anything else!!!!!!

    Thank you!

  • Dude!
    Dude! Member Posts: 22,829 Black Diamond
    edited Mar 11, 2020 4:05PM

    Sorry I find you last response confusing:

    We dont want to waste backup space on archives (assume we will have several TBs of archive per day generated) which will result in corrupted DB.Hence we chose FORCE LOGGED level 0 and level 1 sticking to the archives needed only in this window (we have a backup script which handles this properly).

    Why would TB's of archivelogs corrupt the database? Force logging mode is a requirement that belongs to Oracle DataGuard and ensures that a standby database remains consistent with the primary database. What does this have to do with RMAN?

    The aim of the backup strategy is to offer a non-logged block corruption free db to the end users.We aim to use incremental level 0 + incremental level 1 + archive generated during the incremental level 1 alone (we have set until time in our run script and the time is just the end of the incremental level 1).

    Are you backing up and recovering a standby database?

    RMAN has no crystal ball to know what kind of recovery you desire. You can, however, delete all archivelogs prior to performing the backup, in which case you can limit your recovery strategy to the way you desire. RMAN will not backup data that doesn't exist.

    Usually, if you need to save disk space, you would backup the "database plus archivelog delete input", which means archivelogs are automatically purged after backup. You can also only backup achivelogs that have not been backed up.

  • user8930540
    user8930540 Member Posts: 31 Blue Ribbon
    edited Mar 12, 2020 3:13AM

    Hi Dude,

    Thanks for your kind response!

    pastedImage_2.png

    We dont want to keep any archives which doesnt promise us an corruption free database, all these TB worth of archives were generated from non-logging operation (which is outside our backup window). This is the context of the statement.  We secure a L0 and L1 with force logging enabled in the DB and we backup all the archives which were generated in the backup window and this works just fine - no issues. But the issue is the jump in incremental_change# for level 1 compared to checkpoint_change from level 0 - this is what we are trying to understand why and how?

    pastedImage_3.png

    We dont have any standby databases. Anyway we are looking at a corruption free database. We do a lot of sqlldr operation with nologging.

    pastedImage_4.png

    This is clear.

    pastedImage_5.png

    We are trying to save space on recovery area (disk) & backup system (tape) as well. Dont want to keep something which isnt useful.

    Hope this clarifies. Intention of this post is to see why oracle had a jump between the lvl0.checkpoint_change# and lvl1.incremental_change#?

    Thank you!

  • Dude!
    Dude! Member Posts: 22,829 Black Diamond
    edited Mar 12, 2020 5:08AM

    The following is from Oracle documentation:

    During an incremental backup, RMAN reads the SCN of each data block in the input file and compares it to the checkpoint SCN of the parent incremental backup. If the SCN in the input data block is greater than or equal to the checkpoint SCN of the parent, then RMAN copies the block.

    There are SCN and Checkpoing SCN. The SCN of a data block, from what I understand, is not necessarily in a sequential order and reflects the "current" SCN similar to the current database clock.

    The RMAN recovery process will use incremental level 1 backups and archivelogs necessary to perform media recovery. This process will also rebuild the undo tablespace in order to rollback uncommitted transactions. As far as I'm aware, starting an RMAN backup will automatically archive the current redo log, which automatically performs a log switch and hence a checkpoint, provided you include the archivelogs in the backup.

    If you use SQL Loader or import data without generating logs, then you can not rely on redo to recover your database. What happens if this occurs during a backup? I would have to give this some more thoughts, but right now I think your best option to guarantee a corruption free recovery might be to create the backup while the database is mounted and not open. You can use an RMAN incremental backup strategy, but you have to make sure no "nologging" happens during the backup.

    Have you seen the following:

    Best Practice 6: Plan for NOLOGGING Operations in Your Backup/Recovery Strategy

    https://docs.oracle.com/database/121/VLDBG/GUID-42825ED1-C4C5-449B-870F-D2C8627CBF86.htm#VLDBG1578

  • user8930540
    user8930540 Member Posts: 31 Blue Ribbon
    edited Mar 12, 2020 6:01AM

    Hi Dude,

    pastedImage_0.png

    I agree to the comment, this was one of the documents we referenced before devising this backup/recovery strategy.

    It was difficult for us to find a backup window (outage) of 24 or so with our current backup capability (10g pipe - SBT) - best possible. There are other alternate options to perform backup (like storage level snapshot) to reduce the window of backup - but they are all expensive I believe and arent explored yet for regular backup purpose yet.

    This discussions are going to happen in future in case we dont find the cause of the observation.

    Sticking to the original post question  to see if we can find out the cause:

    Why oracle had a jump between the lvl0.checkpoint_change# and lvl1.incremental_change#?

    I havent scrutinized the recovery this far until this backup strategy was arrived at and it ended up in corrupted datafiles when we opened the DB.

    Thank you!

  • Dude!
    Dude! Member Posts: 22,829 Black Diamond
    edited Mar 12, 2020 7:41AM

    Are you assuming that the block SCN should be sequential and the gap is the reason for the corruption?

    As I understand SCN is not necessarily sequential - it's an internal time stamp. I also don't see how L0 and L1 should be affected by nologging since it follows compares changes at the block level, unless you use block change tracking.

    You cannot instruct RMAN to use L1 or archivelog for recovery. RAM will determine based on own decision what is available and what is the most economic strategy.