8 Replies Latest reply: Feb 24, 2012 2:14 PM by user11971589 RSS

    Oracle EE 11.2.0.2 RMAN Duplicate interesting issue - is it a bug?

    user11971589
      Hi,

      Oracle EE 11.2.0.2 RHEL 5.7 x86 64-bit RACOne Node database (3 - nodes).

      I was working on a process to refresh out test environment with prod database backup from tape and i have done it many times. Most of our environment is in 10gR2 and have few on 11.2.0.2 RACOne Node databases. The recovery basics in Oracle world is to use archive log sequence + 1 (provided we know before hand the backups metadata) and it works as expected until 10g but with 11.2.0.2 RMAN duplicate, i stumbled upon an issue where RMAN is trying to restore an older controlfile auto backup and then rolling forward from there. Its easy to explain with an example..
      1) I have an archivelog backup, say sequence 20 to 30 on Feb 22nd @ 10pm and a controlfile autobackup at this time c-20120222
      2) I have a FULL backup + archive log until sequence 50 on Feb 23nd@ 6pm and a controlfile autobackup at this time  c-20120223
      3) I am using RMAN duplicate to refresh UAT with Feb 23rd backup and use "set until sequence 51 thread 1"
      I am using following RMAN commands to do the duplicate on UAT server (executed on UAT server)
      rman target sys/***@PROD_DB  auxiliary /
      RMAN> run
      2> {
      3> allocate auxiliary channel c1 device type disk;
      4> allocate auxiliary channel c2 device type disk;
      5> allocate auxiliary channel c3 device type disk;
      6> set until sequence 51 thread 1;
      7> duplicate target database to UAT_DB;
      8> }
      RMAN duplicate automates all that we do manually but the recovery mechanism is same. In the above scenario, it connects to target database, gets the backup metadata and restores controlfile from c-20120223 , restores the datafiles and recovers till sequence 50 . This has been the scenario until 10g in every RMAN duplicate process i have tried and is expected from Oracle.

      Issue:

      In 11gR2 (11.2.0.2), for the same duplicate scenario above, it is restoring the controlfile from autobackup c-20120222 and restoring the datafiles from Feb 23rd backup, recovering till sequence 50 and doing the duplicate process. The thing to note here is that the controlfile autobackup c-20120222 does not have any record of the FULL backup that occured on Feb 23 rd and since RMAN duplicate is having a connection to target database, it knows the backup metadata and hence it is able to complete the recovery.

      For clarification, Thread 1 & 2 of RAC One Node is not an issue here for sure. Did anyone experience this issue? Any insight into this would be greatly appreciated.

      Thanks,
      Shiva
        • 1. Re: Oracle EE 11.2.0.2 RMAN Duplicate interesting issue - is it a bug?
          mseberg
          Hello;

          I'm thinking No. How do you know thread 1 is the latest?

          I would have recovery run until some specific time. Use SET UNTIL TIME instead.

          Best Regards

          mseberg
          • 2. Re: Oracle EE 11.2.0.2 RMAN Duplicate interesting issue - is it a bug?
            user11971589
            mseberg,

            As informed earlier, i am using RAC One Node, so at any point in time, only ONE instance will be up and only 1 THREAD will be active (except during ONLINE RELOCATION). In my case, the instance_2 was never active as we never had any failures or ONLINE RELOCATIONS and below is the stats from my dictionary. And we haven't defined what instance should use what THREAD NUMBER in the spfile, but we can know what thread is active (its same as RAC, thought INST_ID is showing as 1, it might be different in RAC One Node).
            SQL> select inst_id, group#, thread#,status from gv$log;
            
               INST_ID     GROUP#    THREAD# STATUS
            ---------- ---------- ---------- ----------------
                     1          1          1 CURRENT
                     1          2          1 ACTIVE
                     1          3          2 INACTIVE
                     1          4          2 INACTIVE
            
            SQL> alter system switch logfile;
            
            System altered.
            
            SQL> select inst_id, group#, thread#,status from gv$log;
            
               INST_ID     GROUP#    THREAD# STATUS
            ---------- ---------- ---------- ----------------
                     1          1          1 ACTIVE
                     1          2          1 CURRENT
                     1          3          2 INACTIVE
                     1          4          2 INACTIVE
            
            SQL> select inst_id, thread#, enabled, status from GV$INSTANCE_LOG_GROUP;
            
               INST_ID    THREAD# ENABLED  STATUS
            ---------- ---------- -------- ------
                     1          1 PUBLIC   OPEN
                     1          2 PUBLIC   CLOSED
            The recovery completed just because we had the backups in place but what if did not had the Feb 22nd backup tapes in place, it just FAILS. The question here is, RMAN Duplicate is doing something which it is not supposed to do. And coming to the UNTIL TIME vs UNTIL SEQUENCE, i can try it but the recovery mechanism should be same, rather than letting Oracle figure out, i am directly giving it at what sequence it should recover to, which is more specific.

            Thanks for pouring in your thoughts.

            -Shiva
            • 3. Re: Oracle EE 11.2.0.2 RMAN Duplicate interesting issue - is it a bug?
              mseberg
              OK;

              Interesting. I would only expect to use this method if you have a missing or damaged archived redo log file.

              So you must query v$log_history to get your information.

              I figured time-based might be helpful since you can use TO_DATE and eliminate any ambiguity.

              Thanks for sticking to your guns I hope you get a good answer.

              Best Regards

              mseberg
              • 4. Re: Oracle EE 11.2.0.2 RMAN Duplicate interesting issue - is it a bug?
                AJ
                sounds strange to me. I think RMAN must do this for a good reason..
                Maybe you can test to explicitly set the controlfile backup c-20120222 unavailable which would force RMAN to look for another. If it cannot use the newer controlfile backup, maybe the error message will give you a clue.
                • 5. Re: Oracle EE 11.2.0.2 RMAN Duplicate interesting issue - is it a bug?
                  tychos
                  Hi Shiva
                  It looks like the duplicate also performs a recover on the controlfile in 11g.
                  Oracle has added new features to the duplicate command which might be the basis for the changed behaviour between 10g and 11g.
                  in 11g you can use duplicate:
                  - in active database duplication
                  - without connected to a target but connected to a catalog.
                  I guess it is not a bug but expected behaviour to make these features available.
                  Regards,
                  Tycho
                  • 6. Re: Oracle EE 11.2.0.2 RMAN Duplicate interesting issue - is it a bug?
                    user11971589
                    AJ,

                    I have already tried that scenario, i renamed the c-20120222 and kicked off the same duplicate process but it failed and it kept on "fail-over to previous backup" which is expected. The thing is, RMAN look's for the control file autobackups from newest to oldest (unless you gave a point in time) and duplicate command is not doing it.

                    Tycho,

                    I agree there are new features but to support them a change in recovery mechanism is something i would disagree that Oracle would do. Active Duplication is not based on backups, so that is out of question here. For backup-based duplication, we can try with connections to target or targetless, catalog or catalogless or none. I have already given an example with the target (same with target + catalog) and i expect the same behavior with only catalog connection. Now, i have tested the example without connecting to target or catalog and below are my observations...

                    For the same backups i mentioned in the very first email, i got the following info....
                     rman target / 
                     
                    Recovery Manager: Release 11.2.0.2.0 - Production on Fri Feb 24 12:35:54 2012
                     
                    Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
                     
                    connected to target database: RMANTEST (DBID=3789692530)
                     
                    RMAN> list backup of archivelog sequence 95;
                     
                    using target database control file instead of recovery catalog
                     
                    List of Backup Sets
                    ===================
                     
                     
                    BS Key  Size       Device Type Elapsed Time Completion Time
                    ------- ---------- ----------- ------------ ------------------
                    11      2.50K      DISK        00:00:00     23-FEB-12 18:05:40
                            BP Key: 11   Status: AVAILABLE  Compressed: YES  Tag: TAG20120222T094240
                            Piece Name: /u01/app/oracle/exp/rmantest_stl/full_backup/rmantest_stl_0bn3upkg_1_1.arch
                     
                      List of Archived Logs in backup set 11
                      Thrd Seq     Low SCN    Low Time           Next SCN   Next Time
                      ---- ------- ---------- ------------------ ---------- ---------
                      1    95      346280     23-FEB-12 18:05:10 346302     23-FEB-12 18:05:30
                    I got the scn_to_timestamp for the NEXT_SCN of sequence 95, which should correpond tothe "Next Time" that is the timestamp for the Archive log sequence 96.
                    SQL> select scn_to_timestamp(346302) as timestamp from dual;
                     
                    TIMESTAMP
                    ---------------------------------------------------------------------------
                    23-FEB-12 18.05.30.000000000 PM
                    Now, i gave the following for targetless & catalogless duplication...
                    rman auxiliary /
                    RMAN> run
                    2> {
                    3> set until time "to_date('23-FEB-12 18:05:30','DD-MON-YY HH24:MI:SS')";
                    4> duplicate database to rman01s backup location '/u01/app/oracle/exp/rmantest_stl';
                    5> }
                    RESULT: Same ISSUE but this time IT FAILED with the error i expected. It went ahead and restored the c-20120222 autobackup control file and tried to restore the datafiles but, as we
                    know, this control file autobackup does not have record of Feb 23rd + archive log backup, so it FAILS. This is exactly what i told in my first email.

                    Error Log File:
                    contents of Memory Script:
                    {
                       sql clone "alter system set  control_files =
                      ''+RMAN01D_DATA/rman01s_stl1/controlfile/current.270.776081841'', ''+RMAN01D_RECO/rman01s_stl1/controlfile/current.263.776081841'' comment=
                     ''Set by RMAN'' scope=spfile";
                       sql clone "alter system set  db_name =
                     ''RMANTEST'' comment=
                     ''Modified by RMAN duplicate'' scope=spfile";
                       shutdown clone immediate;
                       startup clone force nomount
                       restore clone primary controlfile from  '/u01/app/oracle/exp/rmantest_stl/c-20120222';
                       alter clone database mount;
                    }
                    
                    
                    Errors in memory script
                    RMAN-03015: error occurred in stored script Memory Script
                    RMAN-06136: ORACLE error from auxiliary database: ORA-01507: database not mounted
                    ORA-06512: at "SYS.X$DBMS_RCVMAN", line 13371
                    ORA-06512: at line 1
                    RMAN-03015: error occurred in stored script Memory Script
                    RMAN-06026: some targets not found - aborting restore
                    RMAN-06023: no backup or copy of datafile 5 found to restore
                    RMAN-06023: no backup or copy of datafile 4 found to restore
                    RMAN-06023: no backup or copy of datafile 3 found to restore
                    RMAN-06023: no backup or copy of datafile 2 found to restore
                    RMAN-06023: no backup or copy of datafile 1 found to restore
                    RMAN-00571: ===========================================================
                    RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
                    RMAN-00571: ===========================================================
                    RMAN-03002: failure of Duplicate Db command at 02/24/2012 10:18:04
                    RMAN-05501: aborting duplication of target database
                    So, my next test was not to give any Point In Time, that is just issue the duplicate without SET UNIL clause and this time, and this time it behaved, it restored the control file from c-20120223 and completed the duplicate process. So what does this tell us? We cannot do a POINT IN TIME duplicate using RMAN DUPLICATE????
                    rman auxiliary /
                    RMAN> run
                    2> {
                    3> duplicate database to rman01s backup location '/u01/app/oracle/exp/rmantest_stl';
                    4> }
                    
                    
                    contents of Memory Script:
                    {
                       sql clone "alter system set  control_files =
                      ''+RMAN01D_DATA/rman01s_stl1/controlfile/current.270.776089489'', ''+RMAN01D_RECO/rman01s_stl1/controlfile/current.263.776089489'' comment=
                     ''Set by RMAN'' scope=spfile";
                       sql clone "alter system set  db_name =
                     ''RMANTEST'' comment=
                     ''Modified by RMAN duplicate'' scope=spfile";
                       shutdown clone immediate;
                       startup clone force nomount
                       restore clone primary controlfile from  '/u01/app/oracle/exp/rmantest_stl/c-20120223';
                       alter clone database mount;
                    }
                    Just checking if anyone stumbled upon these findings. An Oracle SR is already in place, so just waiting for an update from them.

                    Thanks,
                    Shiva
                    • 7. Re: Oracle EE 11.2.0.2 RMAN Duplicate interesting issue - is it a bug?
                      tychos
                      Hi Shiva,
                      Another reason could be the flaw in the 10g and older duplicate procedure.
                      When a datafile was added after your last rman database backup the duplicate failed as this datafile could not be restored.
                      Regards,
                      Tycho
                      • 8. Re: Oracle EE 11.2.0.2 RMAN Duplicate interesting issue - is it a bug?
                        user11971589
                        Hmmm.. it's true, i ran into those situations.

                        In yet another test, i saw as what would be the criteria for selecting the control file autobackup restore in DUPLICATE. Here is the sequence of backups i took ( i have "CONTROL FILE AUTOBACKUP ON" always). I did "log switches" so that i have a considerable gap between archive log sequences
                        1) Took an archive log backup (current sequence 100) and after this we got control file auto backup                          c-3789692530-20120224-00
                        2) Took an "backup current control file" and after this we got control file auto backup                                      c-3789692530-20120224-01
                        3) Took a FULL database + archive log backup (current sequence 120) and after this we got control file autobackup            c-3789692530-20120224-02
                        4) Took another archive log backup (current sequence 129) and after this we got control file autobackup                      c-3789692530-20120224-03
                        and here is the RMAN duplicate script i used
                        rman target sys/****@rmantest_stl auxiliary /
                        
                        RMAN> run
                        2> {
                        3> allocate auxiliary channel c1 device type disk;
                        4> allocate auxiliary channel c2 device type disk;
                        5> allocate auxiliary channel c3 device type disk;
                        6> allocate auxiliary channel c4 device type disk;
                        7> set until sequence 130 thread 1;
                        8> duplicate target database to rman01s;
                        9> }
                        For us to be confused, it restored from the correct (expected) control file auto backup c-3789692530-20120224-02 and completed the duplicate successfully.

                        Starting restore at 24-FEB-12
                        
                        channel c1: starting datafile backup set restore
                        channel c1: restoring control file
                        channel c1: reading from backup piece /u01/app/oracle/exp/rmantest_stl/c-3789692530-20120224-02
                        channel c1: piece handle=/u01/app/oracle/exp/rmantest_stl/c-3789692530-20120224-02 tag=TAG20120224T134257
                        channel c1: restored backup piece 1
                        channel c1: restore complete, elapsed time: 00:00:02
                        output file name=+RMAN01D_DATA/rman01s_stl1/controlfile/current.267.776094657
                        output file name=+RMAN01D_RECO/rman01s_stl1/controlfile/current.261.776094657
                        Finished restore at 24-FEB-12
                        I cannot guess as what is the criteria for it selecting the control files for restore. I guess i will just wait for Oracle Support to come back with an answer.

                        -Shiva