6 Replies Latest reply: Aug 19, 2013 8:45 PM by Hemant K Chitale RSS

    New Non-ASM Standby Trying to use ASM during recovery

    Don Seiler

      Oracle 11.2.0.3 on RH6 x86_64. Cross-posting from oracle-l.

       

      We have a database on ASM. We want to migrate to filesystem storage on same host (Oracle ZFS). Recommended path from Oracle is to create a standby and then do a failover when ready. Simple enough you'd think.

       

      Standby has all reference to ASM diskgroups removed, and convert parameters set appropriately. Take a new backup including archivelogs and also a backup standby controlfile. The "duplicate target database for standby" performs the restore phase perfectly fine. When the media recovery phase starts, I see it tries to mount the diskgroup that the primary uses for ASM. However it fails to do so (plenty of errors to alert log), then recovery fails and the instance is left in mount mode. Subsequent attempt to run "recover database" or even "crosscheck archivelog all" run into the same ASM errors.

       

      The one odd thing I see is the reference to the srvctl resource name for that diskgroup:

       

      Mon Aug 19 10:41:35 2013

       

      ERROR: failed to establish dependency between database prod_zfs and diskgroup resource ora.FRA.dg

       

      However I never registered prod_zfs in srvctl, and it still isn't listed when I run "srvctl config database".

       

      There are no references to ASM paths in the standby v$logfile, v$datafie, v$tempfile or v$archived_log. Created a trace controlfile and the ASM paths it uses are for logfiles and datafiles which are successfully restored on disk and renamed in the controlfile.

       

      Wondering if any of you have seen this.

        • 1. Re: New Non-ASM Standby Trying to use ASM during recovery
          mseberg

          Hello;

           

          Can you post the exact Oracle error?

           

          Can you create a new Pfile from the Standby Spfile and post the pfile?

           

          I'm thinking I would look at this too:

           

          Step By Step Guide On How To Recreate Standby Control File When Datafiles Are On ASM And Using Oracle Managed Files (Doc ID 734862.1)

           

          Best Regards

           

          mseberg

           

          Message was edited by: mseberg

          • 2. Re: New Non-ASM Standby Trying to use ASM during recovery
            Don Seiler

            Here's an example of what I can reproduce 100%of the time:

             

            RMAN> backup database plus archivelog;

             

            Starting implicit crosscheck backup at 08/19/2013 16:28:53

            using target database control file instead of recovery catalog

            allocated channel: ORA_DISK_1

            channel ORA_DISK_1: SID=487 device type=DISK

            allocated channel: ORA_DISK_2

            channel ORA_DISK_2: SID=498 device type=DISK

            allocated channel: ORA_DISK_3

            channel ORA_DISK_3: SID=509 device type=DISK

            allocated channel: ORA_DISK_4

            channel ORA_DISK_4: SID=564 device type=DISK

            ORA-03113: end-of-file on communication channel

            ORA-01403: no data found

            ORA-01403: no data found

             

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ... (repeated many times)

            Crosschecked 63 objects

            ORA-03113: end-of-file on communication channel

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ... (repeated many times)

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03113: end-of-file on communication channel

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ORA-03114: not connected to ORACLE

            ... (repeated many times)

            ORA-03114: not connected to ORACLE

            ORA-03113: end-of-file on communication channel

            RMAN-00571: ===========================================================

            RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

            RMAN-00571: ===========================================================

            RMAN-03002: failure of backup plus archivelog command at 08/19/2013 16:28:58

            ORA-03114: not connected to ORACLE

            • 3. Re: New Non-ASM Standby Trying to use ASM during recovery
              Don Seiler

              Just FYI that I've opened an SR and will update this post with my findings.

               

              In the alert log I'll see many of these:

               

              Mon Aug 19 16:28:55 2013

              Errors in file /mnt/oracle/app/diag/rdbms/prodzfs/prodzfs/trace/prodzfs_ora_47041.trc  (incident=25053):

              ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], []

              Incident details in: /mnt/oracle/app/diag/rdbms/prodzfs/prodzfs/incident/incdir_25053/prodzfs_ora_47041_i25053.trc

              Use ADRCI or Support Workbench to package the incident.

              See Note 411.1 at My Oracle Support for error and packaging details.

              Errors in file /mnt/oracle/app/diag/rdbms/prodzfs/prodzfs/trace/prodzfs_ora_47041.trc  (incident=25054):

              ORA-00600: internal error code, arguments: [17090], [], [], [], [], [], [], [], [], [], [], []

              Incident details in: /mnt/oracle/app/diag/rdbms/prodzfs/prodzfs/incident/incdir_25054/prodzfs_ora_47041_i25054.trc

               

              That last trace file has these important bits:

               

              *** 2013-08-19 16:28:57.367

              *** SESSION ID:(498.3) 2013-08-19 16:28:57.367

              *** CLIENT ID:() 2013-08-19 16:28:57.367

              *** SERVICE NAME:() 2013-08-19 16:28:57.367

              *** MODULE NAME:(rman (TNS V1-V3)) 2013-08-19 16:28:57.367

              *** ACTION NAME:(0000006 STARTED62) 2013-08-19 16:28:57.367

               

               

              NOTE: disk 2 is missing from group 2

              DDE: Problem Key 'ORA 600 [kfioTranslateIO03]' was flood controlled (0x2) (incident: 25061)

              ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], []

              kfioRqSet=0x7ff799a87fe8 parent=0x7ff799a380c0 gn=(64.0) cnt=0

                size=1032192 vxn=0 byte offset=16384 buf offset=0

                skipped[0]=0 skipped[1]=0 skipped[2]=0 skipped[3]=0 skipped[4]=0 skipped[5]=0

                failed[0]=0 failed[1]=0 failed[2]=0 failed[3]=0 failed[4]=0 failed[5]=0

              parent <kfiorq>:

              =========Start of 'kfiorq = [0x7ff799a380c0]' dumping =========

                      Status            =  UNKWOWN

                      Flags             =  READ

                      Mirror side       = 0

                      Fib               = 0x85e2a4fa0

                      Offset            = 1

                      buffer ptr        = 0x7ff7998e1000

                      Rcount            = 1032192

                      err_kfiorq        = 600

                      Inflight disk IO  = 0

                      Completed disk IO = 0

                      Oracle error      = 0

                      Intended zone     = 0

                ===Dump of all attached kfiodrq's===

              =========End of 'kfiorq = [0x7ff799a380c0]' dumping =========

               

               

              parent <kfiofib>:

              ############# kfiofib = 0x85e2a4fa0 #################

              Diskgroup Name     = F0

              File number        = 3676.823629065

              File type          = 9

              Flags              = 8

              Blksize            = 16384

              File size          = 149 blocks

              Blk one offset     = 1

              Redundancy         = 17

              Physical blocksz   = 512

              Open name          = +F0/prod/backupset/2013_08_16/ncnnf0_tag20130816t175103_0.3676.823629065

              Fully-qualified nm =+F0/prod/backupset/2013_08_16/ncnnf0_tag20130816t175103_0.3676.823629065

              Mapid             = 7

              Slave ID          = -1

              Connection        = 0x(nil)

              ############################################

              DDE: Problem Key 'ORA 600 [17090]' was flood controlled (0x2) (incident: 25062)

              ORA-00600: internal error code, arguments: [17090], [], [], [], [], [], [], [], [], [], [], []

              Error ORA-600 signaled at ksedsts()+461<-ksf_short_stack()+77<-kge_snap_callstack()+63<-kge_sigtrace_dump()+69<-kgepop()+750<-kgesiv()+136<-kgesic0()+137<-kgersel()+605<-kfioTranslateIO()+3127<-kfioRqSetPrepare()+993<-kfioSubmitIO()+2293<-kfioRequestPriv()+182<-kfioRequest()+706<-ksfd_kfioRequest()+649<-ksfd_osmgo()+256<-ksfdgo()+861<-ksfdaio()+2521<-ksfqfret()+2420<-krbmdbp()+2557<-krbidbp()+1037<-pevm_icd_call_common()+867<-pfrinstr_ICAL()+168<-pfrrun_no_tool()+63<-pfrrun()+627<-plsql_run()+649<-pricar()+1003<-pricbr()+572<-prient2()+1259<-prient()+2268<-kkxrpc()+512<-kporpc()+634<-opiodr()+916<-ttcpip()+2242<-opitsk()+1673<-opiino()+966<-opiodr()+916<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<-__libc_start_main()+253<-_start()+36

              ERROR: unrecoverable error ORA-600 raised in ASM I/O path; terminating process 47043

              ----- Abridged Call Stack Trace -----

              ksedsts()+461<-kfioRequest()+2157<-ksfd_kfioRequest()+649<-ksfd_osmgo()+256<-ksfdgo()+861<-ksfdaio()+2521<-ksfqfret()+2420<-krbmdbp()+2557<-krbidbp()+1037<-pevm_icd_call_common()+867<-pfrinstr_ICAL()+168<-pfrrun_no_tool()+63<-pfrrun()+627<-plsql_run()+649<-pricar()+1003

              <-pricbr()+572<-prient2()+1259<-prient()+2268<-kkxrpc()+512<-kporpc()+634<-opiodr()+916<-ttcpip()+2242<-opitsk()+1673<-opiino()+966<-opiodr()+916<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<-__libc_start_main()+253<-_start()+36

               

               

              ----- End of Abridged Call Stack Trace -----

              • 4. Re: New Non-ASM Standby Trying to use ASM during recovery
                Don Seiler

                Current status is this:

                 

                1. I've got the standby fully recovered via dataguard. Recovery via RMAN would not work due to problems with ASM conflict detailed above. Standby recovery continues to work fine.
                2. RMAN still problems. I can reproduce the error 100% from basic commands such as "backup database" and "crosscheck archivelog all".
                • 5. Re: New Non-ASM Standby Trying to use ASM during recovery
                  Don Seiler

                  Should note that the backup file that it mentions in the trace file is from a controlfile backup taken 3 days ago. Many other tracefiles mention this file as well. Considering that the standby was created by restoring a new backup that I took this morning (backup database plus archivelog, followed by backup current controlfile for standby), I cannot see why it would be interested in an old controlfile backup file. Especially when I'm trying to run a new "backup database".

                   

                  Anyway, as I said, an SR is open, I'll let you know what I find.

                  • 6. Re: New Non-ASM Standby Trying to use ASM during recovery
                    Hemant K Chitale

                    "Recovery via RMAN" .... does that mean that you are manually transferring the ArchiveLogs from the Primary site to the Standby ?  ArchiveLogs are on ASM at the Primary but on FileSystem on the Standby ?

                    Did you issue "ALTER DATABASE REGISTER LOGFILE '..path_to_archivelog' ; "  to register every transferred archivelog ?  Oracle should then look for the archivelogs at the registered location.  Use SQLPlus to issue the RECOVER DATABASE command.

                     

                    Hemant K Chitale