6 Replies Latest reply: Apr 30, 2012 12:08 PM by user13283765 RSS

    Reinstate physical stansdby fails on first attempt

    jstem1177
      Hello All,

      Running into a weird problem when reinstate a physical standby database. The reinstate goes through the proper cycle and then fails at the end.
      The work around is to stop/start the reinstating physical standby and rerun the reinstate. But I woulsd still like to figure out why there appears to be network issues.

      However I did find the following error in the logs. What I don't undserstand is how come it cannot connect to itself (Error 12154 received logging on to the standby).

      Setup
      ========
      Primary : 3 nodes RAC 11gR2 - hccprdrm
      Reinstate Standby: 3nodes RAC 11gR2 - hccprdrt
      Protection: MaxAvailibity
      Flashback On: YES
      TNS: same on all servers

      Completed: alter database  mount
      Error 12154 received logging on to the standby
      FAL[client, ARC2]: Error 12154 connecting to hccprdrm for fetching gap sequence
      Starting Data Guard Broker (DMON)
      Fri Apr 27 12:47:51 2012
      INSV started with pid=44, OS id=11419
      Here is the output from reinstate cycle. Its seems to fail just before it mounts the new physical standby
      DGMGRL>  reinstate database hccprdrmt
      Reinstating database "hccprdrt", please wait...
      Operation requires shutdown of instance "hccprdrt1" on database "hccprdrt"
      Shutting down instance "hccprdrt1"...
      ORA-01109: database not open
      
      Database dismounted.
      ORACLE instance shut down.
      Operation requires startup of instance "hccprdrt1" on database "hccprdrt"
      Starting instance "hccprdrt1"...
      ORACLE instance started.
      Database mounted.
      Continuing to reinstate database "hccprdrt" ...
      Error: ORA-16653: failed to reinstate database
      
      Failed.
      Reinstatement of database "hccprdrt" failed
      I've check the network and the databases and servers can be ping and tnsping, sqlplus without any trouble. The v$datguard_stat show
      LGWR: Beginning to archive log 16 thread 1 sequence 4
      ARC3: Beginning to archive thread 1 sequence 3 (606960086-606967042)
      
      MESSAGE
      --------------------------------------------------------------------------------
      ARC3: Completed archiving thread 1 sequence 3 (606960086-606967042)
      ARC0: Standby redo logfile selected for thread 1 sequence 3 for destination LOG_
      ARCHIVE_DEST_2
      
      LGWR: Completed archiving log 16 thread 1 sequence 4
      LGWR: Standby redo logfile selected to archive thread 1 sequence 5
      LGWR: Standby redo logfile selected for thread 1 sequence 5 for destination LOG_
      ARCHIVE_DEST_2
      
      LGWR: Beginning to archive log 19 thread 1 sequence 5
      ARC3: Beginning to archive thread 1 sequence 4 (606967042-606967089)
      
      MESSAGE
      --------------------------------------------------------------------------------
      ARC3: Completed archiving thread 1 sequence 4 (606967042-606967089)
      ARC2: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3113)
      ARC2: Error 16058 attaching to RFS for reconnect
      PING[ARC2]: Error 3113 when pinging standby hccprdrt.
      LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3113)
      LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
      Error 3113 for archive log file 19 to 'hccprdrt'
      LGWR: Error 1041 disconnecting from destination LOG_ARCHIVE_DEST_2 standby host
      'hccprdrt'
      
      LGWR: Failed to archive log 19 thread 1 sequence 5 (3113)
      Thanks for any assistance.

      Jan S.
        • 1. Re: Reinstate physical stansdby fails on first attempt
          mseberg
          Jan;

          First off, sorry I have no additional comments on your other question "Dataguard Log Transport Problems after Crashing Apply Instance - 11GR2 RAC"



          12154 - indicates that there is a problem with name resolution, could the Standby server TNSnames be incorrect or missing an entry?

          3113 - Would test the network

          Primary site:

          conn sys/<password>@standby as sysdba

          standby site:

          conn sys/<password>@primary as sysdba



          ORA-01109 - during reinstate ( normal - kind of like if you do a shutdown immediate while in MOUNT mode, it will bark but harmless )

          ORA-16653 - during reinstate - Generally I would check and make sure Flashback Database is not disabled in either database.
          Data Guard broker log files should confirm or deny.


          Summary - I believe/agree that you have a network issue, listener or tnsnames on one of your servers.

          Very busy today, not sure I will be back.

          Worth reminding you you can search the DG forum for these errors.

          Worth a look :

          http://www.oracle.com/technetwork/database/features/availability/maa-wp-11gr2-client-failover-173305.pdf

          Best Regards

          mseberg
          • 2. Re: Reinstate physical stansdby fails on first attempt
            jstem1177
            Hello mseberg,

            I have one question. I htink I may have found the problem. The TNS identifer used for the broker should it poitn to the normal service name or to the DGMGRL service name??
            The reason I asked is because I found that weather I do a switchover or failover, everything works only after I bound the new standby?

            HCCPRDRT =
             (DESCRIPTION =
                 (ADDRESS_LIST =
                    (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn1-vip)(PORT=1521))
                 (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn2-vip)(PORT=1521))
                    (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn3-vip)(PORT=1521))
                 )
                (CONNECT_DATA =
                  *(SERVICE_NAME = hccprdrt.my_domain.com)*
                )
              )
            {code}
            
            {code}
            HCCPRDRT =
             (DESCRIPTION =
                 (ADDRESS_LIST =
                    (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn1-vip)(PORT=1521))
                 (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn2-vip)(PORT=1521))
                    (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn3-vip)(PORT=1521))
                 )
                (CONNECT_DATA =
                  *(SERVICE_NAME = hccprdrt_DGMGRL.my_domain.com)*
                )
              )
            {code}
            
            
            
            Thanks
            
            Jan S.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
            • 3. Re: Reinstate physical stansdby fails on first attempt
              mseberg
              Non RAC example from mine


              The _DGMGR has to be there and should show up in lsnrtcl status Note the SID_LIST_LISTENER

              Listener on Primary
               
              # listener.ora Network Configuration File: /u01/app/oracle/product/11.2.0/network/admin/listener.ora
              # Generated by Oracle configuration tools.
               
              LISTENER =
                (DESCRIPTION_LIST =
                  (DESCRIPTION =
                    (ADDRESS = (PROTOCOL = TCP)(HOST = yourdomain.com)(PORT = 1521))
                  )
                )
               
              SID_LIST_LISTENER = (SID_LIST=(SID_DESC=(SID_NAME = PRIMARY)
                                      (ORACLE_HOME=/u01/app/oracle/product/11.2.0)
                                         (GLOBAL_DBNAME = PRIMARY_DGMGRL.yourdomain.com)))
               
               
               
              ADR_BASE_LISTENER = /u01/app/oracle
               
               
               
              INBOUND_CONNECT_TIMEOUT_LISTENER=120
               
               
               
              Listener on Standby
               
               
              # listener.ora Network Configuration File: /u01/app/oracle/product/11.2.0/network/admin/listener.ora
              # Generated by Oracle configuration tools.
               
              LISTENER =
                (DESCRIPTION_LIST =
                  (DESCRIPTION =
                    (ADDRESS = (PROTOCOL = TCP)(HOST = yourdomain.com)(PORT = 1521))
                  )
                )
               
              SID_LIST_LISTENER = (SID_LIST =(SID_DESC =(SID_NAME = STANDBY)
                                    (ORACLE_HOME = /u01/app/oracle/product/11.2.0)
                                    (SDU=32767)
                                    (GLOBAL_DBNAME = STANDBY_DGMGRL.yourdomain.com)))    
               
               
              ADR_BASE_LISTENER = /u01/app/oracle
               
               
              INBOUND_CONNECT_TIMEOUT_LISTENER=120
               
              If your's is incorrect it would explain a lot. It does look wrong
              LISTENER = (DESCRIPTION = 
                   (ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=host_name)
                   (PORT=port_num))))
              SID_LIST_LISTENER=(SID_LIST=(SID_DESC=(SID_NAME=sid_name)
                   (GLOBAL_DBNAME=db_unique_name_DGMGRL.db_domain)
                   (ORACLE_HOME=oracle_home)))
              So yours should be more like this
              LISTENER = (DESCRIPTION = 
                   (ADDRESS_LIST =
                   (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn1-vip)(PORT=1521))
                   (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn2-vip)(PORT=1521))
                   (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn3-vip)(PORT=1521))
              )
              SID_LIST_LISTENER=(SID_LIST=(SID_DESC=(SID_NAME=??)
                   (GLOBAL_DBNAME=hccprdrt_DGMGRL.my_domain.com)
                   (ORACLE_HOME=oracle_home)))
              Sorry, my setup is different, would love to hand you the exact thing...closer




              Best Regards

              mseberg

              Edited by: mseberg on Apr 27, 2012 2:16 PM

              Edited by: mseberg on Apr 27, 2012 2:20 PM

              Edited by: mseberg on Apr 27, 2012 2:22 PM


              h1. Metalink docs


              The top one is very good ( has RAC One Node Database example )


              Oracle Data Guard Broker and Static Service Registration [ID 1387859.1]

              Automatic Restart of Databases during Switchover fail with ORA-12514 in DGMGRL [ID 308943.1]

              Edited by: mseberg on Apr 27, 2012 2:26 PM

              Edited by: mseberg on Apr 27, 2012 2:29 PM


              h3. Sorry about all the edits, work is very busy
              • 4. Re: Reinstate physical stansdby fails on first attempt
                jstem1177
                Hello mseberg,

                So I fiddled around witht the dataguard for the larger part of the weekend and its still jamming after a switchover and a failover. I also tried to change the listener as you suggested and this did not help, actually it caused the same errors as seen in Bug:

                Bug 9300432: ORA-16570 AT SWITCHOVER PHYS. STANDBY DB, NOT RESTARTED BY DGMGRL
                Performing switchover NOW, please wait...
                Error: ORA-16552: error occurred while generating directives for client
                Error: ORA-16501: the Data Guard broker operation failed
                I followed the great document 1387859.1 and confirmed that my setup is correct (Administrator managed RAC --> sid_list for listener includes only the sid_name for the specify host)
                - I also veriifed and the lsitener is up and available during the switchover and failover
                - I can ping from both sides and tnsping
                - All communications for the broker work correctly
                - The DG Static Connector registration shows correct for each instance (_DGMGRL) when I do a show instance verbose [instance_name]

                Really at this point I think I may need to open up a SR with Oracle to figure this one out. Its like I said, it goes through the entire cycle of a failover and swithcover, its only after it completes the cycle that I complains the the 2 sites cannot communicate with one another.

                Thanks in advance for any other ideas you might have.

                Jan S.
                • 5. Re: Reinstate physical stansdby fails on first attempt
                  mseberg
                  Jan;

                  Sorry about all the issues. Both ORA-16501 and ORA-16552 are vague at best. I believe you have done a great job of trying to shake out the issue before getting Oracle support involved. This is always a good idea and hopefully you can cut to the chase with them.

                  I found a OTN thread with the same errors, not sure if any information there will help.


                  error in swithover to standby

                  Points to ( Sorry if I have given you any duplicate links )

                  Bug 9300432: ORA-16570 AT SWITCHOVER PHYS. STANDBY DB, NOT RESTARTED BY DGMGRL ( seems to be not fixed with work around only )


                  I did find this if you have not already looked at it :

                  http://neeraj-dba.blogspot.com/2011/11/how-to-drop-data-guard-configuration-in.html


                  Very similar to :

                  How to Safely Remove a Data Guard Broker Configuration [ID 261336.1]



                  If you find a solution please consider posting it here.

                  Best Regards

                  mseberg

                  Edited by: mseberg on Apr 30, 2012 9:30 AM
                  • 6. Re: Reinstate physical stansdby fails on first attempt
                    user13283765
                    Hello mseberg,

                    So I've norrow down the issue. This seems to be related to oracle restart. Here is what I did.

                    I performed a swtichover(applies to failover as well) and here is the output:
                    DGMGRL> show configuration;
                    
                    Configuration - dg_config
                    
                      Protection Mode: MaxAvailability
                      Databases:
                        hccprdrm - Primary database
                        hccprdrt - Physical standby database
                    
                    Fast-Start Failover: DISABLED
                    
                    Configuration Status:
                    SUCCESS
                    
                    DGMGRL> switchover to hccprdrt;
                    Performing switchover NOW, please wait...
                    New primary database "hccprdrt" is opening...
                    Operation requires shutdown of instance "hccprdrm1" on database "hccprdrm"
                    Shutting down instance "hccprdrm1"...
                    ORACLE instance shut down.
                    Operation requires startup of instance "hccprdrm1" on database "hccprdrm"
                    Starting instance "hccprdrm1"...
                    ORACLE instance started.
                    Database mounted.
                    Switchover succeeded, new primary is "hccprdrt"
                    DGMGRL> show configuration;
                    At this point I can see that hccprdrm1 is mounted and I can connect to it locally:
                    [oracle@oraprdrmn1 ~]$ srvctl status database -d hccprdrm -v
                    Instance hccprdrm1 is running on node oraprdrmn1. Instance status: Mounted (Closed).
                    Instance hccprdrm2 is not running on node oraprdrmn2
                    Instance hccprdrm3 is not running on node oraprdrmn3
                    
                    SQL> select open_mode, database_role from v$database;
                    
                    OPEN_MODE            DATABASE_ROLE
                    -------------------- ----------------
                    MOUNTED              PHYSICAL STANDBY
                    On the NEW PRIMARY SITE i check that tnsping works:
                    Used TNSNAMES adapter to resolve the alias
                    Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL=tcp)(HOST=******)(PORT=1521))) (CONNECT_DATA = (SERVICE_NAME = ****)))
                    OK (20 msec)
                    The I tried to connect with sqlplus to the OLD PRIMARY and got the following error:
                    [oracle@oraprdrtn1 admin]$ sqlplus system/*******@hccprdrm
                    
                    SQL*Plus: Release 11.2.0.2.0 Production on Mon Apr 30 12:54:50 2012
                    
                    Copyright (c) 1982, 2010, Oracle.  All rights reserved.
                    
                    ERROR:
                    ORA-01033: ORACLE initialization or shutdown in progress
                    Process ID: 0
                    Session ID: 0 Serial number: 0
                    Ok, so now, I decided to go into the NEW PRIMARY and checked the status of the dest_2 and found that it was consistent with Oracle Not Available:
                    SQL> select DEST_ID,STATUS,ERROR from V$archive_dest where dest_id=2;
                    
                       DEST_ID STATUS
                    ---------- ---------
                    ERROR
                    -----------------------------------------------------------------
                             2 ERROR
                    ORA-01034: ORACLE not available
                    So I'm think problems with the ORACLE_RESTART. Just to make sure I decided to bound the listener on the mounted instance. The I tried another connect from the NEW PRIMARY:
                    [oracle@oraprdrtn1 admin]$ sqlplus system/******@hccprdrm
                    
                    SQL*Plus: Release 11.2.0.2.0 Production on Mon Apr 30 13:03:04 2012
                    
                    Copyright (c) 1982, 2010, Oracle.  All rights reserved.
                    
                    ERROR:
                    ORA-01034: ORACLE not available
                    ORA-27101: shared memory realm does not exist
                    Linux-x86_64 Error: 2: No such file or directory
                    Process ID: 0
                    Session ID: 0 Serial number: 0
                    Now the question is whether the bug is what you mentioned, but I don't think so because the old primary is mount and the role switch happend properly.

                    Only after I restart the OLD PRIMARY does all the log transport resume.

                    So I now know that the LISTENER is properly configured, so is the TNS and the underlying dataguard is correct as will.


                    Do you think at this point I should take it to Oracle Support? Or do you have nay ideas what I could do with regards to oracle_restart?

                    Thanks

                    Jan S.