This discussion is archived
6 Replies Latest reply: Apr 30, 2012 10:08 AM by user13283765 RSS

Reinstate physical stansdby fails on first attempt

jstem1177 Explorer
Currently Being Moderated
Hello All,

Running into a weird problem when reinstate a physical standby database. The reinstate goes through the proper cycle and then fails at the end.
The work around is to stop/start the reinstating physical standby and rerun the reinstate. But I woulsd still like to figure out why there appears to be network issues.

However I did find the following error in the logs. What I don't undserstand is how come it cannot connect to itself (Error 12154 received logging on to the standby).

Setup
========
Primary : 3 nodes RAC 11gR2 - hccprdrm
Reinstate Standby: 3nodes RAC 11gR2 - hccprdrt
Protection: MaxAvailibity
Flashback On: YES
TNS: same on all servers

Completed: alter database  mount
Error 12154 received logging on to the standby
FAL[client, ARC2]: Error 12154 connecting to hccprdrm for fetching gap sequence
Starting Data Guard Broker (DMON)
Fri Apr 27 12:47:51 2012
INSV started with pid=44, OS id=11419
Here is the output from reinstate cycle. Its seems to fail just before it mounts the new physical standby
DGMGRL>  reinstate database hccprdrmt
Reinstating database "hccprdrt", please wait...
Operation requires shutdown of instance "hccprdrt1" on database "hccprdrt"
Shutting down instance "hccprdrt1"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "hccprdrt1" on database "hccprdrt"
Starting instance "hccprdrt1"...
ORACLE instance started.
Database mounted.
Continuing to reinstate database "hccprdrt" ...
Error: ORA-16653: failed to reinstate database

Failed.
Reinstatement of database "hccprdrt" failed
I've check the network and the databases and servers can be ping and tnsping, sqlplus without any trouble. The v$datguard_stat show
LGWR: Beginning to archive log 16 thread 1 sequence 4
ARC3: Beginning to archive thread 1 sequence 3 (606960086-606967042)

MESSAGE
--------------------------------------------------------------------------------
ARC3: Completed archiving thread 1 sequence 3 (606960086-606967042)
ARC0: Standby redo logfile selected for thread 1 sequence 3 for destination LOG_
ARCHIVE_DEST_2

LGWR: Completed archiving log 16 thread 1 sequence 4
LGWR: Standby redo logfile selected to archive thread 1 sequence 5
LGWR: Standby redo logfile selected for thread 1 sequence 5 for destination LOG_
ARCHIVE_DEST_2

LGWR: Beginning to archive log 19 thread 1 sequence 5
ARC3: Beginning to archive thread 1 sequence 4 (606967042-606967089)

MESSAGE
--------------------------------------------------------------------------------
ARC3: Completed archiving thread 1 sequence 4 (606967042-606967089)
ARC2: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3113)
ARC2: Error 16058 attaching to RFS for reconnect
PING[ARC2]: Error 3113 when pinging standby hccprdrt.
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3113)
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Error 3113 for archive log file 19 to 'hccprdrt'
LGWR: Error 1041 disconnecting from destination LOG_ARCHIVE_DEST_2 standby host
'hccprdrt'

LGWR: Failed to archive log 19 thread 1 sequence 5 (3113)
Thanks for any assistance.

Jan S.
  • 1. Re: Reinstate physical stansdby fails on first attempt
    mseberg Guru
    Currently Being Moderated
    Jan;

    First off, sorry I have no additional comments on your other question "Dataguard Log Transport Problems after Crashing Apply Instance - 11GR2 RAC"



    12154 - indicates that there is a problem with name resolution, could the Standby server TNSnames be incorrect or missing an entry?

    3113 - Would test the network

    Primary site:

    conn sys/<password>@standby as sysdba

    standby site:

    conn sys/<password>@primary as sysdba



    ORA-01109 - during reinstate ( normal - kind of like if you do a shutdown immediate while in MOUNT mode, it will bark but harmless )

    ORA-16653 - during reinstate - Generally I would check and make sure Flashback Database is not disabled in either database.
    Data Guard broker log files should confirm or deny.


    Summary - I believe/agree that you have a network issue, listener or tnsnames on one of your servers.

    Very busy today, not sure I will be back.

    Worth reminding you you can search the DG forum for these errors.

    Worth a look :

    http://www.oracle.com/technetwork/database/features/availability/maa-wp-11gr2-client-failover-173305.pdf

    Best Regards

    mseberg
  • 2. Re: Reinstate physical stansdby fails on first attempt
    jstem1177 Explorer
    Currently Being Moderated
    Hello mseberg,

    I have one question. I htink I may have found the problem. The TNS identifer used for the broker should it poitn to the normal service name or to the DGMGRL service name??
    The reason I asked is because I found that weather I do a switchover or failover, everything works only after I bound the new standby?

    HCCPRDRT =
     (DESCRIPTION =
         (ADDRESS_LIST =
            (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn1-vip)(PORT=1521))
         (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn2-vip)(PORT=1521))
            (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn3-vip)(PORT=1521))
         )
        (CONNECT_DATA =
          *(SERVICE_NAME = hccprdrt.my_domain.com)*
        )
      )
    {code}
    
    {code}
    HCCPRDRT =
     (DESCRIPTION =
         (ADDRESS_LIST =
            (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn1-vip)(PORT=1521))
         (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn2-vip)(PORT=1521))
            (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn3-vip)(PORT=1521))
         )
        (CONNECT_DATA =
          *(SERVICE_NAME = hccprdrt_DGMGRL.my_domain.com)*
        )
      )
    {code}
    
    
    
    Thanks
    
    Jan S.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
  • 3. Re: Reinstate physical stansdby fails on first attempt
    mseberg Guru
    Currently Being Moderated
    Non RAC example from mine


    The _DGMGR has to be there and should show up in lsnrtcl status Note the SID_LIST_LISTENER

    Listener on Primary
     
    # listener.ora Network Configuration File: /u01/app/oracle/product/11.2.0/network/admin/listener.ora
    # Generated by Oracle configuration tools.
     
    LISTENER =
      (DESCRIPTION_LIST =
        (DESCRIPTION =
          (ADDRESS = (PROTOCOL = TCP)(HOST = yourdomain.com)(PORT = 1521))
        )
      )
     
    SID_LIST_LISTENER = (SID_LIST=(SID_DESC=(SID_NAME = PRIMARY)
                            (ORACLE_HOME=/u01/app/oracle/product/11.2.0)
                               (GLOBAL_DBNAME = PRIMARY_DGMGRL.yourdomain.com)))
     
     
     
    ADR_BASE_LISTENER = /u01/app/oracle
     
     
     
    INBOUND_CONNECT_TIMEOUT_LISTENER=120
     
     
     
    Listener on Standby
     
     
    # listener.ora Network Configuration File: /u01/app/oracle/product/11.2.0/network/admin/listener.ora
    # Generated by Oracle configuration tools.
     
    LISTENER =
      (DESCRIPTION_LIST =
        (DESCRIPTION =
          (ADDRESS = (PROTOCOL = TCP)(HOST = yourdomain.com)(PORT = 1521))
        )
      )
     
    SID_LIST_LISTENER = (SID_LIST =(SID_DESC =(SID_NAME = STANDBY)
                          (ORACLE_HOME = /u01/app/oracle/product/11.2.0)
                          (SDU=32767)
                          (GLOBAL_DBNAME = STANDBY_DGMGRL.yourdomain.com)))    
     
     
    ADR_BASE_LISTENER = /u01/app/oracle
     
     
    INBOUND_CONNECT_TIMEOUT_LISTENER=120
     
    If your's is incorrect it would explain a lot. It does look wrong
    LISTENER = (DESCRIPTION = 
         (ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=host_name)
         (PORT=port_num))))
    SID_LIST_LISTENER=(SID_LIST=(SID_DESC=(SID_NAME=sid_name)
         (GLOBAL_DBNAME=db_unique_name_DGMGRL.db_domain)
         (ORACLE_HOME=oracle_home)))
    So yours should be more like this
    LISTENER = (DESCRIPTION = 
         (ADDRESS_LIST =
         (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn1-vip)(PORT=1521))
         (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn2-vip)(PORT=1521))
         (ADDRESS = (PROTOCOL=tcp)(HOST=prdrtn3-vip)(PORT=1521))
    )
    SID_LIST_LISTENER=(SID_LIST=(SID_DESC=(SID_NAME=??)
         (GLOBAL_DBNAME=hccprdrt_DGMGRL.my_domain.com)
         (ORACLE_HOME=oracle_home)))
    Sorry, my setup is different, would love to hand you the exact thing...closer




    Best Regards

    mseberg

    Edited by: mseberg on Apr 27, 2012 2:16 PM

    Edited by: mseberg on Apr 27, 2012 2:20 PM

    Edited by: mseberg on Apr 27, 2012 2:22 PM


    h1. Metalink docs


    The top one is very good ( has RAC One Node Database example )


    Oracle Data Guard Broker and Static Service Registration [ID 1387859.1]

    Automatic Restart of Databases during Switchover fail with ORA-12514 in DGMGRL [ID 308943.1]

    Edited by: mseberg on Apr 27, 2012 2:26 PM

    Edited by: mseberg on Apr 27, 2012 2:29 PM


    h3. Sorry about all the edits, work is very busy
  • 4. Re: Reinstate physical stansdby fails on first attempt
    jstem1177 Explorer
    Currently Being Moderated
    Hello mseberg,

    So I fiddled around witht the dataguard for the larger part of the weekend and its still jamming after a switchover and a failover. I also tried to change the listener as you suggested and this did not help, actually it caused the same errors as seen in Bug:

    Bug 9300432: ORA-16570 AT SWITCHOVER PHYS. STANDBY DB, NOT RESTARTED BY DGMGRL
    Performing switchover NOW, please wait...
    Error: ORA-16552: error occurred while generating directives for client
    Error: ORA-16501: the Data Guard broker operation failed
    I followed the great document 1387859.1 and confirmed that my setup is correct (Administrator managed RAC --> sid_list for listener includes only the sid_name for the specify host)
    - I also veriifed and the lsitener is up and available during the switchover and failover
    - I can ping from both sides and tnsping
    - All communications for the broker work correctly
    - The DG Static Connector registration shows correct for each instance (_DGMGRL) when I do a show instance verbose [instance_name]

    Really at this point I think I may need to open up a SR with Oracle to figure this one out. Its like I said, it goes through the entire cycle of a failover and swithcover, its only after it completes the cycle that I complains the the 2 sites cannot communicate with one another.

    Thanks in advance for any other ideas you might have.

    Jan S.
  • 5. Re: Reinstate physical stansdby fails on first attempt
    mseberg Guru
    Currently Being Moderated
    Jan;

    Sorry about all the issues. Both ORA-16501 and ORA-16552 are vague at best. I believe you have done a great job of trying to shake out the issue before getting Oracle support involved. This is always a good idea and hopefully you can cut to the chase with them.

    I found a OTN thread with the same errors, not sure if any information there will help.


    error in swithover to standby

    Points to ( Sorry if I have given you any duplicate links )

    Bug 9300432: ORA-16570 AT SWITCHOVER PHYS. STANDBY DB, NOT RESTARTED BY DGMGRL ( seems to be not fixed with work around only )


    I did find this if you have not already looked at it :

    http://neeraj-dba.blogspot.com/2011/11/how-to-drop-data-guard-configuration-in.html


    Very similar to :

    How to Safely Remove a Data Guard Broker Configuration [ID 261336.1]



    If you find a solution please consider posting it here.

    Best Regards

    mseberg

    Edited by: mseberg on Apr 30, 2012 9:30 AM
  • 6. Re: Reinstate physical stansdby fails on first attempt
    user13283765 Newbie
    Currently Being Moderated
    Hello mseberg,

    So I've norrow down the issue. This seems to be related to oracle restart. Here is what I did.

    I performed a swtichover(applies to failover as well) and here is the output:
    DGMGRL> show configuration;
    
    Configuration - dg_config
    
      Protection Mode: MaxAvailability
      Databases:
        hccprdrm - Primary database
        hccprdrt - Physical standby database
    
    Fast-Start Failover: DISABLED
    
    Configuration Status:
    SUCCESS
    
    DGMGRL> switchover to hccprdrt;
    Performing switchover NOW, please wait...
    New primary database "hccprdrt" is opening...
    Operation requires shutdown of instance "hccprdrm1" on database "hccprdrm"
    Shutting down instance "hccprdrm1"...
    ORACLE instance shut down.
    Operation requires startup of instance "hccprdrm1" on database "hccprdrm"
    Starting instance "hccprdrm1"...
    ORACLE instance started.
    Database mounted.
    Switchover succeeded, new primary is "hccprdrt"
    DGMGRL> show configuration;
    At this point I can see that hccprdrm1 is mounted and I can connect to it locally:
    [oracle@oraprdrmn1 ~]$ srvctl status database -d hccprdrm -v
    Instance hccprdrm1 is running on node oraprdrmn1. Instance status: Mounted (Closed).
    Instance hccprdrm2 is not running on node oraprdrmn2
    Instance hccprdrm3 is not running on node oraprdrmn3
    
    SQL> select open_mode, database_role from v$database;
    
    OPEN_MODE            DATABASE_ROLE
    -------------------- ----------------
    MOUNTED              PHYSICAL STANDBY
    On the NEW PRIMARY SITE i check that tnsping works:
    Used TNSNAMES adapter to resolve the alias
    Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL=tcp)(HOST=******)(PORT=1521))) (CONNECT_DATA = (SERVICE_NAME = ****)))
    OK (20 msec)
    The I tried to connect with sqlplus to the OLD PRIMARY and got the following error:
    [oracle@oraprdrtn1 admin]$ sqlplus system/*******@hccprdrm
    
    SQL*Plus: Release 11.2.0.2.0 Production on Mon Apr 30 12:54:50 2012
    
    Copyright (c) 1982, 2010, Oracle.  All rights reserved.
    
    ERROR:
    ORA-01033: ORACLE initialization or shutdown in progress
    Process ID: 0
    Session ID: 0 Serial number: 0
    Ok, so now, I decided to go into the NEW PRIMARY and checked the status of the dest_2 and found that it was consistent with Oracle Not Available:
    SQL> select DEST_ID,STATUS,ERROR from V$archive_dest where dest_id=2;
    
       DEST_ID STATUS
    ---------- ---------
    ERROR
    -----------------------------------------------------------------
             2 ERROR
    ORA-01034: ORACLE not available
    So I'm think problems with the ORACLE_RESTART. Just to make sure I decided to bound the listener on the mounted instance. The I tried another connect from the NEW PRIMARY:
    [oracle@oraprdrtn1 admin]$ sqlplus system/******@hccprdrm
    
    SQL*Plus: Release 11.2.0.2.0 Production on Mon Apr 30 13:03:04 2012
    
    Copyright (c) 1982, 2010, Oracle.  All rights reserved.
    
    ERROR:
    ORA-01034: ORACLE not available
    ORA-27101: shared memory realm does not exist
    Linux-x86_64 Error: 2: No such file or directory
    Process ID: 0
    Session ID: 0 Serial number: 0
    Now the question is whether the bug is what you mentioned, but I don't think so because the old primary is mount and the role switch happend properly.

    Only after I restart the OLD PRIMARY does all the log transport resume.

    So I now know that the LISTENER is properly configured, so is the TNS and the underlying dataguard is correct as will.


    Do you think at this point I should take it to Oracle Support? Or do you have nay ideas what I could do with regards to oracle_restart?

    Thanks

    Jan S.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points