This discussion is archived
6 Replies Latest reply: Jun 19, 2013 7:19 AM by 944515 RSS

One Node down in RAC

user545194 Newbie
Currently Being Moderated

Hi,

 

We are running an Oracle 2-Node RAC (11.2.0.3) on Redhat Enterprise Linux, vers. 5 64 bit. Due to a core switch outage, node 2

lost the connection and went down. The database services were automatically moved to node 1. Now we need to bring back node 2 into the cluster.

 

What would be the proper order of commands to achieve this?

 

Thanks for your ideas!

  • 1. Re: One Node down in RAC
    944515 Newbie
    Currently Being Moderated

    Hi,

     

    If the server just crashed you shouldn't have to do anything other than starting the server again, and of course make sure that GI starts up (and unless you've turned off autostart it should start as part of the boot process).

    If autostart is turned off, then to start it do this (as root):

    $ORA_GI_HOME/bin/crsctl start crs

     

    The node is still part of the cluster and it will recover itself and join the cluster again.

     

    regards

    /M

  • 2. Re: One Node down in RAC
    944515 Newbie
    Currently Being Moderated

    Hi again,

     

    Forgot to mention that to check if autostart is enabled/disabled, do this:

     

    cat /etc/oracle/scls_scr/<hostname>/root/ohasdstr

     

    And it should say enable or disable

     

    /M

  • 3. Re: One Node down in RAC
    user545194 Newbie
    Currently Being Moderated

    Hi,

     

    Thanks for your feedback! I will check that. Just a note: The 2.server did not have to be rebooted, it only lost the

    interconnection. I was wondering if the above mentioned issued can be solved but using

     

    crsctl add resource ...

     

    to bring that node back into the cluster?

     

     

  • 4. Re: One Node down in RAC
    944515 Newbie
    Currently Being Moderated

    Hi,

     

    What do you mean by 'lost the interconnection'? Did node2 loose the connection to node1 when the switch died and then rebooted itself? That is to be expected, but then when the server starts up again it should just join the cluster again. If it hasn't then something is fishy.

     

    You don't have to add any resources back to the Cluster Registry after a node reboot.

    Could you run a 'crsctl stat res -t -init' & a 'crsctl stat res -t' from node2?

     

    /M

  • 5. Re: One Node down in RAC
    user545194 Newbie
    Currently Being Moderated

    Hi,

     

    I'll post an excerpt from the grid alert log:

     

    [code]

    ...

    [cssd(20848)]CRS-1612:Network communication with node rac1srv0081 (2) missing for 50% of timeout interval.  Removal of this node from cluster in 14.410 seconds

    2013-06-17 02:38:13.378

    [cssd(20848)]CRS-1611:Network communication with node rac1srv0081 (2) missing for 75% of timeout interval.  Removal of this node from cluster in 7.390 seconds

    2013-06-17 02:38:18.388

    [cssd(20848)]CRS-1610:Network communication with node rac1srv0081 (2) missing for 90% of timeout interval.  Removal of this node from cluster in 2.380 seconds

    2013-06-17 02:38:42.819

    [cssd(20848)]CRS-1632:Node rac1srv0081 is being removed from the cluster in cluster incarnation 217939718

    2013-06-17 02:38:45.483

    [cssd(20848)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1srv0080 .

    2013-06-17 02:38:45.525

    [ctssd(21102)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac1srv0080.

    2013-06-17 02:38:48.149

    [crsd(22323)]CRS-5504:Node down event reported for node 'rac1srv0081'.

    ....

    [code]

    Maybe this can throw some light onto the subject.

  • 6. Re: One Node down in RAC
    944515 Newbie
    Currently Being Moderated

    Hi,

     

    Yes, node2 got evicted since the network was down ("Node rac1srv0081 is being removed from the cluster in cluster incarnation 217939718"), and this is expected to avoid split-brain.

    Is the network back up now? Is node2 (rac1srv0081) up again?

     

    Can you you run a '$ORA_GI_HOME/bin/crsctl stat res -t -init' & a '$ORA_GI_HOME/bin/crsctl stat res -t' from node2 & from node1?

     

    /M

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points