This discussion is archived
8 Replies Latest reply: Feb 12, 2013 1:53 AM by 988246 RSS

Oracle RAC Private Connection fail. what is the preferable node?

988246 Newbie
Currently Being Moderated
Hello Everyone! I need all your helps about Oracle RAC.

I would like to know that when the private network in oracle rac down, which is the preferable node in the cluster (cluster of 2 nodes) to takeover the cluster.

Can I set/get the failover policy to point to one node in the cluster?

Thank you in advance.
  • 1. Re: Oracle RAC Private Connection fail. what is the preferable node?
    brunors Explorer
    Currently Being Moderated
    Hi,

    follow:

    Introdution - http://docs.oracle.com/cd/B28359_01/rac.111/b28254/admcon.htm

    Implementation - http://docs.oracle.com/cd/B19306_01/rac.102/b28759/install.htm
    - http://infrastructure.celeritas.com/Oracle-High-Availability-RAC.html
    - http://www.puschitz.com/InstallingOracle10gRAC.shtml

    Installing 2 nodes -http://muneer2908.wordpress.com/2011/01/21/creating-a-2-node-oracle-10g-rac-database/


    Kind regards,
    Bruno Reis.
    www.brunors.com
  • 2. Re: Oracle RAC Private Connection fail. what is the preferable node?
    Sebastian Solbach (DBA Community) Guru
    Currently Being Moderated
    Hi,

    the node which will survive is normally the node, which is started first (or has the lowest node number)
    The first node in the cluster will get a role, to write the OCR. All other nodes are only reading the OCR.
    This role fails over to other nodes, however keeping the node with this role up and running is easier.
    Hence the node having this role will survive.

    If you shutdown the first node or the first node fails, then the role will fail over to the second node (since it is the only left in the cluster).
    So if the first node gets back online (he will get a higher node number then), it is still the second node having that role. In this case the second node will survive.

    Regards
    Sebastian
  • 3. Re: Oracle RAC Private Connection fail. what is the preferable node?
    988246 Newbie
    Currently Being Moderated
    Thank you for link.
  • 4. Re: Oracle RAC Private Connection fail. what is the preferable node?
    988246 Newbie
    Currently Being Moderated
    Hi

    Thank for your helpful answer.

    Anyway, I want to test which is the preferable nodes in cluster (cluster of two nodes) by starting up the crs on the Node2 firstly to bring the role to node2.
    All resources in crs on Node2 started completely, so it means Node2 have the role to write to OCR.

    After that I start the crs on Node 1 to join the cluster. The crs on node1 completely started.

    I down the private connection by using the ifdown command on linux to break the private network connection.

    I see that the node that survive is "Node 1" NOT "Node 2". Upon your messages, it supposes "Node2" should survive, but "Node2" is dead.

    I am sorry, I am not sure whether I understand your message correctly or not.

    Please, help me and guide me more about this.

    Thanks
  • 5. Re: Oracle RAC Private Connection fail. what is the preferable node?
    GauravAhuja Newbie
    Currently Being Moderated
    Hi,


    Kindly use olsnodes command to find out the node number.

    Node in output with lowest number will survive.

    By naming os node1 and node2 is not the point here. In ./olsnodes command you will find the node number, and which instance starts first will be lowest in number.


    Regards
    Gaurav
  • 6. Re: Oracle RAC Private Connection fail. what is the preferable node?
    Sebastian Solbach (DBA Community) Guru
    Currently Being Moderated
    Hi,

    ifdown is not the correct failover usecase for testing split brain/private network down.
    Clusterware reacts differently, when it identifies that the network card is down, than to if the link to the switch is down.

    So please try the same pulling the LINK and not down the interface.
    And you will see node 2 will survive. (make sure you check beforehand with Guarav tipp about olsnodes).

    Regards
    Sebastian
  • 7. Re: Oracle RAC Private Connection fail. what is the preferable node?
    988246 Newbie
    Currently Being Moderated
    Hi,

    I use olsnodes -n, and got the output below:

    [root@rac1 ~]# olsnodes -n
    rac1 1
    rac2 2

    So, it seems that the preferable node when private network is down is always rac1

    Please, correct if I am wrong.

    Thanks,
  • 8. Re: Oracle RAC Private Connection fail. what is the preferable node?
    988246 Newbie
    Currently Being Moderated
    Hi,

    Base on my testing environment, I use VMware to build the oracle rac.
    Because ifdown command reacts differently, now I use vmware feature on Network interface "disconnect" as pulling out the link from network interface.

    Please, kindly have a look on the actions, I have done. Correct me if I am wrong.

    I have two nodes which are node1 "rac1" and node2 "rac2".

    1. I start crs on node2 "rac2" so that node2 have the role to write to OCR

    2. After node2 start completely, I start crs on node1 "rac1" to join cluster.

    3. I use disconnect network link from private network on node1, and I also check using ethtool to check link detected:

    [root@rac1 ~]# ethtool eth1 | grep Link
    Link detected: no


    [root@rac1 ~]# ifconfig eth1
    eth1 Link encap:Ethernet HWaddr 00:0C:29:6A:73:20
    inet addr:192.168.2.231 Bcast:192.168.2.255 Mask:255.255.255.0
    UP BROADCAST MULTICAST MTU:1500 Metric:1
    RX packets:2658671 errors:0 dropped:0 overruns:0 frame:0
    TX packets:2069398 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:1615768514 (1.5 GiB) TX bytes:985464556 (939.8 MiB)

    4. After some seconds, I checked the log file

    [cssd(9517)]CRS-1612:Network communication with node rac2 (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.620 seconds
    2013-02-12 04:12:30.419
    [cssd(9517)]CRS-1611:Network communication with node rac2 (2) missing for 75% of timeout interval. Removal of this node from cluster in 6.610 seconds
    2013-02-12 04:12:34.436
    [cssd(9517)]CRS-1610:Network communication with node rac2 (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.590 seconds
    2013-02-12 04:12:37.036
    [cssd(9517)]CRS-1607:Node rac2 is being evicted in cluster incarnation 251972986; details at (:CSSNM00007:) in /u01/app/11.2.0/grid/log/rac1/cssd/ocssd.log.
    2013-02-12 04:12:39.136
    [cssd(9517)]CRS-1625:Node rac2, number 2, was manually shut down
    2013-02-12 04:12:39.140
    [cssd(9517)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 .
    2013-02-12 04:12:39.157
    [crsd(9957)]CRS-5504:Node down event reported for node 'rac2'.
    2013-02-12 04:12:45.519
    [crsd(9957)]CRS-2773:Server 'rac2' has been removed from pool 'Generic'.
    2013-02-12 04:12:45.519
    [crsd(9957)]CRS-2773:Server 'rac2' has been removed from pool 'ora.oradb'

    5. I check the status resource of cluster and I see that Node1 "rac1" is the survived node.

    Please help me to analyze it.

    Thanks,

    Edited by: 985243 on Feb 12, 2013 1:53 AM

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points