Hello Everyone! I need all your helps about Oracle RAC.
I would like to know that when the private network in oracle rac down, which is the preferable node in the cluster (cluster of 2 nodes) to takeover the cluster.
Can I set/get the failover policy to point to one node in the cluster?
Thank you in advance.
the node which will survive is normally the node, which is started first (or has the lowest node number)
The first node in the cluster will get a role, to write the OCR. All other nodes are only reading the OCR.
This role fails over to other nodes, however keeping the node with this role up and running is easier.
Hence the node having this role will survive.
If you shutdown the first node or the first node fails, then the role will fail over to the second node (since it is the only left in the cluster).
So if the first node gets back online (he will get a higher node number then), it is still the second node having that role. In this case the second node will survive.
Thank for your helpful answer.
Anyway, I want to test which is the preferable nodes in cluster (cluster of two nodes) by starting up the crs on the Node2 firstly to bring the role to node2.
All resources in crs on Node2 started completely, so it means Node2 have the role to write to OCR.
After that I start the crs on Node 1 to join the cluster. The crs on node1 completely started.
I down the private connection by using the ifdown command on linux to break the private network connection.
I see that the node that survive is "Node 1" NOT "Node 2". Upon your messages, it supposes "Node2" should survive, but "Node2" is dead.
I am sorry, I am not sure whether I understand your message correctly or not.
Please, help me and guide me more about this.
Kindly use olsnodes command to find out the node number.
Node in output with lowest number will survive.
By naming os node1 and node2 is not the point here. In ./olsnodes command you will find the node number, and which instance starts first will be lowest in number.
ifdown is not the correct failover usecase for testing split brain/private network down.
Clusterware reacts differently, when it identifies that the network card is down, than to if the link to the switch is down.
So please try the same pulling the LINK and not down the interface.
And you will see node 2 will survive. (make sure you check beforehand with Guarav tipp about olsnodes).
I use olsnodes -n, and got the output below:
[root@rac1 ~]# olsnodes -n
So, it seems that the preferable node when private network is down is always rac1
Please, correct if I am wrong.
Base on my testing environment, I use VMware to build the oracle rac.
Because ifdown command reacts differently, now I use vmware feature on Network interface "disconnect" as pulling out the link from network interface.
Please, kindly have a look on the actions, I have done. Correct me if I am wrong.
I have two nodes which are node1 "rac1" and node2 "rac2".
1. I start crs on node2 "rac2" so that node2 have the role to write to OCR
2. After node2 start completely, I start crs on node1 "rac1" to join cluster.
3. I use disconnect network link from private network on node1, and I also check using ethtool to check link detected:
[root@rac1 ~]# ethtool eth1 | grep Link
Link detected: no
[root@rac1 ~]# ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:0C:29:6A:73:20
inet addr:192.168.2.231 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:2658671 errors:0 dropped:0 overruns:0 frame:0
TX packets:2069398 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:1615768514 (1.5 GiB) TX bytes:985464556 (939.8 MiB)
4. After some seconds, I checked the log file
[cssd(9517)]CRS-1612:Network communication with node rac2 (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.620 seconds
[cssd(9517)]CRS-1611:Network communication with node rac2 (2) missing for 75% of timeout interval. Removal of this node from cluster in 6.610 seconds
[cssd(9517)]CRS-1610:Network communication with node rac2 (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.590 seconds
[cssd(9517)]CRS-1607:Node rac2 is being evicted in cluster incarnation 251972986; details at (:CSSNM00007:) in /u01/app/11.2.0/grid/log/rac1/cssd/ocssd.log.
[cssd(9517)]CRS-1625:Node rac2, number 2, was manually shut down
[cssd(9517)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 .
[crsd(9957)]CRS-5504:Node down event reported for node 'rac2'.
[crsd(9957)]CRS-2773:Server 'rac2' has been removed from pool 'Generic'.
[crsd(9957)]CRS-2773:Server 'rac2' has been removed from pool 'ora.oradb'
5. I check the status resource of cluster and I see that Node1 "rac1" is the survived node.
Please help me to analyze it.
Edited by: 985243 on Feb 12, 2013 1:53 AM