the node which will survive is normally the node, which is started first (or has the lowest node number)
The first node in the cluster will get a role, to write the OCR. All other nodes are only reading the OCR.
This role fails over to other nodes, however keeping the node with this role up and running is easier.
Hence the node having this role will survive.
If you shutdown the first node or the first node fails, then the role will fail over to the second node (since it is the only left in the cluster).
So if the first node gets back online (he will get a higher node number then), it is still the second node having that role. In this case the second node will survive.
Anyway, I want to test which is the preferable nodes in cluster (cluster of two nodes) by starting up the crs on the Node2 firstly to bring the role to node2.
All resources in crs on Node2 started completely, so it means Node2 have the role to write to OCR.
After that I start the crs on Node 1 to join the cluster. The crs on node1 completely started.
I down the private connection by using the ifdown command on linux to break the private network connection.
I see that the node that survive is "Node 1" NOT "Node 2". Upon your messages, it supposes "Node2" should survive, but "Node2" is dead.
I am sorry, I am not sure whether I understand your message correctly or not.
ifdown is not the correct failover usecase for testing split brain/private network down.
Clusterware reacts differently, when it identifies that the network card is down, than to if the link to the switch is down.
So please try the same pulling the LINK and not down the interface.
And you will see node 2 will survive. (make sure you check beforehand with Guarav tipp about olsnodes).
Base on my testing environment, I use VMware to build the oracle rac.
Because ifdown command reacts differently, now I use vmware feature on Network interface "disconnect" as pulling out the link from network interface.
Please, kindly have a look on the actions, I have done. Correct me if I am wrong.
I have two nodes which are node1 "rac1" and node2 "rac2".
1. I start crs on node2 "rac2" so that node2 have the role to write to OCR
2. After node2 start completely, I start crs on node1 "rac1" to join cluster.
3. I use disconnect network link from private network on node1, and I also check using ethtool to check link detected:
[root@rac1 ~]# ethtool eth1 | grep Link
Link detected: no
[cssd(9517)]CRS-1612:Network communication with node rac2 (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.620 seconds
[cssd(9517)]CRS-1611:Network communication with node rac2 (2) missing for 75% of timeout interval. Removal of this node from cluster in 6.610 seconds
[cssd(9517)]CRS-1610:Network communication with node rac2 (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.590 seconds
[cssd(9517)]CRS-1607:Node rac2 is being evicted in cluster incarnation 251972986; details at (:CSSNM00007:) in /u01/app/11.2.0/grid/log/rac1/cssd/ocssd.log.
[cssd(9517)]CRS-1625:Node rac2, number 2, was manually shut down
[cssd(9517)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 .
[crsd(9957)]CRS-5504:Node down event reported for node 'rac2'.
[crsd(9957)]CRS-2773:Server 'rac2' has been removed from pool 'Generic'.
[crsd(9957)]CRS-2773:Server 'rac2' has been removed from pool 'ora.oradb'
5. I check the status resource of cluster and I see that Node1 "rac1" is the survived node.