I have an issue on a stretched production cluster, running 2 node Oracle RAC 11gR2 located in two different sites.
The problem we are facing is, the node 1 of the cluster on the site 1 has lost the external connectivity on both private and public interconnections.
In this case, node 1 and node 2 failed due the network heartbeat check on the private interconnection but node 2 still have the public interconnection where many applications are working on.
According to the Oracle evicting policy, at this stage, the master node (maybe the node with the lowest node number) sends a "kill signal" to the other node to prevent a potential "Split Brain effect".
I was wondering, is there any polocy I can set up just to define a priority among the nodes? I would like to preserve "alive" the node reachable by the applications (node2 with just public interconnection up) and let the other node (node1 with both connections down) be evicted.
I will really appreciate all of your answers.
There is no such way in which you can prioritize a node to always be a master node. The possible algorithms in a which a node can be defined as a master node are as follows
- The node which started first in the cluster
- The node which remained up during a maintenance activity where in the other node would have been brought down
The way in which you can identify a master node is
- By looking at the crsd log and checking which nodes log has the latest " I AM THE NEW OCR MASTER" or "NEW OCR MASTER IS"
- ocrconfig -showbackup (the node which owns and hosts the latest ocr backup
Vandana - Oracle
Thank you for your detailed answer.
This is exactly what I supposed, so actually there is not any possible solution to prioritize the eviction policy among the nodes.
The master node will survive anyhow to the aviction and maybe won't be the node available to the applications.
What do you think about the following "workaround"?
Add a ad-hoc service to Oracle Clusterware to check constantly the network connection to the applications and shutdown the node in case of private interconnection and application interconnection failures?
This workaround should prevent the eviction of the node reachable from the applications, in case of interconnection failures.
Thanking in advance,