I have Sun Fire V490 x 2 servers with Oracle RAC and they faced a Split brain problem. One of the node's database instance has gone down, The DBA claims it is due to network problem, but as such the networks are OK. We use the on board CE1 interface for Cluster interconnect and CE0 as the public interface.
Did anybody face this kind of a problem? Could this be a hardware/OS patch problem?
I had kept a continuous ping for 24 hours after this happened last time and the output shows no packet loss
In order to diagnose this properly, you'll need to provide too much detail and far too many log files for a generic discussion forum to handle.
Use your service contract and open a support case.
Because a cluster environment is involved you'll likely end up talking to the cluster support staff.
They can analyze hardware and software errors as well as review whether you configured the systems in a supportable fashion.
Be prepared to make a direct connection to each system and gather data using such as by using the Explorer tool. The technical support staff will tell you what they will actually need.