5 Replies Latest reply on Jan 27, 2013 11:53 AM by Sebastian Solbach -Database Community-Oracle

    Split Brain Scenario


      Split Brain could happen when private interconnect fails and instances start working independantly and potentially update the same block which the other instance has updated leading to data corruption. Right ? Is there any other scenario where Split Brain scenario could occur ?
        • 1. Re: Split Brain Scenario
          Levi Pereira

          <li> Network failure or latency between nodes. It would take 30 consecutive missed checkins (by default - determined by the CSS misscount) to cause a node eviction.
          <li> Problems writing to or reading from the CSS voting disk. If the node cannot perform a disk heartbeat to the majority of its voting files, then the node will be evicted.
          <li> A member kill escalation. For example, database LMON process may request CSS to remove an instance from the cluster via the instance eviction mechanism. If this times out it could escalate to a node kill.
          <li> An unexpected failure or hang of the OCSSD process, this can be caused by any of the above issues or something else.
          An Oracle bug.

          *Top 5 Issues That Cause Node Reboots or Evictions or Unexpected Recycle of CRS [ID 1367153.1]*

          *Troubleshooting 11.2 Clusterware Node Evictions (Reboots) [ID 1050693.1]*

          Read Docs about Reboot-less node fencing:

          Edited by: Levi Pereira on Jan 24, 2013 5:02 PM
          • 2. Re: Split Brain Scenario
            If the network between two machines in a cluster is disturbed, the cluster is said to have a 'split brain'.

            Because of the voting disk or disks, the split brain can be solved by the master by terminating the other or others.
            • 3. Re: Split Brain Scenario
              Sebastian Solbach -Database Community-Oracle

              just to make it clear: In any case, the mechanism from Oracle do prevent block corruptions. So while a split brain can happen (and result often in node reboots), it will never affect ACID of the database.

              • 4. Re: Split Brain Scenario
                Actually to have a true "split brain" with the current versions of RAC/clusterware, you would have a major network disconnect AND the OCR/VOTING devices would have to be accessible independent of the other node. I could forsee this only when you have a WAN Cluster (node1 in Seatle and node2 in Dallas with ASM failure groups for each site).

                Any other "disturbance" would cause a node eviction - completely different problem from "split brain".
                [wrote this a while ago and forgot to [submit]]
                • 5. Re: Split Brain Scenario
                  Sebastian Solbach -Database Community-Oracle

                  this is not a true split brain, since if the clusters loose the access to majority of the Voting disks, they will reboot as well.
                  This will prevent any kind of block corruption as well.