2 Replies Latest reply: May 25, 2012 5:59 AM by user13371969 RSS

    Node can not join cluster after RAC HA Testing

    mdecker
      Dear forum,

      We are performing RAC failover tests according to document "RAC System Test Plan Outline 11gR2, Version 2.0". In testcase #14 - Interconnect network failure (11.2.0.2 an higher), we have disabled private interconnect network of node node1 (OCR Master).

      Then - as expected - node node2 was evicted. Now, after enabling private interconnect network on node node1, i want to start CRS again on node2. However, node does not join cluster with messages:

      2012-03-15 14:12:35.138: [ CSSD][1113114944]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
      2012-03-15 14:12:35.371: [ CSSD][1109961024]clssnmvDHBValidateNCopy: node 1, node1, has a disk HB, but no network HB, DHB has rcfg 226493542, wrtcnt, 2301201, LATS 5535614, lastSeqNo 2301198, uniqueness 1331804892, timestamp 1331817153/13040714
      2012-03-15 14:12:35.479: [ CSSD][1100884288]clssnmvDHBValidateNCopy: node 1, node1, has a disk HB, but no network HB, DHB has rcfg 226493542, wrtcnt, 2301202, LATS 5535724, lastSeqNo 2301199, uniqueness 1331804892, timestamp 1331817154/13041024
      2012-03-15 14:12:35.675: [ CSSD][1080801600]clssnmvDHBValidateNCopy: node 1, node1, has a disk HB, but no network HB, DHB has rcfg 226493542, wrtcnt, 2301203, LATS 5535924, lastSeqNo 2301200, uniqueness 1331804892, timestamp 1331817154/13041364

      Rebooting node2 did not help. Node1 which was online all the time (although private interconnect interface was unplugged for a few minutes and then plugged back in). I suppose that if we reboot node2, the problem will disappear. But there should be solution, which keeps availability requirements.

      Setup:
      2 Nodes (OEL5U7, UEK)
      2 Storages
      Network bonding via Linux bonding
      GI 11.2.0.3.1
      RDBMS 11.1.0.7.10

      Any ideas?

      Regards,
      Martin