5 Replies Latest reply: Oct 25, 2012 10:01 AM by Jim Russell RSS

    OVM servers lost clustering

    945503
      I have two OVM servers in an OVM pool and for some inexplicable reason, the clustering fell over.
      I notice a "Cluster Discovery Warning" in the Event logs, and both servers are showing exclamation marks on their icons.

      How am I supposed to restore clustering when both servers are in this state? I don't see any network problems....

      I am on 3.1.1 by the way
      Manager on 3.1.1 build 399

      Edited by: 942500 on Oct 24, 2012 3:50 PM
        • 1. Re: OVM servers lost clustering
          Jim Russell
          I also have issues with 3 of the VM servers in my 6 node cluster. I did a yum update and after rebooting the updated nodes won't join the cluster. It's not a big issue for me right now as I have way more capacity than I need. I'm hoping that upgrading to 3.2.1 once that is officially released will fix the issue.

          o2hb: Heartbeat started on region 0004FB0000050000BD41301BD1BB6781 (dm-1)
          OCFS2 1.8.0
          o2hb: Region 0004FB0000050000BD41301BD1BB6781 (dm-1) is now a quorum device
          (mount.ocfs2,5188,20):dlm_join_domain:1935 Timed out joining dlm domain 0004FB0000050000BD41301BD1BB6781 after 90400 msecs
          ocfs2: Unmounting device (252,1) on (node 0)
          (mount.ocfs2,5203,3):dlm_join_domain:1935 Timed out joining dlm domain 0004FB0000050000BD41301BD1BB6781 after 90600 msecs
          ocfs2: Unmounting device (252,1) on (node 0)
          (mount.ocfs2,5217,0):dlm_join_domain:1935 Timed out joining dlm domain 0004FB0000050000BD41301BD1BB6781 after 90600 msecs
          ocfs2: Unmounting device (252,1) on (node 0)
          o2net: No longer connected to node atxb617.xxx.xxx (num 1) at 10.x.x.x:7777
          o2net: No longer connected to node atxb422.xxx.xxx (num 2) at 10.x.x.x:7777
          o2net: No longer connected to node atxc121.xxx.xxx (num 6) at 10.x.x.x:7777
          o2hb: Heartbeat stopped on region 0004FB0000050000BD41301BD1BB6781 (dm-1)
          device-mapper: nfs: released file /nfsmnt/d619a32c-6b10-4298-976b-a857b6d7f44e/ovspoolfs.img
          • 2. Re: OVM servers lost clustering
            Dave Smulsky
            Jim

            Have you verified that networking is still functioning on the 3 servers you updated??
            • 3. Re: OVM servers lost clustering
              945503
              More information on my situation.

              Trying to restart o2cb and I get this:-

              # service o2cb start
              Setting cluster stack "o2cb": OK
              Registering O2CB cluster "26851b3782936906": OK
              Setting O2CB cluster timeouts : OK
              Starting global heartbeat for cluster "26851b3782936906": Failed
              o2cb: Heartbeat region could not be found 0004FB00000500004C9825CA1314ED36
              Stopping global heartbeat on cluster "26851b3782936906": OK

              Why would heartbeat fail? How do I fix this?
              • 4. Re: OVM servers lost clustering
                945503
                Looks like my SAN went and died.
                That would prevent the heartbeat from working.

                Never mind
                • 5. Re: OVM servers lost clustering
                  Jim Russell
                  Yes, I have two networks, one for cluster heartbeat and server management, the other for VMs. Both networks ping fine from eachother and from another box on a different switch.

                  I moved one of the failing servers to Unassigned Servers and tried to move it back. It failed with an error indicating that it was unable to bring up the heartbeat.