0 Replies Latest reply: Dec 13, 2013 2:44 PM by Linux-RAC-Admin RSS

    ocfs2 1.8 on OL6.4 Mount Issue

    Linux-RAC-Admin


      Hi All,

       

      I have built three servers on Oracle Linux 6.5 kernel 2.6.39-400.211.2.el6uek.x86_64. I will be using ocfs2 for the cluster registry in my RAC environment.

       

      I installed ocfs2-tools-1.8.0-11.el6.x86_64, configured o2cb on all three nodes. The first node I mount the filesystem on works fine but when trying to mount either of the other two nodes it fails -

       

      #mount -t ocfs2 -o _netdev,nointr,datavolume /dev/emcpowere1 /oracrsfiles

      mount.ocfs2: Host is down while mounting /dev/emcpowere1 on /oracrsfiles. Check 'dmesg' for more information on this error.

       

      #dmesg

      o2net: Connected to node ucslnxrac01.ad.swfwmd.net (num 0) at 172.16.17.2:7777

      o2net: Connection to node ucslnxrac01.ad.swfwmd.net (num 0) at 172.16.17.2:7777 shutdown, state 8

      o2net: No longer connected to node ucslnxrac01.ad.swfwmd.net (num 0) at 172.16.17.2:7777

      (mount.ocfs2,5851,6):dlm_send_nodeinfo:1301 ERROR: node mismatch -112, node 0

      (mount.ocfs2,5851,6):dlm_try_to_join_domain:1685 ERROR: status = -112

      (mount.ocfs2,5851,6):dlm_send_one_join_cancel:1406 ERROR: Error -107 when sending message 512 (key 0x666c6172) to node 0

      (mount.ocfs2,5851,6):dlm_send_join_cancels:1440 ERROR: Error return -107 cancelling join on node 0

      (mount.ocfs2,5851,6):dlm_send_join_cancels:1447 ERROR: status = -107

      (mount.ocfs2,5851,6):dlm_try_to_join_domain:1722 ERROR: status = -107

      (mount.ocfs2,5851,6):dlm_join_domain:1955 ERROR: status = -112

      (mount.ocfs2,5851,6):dlm_register_domain:2214 ERROR: status = -112

      (mount.ocfs2,5851,6):o2cb_cluster_connect:358 ERROR: status = -112

      (mount.ocfs2,5851,6):ocfs2_dlm_init:3004 ERROR: status = -112

      (mount.ocfs2,5851,6):ocfs2_mount_volume:1883 ERROR: status = -112

      ocfs2: Unmounting device (120,65) on (node 0)

      (mount.ocfs2,5851,6):ocfs2_fill_super:1238 ERROR: status = -112

      [root@ucslnxrac02 ocfs2]# dmesg

      o2net: Connected to node ucslnxrac01.ad.swfwmd.net (num 0) at 172.16.17.2:7777

      o2net: Connection to node ucslnxrac01.ad.swfwmd.net (num 0) at 172.16.17.2:7777 shutdown, state 8

      o2net: No longer connected to node ucslnxrac01.ad.swfwmd.net (num 0) at 172.16.17.2:7777

      (mount.ocfs2,5851,6):dlm_send_nodeinfo:1301 ERROR: node mismatch -112, node 0

      (mount.ocfs2,5851,6):dlm_try_to_join_domain:1685 ERROR: status = -112

      (mount.ocfs2,5851,6):dlm_send_one_join_cancel:1406 ERROR: Error -107 when sending message 512 (key 0x666c6172) to node 0

      (mount.ocfs2,5851,6):dlm_send_join_cancels:1440 ERROR: Error return -107 cancelling join on node 0

      (mount.ocfs2,5851,6):dlm_send_join_cancels:1447 ERROR: status = -107

      (mount.ocfs2,5851,6):dlm_try_to_join_domain:1722 ERROR: status = -107

      (mount.ocfs2,5851,6):dlm_join_domain:1955 ERROR: status = -112

      (mount.ocfs2,5851,6):dlm_register_domain:2214 ERROR: status = -112

      (mount.ocfs2,5851,6):o2cb_cluster_connect:358 ERROR: status = -112

      (mount.ocfs2,5851,6):ocfs2_dlm_init:3004 ERROR: status = -112

      (mount.ocfs2,5851,6):ocfs2_mount_volume:1883 ERROR: status = -112

      ocfs2: Unmounting device (120,65) on (node 0)

      (mount.ocfs2,5851,6):ocfs2_fill_super:1238 ERROR: status = -112

       

      The first node reports -

       

      Dec 13 15:12:18 ucslnxrac01 kernel: o2net: Accepted connection from node ucslnxrac02.ad.swfwmd.net (num 1) at 172.16.17.3:7777

      Dec 13 15:12:53 ucslnxrac01 kernel: o2net: Connection to node ucslnxrac02.ad.swfwmd.net (num 1) at 172.16.17.3:7777 has been idle for 30.77 secs, shutting it down.

      Dec 13 15:12:53 ucslnxrac01 kernel: o2net: No longer connected to node ucslnxrac02.ad.swfwmd.net (num 1) at 172.16.17.3:7777

       

      If I unmount the fileshare from the first node, I can then mount it on the second. Trying to then mount on the first gives the same errors. so it seems to not be allowing nodes to join the initial cluster.

       

      /etc/ocfs2/cluster.conf is the same on each node -

       

      # cat cluster.conf
      node:
              name = ucslnxrac01.ad.swfwmd.net
              cluster = ocfs2
              number = 0
              ip_address = 172.16.17.2
              ip_port = 7777

      node:
              name = ucslnxrac02.ad.swfwmd.net
              cluster = ocfs2
              number = 1
              ip_address = 172.16.17.3
              ip_port = 7777

      node:
              name = ucslnxrac03.ad.swfwmd.net
              cluster = ocfs2
              number = 2
              ip_address = 172.16.17.4
              ip_port = 7777

      cluster:
              name = ocfs2
              heartbeat_mode = local
              node_count = 3

       

      o2cb status on node with mounted filesystem -

      #  service o2cb status

      Driver for "configfs": Loaded

      Filesystem "configfs": Mounted

      Stack glue driver: Loaded

      Stack plugin "o2cb": Loaded

      Driver for "ocfs2_dlmfs": Loaded

      Filesystem "ocfs2_dlmfs": Mounted

      Checking O2CB cluster "ocfs2": Online

        Heartbeat dead threshold: 31

        Network idle timeout: 30000

        Network keepalive delay: 2000

        Network reconnect delay: 2000

        Heartbeat mode: Local

      Checking O2CB heartbeat: Active

       

      o2cb status on node which I am trying to mount the filesystem -

      # service o2cb status

      Driver for "configfs": Loaded

      Filesystem "configfs": Mounted

      Stack glue driver: Loaded

      Stack plugin "o2cb": Loaded

      Driver for "ocfs2_dlmfs": Loaded

      Filesystem "ocfs2_dlmfs": Mounted

      Checking O2CB cluster "ocfs2": Online

        Heartbeat dead threshold: 31

        Network idle timeout: 30000

        Network keepalive delay: 2000

        Network reconnect delay: 2000

        Heartbeat mode: Local

      Checking O2CB heartbeat: Not active

       


      Firewall and Selinux are off/disabled on all nodes.

       

      Any ideas?

       

      Thanks,

      Michele