4 Replies Latest reply on Oct 17, 2013 11:21 AM by AjithPathiyil

    node 2 is rebooted when ocfs is mounted from it

    muneer.ngha.ksa

      Dear Friends,

       

      I have installed vmware workstation and I am trying to create a 2 node Oracle 10g RAC system.

       

      I have done with most of the installation pre-reqs. However, I am stuck at OCFS2 file system creation.

       

      I have a drive /dev/sdb dedicated to hold OCFS2 file system which i am trying to mount on a directory "/ocfs"

       

      The mount command is as follows:

       

      mount -t ocfs2 -o datavolume,nointr /dev/sdb1 /ocfs

       

      This command is successfully run on both the nodes. However, when I run it on second node, after sometime of its completion, this node is getting rebooted.

       

      If I don't mount the file system on "/ocfs" on the 2nd node, nothing happens. But as soon as I run the above command on 2nd node, it gets rebooted.

       

      I found following messages in output of "dmesg" command:

       

      mptscsi: ioc0: task abort: SUCCESS (sc=38153380)

      o2net: connected to node rac1 (num 0) at 192.168.2.131:7777

      o2net: connection to node rac1 (num 0) at 192.168.2.131:7777 has been idle for 3 0.0 seconds, shutting it down.

      (0,0):o2net_idle_timer:1426 here are some times that might help debug the situat ion: (tmr 1381609139.201720 now 1381609439.670136 dr 1381609139.201617 adv 13816 09139.201727:1381609139.201727 func (00000000:0) 0.0:0.0)

      o2net: no longer connected to node rac1 (num 0) at 192.168.2.131:7777

      OCFS2 1.2.7 Mon Nov 19 23:07:51 EST 2007 (build d443ce77532cea8d1e167ab2de51b8c8 )

      SELinux: initialized (dev debugfs, type debugfs), uses genfs_contexts

      (25333,0):dlm_request_join:901 ERROR: status = -107

      (25333,0):dlm_try_to_join_domain:1049 ERROR: status = -107

      (25333,0):dlm_join_domain:1321 ERROR: status = -107

      (25333,0):dlm_register_domain:1514 ERROR: status = -107

      (25333,0):ocfs2_dlm_init:2024 ERROR: status = -107

      (25333,0):ocfs2_mount_volume:1133 ERROR: status = -107

      ocfs2: Unmounting device (8,17) on (node 1)

      o2net: connected to node rac1 (num 0) at 192.168.2.131:7777

      o2net: connection to node rac1 (num 0) at 192.168.2.131:7777 has been idle for 3 0.0 seconds, shutting it down.

      (0,0):o2net_idle_timer:1426 here are some times that might help debug the situat ion: (tmr 1381610045.317343 now 1381610345.246950 dr 1381610045.317258 adv 13816 10045.317350:1381610045.317351 func (00000000:0) 0.0:0.0)

      o2net: no longer connected to node rac1 (num 0) at 192.168.2.131:7777

      (25358,0):dlm_request_join:901 ERROR: status = -107

       

       

      I am not sure whats going wrong here. Kindly help me.

       

      Thanks.

        • 1. Re: node 2 is rebooted when ocfs is mounted from it
          Levi Pereira

          Hi,

           

          /dev/sdb1 /ocfs ocfs2 _netdev,datavolume,nointr 0 0

          _netdevEnsures that the OCFS2 volume is not mounted before the networking structure is up, and ensures that there is an unmount before shutting down the network.datavolumeApplies only to data volumes, and every type of file usage except shared binaries. On a clustered database such as Oracle Real Application Clusters (RAC), the datavolume includes the Cluster Registry and Voting Disks. The datavolume allows direct I/O access to the files.nointrProhibits interrupts, and is applied to the same type of data files as the datavolume option.

           

          When Oracle RAC Voting Disks, and OCR (Oracle Cluster Registry) disks are installed on OCFS2, the disks require the same mounting options as datavolumes:

          _netdev,datavolume,nointr

          .

          1 person found this helpful
          • 2. Re: node 2 is rebooted when ocfs is mounted from it
            muneer.ngha.ksa

            Dear Levi

             

            Thanks for the explanation.

             

            Can you let me know why the node is getting rebooted if these are the options to be used with ocfs2

             

            Thanks.

            • 3. Re: node 2 is rebooted when ocfs is mounted from it
              Levi Pereira

              Hi,

              Try it

              Heartbeat/Voting/Quorum Related Timeout Configuration for Linux, OCFS2, RAC Stack to Avoid Unnecessary Node Fencing, Panic and Reboot (Doc ID 395878.1)

              • 4. Re: node 2 is rebooted when ocfs is mounted from it
                AjithPathiyil

                Hi Muneer,

                 

                This is otherwise called "Shoot the other node in the head" STONITH technology, Since the RAC database is created in VMWARE, the resources are limited and so the fencing timeout parameters should be set according to our system resource limitations in VMWARE virtual environments.

                 

                So, the best fencing timout setting would be below, So that, the timeout parameters do wait for the longer hearbeat and still beleive the node(s) is part of cluster, Else, if timeout is smaller, it will shoot the node. This fencing timeout setting works for me very well, Hope this helps you too, If yes, please mark it as answered.

                 

                Increase CRS Fencing Timeout (Shared Filesystem)

                ===================================================

                 

                 

                These steps are not necessary for a test or production environment. However they might make your

                VMware test cluster just a little more stable and they will provide a good learning opportunity about

                Grid Infrastructure.

                 

                 

                 

                 


                1. Grid Infrastructure must be running on only one node to change these settings. Shutdown the clusterware on ajithpathiyil2 as user root.


                 

                 

                [oracle@ajithpathiyil1 ˜]$ ssh ajithpathiyil2

                Last login: Wed Mar 30 14:50:49 2011

                Set environment by typing 'oenv' - default is instance RAC1.

                ajithpathiyil2:/home/oracle[RAC1]$ su -

                Password:

                [root@ajithpathiyil2 bin]# crsctl stop crs

                CRS-2791: Starting shutdown of Oracle High Availability

                Services-managed resources

                on 'ajithpathiyil2'

                CRS-2673: Attempting to stop 'ora.crsd' on 'ajithpathiyil2'

                CRS-2790: Starting shutdown of Cluster Ready Services-managed

                resources on 'ajithpathiyil2'

                ...

                ...

                ...

                CRS-2793: Shutdown of Oracle High Availability Services-managed

                resources on 'ajithpathiyil2' has completed

                CRS-4133: Oracle High Availability Services has been stopped.

                 

                 

                 

                 

                 

                 

                 

                2. Return to node ajithpathiyil1. As the root user, increase the misscount so that CRS waits 1.5 minutes before it reboots. (VMware can drag a little on some laptops!)

                 

                 

                 

                 

                 

                [root@ajithpathiyil1 ˜]# crsctl get css misscount

                30

                [root@ajithpathiyil1 ˜]# crsctl set css misscount 90

                Configuration parameter misscount is now set to 90.

                 

                 

                 

                 

                 

                3. Increase the disktimeout so that CRS waits 10 minutes for I/O to complete before rebooting.

                 

                 

                 

                 

                [root@ajithpathiyil1 ˜]# crsctl get css disktimeout

                200

                [root@ajithpathiyil1 ˜]# crsctl set css disktimeout 600

                Configuration parameter disktimeout is now set to 600.

                 

                 

                 

                 

                 

                4. Restart CRS on the other node.

                 

                 

                [root@ajithpathiyil1 bin]# ssh ajithpathiyil2

                [root@ajithpathiyil2 bin]# crsctl start crs