This discussion is archived
4 Replies Latest reply: Oct 17, 2013 4:21 AM by AjithPathiyil RSS

node 2 is rebooted when ocfs is mounted from it

muneer.ngha.ksa Newbie
Currently Being Moderated

Dear Friends,

 

I have installed vmware workstation and I am trying to create a 2 node Oracle 10g RAC system.

 

I have done with most of the installation pre-reqs. However, I am stuck at OCFS2 file system creation.

 

I have a drive /dev/sdb dedicated to hold OCFS2 file system which i am trying to mount on a directory "/ocfs"

 

The mount command is as follows:

 

mount -t ocfs2 -o datavolume,nointr /dev/sdb1 /ocfs

 

This command is successfully run on both the nodes. However, when I run it on second node, after sometime of its completion, this node is getting rebooted.

 

If I don't mount the file system on "/ocfs" on the 2nd node, nothing happens. But as soon as I run the above command on 2nd node, it gets rebooted.

 

I found following messages in output of "dmesg" command:

 

mptscsi: ioc0: task abort: SUCCESS (sc=38153380)

o2net: connected to node rac1 (num 0) at 192.168.2.131:7777

o2net: connection to node rac1 (num 0) at 192.168.2.131:7777 has been idle for 3 0.0 seconds, shutting it down.

(0,0):o2net_idle_timer:1426 here are some times that might help debug the situat ion: (tmr 1381609139.201720 now 1381609439.670136 dr 1381609139.201617 adv 13816 09139.201727:1381609139.201727 func (00000000:0) 0.0:0.0)

o2net: no longer connected to node rac1 (num 0) at 192.168.2.131:7777

OCFS2 1.2.7 Mon Nov 19 23:07:51 EST 2007 (build d443ce77532cea8d1e167ab2de51b8c8 )

SELinux: initialized (dev debugfs, type debugfs), uses genfs_contexts

(25333,0):dlm_request_join:901 ERROR: status = -107

(25333,0):dlm_try_to_join_domain:1049 ERROR: status = -107

(25333,0):dlm_join_domain:1321 ERROR: status = -107

(25333,0):dlm_register_domain:1514 ERROR: status = -107

(25333,0):ocfs2_dlm_init:2024 ERROR: status = -107

(25333,0):ocfs2_mount_volume:1133 ERROR: status = -107

ocfs2: Unmounting device (8,17) on (node 1)

o2net: connected to node rac1 (num 0) at 192.168.2.131:7777

o2net: connection to node rac1 (num 0) at 192.168.2.131:7777 has been idle for 3 0.0 seconds, shutting it down.

(0,0):o2net_idle_timer:1426 here are some times that might help debug the situat ion: (tmr 1381610045.317343 now 1381610345.246950 dr 1381610045.317258 adv 13816 10045.317350:1381610045.317351 func (00000000:0) 0.0:0.0)

o2net: no longer connected to node rac1 (num 0) at 192.168.2.131:7777

(25358,0):dlm_request_join:901 ERROR: status = -107

 

 

I am not sure whats going wrong here. Kindly help me.

 

Thanks.

  • 1. Re: node 2 is rebooted when ocfs is mounted from it
    Levi-Pereira Guru
    Currently Being Moderated

    Hi,

     

    /dev/sdb1 /ocfs ocfs2 _netdev,datavolume,nointr 0 0

    _netdevEnsures that the OCFS2 volume is not mounted before the networking structure is up, and ensures that there is an unmount before shutting down the network.datavolumeApplies only to data volumes, and every type of file usage except shared binaries. On a clustered database such as Oracle Real Application Clusters (RAC), the datavolume includes the Cluster Registry and Voting Disks. The datavolume allows direct I/O access to the files.nointrProhibits interrupts, and is applied to the same type of data files as the datavolume option.

     

    When Oracle RAC Voting Disks, and OCR (Oracle Cluster Registry) disks are installed on OCFS2, the disks require the same mounting options as datavolumes:

    _netdev,datavolume,nointr

    .

  • 2. Re: node 2 is rebooted when ocfs is mounted from it
    muneer.ngha.ksa Newbie
    Currently Being Moderated

    Dear Levi

     

    Thanks for the explanation.

     

    Can you let me know why the node is getting rebooted if these are the options to be used with ocfs2

     

    Thanks.

  • 3. Re: node 2 is rebooted when ocfs is mounted from it
    Levi-Pereira Guru
    Currently Being Moderated

    Hi,

    Try it

    Heartbeat/Voting/Quorum Related Timeout Configuration for Linux, OCFS2, RAC Stack to Avoid Unnecessary Node Fencing, Panic and Reboot (Doc ID 395878.1)

  • 4. Re: node 2 is rebooted when ocfs is mounted from it
    AjithPathiyil Newbie
    Currently Being Moderated

    Hi Muneer,

     

    This is otherwise called "Shoot the other node in the head" STONITH technology, Since the RAC database is created in VMWARE, the resources are limited and so the fencing timeout parameters should be set according to our system resource limitations in VMWARE virtual environments.

     

    So, the best fencing timout setting would be below, So that, the timeout parameters do wait for the longer hearbeat and still beleive the node(s) is part of cluster, Else, if timeout is smaller, it will shoot the node. This fencing timeout setting works for me very well, Hope this helps you too, If yes, please mark it as answered.

     

    Increase CRS Fencing Timeout (Shared Filesystem)

    ===================================================

     

     

    These steps are not necessary for a test or production environment. However they might make your

    VMware test cluster just a little more stable and they will provide a good learning opportunity about

    Grid Infrastructure.

     

     

     

     


    1. Grid Infrastructure must be running on only one node to change these settings. Shutdown the clusterware on ajithpathiyil2 as user root.


     

     

    [oracle@ajithpathiyil1 ˜]$ ssh ajithpathiyil2

    Last login: Wed Mar 30 14:50:49 2011

    Set environment by typing 'oenv' - default is instance RAC1.

    ajithpathiyil2:/home/oracle[RAC1]$ su -

    Password:

    [root@ajithpathiyil2 bin]# crsctl stop crs

    CRS-2791: Starting shutdown of Oracle High Availability

    Services-managed resources

    on 'ajithpathiyil2'

    CRS-2673: Attempting to stop 'ora.crsd' on 'ajithpathiyil2'

    CRS-2790: Starting shutdown of Cluster Ready Services-managed

    resources on 'ajithpathiyil2'

    ...

    ...

    ...

    CRS-2793: Shutdown of Oracle High Availability Services-managed

    resources on 'ajithpathiyil2' has completed

    CRS-4133: Oracle High Availability Services has been stopped.

     

     

     

     

     

     

     

    2. Return to node ajithpathiyil1. As the root user, increase the misscount so that CRS waits 1.5 minutes before it reboots. (VMware can drag a little on some laptops!)

     

     

     

     

     

    [root@ajithpathiyil1 ˜]# crsctl get css misscount

    30

    [root@ajithpathiyil1 ˜]# crsctl set css misscount 90

    Configuration parameter misscount is now set to 90.

     

     

     

     

     

    3. Increase the disktimeout so that CRS waits 10 minutes for I/O to complete before rebooting.

     

     

     

     

    [root@ajithpathiyil1 ˜]# crsctl get css disktimeout

    200

    [root@ajithpathiyil1 ˜]# crsctl set css disktimeout 600

    Configuration parameter disktimeout is now set to 600.

     

     

     

     

     

    4. Restart CRS on the other node.

     

     

    [root@ajithpathiyil1 bin]# ssh ajithpathiyil2

    [root@ajithpathiyil2 bin]# crsctl start crs

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points