This discussion is archived
6 Replies Latest reply: Feb 5, 2013 6:03 AM by Oiac RSS

Shutdown inactive node cause reboot active node

Oiac Newbie
Currently Being Moderated
Hi,
when I try to shutddown a inactive node of Oracle clusterware (two nodes) the active node reboot (s.o Oracle Linux 5.6 and ocfs2).
Fire it acquires vip of second node, then reboot and work correctly.

Anyone can help me?

Have a nice day
  • 1. Re: Shutdown inactive node cause reboot active node
    Oiac Newbie
    Currently Being Moderated
    Here the log of active node:


    Jan 31 15:35:23 esse3-db1 avahi-daemon[7041]: Registering new address record for 192.168.101.222 on eth0.
    Jan 31 15:56:30 esse3-db1 kernel: bnx2: eth2 NIC Copper Link is Down
    Jan 31 15:56:32 esse3-db1 kernel: bnx2: eth2 NIC Copper Link is Up, 100 Mbps full duplex, receive & transmit flow control ON
    Jan 31 15:56:59 esse3-db1 kernel: o2net: connection to node esse3-db2.unisalento.it (num 0) at 192.168.101.202:7777 has been idle for 30.0 seconds, shutting it down.
    Jan 31 15:56:59 esse3-db1 kernel: (0,17):o2net_idle_timer:1503 here are some times that might help debug the situation: (tmr 1359644189.432221 now 1359644219.431047 dr 1359644189.432200 adv 1359644189.432221:1359644189.432221 func (1f70fe7a:504) 1359644047.329097:1359644047.329102)
    Jan 31 15:56:59 esse3-db1 kernel: o2net: no longer connected to node esse3-db2.unisalento.it (num 0) at 192.168.101.202:7777
    Jan 31 15:57:29 esse3-db1 kernel: (6377,17):o2net_connect_expired:1664 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors.
    Jan 31 16:01:40 esse3-db1 syslogd 1.4.1: restart.

    Edited by: user2907588 on 1-feb-2013 7.39
  • 2. Re: Shutdown inactive node cause reboot active node
    652072 Newbie
    Currently Being Moderated
    user2907588 wrote:
    Here the log of active node:


    Jan 31 15:35:23 esse3-db1 avahi-daemon[7041]: Registering new address record for 192.168.101.222 on eth0.
    *> Jan 31 15:56:30 esse3-db1 kernel: bnx2: eth2 NIC Copper Link is Down*
    Jan 31 15:56:32 esse3-db1 kernel: bnx2: eth2 NIC Copper Link is Up, 100 Mbps full duplex, receive & transmit flow control ON
    Jan 31 15:56:59 esse3-db1 kernel: o2net: connection to node esse3-db2.unisalento.it (num 0) at 192.168.101.202:7777 has been idle for 30.0 seconds, shutting it down.
    Jan 31 15:56:59 esse3-db1 kernel: (0,17):o2net_idle_timer:1503 here are some times that might help debug the situation: (tmr 1359644189.432221 now 1359644219.431047 dr 1359644189.432200 adv 1359644189.432221:1359644189.432221 func (1f70fe7a:504) 1359644047.329097:1359644047.329102)
    Jan 31 15:56:59 esse3-db1 kernel: o2net: no longer connected to node esse3-db2.unisalento.it (num 0) at 192.168.101.202:7777
    *> Jan 31 15:57:29 esse3-db1 kernel: (6377,17):o2net_connect_expired:1664 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors.*
    *> Jan 31 16:01:40 esse3-db1 syslogd 1.4.1: restart.*
    >
    Edited by: user2907588 on 1-feb-2013 7.39
    Hello,

    One of the nodes in cluster seemed to have been evicted previously due to eth2 NIC outage between nodes so as removing the failed node(could be what u r referring to "INACTIVE")
    Please check I have highlighted in provided log information... Check If you are able to ping to the specified IP, and do password-less ssh to other node (101), and ask your system/network administrator to look into it...

    Regards,
    Naga
  • 3. Re: Shutdown inactive node cause reboot active node
    Sebastian Solbach (DBA Community) Guru
    Currently Being Moderated
    Hi,

    if this is 10.2 or 11.1, then the cause could be the shutdown script under rcX.d folder.
    They were created with K96XXX and so clusterware will shutdown after the network (which normally is shutdown under K15 or similar).
    The problem resulting in this, is that the clusterware detects a "Split brain", though you only shutdown a node.
    So simply change the K96 to something before the network, and you should be fine.

    10gR2/11gR1: Instances Abort With ORA-29702 When The Server is rebooted or shut down (Doc ID 752399.1)

    Another problem could be in connection with OCFS2. Since you have indepentend clusterwares (OCFS2 and Oracle Clusterware) running on the node a
    missconfiguration (start order of OCFS2 and Oracle Clusterware) can result in the phenomen you see.

    Using OCFS2 with Oracle Clusterware (11gR2, 11gR1, 10gR2 and 10gR1) (Doc ID 974928.1)

    Regards
    Sebastian
  • 4. Re: Shutdown inactive node cause reboot active node
    Marko Sutic Newbie
    Currently Being Moderated
    Hello,

    what you meant when you said that this is "INACTIVE" node?

    - Is CRS and its related resources running on specified node?
    - Do you have OCFS2 devices mounted?
    - Do you have o2cb cluster service running?
    - What command you issue when you want to shutdown inactive node?
    - What version of Oracle Clusterware you're using?

    Problem happens during shutdown sequence which is not performed in correct order as Naga and Sebastian said already.
    You could stop CRS manually on inactive node along with OCFS2 resources and then issue shutdown command.

    Regards,
    Marko
  • 5. Re: Shutdown inactive node cause reboot active node
    Oiac Newbie
    Currently Being Moderated
    Thanks Sebastian,
    I'm going to test your solution.

    I'll give you feedback.

    Have a nice day,
    Mariano
  • 6. Re: Shutdown inactive node cause reboot active node
    Oiac Newbie
    Currently Being Moderated
    Hi,
    I applyed solution (oracle metalink id 752399.1) and worked fine!


    Thank you,
    Mariano

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points