This discussion is archived
3 Replies Latest reply: Mar 8, 2012 6:01 AM by 807928 RSS

Cluster issue

veijar Newbie
Currently Being Moderated
Hi

One of cluster node has entered into maintenance mode, i checked the logs and got the below msgs.

Can anyone tell the reason for entering into single user mode automatically?

pls help me....

Mar 8 15:44:42 dracsapp in.mpathd[265]: [ID 168056 daemon.error] All Interfaces in group IPMP1 have failed
Mar 8 15:44:42 dracsapp Cluster.PNM: [ID 890413 daemon.notice] IPMP1: state transition from OK to DOWN.
Mar 8 15:44:42 dracsapp genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path dracsapp:igb1 - dracsdb:igb1 being drained
Mar 8 15:44:44 dracsapp genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path dracsapp:igb1 - dracsftp:igb1 being drained
Mar 8 15:44:44 dracsapp genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path dracsapp:igb1 - drstarrep:igb1 being drained
Mar 8 15:44:44 dracsapp genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path dracsapp:igb1 - drstarqa:igb1 being drained
Mar 8 15:44:44 dracsapp genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path dracsapp:igb1 - dracsfo:igb1 being drained
Mar 8 15:44:45 dracsapp genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path dracsapp:igb3 - drstarrep:igb3 being drained
Mar 8 15:44:45 dracsapp genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path dracsapp:igb3 - dracsdb:igb3 being drained
Mar 8 15:44:45 dracsapp genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path dracsapp:igb3 - dracsftp:igb3 being drained
Mar 8 15:44:45 dracsapp genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path dracsapp:igb3 - dracsfo:igb3 being drained
Mar 8 15:44:45 dracsapp genunix: [ID 489438 kern.notice] NOTICE: clcomm: Path dracsapp:igb3 - drstarqa:igb3 being drained
Mar 8 15:44:57 dracsapp genunix: [ID 810937 kern.notice] NOTICE: CMM: Reconfiguration delaying for 5 milliseconds to allow larger partitions to win race for quorum dev
ices.
Mar 8 15:44:57 dracsapp scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x0
Mar 8 15:47:31 dracsapp genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_144489-17 64-bit
Mar 8 15:47:31 dracsapp genunix: [ID 218167 kern.notice] Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
Mar 8 15:47:31 dracsapp unix: [ID 126719 kern.info] features: 2f7fffdf<sse4_2,sse4_1,ssse3,cpuid,mwait,cmp,cx16,sse3,nx,asysc,htt,sse2,sse,sep,pat,cx8,pae,mca,mmx,cmov
,pge,mtrr,msr,tsc,lgpg>
Mar 8 15:47:31 dracsapp unix: [ID 168242 kern.info] mem = 16767836K (0x3ff6d7000)
Mar 8 15:47:31 dracsapp rootnex: [ID 466748 kern.info] root nexus = i86pc
Mar 8 15:47:31 dracsapp iommulib: [ID 321598 kern.info] NOTICE: iommulib_nexus_register: rootnex-1: Succesfully registered NEXUS i86pc nexops=fffffffffbcf6140
Mar 8 15:47:31 dracsapp rootnex: [ID 349649 kern.info] pseudo0 at root
Mar 8 15:47:31 dracsapp genunix: [ID 936769 kern.info] pseudo0 is /pseudo
Mar 8 15:47:31 dracsapp scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Mar 8 15:47:31 dracsapp Auto-failback capability disabled through scsi_vhci.conf file.
Mar 8 15:47:31 dracsapp rootnex: [ID 349649 kern.info] scsi_vhci0 at root
  • 1. Re: Cluster issue
    807928 Journeyer
    Currently Being Moderated
    Looks like you have networking issues. I know that's rather stating the obvious though. If the cluster nodes cannot communicate then the partitions will race for thew quorum device (or server). This node lost. When the node reboots, if it cannot communicate with the other partition it will be prevented from joining the cluster.

    Tim
    ---
  • 2. Re: Cluster issue
    veijar Newbie
    Currently Being Moderated
    We have 5+1 Node cluster. out of 6, any one node is getting rebooted or else 5 nodes are getting rebooted simultaneously.

    If its networking issue, what are the things i have to check?
  • 3. Re: Cluster issue
    807928 Journeyer
    Currently Being Moderated
    I'm not sure quite where to start. First, it would be useful to know what hardware you are running on. If it isn't a supported configuration, then you might have run into something that is hardware specific. If it is a supported configuration, then I would check that the switches are properly configured. In particular, are the nodes connected to two separate switches and do they use isolated (private) vlans for the private networks. If there is some transient event occurring on the switch, like a broadcast storm that takes >10 to recover from, then the cluster nodes will be affected. If the switches use spanning tree and they are trying to reconverge after some configuration (topology?) change, then again, unless that happens in < 10 seconds, the cluster nodes could be affected.

    Sorry, these are rather vague suggestions, but it's difficult to guess what might be the root cause.

    Tim
    ---

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points