1 Reply Latest reply: Jul 5, 2012 3:49 AM by T.F. RSS

    node1 in the cluster did not join the cluster after we did a memory upgrade

    946639
      Hi,

      We have a two node Sun cluster (version 3.2) running on T2000 H/W. when we brought down one node for memory upgrade, that server never joined the cluster when powered on, giving the below error and waiting for minutes.

      on Node1

      NOTICE: clcomm: Path hopdb1:nxge1 - hopdb2:nxge1 errors during initiation
      NOTICE: clcomm: Path hopdb1:nxge0 - hopdb2:nxge0 errors during initiation
      WARNING: Path hopdb1:nxge1 - hopdb2:nxge1 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
      WARNING: Path hopdb1:nxge0 - hopdb2:nxge0 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.

      At the same time it was giving this error on the second node which is running fine.

      on Node2

      Jun 16 16:49:59 hopdb2 cl_runtime: [ID 182413 kern.warning] WARNING: clcomm: Rejecting communication attempt from a stale incarnation of node hopdb1; reported boot time Thu Apr 26 17:27:33 GMT 2012, expected boot time Thu Apr 26 21:19:45 GMT 2012 or later.

      Finally we have backed up the Memory upgrade and booted back but still node one was at the above mentioned state.

      We did this activity on 16th June, since the above warning is talking about boot time difference; we waited for approx 4hrs by bringing the node1 (problem node) in non-clustered mode and rebooted node1. The node1 came up cleanly and joined the cluster.

      What we are unable to understand is this warning

      WARNING: clcomm: Rejecting communication attempt from a stale incarnation of node hopdb1; reported boot time Thu Apr 26 17:27:33 GMT 2012, expected boot time Thu Apr 26 21:19:45 GMT 2012 or later

      We wanted to know if this gets repeated when we do a memeory upgrade next time? or any suggestion to fix this problem.


      Thanks in advance.

      Regards,
      Satish
        • 1. Re: node1 in the cluster did not join the cluster after we did a memory upgrade
          T.F.
          It is pretty save to say that the memory upgrade itself is not related to the issue you see.

          The syslog message 182413 is explained within the Solaris Cluster Error Messages guide. From reading the description and solution I would guess that your cluster nodes time is not in sync.
          You need to check which system time each system currently has and adjust, if required. I would recommend to setup ntp to sync time against an external ntp server.

          Regards
          Thorsten