6 Replies Latest reply: Jun 9, 2011 7:17 AM by rukbat RSS

    IPMP Standby Interface Fails On Boot

      Hi. I have some odd ipmp behaviour I can't explain - I wonder if anyone has run into similar issues or can otherwise shed light?

      I have a shiny new Netra x4250 running Sol10 10/09. What I'm trying to configure is as follows:
      e1000g0 as a "management" interface
      e1000g1 and e1000g2 as an active/standby failover ipmp group with probe based failure detection.

      The server doesn't have a default router, but it does have five probe targets on the same LAN and in the same subnet, configured as static routes via a startup script.

      The problem is that when the system boots, the mpathd process marks the standby interface as FAILED with the message:
      Sep 9 17:48:45 server1 in.mpathd[165]: NIC failure detected on e1000g2 of group frontend

      An ifconfig -a at this point looks like:

      [root@server1]/root #ifconfig -a
      lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
      inet netmask ff000000
      e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
      inet netmask ffffff00 broadcast
      ether 60:eb:69:7:fa:68
      e1000g1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
      inet netmask ffffff00 broadcast
      groupname frontend
      ether 60:eb:69:7:fa:69
      e1000g1:1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3
      inet netmask ffffff00 broadcast
      inet netmask ffffff00 broadcast
      groupname frontend
      ether 60:eb:69:7:fa:6a

      ...and a netstat -rn shows:

      [root@server1]/root #netstat -rn
      Routing Table: IPv4
      Destination Gateway Flags Ref Use Interface
      -------------------- -------------------- ----- ----- ---------- --------- U 1 2 e1000g0 U 1 10 e1000g1 U 1 0 e1000g1:1 U 1 15 e1000g2 UGH 1 0 UGH 1 0 UGH 1 0 UGH 1 0 UGH 1 0 UH 1 0 lo0

      If I try detaching the standby inteface and re-attaching it using if_mpadm -d and if_mpadm -r, mpath.d reports the same NIC failure immediately.

      If I unplumb and reconfigure the standby interface (e1000g2) then pkill -HUP mpath, the ifconfig output for the standby inteface becomes:

      inet netmask ffffff00 broadcast
      ether 60:eb:69:7:fa:6a

      ...which looks a bit healthier but a snoop shows zero traffic on that interface and if I pull the active patch lead, it doesn't fail over.

      As a sanity check, if I unplumb both interfaces and configure e1000g2 (i.e. the one that is failing) as a normal, non-ipmp interface, using the test IP, I can ping all five probe target IP's fine.

      To complete the picture, here are the other relevant bits of config:

      #pragma ident "@(#)mpathd.dfl 1.2 00/07/17 SMI"
      # Time taken by mpathd to detect a NIC failure in ms. The minimum time
      # that can be specified is 100 ms.
      # Failback is enabled by default. To disable failback turn off this option
      # By default only interfaces configured as part of multipathing groups
      # are tracked. Turn off this option to track all network interfaces
      # on the system

      server1 \
      netmask + broadcast + group frontend up \
      addif netmask + broadcast + deprecated -failover up

      /etc/hostname.e1000g2: netmask + broadcast + deprecated group frontend -failover standby up

      From /etc/hosts: server1

      Any help or thoughts much appreciated.
        • 1. Re: IPMP Standby Interface Fails On Boot
          After some more digging it gets wierder...

          If I snoop on the active interface (e1000g1), I can see pings going out to the ipmp probe targets as you would expect.

          I also see pings going to (and coming back from) the probe targets from the test IP of the failed interface (! Furthermore (and possibly even wierder) those pings have the source MAC address of the failed interface too.

          Is it possible that snooping on one interface (i.e. snoop -d e1000g1) could pick up ethernet frames from a different inteface (i.e. e1000g2)? It seems pretty unlikely to me - kind of defeats the object of specifying an interface to snoop on. If I snoop -d e1000g2 (the failed interface) I get nothing.

          So if those pings relating to the standby test address are going out onto the LAN from the active i/f with a source MAC of the standby i/f, the LAN switch is going to l learn that MAC belongs to the active i/f. Therefore the ping replies will be recieved on the active i/f (which I see in the snoop). So how on earth is in.mpathd going to know that the standby NIC is working and can be brought back into service?
          • 2. Re: IPMP Standby Interface Fails On Boot
            Hey Hdavie,

            We are having the same issue here. We have two interfaces e1000g1 and e1000g2 in the same ipmp group with two virtual interfaces running on the active interface. On one of our x4250 systems this problem is now not reproducible. On the other the e1000g2 interface is FAILED almost immediately after being added to the group.

            If you or anyone else has information on this issue please post it here, this is causing us grief.
            • 3. Re: IPMP Standby Interface Fails On Boot
              Hi John,

              If you can always reproduce the problem on one and never on the other, the obvious question is what is different between your two servers?

              I'm making some headway, have had the below pointed out to me by Oracle support:

              ...which outlines a known IPMP bug that my symptoms fit and is fixed in a kernel patch.

              I'm in the process of applying patches, will post an update when done.
              • 4. Re: IPMP Standby Interface Fails On Boot
                On the system that works I've got 141445-09 on the system which your reference states may cause this issue, I've also already got 142901-10 which the same article states should fix it.

                On the system that doesn't I only have 141445-09....I'll be adding this fix on the affected system today.
                • 5. Re: IPMP Standby Interface Fails On Boot
                  We were facing this problem and the metalink note 1021262.1 helped. Our x86 system running Solaris 10 u8 did have the 141445-09 patch but did NOT have the 142901 patch. After applying patch 142901-02, the problem was resolved.
                  • 6. Re: IPMP Standby Interface Fails On Boot
                    ... and this resurrected ancient post is now locked.
                    It was originally a discussion on the old Sun forum web site.
                    The 2010 posting dates represent when it was migrated to the Oracle forum site.
                    None of the original posters ever registered to the OTN forums, thus there are no individual poster usernames. None of them will know the latest post exists.