8 Replies Latest reply: Aug 20, 2013 2:18 AM by WadhahDaouehi RSS

    802.3ad Bonding issues with OEL 6.1

    msimm29

      We are trying to get 802.3ad (Mode 4) bonding functional on our Oracle database servers running OEL 6.1.

       

      Everything seems to be ok, but we cannot ping in or out when we set mode to 4.  If we change mode to 0 and reboot everything works fine.

       

      The following is our config.

       

      /etc/modprobe.d/bond.conf

      alias bond0 bonding

       

      /etc/sysconfig/network-scripts/ifcfg-bond0

      DEVICE=bond0

      ONBOOT=yes

      BROADCAST=10.41.5.255

      IPADDR=10.41.5.88

      NETMASK=255.255.255.0

      GATEWAY=10.41.5.3

      DNS1=172.28.210.50

      DNS2=172.28.179.50

      USERCTL=no

      BONDING_OPTS="mode=4 miimon=100 updelay=40000"

      ifcfg-em1

      DEVICE=em1

      ONBOOT=yes

      BOOTPROTO=none

      USERCTL=no

      MASTER=bond0

      SLAVE=yes

      ifcfg-p3p1

      DEVICE=p3p1

      ONBOOT=yes

      BOOTPROTO=none

      USERCTL=no

      MASTER=bond0

      SLAVE=yes

       

      Here is the bonding section of /var/log/messages

      Aug  8 10:23:17 lxsmgcm15003c kernel: Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

      Aug  8 10:23:17 lxsmgcm15003c kernel: bonding: Warning: either miimon or arp_interval and arp_ip_target module parameters must be specified, otherwise bonding will not detect link failures! see bonding.txt for details.

      Aug  8 10:23:17 lxsmgcm15003c kernel: Loading kernel module for a network device with CAP_SYS_MODULE (deprecated).  Use CAP_NET_ADMIN and alias netdev-bond0 instead

      Aug  8 10:23:17 lxsmgcm15003c kernel: bonding: bond0: setting mode to 802.3ad (4).

      Aug  8 10:23:17 lxsmgcm15003c kernel: bonding: bond0: Setting MII monitoring interval to 100.

      Aug  8 10:23:17 lxsmgcm15003c kernel: bonding: bond0: Setting up delay to 40000.

      Aug  8 10:23:17 lxsmgcm15003c kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready

      Aug  8 10:23:17 lxsmgcm15003c kernel: bonding: bond0: Adding slave em1.

      Aug  8 10:23:17 lxsmgcm15003c kernel: bnx2 0000:01:00.0: em1: using MSIX

      Aug  8 10:23:17 lxsmgcm15003c kernel: bonding: bond0: enslaving em1 as a backup interface with a down link.

      Aug  8 10:23:17 lxsmgcm15003c kernel: bonding: bond0: Adding slave p3p1.

      Aug  8 10:23:17 lxsmgcm15003c kernel: bnx2 0000:07:00.0: p3p1: using MSIX

      Aug  8 10:23:17 lxsmgcm15003c kernel: bonding: bond0: enslaving p3p1 as a backup interface with a down link.

      Aug  8 10:23:17 lxsmgcm15003c kernel: bnx2 0000:01:00.1: em2: using MSIX

      Aug  8 10:23:17 lxsmgcm15003c kernel: ADDRCONF(NETDEV_UP): em2: link is not ready

      Aug  8 10:23:17 lxsmgcm15003c kernel: bnx2 0000:01:00.0: em1: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON

      Aug  8 10:23:17 lxsmgcm15003c kernel: bonding: bond0: link status up for interface em1, enabling it in 0 ms.

      Aug  8 10:23:17 lxsmgcm15003c kernel: bonding: bond0: link status definitely up for interface em1.

      Aug  8 10:23:17 lxsmgcm15003c kernel: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready

      Aug  8 10:23:17 lxsmgcm15003c kernel: bnx2 0000:07:00.0: p3p1: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON

      Aug  8 10:23:17 lxsmgcm15003c kernel: bonding: bond0: link status up for interface p3p1, enabling it in 40000 ms.

      Aug  8 10:23:17 lxsmgcm15003c kernel: bnx2 0000:01:00.1: em2: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON

      Aug  8 10:23:17 lxsmgcm15003c kernel: ADDRCONF(NETDEV_CHANGE): em2: link becomes ready

      Aug  8 10:23:17 lxsmgcm15003c avahi-daemon[3074]: Registering new address record for fe80::862b:2bff:fe5a:6e49 on bond0.*.

      Aug  8 10:23:17 lxsmgcm15003c avahi-daemon[3074]: Registering new address record for 10.41.5.88 on bond0.IPv4.

       

      Aug  8 10:23:53 lxsmgcm15003c kernel: bonding: bond0: link status definitely up for interface p3p1.

       

       

      /proc/net/bonding/bond0

      Ethernet Channel bonding Driver: v3.6.0 (September 26, 2009)

       

      Bonding Mode: IEEE 802.3ad Dynamic link aggregation

      Transmit Hash Policy: layer2 (0)

      MII Status: up

      MII Polling Interval (ms): 100

      Up Delay (ms): 40000

      Down Delay (ms): 0

       

      802.ad info

      LACP rate: slow

      Aggregator selection policy (ad_select): stable

      Active Aggregator Info:

                  Aggregator ID: 1

                  Number of ports:1

                  Actor Key: 17

                  Partner Key: 1

                  Partner Mac Address: 00:00:00:00:00:00

       

      Slave Interface: em1

      MII Status: up

      Link Failure Count: 0

      Permanent HW addr: 84:2b:2b:xx:xx:xx

      Aggregator ID: 1

      Slave queue ID: 0

       

      Slave Interface: p3p1

      MII Status: up

      Link Failure Count: 0

      Permanent HW addr: 00:10:18:xx:xx:xx

      Aggregator ID: 2

      Slave queue ID: 0

       

       

      I asked the networking guy to verify that these ports are indeed set for etherchanneling and he provided this in response..

       

      c#sho etherchannel summary

      Flags:  D - down        P - bundled in port-channel

      I - stand-alone s - suspended

      H - Hot-standby (LACP only)

      R - Layer3      S - Layer2

      U - in use      f - failed to allocate aggregator

       

      M - not in use, minimum links not met

      u - unsuitable for bundling

      w - waiting to be aggregated

      d - default port

       

       

      Number of channel-groups in use: 9

      Number of aggregators:           9

       

      Group  Port-channel Protocol    Ports

      ------+-------------+-----------+-----------------------------------------------

      2      Po2(SU) -        Te3/3(P)    Te3/4(P)

      11 Po11(SU) -        Gi1/1(P)    Gi2/1(P)

      12 Po12(SU) -        Gi1/2(P)    Gi2/2(P)

      13 Po13(SU) -        Gi5/1(P)    Gi6/1(P)

      14 Po14(SU) -        Gi5/2(P)    Gi6/2(P)

      15 Po15(SU) -        Gi1/3(P)    Gi2/3(P)

      16 Po16(SU) -        Gi1/4(P)    Gi2/4(P)

      17 Po17(SU) -        Gi5/3(P)    Gi6/3(P)

      18 Po18(SD) -        Gi5/4(D)    Gi6/4(D)

       

      em1 and p3p1 go to card 1 and card 2 port 4 so they are etherchanneled.

       

      If someone has any ideas, we are drawing at straws here.

       

      Thanks,

      Matt

        • 1. Re: 802.3ad Bonding issues with OEL 6.1
          Dude!

          What is updelay=40000 supposed to accomplish? Do you have LACP enabled on your switch?

          • 2. Re: 802.3ad Bonding issues with OEL 6.1
            msimm29

            We did some testing back when we were running Mode 0.  When I pulled one network cable then waited a bit and plugged it back in, we would get a bunch of ping timeouts for about 30 seconds.  It was determined that the bond was putting the network card back into service before the card had finished initializing.  Adding this updelay fixed it and makes plugging the cable back in seamless.

             

            I sent an email back off to network support asking if the port group is in LACP or Static Persistence.  I have a feeling its not in LACP because I don't see it listed under protocol in his etherchannel summary.

            • 3. Re: 802.3ad Bonding issues with OEL 6.1
              Dude!

              It was determined that the bond was putting the network card back into service before the card had finished initializing.

              How was this determined? It sounds very strange. How is it possible that a NIC is UP before it finished initializing and negotiating the physical layer?

               

              Perhaps you need to disable spanning tree or enable portfast or the equivalent. PortFast minimizes the time it takes for the server or workstation to come online

              • 4. Re: 802.3ad Bonding issues with OEL 6.1
                WadhahDaouehi

                Hi,

                The mode 4

                To use the 802.3ad (Mode 4) bonding functional under Linux, you must use a specific switch that support 802.3ad. In other case if you don't have a specific switch you can use the mode 6 under Linux that have the same specification like mode 4.


                I hope this can help you

                Best Regards



                • 5. Re: 802.3ad Bonding issues with OEL 6.1
                  Dude!

                  Many people use mode 6 (ALB) because it is an easy and compatible method for fault-tolerance and load balancing. However, load balancing under mode 6 is achieved through ARP negotiation, which is certainly not the same as mode 4 and not providing the same performance. A single connection under mode 6 only uses one NIC, whereas under mode 4 multiple devices share one physical address. Mode 4 requires a managed switch with LACP (802.3ad) support and that all NIC's are plugged into the same switch.

                  • 6. Re: 802.3ad Bonding issues with OEL 6.1
                    WadhahDaouehi

                    Hi,

                    Thank you for your clarification, but i know that the mode 6 is requiring that the driver of the slave NIC should support the changement of the MAC address when it is working, then the slave NIC driver should be one of this drivers (e100, e1000, tg3, bnx2, b44, forcedeth).

                     

                    Best Regards.

                    • 7. Re: 802.3ad Bonding issues with OEL 6.1
                      Dude!

                      Are you sure? The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the slaves. (http://wiki.centos.org/TipsAndTricks/BondingInterfaces)

                      • 8. Re: 802.3ad Bonding issues with OEL 6.1
                        WadhahDaouehi

                        Hi,

                        About the driver of  the slave NIC should support the changement of the MAC address when it is working, i found this in the book titled "Linux Solutions de Haute Disponibilité" , author Sébastien ROHAUT . this book is in the french language.

                        Mode 6 ou balance ALB : Adaptative Load Balancing. Ce mode reprend le mode 5, balance TLB, et un mode RLB (Receive Load Balancing) pour le trafic IPv4. Les tables ARP sont modifiées de telle manière que ARP voit l’ensemble des interfaces réseau comme une seule en entrée comme en sortie. Celleci est modifiée à la volée sur chaque interface esclave par le bonding. Ce mode ne nécessite aucune configuration particulière du switch. C’est le seul dans ce cas qui propose à la fois une modification de la bande passante, un équilibrage de charge et une tolérance de panne. Cependant les pilotes des cartes esclaves doivent supporter la modification de leur adresse MAC une fois le port ouvert, ce qui n’est pas toujours le cas.

                         

                        Pour récapituler, les modes activebackup, balance TLB et balance ALB ne nécessitent pas de switch particulier. Le dernier mode est le plus intéressant mais nécessite cependant des pilotes le supportant (par exemple les pilotes e100, e1000, tg3, bnx2, b44, forcedeth).

                         

                        if you understand the french language that's good, if not use google translate.

                         

                        I hope this can help you

                        Best Regards