8 Replies Latest reply: Feb 26, 2013 7:32 AM by user12273962 RSS

    Network performance after upgrade to 3.1.1

    fraze2001
      Recently upgraded two VM Servers in a pool from 3.0.3 to 3.1.1 and now the network performance of all the guests has bombed. All scp's/sftp's to the guests stall.

      The interface on the VM server all looks good, so I'm not sure where to start troubleshooting
      [root@dcporav05 ~]# ethtool eth1
      Settings for eth1:
              Supported ports: [ FIBRE ]
              Supported link modes:   1000baseT/Full
                                      2500baseX/Full
                                      10000baseT/Full
              Supports auto-negotiation: Yes
              Advertised link modes:  1000baseT/Full
                                      2500baseX/Full
                                      10000baseT/Full
              Advertised pause frame use: Symmetric Receive-only
              Advertised auto-negotiation: Yes
              Speed: 1000Mb/s
              Duplex: Full
              Port: FIBRE
              PHYAD: 17
              Transceiver: internal
              Auto-negotiation: on
              Supports Wake-on: g
              Wake-on: g
              Current message level: 0x00000000 (0)
              Link detected: yes
      Any help appreciated.

      Thanks,
      Fraze
        • 1. Re: Network performance after upgrade to 3.1.1
          Rob
          Is non-encrypted traffic also slow?

          I ran in a problem on OEL6 guests in which encrypted traffic was horribly slow (timing out or maxing at 1Kbps). Support couldn't come up with a root cause, so I rebooted the guests and the problems went away.

          Rob
          • 2. Re: Network performance after upgrade to 3.1.1
            fraze2001
            Thanks Rob,

            Looks like unencrypted traffic is suffering the same fate. I've rebooted the VM's but no good, and have created new VM's from templates which are showing the same behaviour.

            I'm getting the same issues scp'ing to/from other physical linux machines on the network, but weirdly sftp to the affected guests from my PC works fine. So in order to copy a big file from a machine to one of the guests I can:

            1. sftp the file to my local PC from the physical machine
            2. sftp to the guest

            I can also ssh/sftp to the VM servers themselves with no issue.

            I've got an SR open, but haven't heard anything yet.

            Thanks,
            Fraze
            • 3. Re: Network performance after upgrade to 3.1.1
              Rob
              Are you getting packet loss between guests and the outside world?

              What does your network configuration look like? Are you using Adaptive Load Balancing, by any chance?
              • 4. Re: Network performance after upgrade to 3.1.1
                fraze2001
                There's nothing special about the network config, just one NIC, no load balancing etc. I can happily ping the outside world from the guests with no packet loss
                [root@dcporavd01 ~]# ping www.google.com
                PING www.google.com (74.125.237.20) 56(84) bytes of data.
                64 bytes from syd01s04-in-f20.1e100.net (74.125.237.20): icmp_seq=1 ttl=54 time=3.25 ms
                64 bytes from syd01s04-in-f20.1e100.net (74.125.237.20): icmp_seq=2 ttl=54 time=3.13 ms
                64 bytes from syd01s04-in-f20.1e100.net (74.125.237.20): icmp_seq=3 ttl=54 time=3.13 ms
                64 bytes from syd01s04-in-f20.1e100.net (74.125.237.20): icmp_seq=4 ttl=53 time=2.83 ms
                64 bytes from syd01s04-in-f20.1e100.net (74.125.237.20): icmp_seq=5 ttl=53 time=2.84 ms
                64 bytes from syd01s04-in-f20.1e100.net (74.125.237.20): icmp_seq=6 ttl=54 time=3.13 ms
                64 bytes from syd01s04-in-f20.1e100.net (74.125.237.20): icmp_seq=7 ttl=53 time=2.97 ms
                64 bytes from syd01s04-in-f20.1e100.net (74.125.237.20): icmp_seq=8 ttl=54 time=3.14 ms
                64 bytes from syd01s04-in-f20.1e100.net (74.125.237.20): icmp_seq=9 ttl=53 time=2.93 ms
                
                --- www.google.com ping statistics ---
                9 packets transmitted, 9 received, 0% packet loss, time 8000ms
                rtt min/avg/max/mdev = 2.832/3.042/3.252/0.142 ms
                • 5. Re: Network performance after upgrade to 3.1.1
                  Rob
                  I would be interested to hear what support finally says. Good luck...
                  • 6. Re: Network performance after upgrade to 3.1.1
                    fraze2001
                    Cheers, think I'll need it :) - I'll update the thread with the findings
                    • 7. Re: Network performance after upgrade to 3.1.1
                      fraze2001
                      Issue was resolved by turning off large-receive-offload on the interfaces that the bridge was built on, on the VM servers
                      -ethtool_-k_eth2
                      1 Offload parameters for eth2:
                      2 rx-checksumming: on
                      3 tx-checksumming: on
                      4 scatter-gather: on
                      5 tcp-segmentation-offload: on
                      6 udp-fragmentation-offload: off
                      7 generic-segmentation-offload: on
                      8 generic-receive-offload: on
                      9 large-receive-offload: on
                      
                      - ethtool_-k_eth3
                      1 Offload parameters for eth3:
                      2 rx-checksumming: on
                      3 tx-checksumming: on
                      4 scatter-gather: on
                      5 tcp-segmentation-offload: on
                      6 udp-fragmentation-offload: off
                      7 generic-segmentation-offload: on
                      8 generic-receive-offload: on
                      9 large-receive-offload: on
                      
                      Note that large receive offload (lro) has been known to be problematic for Oracle VM Server bridged 10gbe bonds/interfaces.
                      In your case, bridge 0004fb001018a8e exits atop bond1 that utilises interfaces eth[2,3].
                      Given this to be the case, I'd initially recommend that you disable lro for at least eth[2,3], then re-test network performance.
                      To disable lro on eth[2,3], perform the following against Oracle VM Servers' relevant interface configuration files:
                      # for i in 2 3; do echo 'ETHTOOL_OFFLOAD_OPTS="lro off"' >> /etc/sysconfig/network-scripts/ifcfg-eth$i; done
                      Once in place, shutdown (or migrate off) all guest VMs, restart the Oracle VM Server, startup (or migrate back) guest VMs
                      I was also pointed to a network performace tuning document for OVM 3

                      http://www.oracle.com/technetwork/server-storage/vm/ovm3-10gbe-perf-1900032.pdf
                      • 8. Re: Network performance after upgrade to 3.1.1
                        user12273962
                        Thanks for sharing.