1 2 Previous Next 26 Replies Latest reply: Oct 25, 2012 9:22 AM by user386688 Go to original post RSS
      • 15. Re: OMV 3.1.1 10GB nic issues
        user386688
        /etc/sysctl.conf:
        ### IPV4 specific settings

        net.ipv4.tcp_timestamps = 0
        net.ipv4.tcp_sack = 0
        net.ipv4.tcp_rmem = 10000000 10000000 10000000
        net.ipv4.tcp_wmem = 10000000 10000000 10000000
        net.ipv4.tcp_mem = 10000000 10000000 10000000
        net.core.rmem_max = 524287
        net.core.wmem_max = 524287
        net.core.rmem_default = 524287
        net.core.wmem_default = 524287
        net.core.optmem_max = 524287
        net.core.netdev_max_backlog = 300000
        net.bridge.bridge-nf-call-ip6tables = 0
        net.bridge.bridge-nf-call-iptables = 0
        net.bridge.bridge-nf-call-arptables = 0
        net.ipv4.tcp_window_scaling = 1

        grub.conf (We've allocated extra RAM for Dom0):
        title Oracle VM Server (2.6.39-200.29.2.el5uek)
        root (hd0,0)
        kernel /xen.gz dom0_mem=3118M
        module /vmlinuz-2.6.39-200.29.2.el5uek ro root=UUID=b29312f6-9437-4185-b
        e53-75eb576b9bd5
        module /initrd-2.6.39-200.29.2.el5uek.img
        • 16. Re: OMV 3.1.1 10GB nic issues
          user386688
          The machines are Dell R710s dual xeon X5550 with 192GB RAM.


          [root@pprodovmsvr01 ~]# modinfo ixgbe
          filename: /lib/modules/2.6.39-200.29.2.el5uek/kernel/drivers/net/ixgbe/ixgbe.ko
          version: 3.6.7-k
          license: GPL
          description: Intel(R) 10 Gigabit PCI Express Network Driver
          author: Intel Corporation, <linux.nics@intel.com>
          srcversion: 780FD4AF454545ABE4FCE06

          With OVM 2.2.1 and updated kernel/ixgbe driver we were seeing on the order of 8-9Gb/sec throughput. With ESX we are seeing 9-9.5Gb/sec throughput. All on the same hardware. Expecting the same sort of performance with a later driver/kernel.

          Edited by: Julian Mccoy-Daly on Oct 24, 2012 3:33 PM
          • 17. Re: OMV 3.1.1 10GB nic issues
            Dave Smulsky
            Thanks for sharing all of that, we have similar nexus setup, but we have no where near the amount of tweaking you have (gso disable, sysctl changes). Were those something you put in yourself or were they recommended by Oracle?
            • 18. Re: OMV 3.1.1 10GB nic issues
              user386688
              No problem.

              It's a combination of stuff learnt in the past and suggested by Oracle in SRs...

              Incidentally, what are you using to do your network performance testing? Have you tweaked the IP stack in the DomUs?

              Cheers

              Julian
              • 19. Re: OMV 3.1.1 10GB nic issues
                Dave Smulsky
                We haven't really done any optimization on either side (dom0 or U).. We really dont seem to be having a throughput issue, rather we are experiencing very random (yet catastrophic) server hangs/reboots that we were thinking was somehow network related. Our DomU's 80% Windows OS and 20% OEL
                • 20. Re: OMV 3.1.1 10GB nic issues
                  user386688
                  Is this a recent move to 10GbE? Was everything stable before? Are your server issues at the Dom0 level or DomU?

                  You have PV'd your Linux boxen and added the PV drivers for Windows? Without those, you will only get 100Mb/sec network adapters. Or at least that was the case in OVM 2 a couple of years ago.

                  We hardly have any Windows boxes on OVM at all, but there is a utility to tune the IP stack out there somewhere. It is easy to starve 10GbE of IO if memory allocation isn't sufficient. netstat -s -t and look for dropped/lost/collapsed packets.

                  For the Linux guests, we have the following in sysctl.conf:

                  net.ipv4.tcp_rmem=4096 524288 16777216
                  net.ipv4.tcp_wmem=4096 524288 16777216
                  net.ipv4.tcp_mem=16384 16384 16384
                  net.ipv4.ipfrag_high_thresh=524288
                  net.ipv4.ipfrag_low_thresh=393216
                  net.ipv4.tcp_timestamps=0
                  net.ipv4.tcp_sack=0
                  net.ipv4.tcp_window_scaling=1
                  net.core.optmem_max=524287
                  net.core.netdev_max_backlog=2500
                  sunrpc.tcp_slot_table_entries=128
                  sunrpc.udp_slot_table_entries=128
                  net.ipv4.tcp_keepalive_time = 300
                  net.ipv4.tcp_keepalive_probes = 3
                  net.ipv4.tcp_keepalive_intvl = 20

                  Is everything connected via the Nexus? Are you using jumbo frames (these always seem to have caused us issues in the past)? I use tools like netperf, iperf and NetApps sio to check on performance - both at Dom0 and DomU level.

                  Edited by: Julian Mccoy-Daly on Oct 25, 2012 7:27 AM
                  • 21. Re: OMV 3.1.1 10GB nic issues
                    Dave Smulsky
                    We moved to 10Gb the same time we moved from OVM 2.2 to 3.0(now 3.1).

                    We've never had stellar stability with OVM in general, we've always experienced some form of random lockup/crash of a Dom0... Seems to be frequently observed on higher-loaded Dom0s (not overloaded).

                    Of course we use non-certified hardware (HP DL585, Quad Opteron 8439 SE's, 192GB ram, Intel 2 port 10GB nic, Dual port Brocade FC HBA (EMC SAN)).. While we havent been denied support, over the last 2 years Oracle has always pointed that out, and never got to the bottom of the crashes.
                    • 22. Re: OMV 3.1.1 10GB nic issues
                      user386688
                      Without knowing your setup in more detail it;'s difficult to know where to suggest you start looking. Were the crashes crashes or heartbeat ring fencing? In OVM2, if your heartbeat file was on an NFS store it was fairly easy to trigger a machine ring fence where the output of the console (ocfs is sorry to be ring fencing your machine) message never appeared in the netconsole output... that would be assuming your o2cb timeout was at the default of 31 or even 61. Moving to 121 solved that particular issue, but we also performed a lot of remediation to reduce IO on the filers at the same time.

                      We've not used OVM 3.anything in production yet - our move to 10GbE was earlier this year with new hardware which has only been partially implemented while we have been testing and ironing out the , er, wrinkles :)
                      • 23. Re: OMV 3.1.1 10GB nic issues
                        Dave Smulsky
                        Oracle has suggested we increase our timeouts to 120 aswell (however thats a very large outage we can't take easily being a 24/7 shop). The problem is most of the time we have inadequate measures in place to capture why the machine reboot (ovm 3.1 doesnt support kdump!) and we dont have an easy way to grab serial console logs. In most of the cases, when a server crashes/freezes, our nexus ports start displaying high rate of pause frames (and eventually err-disable themselves).

                        We use OVM 3.1 for all of our production VM stuff (right now 81 DomU's on 4 Dom0's). We still have 2.2 lingering at other locations but slated for an upgrade (maybe when 3.2 comes to a head)
                        • 24. Re: OMV 3.1.1 10GB nic issues
                          Dave Smulsky
                          What rev of 3.1 are you testing, because we have a difference in ixgbe versions:

                          [root@amralbvh12 ~]# modinfo ixgbe
                          filename: /lib/modules/2.6.39-200.1.4.el5uek/kernel/drivers/net/ixgbe/ixgbe.ko
                          version: 3.4.8-k
                          license: GPL
                          description: Intel(R) 10 Gigabit PCI Express Network Driver
                          author: Intel Corporation, <linux.nics@intel.com>
                          srcversion: 0297D812698076408BAE04A
                          alias: pci:v00008086d0000154Fsv*sd*bc*sc*i*
                          alias: pci:v00008086d0000154Dsv*sd*bc*sc*i*
                          alias: pci:v00008086d00001528sv*sd*bc*sc*i*
                          alias: pci:v00008086d000010F8sv*sd*bc*sc*i*
                          alias: pci:v00008086d0000151Csv*sd*bc*sc*i*
                          alias: pci:v00008086d00001529sv*sd*bc*sc*i*
                          alias: pci:v00008086d0000152Asv*sd*bc*sc*i*
                          alias: pci:v00008086d000010F9sv*sd*bc*sc*i*
                          alias: pci:v00008086d00001514sv*sd*bc*sc*i*
                          alias: pci:v00008086d00001507sv*sd*bc*sc*i*
                          alias: pci:v00008086d000010FBsv*sd*bc*sc*i*
                          alias: pci:v00008086d00001517sv*sd*bc*sc*i*
                          alias: pci:v00008086d000010FCsv*sd*bc*sc*i*
                          alias: pci:v00008086d000010F7sv*sd*bc*sc*i*
                          alias: pci:v00008086d00001508sv*sd*bc*sc*i*
                          alias: pci:v00008086d000010DBsv*sd*bc*sc*i*
                          alias: pci:v00008086d000010F4sv*sd*bc*sc*i*
                          alias: pci:v00008086d000010E1sv*sd*bc*sc*i*
                          alias: pci:v00008086d000010F1sv*sd*bc*sc*i*
                          alias: pci:v00008086d000010ECsv*sd*bc*sc*i*
                          alias: pci:v00008086d000010DDsv*sd*bc*sc*i*
                          alias: pci:v00008086d0000150Bsv*sd*bc*sc*i*
                          alias: pci:v00008086d000010C8sv*sd*bc*sc*i*
                          alias: pci:v00008086d000010C7sv*sd*bc*sc*i*
                          alias: pci:v00008086d000010C6sv*sd*bc*sc*i*
                          alias: pci:v00008086d000010B6sv*sd*bc*sc*i*
                          depends: mdio,dca
                          vermagic: 2.6.39-200.1.4.el5uek SMP mod_unload modversions
                          parm: max_vfs:Maximum number of virtual functions to allocate per physical function (uint)
                          [root@amralbvh12 ~]#
                          • 25. Re: OMV 3.1.1 10GB nic issues
                            user386688
                            Dave wrote:
                            Oracle has suggested we increase our timeouts to 120 aswell (however thats a very large outage we can't take easily being a 24/7 shop). The problem is most of the time we have inadequate measures in place to capture why the machine reboot (ovm 3.1 doesnt support kdump!) and we dont have an easy way to grab serial console logs. In most of the cases, when a server crashes/freezes, our nexus ports start displaying high rate of pause frames (and eventually err-disable themselves).
                            I'm pretty sure we managed to change the timeout without taking everything down, at least we did on 2.2..
                            • 26. Re: OMV 3.1.1 10GB nic issues
                              user386688
                              Dave wrote:
                              What rev of 3.1 are you testing, because we have a difference in ixgbe versions:

                              [root@amralbvh12 ~]# modinfo ixgbe
                              filename: /lib/modules/2.6.39-200.1.4.el5uek/kernel/drivers/net/ixgbe/ixgbe.ko
                              version: 3.4.8-k
                              license: GPL
                              description: Intel(R) 10 Gigabit PCI Express Network Driver
                              author: Intel Corporation, <linux.nics@intel.com>
                              srcversion: 0297D812698076408BAE04A
                              hmmm - later than you I think :)

                              We were provided an updated kernel version in response to a (still open) SR created when we first tried 3.1.1

                              [root@ssctestovmsvr04 ~]# cat /etc/ovs-release
                              Oracle VM server release 3.1.1
                              [root@ssctestovmsvr04 ~]# uname -a
                              Linux ssctestovmsvr04 2.6.39-200.29.2.el5uek #1 SMP Sat Jul 14 10:42:52 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux

                              [root@ssctestovmsvr04 ~]# modinfo ixgbe
                              filename: /lib/modules/2.6.39-200.29.2.el5uek/kernel/drivers/net/ixgbe/ixgbe.ko
                              version: 3.6.7-k
                              license: GPL
                              description: Intel(R) 10 Gigabit PCI Express Network Driver
                              author: Intel Corporation, <linux.nics@intel.com>
                              srcversion: 780FD4AF454545ABE4FCE06
                              1 2 Previous Next