This discussion is archived
1 2 Previous Next 26 Replies Latest reply: Oct 25, 2012 7:22 AM by user386688 Go to original post RSS
  • 15. Re: OMV 3.1.1 10GB nic issues
    user386688 Newbie
    Currently Being Moderated
    /etc/sysctl.conf:
    ### IPV4 specific settings

    net.ipv4.tcp_timestamps = 0
    net.ipv4.tcp_sack = 0
    net.ipv4.tcp_rmem = 10000000 10000000 10000000
    net.ipv4.tcp_wmem = 10000000 10000000 10000000
    net.ipv4.tcp_mem = 10000000 10000000 10000000
    net.core.rmem_max = 524287
    net.core.wmem_max = 524287
    net.core.rmem_default = 524287
    net.core.wmem_default = 524287
    net.core.optmem_max = 524287
    net.core.netdev_max_backlog = 300000
    net.bridge.bridge-nf-call-ip6tables = 0
    net.bridge.bridge-nf-call-iptables = 0
    net.bridge.bridge-nf-call-arptables = 0
    net.ipv4.tcp_window_scaling = 1

    grub.conf (We've allocated extra RAM for Dom0):
    title Oracle VM Server (2.6.39-200.29.2.el5uek)
    root (hd0,0)
    kernel /xen.gz dom0_mem=3118M
    module /vmlinuz-2.6.39-200.29.2.el5uek ro root=UUID=b29312f6-9437-4185-b
    e53-75eb576b9bd5
    module /initrd-2.6.39-200.29.2.el5uek.img
  • 16. Re: OMV 3.1.1 10GB nic issues
    user386688 Newbie
    Currently Being Moderated
    The machines are Dell R710s dual xeon X5550 with 192GB RAM.


    [root@pprodovmsvr01 ~]# modinfo ixgbe
    filename: /lib/modules/2.6.39-200.29.2.el5uek/kernel/drivers/net/ixgbe/ixgbe.ko
    version: 3.6.7-k
    license: GPL
    description: Intel(R) 10 Gigabit PCI Express Network Driver
    author: Intel Corporation, <linux.nics@intel.com>
    srcversion: 780FD4AF454545ABE4FCE06

    With OVM 2.2.1 and updated kernel/ixgbe driver we were seeing on the order of 8-9Gb/sec throughput. With ESX we are seeing 9-9.5Gb/sec throughput. All on the same hardware. Expecting the same sort of performance with a later driver/kernel.

    Edited by: Julian Mccoy-Daly on Oct 24, 2012 3:33 PM
  • 17. Re: OMV 3.1.1 10GB nic issues
    user157995 Explorer
    Currently Being Moderated
    Thanks for sharing all of that, we have similar nexus setup, but we have no where near the amount of tweaking you have (gso disable, sysctl changes). Were those something you put in yourself or were they recommended by Oracle?
  • 18. Re: OMV 3.1.1 10GB nic issues
    user386688 Newbie
    Currently Being Moderated
    No problem.

    It's a combination of stuff learnt in the past and suggested by Oracle in SRs...

    Incidentally, what are you using to do your network performance testing? Have you tweaked the IP stack in the DomUs?

    Cheers

    Julian
  • 19. Re: OMV 3.1.1 10GB nic issues
    user157995 Explorer
    Currently Being Moderated
    We haven't really done any optimization on either side (dom0 or U).. We really dont seem to be having a throughput issue, rather we are experiencing very random (yet catastrophic) server hangs/reboots that we were thinking was somehow network related. Our DomU's 80% Windows OS and 20% OEL
  • 20. Re: OMV 3.1.1 10GB nic issues
    user386688 Newbie
    Currently Being Moderated
    Is this a recent move to 10GbE? Was everything stable before? Are your server issues at the Dom0 level or DomU?

    You have PV'd your Linux boxen and added the PV drivers for Windows? Without those, you will only get 100Mb/sec network adapters. Or at least that was the case in OVM 2 a couple of years ago.

    We hardly have any Windows boxes on OVM at all, but there is a utility to tune the IP stack out there somewhere. It is easy to starve 10GbE of IO if memory allocation isn't sufficient. netstat -s -t and look for dropped/lost/collapsed packets.

    For the Linux guests, we have the following in sysctl.conf:

    net.ipv4.tcp_rmem=4096 524288 16777216
    net.ipv4.tcp_wmem=4096 524288 16777216
    net.ipv4.tcp_mem=16384 16384 16384
    net.ipv4.ipfrag_high_thresh=524288
    net.ipv4.ipfrag_low_thresh=393216
    net.ipv4.tcp_timestamps=0
    net.ipv4.tcp_sack=0
    net.ipv4.tcp_window_scaling=1
    net.core.optmem_max=524287
    net.core.netdev_max_backlog=2500
    sunrpc.tcp_slot_table_entries=128
    sunrpc.udp_slot_table_entries=128
    net.ipv4.tcp_keepalive_time = 300
    net.ipv4.tcp_keepalive_probes = 3
    net.ipv4.tcp_keepalive_intvl = 20

    Is everything connected via the Nexus? Are you using jumbo frames (these always seem to have caused us issues in the past)? I use tools like netperf, iperf and NetApps sio to check on performance - both at Dom0 and DomU level.

    Edited by: Julian Mccoy-Daly on Oct 25, 2012 7:27 AM
  • 21. Re: OMV 3.1.1 10GB nic issues
    user157995 Explorer
    Currently Being Moderated
    We moved to 10Gb the same time we moved from OVM 2.2 to 3.0(now 3.1).

    We've never had stellar stability with OVM in general, we've always experienced some form of random lockup/crash of a Dom0... Seems to be frequently observed on higher-loaded Dom0s (not overloaded).

    Of course we use non-certified hardware (HP DL585, Quad Opteron 8439 SE's, 192GB ram, Intel 2 port 10GB nic, Dual port Brocade FC HBA (EMC SAN)).. While we havent been denied support, over the last 2 years Oracle has always pointed that out, and never got to the bottom of the crashes.
  • 22. Re: OMV 3.1.1 10GB nic issues
    user386688 Newbie
    Currently Being Moderated
    Without knowing your setup in more detail it;'s difficult to know where to suggest you start looking. Were the crashes crashes or heartbeat ring fencing? In OVM2, if your heartbeat file was on an NFS store it was fairly easy to trigger a machine ring fence where the output of the console (ocfs is sorry to be ring fencing your machine) message never appeared in the netconsole output... that would be assuming your o2cb timeout was at the default of 31 or even 61. Moving to 121 solved that particular issue, but we also performed a lot of remediation to reduce IO on the filers at the same time.

    We've not used OVM 3.anything in production yet - our move to 10GbE was earlier this year with new hardware which has only been partially implemented while we have been testing and ironing out the , er, wrinkles :)
  • 23. Re: OMV 3.1.1 10GB nic issues
    user157995 Explorer
    Currently Being Moderated
    Oracle has suggested we increase our timeouts to 120 aswell (however thats a very large outage we can't take easily being a 24/7 shop). The problem is most of the time we have inadequate measures in place to capture why the machine reboot (ovm 3.1 doesnt support kdump!) and we dont have an easy way to grab serial console logs. In most of the cases, when a server crashes/freezes, our nexus ports start displaying high rate of pause frames (and eventually err-disable themselves).

    We use OVM 3.1 for all of our production VM stuff (right now 81 DomU's on 4 Dom0's). We still have 2.2 lingering at other locations but slated for an upgrade (maybe when 3.2 comes to a head)
  • 24. Re: OMV 3.1.1 10GB nic issues
    user157995 Explorer
    Currently Being Moderated
    What rev of 3.1 are you testing, because we have a difference in ixgbe versions:

    [root@amralbvh12 ~]# modinfo ixgbe
    filename: /lib/modules/2.6.39-200.1.4.el5uek/kernel/drivers/net/ixgbe/ixgbe.ko
    version: 3.4.8-k
    license: GPL
    description: Intel(R) 10 Gigabit PCI Express Network Driver
    author: Intel Corporation, <linux.nics@intel.com>
    srcversion: 0297D812698076408BAE04A
    alias: pci:v00008086d0000154Fsv*sd*bc*sc*i*
    alias: pci:v00008086d0000154Dsv*sd*bc*sc*i*
    alias: pci:v00008086d00001528sv*sd*bc*sc*i*
    alias: pci:v00008086d000010F8sv*sd*bc*sc*i*
    alias: pci:v00008086d0000151Csv*sd*bc*sc*i*
    alias: pci:v00008086d00001529sv*sd*bc*sc*i*
    alias: pci:v00008086d0000152Asv*sd*bc*sc*i*
    alias: pci:v00008086d000010F9sv*sd*bc*sc*i*
    alias: pci:v00008086d00001514sv*sd*bc*sc*i*
    alias: pci:v00008086d00001507sv*sd*bc*sc*i*
    alias: pci:v00008086d000010FBsv*sd*bc*sc*i*
    alias: pci:v00008086d00001517sv*sd*bc*sc*i*
    alias: pci:v00008086d000010FCsv*sd*bc*sc*i*
    alias: pci:v00008086d000010F7sv*sd*bc*sc*i*
    alias: pci:v00008086d00001508sv*sd*bc*sc*i*
    alias: pci:v00008086d000010DBsv*sd*bc*sc*i*
    alias: pci:v00008086d000010F4sv*sd*bc*sc*i*
    alias: pci:v00008086d000010E1sv*sd*bc*sc*i*
    alias: pci:v00008086d000010F1sv*sd*bc*sc*i*
    alias: pci:v00008086d000010ECsv*sd*bc*sc*i*
    alias: pci:v00008086d000010DDsv*sd*bc*sc*i*
    alias: pci:v00008086d0000150Bsv*sd*bc*sc*i*
    alias: pci:v00008086d000010C8sv*sd*bc*sc*i*
    alias: pci:v00008086d000010C7sv*sd*bc*sc*i*
    alias: pci:v00008086d000010C6sv*sd*bc*sc*i*
    alias: pci:v00008086d000010B6sv*sd*bc*sc*i*
    depends: mdio,dca
    vermagic: 2.6.39-200.1.4.el5uek SMP mod_unload modversions
    parm: max_vfs:Maximum number of virtual functions to allocate per physical function (uint)
    [root@amralbvh12 ~]#
  • 25. Re: OMV 3.1.1 10GB nic issues
    user386688 Newbie
    Currently Being Moderated
    Dave wrote:
    Oracle has suggested we increase our timeouts to 120 aswell (however thats a very large outage we can't take easily being a 24/7 shop). The problem is most of the time we have inadequate measures in place to capture why the machine reboot (ovm 3.1 doesnt support kdump!) and we dont have an easy way to grab serial console logs. In most of the cases, when a server crashes/freezes, our nexus ports start displaying high rate of pause frames (and eventually err-disable themselves).
    I'm pretty sure we managed to change the timeout without taking everything down, at least we did on 2.2..
  • 26. Re: OMV 3.1.1 10GB nic issues
    user386688 Newbie
    Currently Being Moderated
    Dave wrote:
    What rev of 3.1 are you testing, because we have a difference in ixgbe versions:

    [root@amralbvh12 ~]# modinfo ixgbe
    filename: /lib/modules/2.6.39-200.1.4.el5uek/kernel/drivers/net/ixgbe/ixgbe.ko
    version: 3.4.8-k
    license: GPL
    description: Intel(R) 10 Gigabit PCI Express Network Driver
    author: Intel Corporation, <linux.nics@intel.com>
    srcversion: 0297D812698076408BAE04A
    hmmm - later than you I think :)

    We were provided an updated kernel version in response to a (still open) SR created when we first tried 3.1.1

    [root@ssctestovmsvr04 ~]# cat /etc/ovs-release
    Oracle VM server release 3.1.1
    [root@ssctestovmsvr04 ~]# uname -a
    Linux ssctestovmsvr04 2.6.39-200.29.2.el5uek #1 SMP Sat Jul 14 10:42:52 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux

    [root@ssctestovmsvr04 ~]# modinfo ixgbe
    filename: /lib/modules/2.6.39-200.29.2.el5uek/kernel/drivers/net/ixgbe/ixgbe.ko
    version: 3.6.7-k
    license: GPL
    description: Intel(R) 10 Gigabit PCI Express Network Driver
    author: Intel Corporation, <linux.nics@intel.com>
    srcversion: 780FD4AF454545ABE4FCE06
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points