This discussion is archived
1 2 Previous Next 15 Replies Latest reply: Mar 29, 2013 8:37 AM by user11391721 RSS

Poor NFS disk performance

Roddy Rodstein Explorer
Currently Being Moderated
Greetings,

We are testing OVM 3.1.1 (patched) using NFS repositories (poolfs, and for the VMs files and DNFS Database files) with PVM OL5U8/UEK domUs and are seeing slow read and write speeds.

Could you please share your NFS read and write numbers and any NFS performance tips & tricks?

ENV:
Intel 2 socket/32 core x86 servers with 4 10G NICS
emc vnx array with lots of SSD drives. The storage is at 10GB/s on the same switch with ovs hosts.
Cisco 10G infrastrucure.

The numbers:
1- File transfer using lftp is poor, eg 4.66MB/s. lftp file transfer should be bound by the slowest link which, in this case, is 15MB/s and we only get 1/3 of that.
2- the network speed between two hosts is almost 3Gbps. the network speed between two guests is almost 2Gbps.
3- The direct dd read speed of host on nfs share is 57MB/s, and the direct dd write speed of host on nfs share is 18MB/s. The direct dd read speed of guest on nfs share is 29MB/s, and the direct dd write speed of guest on nfs share is 16MB/s
4- When we mount an nfs folder on the ovs host and do a dd if=./test.zero of=/dev/null bs=16k, we get 130MBps. When we run the same command on a domU, we I get 9MBps using the same nfs mount, comand, mounting with default options.

We have been tuning the TCP Parameter Settings, tunning helps, but the numbers are still poor.

Other NFS posts:
NFS disk performance after upgrade to 3.1.1
Slow disk writes VM 3.1

Thank you for your support!
Roddy
  • 1. Re: Poor NFS disk performance
    jurajl Newbie
    Currently Being Moderated
    Hi Roddy;
    as far as I know common problems with NFS has been patched in OVM 3.1.1 build 379.
    So I think, latest OVM 3.1.1 (build 524) should not have problems with NFS.

    However - this is run directly within NFS repository on dom0 (running latest OVM 3.1.1)
    dd if=/dev/zero of=testfile bs=1M count=1000 oflag=dsync
    1000+0 records in
    1000+0 records out
    1048576000 bytes (1.0 GB) copied, 15.7741 seconds, 66.5 MB/s

    And domU (running within mentioned dom0) NFS performance (same NFS as above = acctually NFS repository share mounted to domU):
    dd if=/dev/zero of=testfile bs=1M count=1000 oflag=dsync
    1000+0 records in
    1000+0 records out
    1048576000 bytes (1.0 GB) copied, 18.4832 seconds, 56.7 MB/s

    However - same domU NFS performance, but different NFS share (exported from same NFS server as above):
    dd if=/dev/zero of=testfile bs=1M count=1000 oflag=dsync
    1000+0 records in
    1000+0 records out
    1048576000 bytes (1.0 GB) copied, 68.9641 seconds, 15.2 MB/s

    So I suspect OVM is not the troublemaker here.
  • 2. Re: Poor NFS disk performance
    Roddy Rodstein Explorer
    Currently Being Moderated
    jurajl,

    Thank you for sharing your dd results and for your assistance!
  • 3. Re: Poor NFS disk performance
    Roddy Rodstein Explorer
    Currently Being Moderated
    After one week of tuning, we now have acceptable NFS performance.

    Out of the box, 3.1.1 with 10G cards with NFS and/oriSCSI "must" be tuned to get acceptable read/write performance.

    I will be publishing a new NFS/iSCSI tuning chapter in the the Oracle Cloud Cookbook.

    Stay tuned!
  • 4. Re: Poor NFS disk performance
    jurajl Newbie
    Currently Being Moderated
    Hi Roddy,
    glad to hear you got this fixed and of course looking forward the new chapter.
    Just beeing curious - your tuning advices do tweak only domU mount options of NFS and/or TCP stack aswell ?

    Juraj
  • 5. Re: Poor NFS disk performance
    Roddy Rodstein Explorer
    Currently Being Moderated
    Both dom0 and dumU /etc/sysctl.conf tunning and dom0 numa CPU tunning.

    We got the numbers looking really good!

    The bottom line is that we all need to benchmark to be able to know if we are in the ball game :-)

    Regards,
    Roddy
  • 6. Re: Poor NFS disk performance
    ACF888 Newbie
    Currently Being Moderated
    Hi Roddy,

    I thought I was hitting the same bug so I've upgraded to OVM 3.1.1. build 544 and also upgraded the ovs agent to 3.1.1-524 but I'm still getting the slow I/O issues for my nfs mounts in the VM guests. You mentioned there was tuning you performed. Did you get a chance to document it in your cookbook yet? If not, can you share some insights on what was tuned?

    Thanks,
    Ann
  • 7. Re: Poor NFS disk performance
    user12273962 Pro
    Currently Being Moderated
    What performance stats are you getting?
  • 8. Re: Poor NFS disk performance
    ACF888 Newbie
    Currently Being Moderated
    On my VM guest running 2.6.39-200.32.1.el6uek.x86_64 which is running on VM Server 2.6.39-200.24.1.el6uek.x86_64 that has the "Bug 14092678 : ORACLE VM 3 - NFS REPOSITORY EXTREMELY SLOW" fix.

    [root ~]# dd if=/dev/xvda2 of=/dev/null bs=1024k count=3000
    3000+0 records in
    3000+0 records out
    3145728000 bytes (3.1 GB) copied, 54.0921 s, 58.2 MB/s

    I'm not sure if there's a caching effect because this speed can sometimes be in the GB/s.
    [root ~]# dd if=/dev/xvda2 of=/dev/null bs=1024k count=3000
    3000+0 records in
    3000+0 records out
    3145728000 bytes (3.1 GB) copied, 0.573802 s, 5.5 GB/s

    [root ~]# dd if=/dev/xvda of=/dev/null bs=1024k count=3000
    3000+0 records in
    3000+0 records out
    3145728000 bytes (3.1 GB) copied, 221.919 s, 14.2 MB/s

    Here's one of the nfs-mounts:
    [root ~]# dd if=/oradata/V15571-01_2of3.zip of=/tmp/V15571-01_2of3.zip bs=1024k count=3000
    1740+1 records in
    1740+1 records out
    1825525792 bytes (1.8 GB) copied, 268.431 s, 6.8 MB/s
  • 9. Re: Poor NFS disk performance
    user12273962 Pro
    Currently Being Moderated
    Are your xvda mounts on NFS repos? One you think you have to watch is your VM guests paging over NFS. Are your page files on virtual disk NFS repos?
  • 10. Re: Poor NFS disk performance
    ACF888 Newbie
    Currently Being Moderated
    Yes, the xvda is on NFS repos. The VM guests have page file on xvda as they are downloaded Oracle templates. Swap was low so I added a virtual disk and extended swap space. Currently, I'm testing the IO speed on a VM that is running nothing except the OS. It is not using any paging that I can see.
  • 11. Re: Poor NFS disk performance
    user12273962 Pro
    Currently Being Moderated
    But does the VM guest that's not paging....share the same NFS server? I suspect you're seeing a lot of total IOPS on NFS server from time to time and that is why you see fluctuations in your stats.

    In my opinion, you should never have a VM guest paging on virtual disks over NFS. You're going to create a lot of IOPS against your NFS server even though you might not see a huge bandwidth utilization. I do have as few guests that page from time to time and I see a large # of IOPS. I try to restrict it from happening as much as possible.
  • 12. Re: Poor NFS disk performance
    ACF888 Newbie
    Currently Being Moderated
    The I/O speed fluctuations I'm seeing is due to caching from our NFS storage side.
    I spoke with an Oracle consultant, he said to use:

    dd if=/dev/zero of=test.img bs=1M count=1000 oflag=direct

    If using oflag=direct, then you're using the direct IO bypassing the buffer cache. I used both with and without oflag=direct and am getting around 10-13 MB/s.

    I'm also using the Oracle Linux 6.3 templates downloaded from edelivery and noticed that when I start up the vm there are no [aio] kernel processes started. When I run an Oracle DB, it is very slow. I've experimented with different templates and noticed that Oracle Linux 6.2 and Oracle Linux 5.7 VMs (built from templates from edelivery) do not act the same way. For example, the [aio] processes are started for OL6.2. I'm in the process of building a DB on the OL6.2 vm guest and see if performance is better. I'm hopeful.
  • 13. Re: Poor NFS disk performance
    user11391721 Newbie
    Currently Being Moderated
    Any NFS tuning recommendations for the DomU's, Dom0's or NFS host would be appreciated.

    Concerned why the DomU's would have such better performance (>1.5Gbit/s) at the shell level but issuing commands from OVM to the OVS's result in such abysmal performance (~7MBit/s).

    Setup:* I have a dedicated storage switch with no other traffic & our 3 ovs 3.2.1 DomU's each have separate dedicated 1GB NIC for storage & our NFS server has 2 bonded 1GB NICs on the storage network.

    The good news is that I am seeing > 1.5 GB/s when all 3 DomU's execute DD commands at the same time against the NFS host. Essentially it looks like we are saturating the network and getting decent performance via the DD commands we have tried.

    The bad news is that when using the OVM to execute move commands or trying to get the OVS's to create VM disk files we are only getting ~7MBit/s.

    The following are some of the DD commands we have tried with their results:

    [root@ovs034 034]# dd if=/dev/zero of=stib1mc100000g bs=1M count=100000 conv=fdatasync
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 1344.93 seconds, 78.0 MB/s

    [root@ovs034 034]# dd if=/dev/zero of=stib1mc10000g bs=1M count=10000 conv=fdatasync
    10000+0 records in
    10000+0 records out
    10485760000 bytes (10 GB) copied, 140.316 seconds, 74.7 MB/s
  • 14. Re: Poor NFS disk performance
    user11391721 Newbie
    Currently Being Moderated
    ACF888 wrote:
    The I/O speed fluctuations I'm seeing is due to caching from our NFS storage side.
    I spoke with an Oracle consultant, he said to use:

    dd if=/dev/zero of=test.img bs=1M count=1000 oflag=direct

    If using oflag=direct, then you're using the direct IO bypassing the buffer cache. I used both with and without oflag=direct and am getting around 10-13 MB/s.
    ...
    I think you may be on to something with regards to the oflag=direct setting.

    Maybe the DomU's in 3.2.1 are now pre-tuned to use this flag=direct (maybe due the performance issues in early 3.x releases)...  Maybe this tuning that is helping others is killing me as things are abysmal with it...+

    Is there a way to turn 'oflag=direct' off for normal OVM/OVS commands ?*

    My understanding was that using the flag conv=fdatasync tells dd to sync the write to disk before it exits. Without this flag, dd will perform the write but some of it will remain in memory/cache, not giving you an accurate picture of the true write performance of the disk. This gave me the understanding that yes I am using cache but the conv=fdatasync won't report back until it actually makes it to persistent disk so I am seeing the true end speed...

    I tried the oflag=direct with abysmal results...

    [root@ovs034 034]# dd if=/dev/zero of=test.img bs=1M count=10 oflag=direct
    10+0 records in
    10+0 records out
    10485760 bytes (10 MB) copied, 23.1025 seconds, *454 kB/s*

    Edited by: user11391721 on Mar 26, 2013 11:04 AM clarify
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points