after OVS upgrade from 3.0.3 to 3.1.1 I noticed performance problems on virtual disks placed on nfs repository. Before upgrade, on 3.0.3 I can read from xvda disk at around 60MB/s, after upgrade on 3.1.1 it fall down to around 1.5 MB/s, with 1MB block size:
dd if=/dev/xvda of=/dev/null bs=1024k count=1000
^C106+0 records in
105+0 records out
110100480 bytes (110 MB) copied, 79.0509 seconds, 1.4 MB/s
Repository is on nfs share attached through dedicated 1Gbit/s ethernet network with MTU=8900. The same configuration was used before upgrade. Only change was upgrade from 3.0.3 to 3.1.1.
Test machines are OEL5 with latest UEK kernels, running PVM mode.
Repository on 3.1.1 is mounted without additional NFS options like rsize,wsize,tcp or proto=3:
192.168.100.10:/mnt/nfs_storage_pool on /OVS/Repositories/0004fb0000030000a75ccd9ef5a238c3 type nfs (rw,addr=192.168.100.10)
I don't find a way to change it, but don't know if it may cause performance issues.
Any idea why is with 3.1.1 so high performance decrease from 60 to 1.5 MB/s?
Well, my tests also show huge slowdown in disk performance, when using NFS repositories.
I tried hdparm -tT /dev/xvda from within virtual machines (they have repositories on the same NAS, however - separate NFS shares are being used as repositories for 3.1.1 and 3.0.3).
This is what I ended up with:
OVM3.1.1 : Timing buffered disk reads: 6 MB in 4.14 seconds = 1.45 MB/sec
OVM3.0.3 : Timing buffered disk reads: 140 MB in 3.02 seconds = 46.32 MB/sec
Could this be caused by NFS late locking mechanism (introduced in 3.1) ?
Any known way to fix this?
I too am witnessing this HUGE slowdown... we have a pretty loaded Oracle 10G server that resides on a NFS repo, and its CRAWLING.. I am opening an SR1 with oracle now because thats a huge issue. In our case, we also notice the dom0's load averages very high due to this, along with my NFS guest that is running from 60-90% iowait.
No dice yet, Oracle has not responded.. Has anyone else had any lucking figuring anything out?
My nfs mounts are identical between 3.0.3 and 3.1.1 so i dont think that matters..
3.0.3: 10.10.1.4:/coraid/nfs01 on /OVS/Repositories/0004fb0000030000113d5ca8bb4766e7 type nfs (rw,addr=10.10.1.4)
3.1.1: 10.10.1.4:/coraid/nfs01 on /OVS/Repositories/0004fb0000030000113d5ca8bb4766e7 type nfs (rw,addr=10.10.1.4)
Speed of NFS share mounted on VM machine is not a problem, I also can get around 80MB/s when reading from nfs share mounted inside of virtual machine.
Problem is, when virtual disk /dev/xvda is on NFS-based storage repository.
Reading disk directly from OVS server is also fast:
[root@acs-ovm3 ~]# dd if=/OVS/Repositories/0004fb0000030000a75ccd9ef5a238c3/VirtualDisks/0004fb00001200009da8bd2fd1dcef22.img of=/dev/null bs=1024k count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 13.4906 seconds, 77.7 MB/s
But when this disk is attached as virtual disk /dev/xvd..... to virtual machine, reading gets slow, 1.5 MB/s. And only on OVS 3.1.1
Gave in, its now an SR1 and was promptly picked up.. However it seems this might be an unknown issue because it is being kicked over to development. I will post any relevant info once (if) Oracle resolves it for me.
can you give me bug#, so I can also check progress ?
I did more tests. When I boot redhat compatible 2.6.18 kernel on my OEL5 in HVM mode, I get the same virtual disk under two devices - hda and xvda. Reading xvda is still slow, but reading hda is fast, over 60MB/s as was before upgrade. Checked with hdparm -tT /dev/.... Don't know why.
Dave, I see that you still have no bug logged for that problem.
Maybe I found a workaround. Only disks attached through PV drivers have the problem, emulated sdxx disks perform fine. Problem was how to enable only emulated disks and not pv disks. Because even with HVM mode are used pv drivers, not emulated.
But after setting HVM mode for virtual machine and adding xen_platform_pci=0 to vm.cfg config file, I see now only emulated sdxx disks with UEK kernel (and of course also emulated nic). Reading sda is fast in my case.
I don't know if this is supported, this can tell only our support, but you may try it on some test system.
Yes, same outcome here, Oracle is aware of the issue, stated they reproduced it internally, but not resolve yet.
I imagine this is going to becoming a much larger problem when more of the 3.0 customers start upgrading... I really hope they find a solution because even on our rather small NFS SR deployment, its rendering Oracle DB servers useless.