I am building a small test environment for Oracle VM with the following configuration
4 boxes, each 1 x AMD 6 core cpu + 32GB RAM + 2 x 1GB ethernet (10.30.32.n, 192.168.20.n networks - I only have 1x16 port GB switch, so all cabling is through the one switch - that may be part of the problem)
3 boxes are diskless, running Oracle VM Server from 8GB USB flash drive
1 Box is EL6.3 and set up with:
1x750GB with (LVM) partitions for /, /u01 (Oracle 11gR2SE + OVM Manager), swap, and a partition for the OVS pool
4x2TB drives, set up with LVM striping, and 4 LVs - 1 750GB for virtualbox and VMWare VMs, and 3 others for OVS storage repositories (each a bit over 2TB) - all ext4.
The OVS Pool and OVS Storage file systems are exported via NFS
VM server and VM manager installs all went smoothly. Servers and storage are discovered OK in the manager.
I created 2 networks - one on the 10.30.32 subnet for Server Management, Cluster Heartbeat and Live Migrate (also the public network for accessing the machines). The other on the 192.168.20 subnet for Storage and Virtual Machine.
Created storage pools, and presented to all the servers.
In creating the Server Pool, just took the default 120 for the cluster timeout
Now when I try to import an assembly, or a VM template, it gets so far, then then the server doing the actual job reboots.
My initial thought is that for whatever reason the NFS traffic involved in importing and writing - either network, or I/O on the NFS server - is swamping the cluster heartbeat traffic, and the node just fences itself
Just wondering if anyone has any ideas or suggestions, or can advise what further diagnostics or troubleshooting I can do
I suspect it more to do with your storage than anything. Run your import again and sample your I/O performance from your NFS server during the import. Also, if you can... .create LACP bond on your NFS server. I don't know what kind of throughput you expect but remember. A single GB nic can only do a max of 125/MB a sec. That sounds rather good but take into consideration everything you're doing across your environment... its not a lot. I see well over 400 mb/sec network throughput at times on my storage. I run bonded 10GB nics.
Once you get your templates and such built you shouldn't experience that much NFS I/O. You should allocate enough memory to your guest as to avoid virtual disk paging on your NFS storage.