We just patched our OEL which in our Oracle VM environment. With the new patch, we have two kernels:
One system is booting the UEK kernel and the other system is booting the XEN kernel.
[root@zinc ~]# uname -a
Linux zinc.cabq.gov 2.6.32-100.35.1.el5uek #1 SMP Wed Jun 1 21:44:30 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@phoenix boot]# uname -a
Linux phoenix.cabq.gov 2.6.18-18.104.22.168.1.el5xen #1 SMP Tue May 31 15:05:36 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
Which kernel "should" i use? I'm running Oracle 11.2 and 11.1 databases on the virtual hosts.
the UEK is the Oracle Unbreakable Linux Kernel, the other the 100% Redhat compatible kernel.
Both kernels are o.k. to run databases, however it is said that the performance with the UEK Kernel is better. (However I have not tested it, and don't know about the difference in VM environments). No matter what, both are supported.
There is just one small thing to know: At the moment ACFS won't work with the UEK Kernel. But that is the only exception.
If you just running databases with or without ASM I would choose the UEK kernel.
BTW. There is no XEN type of kernel anymore, since the UEK kernel decides at boot time, if it runs paravirtualized or not (like the RHEL 6 kernel).
yes and no... UEK will support HugePages, but unfortunately OVM does not yet fully support HugePages in paravirtualized environments:
Setup HugePages in an Guest Does Not Work with Oracle VM 2.1 or 2.1.1 (Doc ID 728063.1)
We've been trying to track down that bug, but haven't had much luck so far. Mainly because we can't reproduce it ourselves. Do you have a testcase that reliably reproduces this panic? If so, please share it with us, so we can get this fixed. Best to use your Oracle Support channel for that.
For me, I can always reproduce this kernel panic on all Oracle VM 2.2 latest servers when booting a paravirtualized guest with more than one VCPU with the UEK kernel. Yesterday I finally found a testcase that always reproduces it at will:
For a paravirtualized guest (OEL 5 update 7), edit vm.cfg to have vcpus=2 (or any number above 1). Booting the VM with the UEK kernel this way will always produce that kernel panic here. As soon as I change back vcus to 1 in the vm.cfg, the virtualized guest boots fine with the UEK kernel. I have the exact same result on different types of (IBM) hardware:
* IBM System x3550 type 7978KPG with one 2.33GHz Xeon Quad core (E5410) and 18GB of memory
* IBM System x3630 M3 type 737764G with two 2,67GHz Xeon Hexa core (X5650) and 96GB of memory
I'd be glad to share this with OSS, however AFAIK we have no basic/premier support for OEL - only network/update support. I'd be happy to work with OSS and tackle this one though. I'll check with the masters here to see whether or not we have OEL basic/premier support in some way.
The same panic shows up on the xen-users list in 2009 in a thread but doesn't give any pointers whatsoever (http://lists.xensource.com/archives/html/xen-users/2009-12/msg00186.html). It's also completely different hardware for that matter.
First of all thanks for your efforts. Below an example of a vm.cfg with the problem exposed:
name = 'basisvm'
bootloader = '/usr/bin/pygrub'
disk = [
disk_other_config = [['xvda1', 'ionice', 'sched=best-effort,prio=5'], ['xvda2', 'ionice', 'sched=best-effort,prio=5'], ['xvda3', 'ionice', 'sched=best-effort,prio=7']]
vif = [
vfb = ['type=vnc,vncunused=1,vnclisten=0.0.0.0,vncpasswd=XXXXXXXXXXXXXXXXX']
memory = 512
keymap = 'en-us'
on_crash = 'restart'
on_reboot = 'restart'
As you can see it is using LVM-backed VBD's at the moment, I can try file backed VBD's if you wish. Below the output of the grep command from the VM after booting with one VCPU:
[root@basisvm ~]# cd /sys/devices/system/cpu/cpu0/cache/
[root@basisvm cache]# grep -r . .
I have also tried booting the VM with only one VBD (in single user mode because it lacks a /var partition) and also with only one network device, but this makes no difference: the kernel panics with the same call trace. I've also checked if we do have basic/premier support for OEL ourselves or one of our customers but unfortunately the answer is no at this time.
Reading my post again and comparing it to the output of the grep command on the OracleVM itself, the shared_cpu_map looks off or wrong. This it what it looks like on dom0:
I think the mask looks different because UEK prints it differently from the OVM server kernel. This doesn't really tell me what is happening, the output doesn't look all that unusual to me. What happens if you remove the vm.cfg option cpus="18-23" or vcpu_avail?
Of course, there's about 4 years difference between those kernels, no wonder it looks different. Removing the cpus="18-23" option did not change behaviour, but the good news is that removing the "vcpu_avail" option now correctly boots the UEK kernel with more than one VCPU. :-) So somehow the culprit is the vcpu_avail option, even though it was set at the same amount of the vcpus option.
The reason this parameter was there is because in the past we used it on VM's to be able to dynamically scale up the amount of VCPUs. We have now moved away from it, but the parameter is still there. Funny thing is, this morning I already found this out when running the YUM updates for the new environment, where I already disabled this parameter (vcpus_avail). I accidently booted the UEK kernel (I forgot to change grub.conf) and was surprised to see the UEK kernel running with 4 VCPUs.
This is good news, however IMHO the kernel should be able to work like this, shouldn't it? FWIW, I'd be glad to help investigate further. Thanks for the effort!