This discussion is archived
6 Replies Latest reply: Sep 29, 2012 3:35 AM by joseph619 RSS

ovm virtual machine domain crashed

joseph619 Newbie
Currently Being Moderated
Hello

We are facing actually a problem. We have installed ovm 2.2.1 and installed 3 (A15, A16, A17) windows virtual machines on a HP Proliant DL380 G7 server.

A17 machine restart unexpectedly when the load is high on the network

Part of logs are:

/var/log/messages:
Sep 25 15:22:18 xxxxx avahi-daemon[7250]: Interface tap23.0.IPv6 no longer relevant for mDNS.
Sep 25 15:22:18 xxxxx avahi-daemon[7250]: Leaving mDNS multicast group on interface tap23.0.IPv6 with address fe80::c40f:d1ff:fe14:ce79.
Sep 25 15:22:18 xxxxx avahi-daemon[7250]: Withdrawing address record for fe80::c40f:d1ff:fe14:ce79 on tap23.0.
Sep 25 15:22:18 xxxxx kernel: xenbr1: port 2(tap23.0) entering disabled state
Sep 25 15:22:18 xxxxx kernel: device tap23.0 left promiscuous mode
Sep 25 15:22:18 xxxxx kernel: type=1700 audit(1348586538.915:90): dev=tap23.0 prom=0 old_prom=256 auid=4294967295 ses=4294967295
Sep 25 15:22:18 xxxxx kernel: xenbr1: port 2(tap23.0) entering disabled state
Sep 25 15:22:20 xxxxx kernel: xenbr1: port 3(vif23.0) entering disabled state

/var/log/xen/xend.log:
[2012-09-25 15:22:18 8235] WARNING (image:490) domain 10_xxxxx: device model failure: pid 11149: died due to signal 11; see /var/log/xen/qemu-dm-10_xxxxx.log
[2012-09-25 15:22:19 8235] WARNING (XendDomainInfo:1907) Domain has crashed: name=10_xxxxx id=23.
[2012-09-25 15:22:19 8235] DEBUG (XendDomainInfo:2757) XendDomainInfo.destroy: domid=23
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2230) Destroying device model
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2237) Releasing devices
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vif/0
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vbd/768
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vbd/832
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/832
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vbd/5696
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/5696
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vbd/2048
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/2048
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vbd/5632
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/5632
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vfb/0
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing console/0
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:117) XendDomainInfo.create_from_dict({'vcpus_params': {'cap': 0, 'weight': 256}, 'PV_args': '', 'features': '', 'cpus': [[], [], [], [], [], [], [], [], [], [], [], []], 'paused': 0, 'domid': 23, 'shutdown': 0, 'VCPUs_live': 12, 'PV_bootloader': '/usr/bin/pygrub', 'actions_after_crash': 'restart', 'vbd_refs': ['0b8eb7e0-91cb-1dd8-93ba-92b1a3ff26b3', 'cd42c91a-1f15-053e-78c4-d2ea9876e6d9', '16e4af8e-048d-7be9-5fc1-e14070ff3551', '4b836ae6-2352-73f0-94e8-b570764113b4', '9a5ce3cb-8a63-60ed-c8b1-ef00d666fe06'], 'PV_ramdisk': '', 'memory_dynamic_min': 8589934592L, 'name_label': '10_xxxxx', 'VCPUs_at_startup': 1, 'HVM_boot_params': {'order': 'c'}, 'platform': {'videoram': '4', 'hpet': '0', 'stdvga': '0', 'vnclisten': '0.0.0.0', 'loader': '/usr/lib/xen/boot/hvmloader', 'vncconsole': '1', 'serial': 'pty', 'vncunused': '1', 'xen_platform_pci': '1', 'monitor': '0', 'boot': 'c', 'rtc_timeoffset': -967, 'vncpasswd': 'XXXXXXXX', 'pci': [], 'pae': '1', 'vpt_align': '1', 'hap': '1', 'viridian': '0', 'acpi': '1', 'localtime': '0', 'timer_mode': '0', 'vnc': '1', 'nographic': '0', 'pci_msitranslate': '1', 'apic': '1', 'usb': '0', 'guest_os_type': 'default', 'device_model': '/usr/lib/xen/bin/qemu-dm', 'keymap': 'fr', 'pci_power_mgmt': '0', 'xauthority': '//.Xauthority', 'isa': '0'}, 'PV_kernel': '', 'console_refs': ['db501a0b-36a5-a080-dab2-d42bace0b21b', 'eba3a0c2-a3ff-00cf-4e40-7c6b433801a1'], 'online_vcpus': 12, 'blocked': 0, 'on_xend_stop': 'ignore', 'memory_static_min': 0, 'HVM_boot_policy': 'BIOS order', 'shutdown_reason': 3, 'VCPUs_max': 12, 'start_time': 1348534061.3324931, 'memory_static_max': 8589934592L, 'actions_after_shutdown': 'destroy', 'on_xend_start': 'ignore', 'crashed': 0, 'memory_dynamic_max': 8589934592L, 'actions_after_suspend': '', 'is_a_template': False, 'PV_bootloader_args': '-q', 'is_control_domain': False, 'uuid': 'c33d113b-46b3-91d3-0984-8c8b192e23c3', 'cpu_time': 22146.567213627, 'shadow_memory': 76, 'dying': 0, 'vcpu_avail': 4095, 'notes': {'SUSPEND_CANCEL': 1}, 'other_config': {}, 'auto_power_on': False, 'running': 0, 'actions_after_reboot': 'restart', 'vif_refs': ['ee5890ab-345b-479b-1504-f3a9405d4dbf'], 'target': 0, 'vtpm_refs': [], 's3_integrity': 1, 'devices': {'cd42c91a-1f15-053e-78c4-d2ea9876e6d9': ('vbd', {'uuid': 'cd42c91a-1f15-053e-78c4-d2ea9876e6d9', 'bootable': 0, 'devid': 832, 'driver': 'paravirtualised', 'dev': 'hdb', 'uname': 'file:/var/ovs/mount/58F85FD535AC460495D6CD8D56EF0E94/running_pool/10_xxxxx/JDEdwards.img', 'mode': 'w'}), 'eba3a0c2-a3ff-00cf-4e40-7c6b433801a1': ('console', {'location': '14', 'devid': 0, 'protocol': 'vt100', 'uuid': 'eba3a0c2-a3ff-00cf-4e40-7c6b433801a1', 'other_config': {}}), 'ee5890ab-345b-479b-1504-f3a9405d4dbf': ('vif', {'bridge': 'xenbr1', 'mac': '00:16:3E:03:AF:B3', 'devid': 0, 'type': 'ioemu', 'uuid': 'ee5890ab-345b-479b-1504-f3a9405d4dbf'}), '4b836ae6-2352-73f0-94e8-b570764113b4': ('vbd', {'uuid': '4b836ae6-2352-73f0-94e8-b570764113b4', 'bootable': 0, 'devid': 2048, 'driver': 'paravirtualised', 'dev': 'sda', 'uname': 'file:/var/ovs/mount/58F85FD535AC460495D6CD8D56EF0E94/running_pool/10_xxxxx/Backup.img', 'mode': 'w'}), 'db501a0b-36a5-a080-dab2-d42bace0b21b': ('vfb', {'vncunused': 1, 'other_config': {'vncunused': 1, 'vncpasswd': 'XXXXXXXX', 'vnclisten': '0.0.0.0', 'vnc': '1'}, 'vnc': '1', 'uuid': 'db501a0b-36a5-a080-dab2-d42bace0b21b', 'vnclisten': '0.0.0.0', 'vncpasswd': 'XXXXXXXX', 'location': '0.0.0.0:5901', 'devid': 0}), '0b8eb7e0-91cb-1dd8-93ba-92b1a3ff26b3': ('vbd', {'uuid': '0b8eb7e0-91cb-1dd8-93ba-92b1a3ff26b3', 'bootable': 1, 'devid': 768, 'driver': 'paravirtualised', 'dev': 'hda', 'uname': 'file:/var/ovs/mount/58F85FD535AC460495D6CD8D56EF0E94/running_pool/10_xxxxx/System.img', 'mode': 'w'}), '16e4af8e-048d-7be9-5fc1-e14070ff3551': ('vbd', {'uuid': '16e4af8e-048d-7be9-5fc1-e14070ff3551', 'bootable': 0, 'devid': 5696, 'driver': 'paravirtualised', 'dev': 'hdd', 'uname': 'file:/var/ovs/mount/58F85FD535AC460495D6CD8D56EF0E94/running_pool/10_xxxxx/MSSQL.img', 'mode': 'w'}), '9a5ce3cb-8a63-60ed-c8b1-ef00d666fe06': ('vbd', {'uuid': '9a5ce3cb-8a63-60ed-c8b1-ef00d666fe06', 'bootable': 0, 'devid': 5632, 'driver': 'paravirtualised', 'dev': 'hdc:cdrom', 'uname': '', 'mode': 'r'})}, 'PV_superpages': 0})
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2327) XendDomainInfo.constructDomain
[2012-09-25 15:22:20 8235] DEBUG (balloon:181) Balloon: 3208900 KiB free; need 4096; done.
[2012-09-25 15:22:20 8235] DEBUG (XendDomain:452) Adding Domain: 24
[2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2528) XendDomainInfo.initDomain: 24 256

Any idea ???
  • 1. Re: ovm virtual machine domain crashed
    Dude! Guru
    Currently Being Moderated
    I think only the following line is relevant:
    2012-09-25 15:22:18 8235] WARNING (image:490) domain 10_xxxxx: device model failure: pid 11149: died due to signal 11; see /var/log/xen/qemu-dm-10_xxxxx.log
    SIGSEGV (11) is the signal sent to a process when it makes an invalid memory reference, or segmentation fault. You may have a hardware issue or driver problem. I suggest to contact Oracle Support, or post it in the Oracle VM forum.
  • 2. Re: ovm virtual machine domain crashed
    joseph619 Newbie
    Currently Being Moderated
    Ok Dude

    i will post it on Oracle VM forum

    support already gave some workaround which does not correct the problem.
    actually i wonder where i can find driver for broadcom higher than 2.0.8 as stated in doc id ID 1312709.1
    cause my driver version is 2.0.8-rh

    also do you know where i can find this driver ?

    thanks
  • 3. Re: ovm virtual machine domain crashed
    Dude! Guru
    Currently Being Moderated
    Sorry no idea. Actaully I don't see the information that points to the network driver. You wrote your systems crashes on high network load. How about system load? Is this a proven fact that can be reproduced or just an observation? Did this work before? Did you change or add any hardware? What is the history of the problem? Does the problem seem to happen periodically, like once a week?
  • 4. Re: ovm virtual machine domain crashed
    joseph619 Newbie
    Currently Being Moderated
    About the load it is just an observation

    Other observation the virtual machine that reboots have is ram full.

    It did not work before, it is like this since we installed the virtual machine

    We did not change any hardware

    The history of the problem:

    1st day

    We have oracle virtual server 2.2.1 running on a HP Proliant DL 380 G7

    We have a (virtual machine) server under windows 2008 R2 (A17). We installed this machine in the last position

    2nd day

    We noticed we were not able to contact the different machine via the network
    We checked the log and noticed an error NETDEV WATCHDOG: eth0: transmit timed and something else related to bnx2 driver

    after some research under support of oracle we found something like this [ID 1312709.1] where the problem were supposed to be solved in oracle virtual server 2.2.2 or using an another kernel and when disabling option C-STATE in the BIOS C6 and C3.
    were drivers related to NIC were updated. We installed this new kernel but the problem were now related to the windows server 2008 R2 machine.
    Actually this machine reboot and the nic where the machine is cannot be contacted after the machine reboot
    i mean if A17 on xenbr1(eth1) reboots ping on eth1 does not respond but still able to ping A15 and A16 on xenbr0(eth0)
    and the nic is:

    Lspci grep ethernet
    Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet

    ethtool -i eth1
    driver: bnx2
    version: 2.0.8-rh
    firmware-version: bc 5.2.3 NCSI 2.0.6
    bus-info: 0000:33:00.1

    end of the story

    the problem sems to happen when users launch batch application which execute batch sql statement (midday or 5 pm). twice or once a day.

    Any idea ?!!

    thanks
  • 5. Re: ovm virtual machine domain crashed
    Dude! Guru
    Currently Being Moderated
    I'm afraid there is not enough information to pin point the problem. The issue could be software or firmware related, in particular since the problem started with installing virtualization, but it may also be more complex or even a combination of problems. To my experience, if a problem is very uncommon, or a known fix does not work, check the hardware. Computers are fragile and complex machines. I was often shocked in the past to see how inappropriate some PC technicians carried out a hardware maintenance.

    Your problem could be related to workload, perhaps due to temperature or amount of memory used. How about your RAM? Is it 3rd party? Perhaps you have a some sort of a hardware issue that manifests depending on temperature, e.g. a tiny crack on a PCB or contact or bad electronic component. Most software or drivers are not able to intercept bad hardware, such a bad cable or bad adapter card.
  • 6. Re: ovm virtual machine domain crashed
    joseph619 Newbie
    Currently Being Moderated
    ok i will check all possibilities (change hardware) and let you know

    thanks for your help !!!

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points