6 Replies Latest reply: Sep 29, 2012 5:35 AM by 858318 RSS

    ovm virtual machine domain crashed

    858318
      Hello

      We are facing actually a problem. We have installed ovm 2.2.1 and installed 3 (A15, A16, A17) windows virtual machines on a HP Proliant DL380 G7 server.

      A17 machine restart unexpectedly when the load is high on the network

      Part of logs are:

      /var/log/messages:
      Sep 25 15:22:18 xxxxx avahi-daemon[7250]: Interface tap23.0.IPv6 no longer relevant for mDNS.
      Sep 25 15:22:18 xxxxx avahi-daemon[7250]: Leaving mDNS multicast group on interface tap23.0.IPv6 with address fe80::c40f:d1ff:fe14:ce79.
      Sep 25 15:22:18 xxxxx avahi-daemon[7250]: Withdrawing address record for fe80::c40f:d1ff:fe14:ce79 on tap23.0.
      Sep 25 15:22:18 xxxxx kernel: xenbr1: port 2(tap23.0) entering disabled state
      Sep 25 15:22:18 xxxxx kernel: device tap23.0 left promiscuous mode
      Sep 25 15:22:18 xxxxx kernel: type=1700 audit(1348586538.915:90): dev=tap23.0 prom=0 old_prom=256 auid=4294967295 ses=4294967295
      Sep 25 15:22:18 xxxxx kernel: xenbr1: port 2(tap23.0) entering disabled state
      Sep 25 15:22:20 xxxxx kernel: xenbr1: port 3(vif23.0) entering disabled state

      /var/log/xen/xend.log:
      [2012-09-25 15:22:18 8235] WARNING (image:490) domain 10_xxxxx: device model failure: pid 11149: died due to signal 11; see /var/log/xen/qemu-dm-10_xxxxx.log
      [2012-09-25 15:22:19 8235] WARNING (XendDomainInfo:1907) Domain has crashed: name=10_xxxxx id=23.
      [2012-09-25 15:22:19 8235] DEBUG (XendDomainInfo:2757) XendDomainInfo.destroy: domid=23
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2230) Destroying device model
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2237) Releasing devices
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vif/0
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vbd/768
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vbd/832
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/832
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vbd/5696
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/5696
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vbd/2048
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/2048
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vbd/5632
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/5632
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing vfb/0
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2250) Removing console/0
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:1144) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:117) XendDomainInfo.create_from_dict({'vcpus_params': {'cap': 0, 'weight': 256}, 'PV_args': '', 'features': '', 'cpus': [[], [], [], [], [], [], [], [], [], [], [], []], 'paused': 0, 'domid': 23, 'shutdown': 0, 'VCPUs_live': 12, 'PV_bootloader': '/usr/bin/pygrub', 'actions_after_crash': 'restart', 'vbd_refs': ['0b8eb7e0-91cb-1dd8-93ba-92b1a3ff26b3', 'cd42c91a-1f15-053e-78c4-d2ea9876e6d9', '16e4af8e-048d-7be9-5fc1-e14070ff3551', '4b836ae6-2352-73f0-94e8-b570764113b4', '9a5ce3cb-8a63-60ed-c8b1-ef00d666fe06'], 'PV_ramdisk': '', 'memory_dynamic_min': 8589934592L, 'name_label': '10_xxxxx', 'VCPUs_at_startup': 1, 'HVM_boot_params': {'order': 'c'}, 'platform': {'videoram': '4', 'hpet': '0', 'stdvga': '0', 'vnclisten': '0.0.0.0', 'loader': '/usr/lib/xen/boot/hvmloader', 'vncconsole': '1', 'serial': 'pty', 'vncunused': '1', 'xen_platform_pci': '1', 'monitor': '0', 'boot': 'c', 'rtc_timeoffset': -967, 'vncpasswd': 'XXXXXXXX', 'pci': [], 'pae': '1', 'vpt_align': '1', 'hap': '1', 'viridian': '0', 'acpi': '1', 'localtime': '0', 'timer_mode': '0', 'vnc': '1', 'nographic': '0', 'pci_msitranslate': '1', 'apic': '1', 'usb': '0', 'guest_os_type': 'default', 'device_model': '/usr/lib/xen/bin/qemu-dm', 'keymap': 'fr', 'pci_power_mgmt': '0', 'xauthority': '//.Xauthority', 'isa': '0'}, 'PV_kernel': '', 'console_refs': ['db501a0b-36a5-a080-dab2-d42bace0b21b', 'eba3a0c2-a3ff-00cf-4e40-7c6b433801a1'], 'online_vcpus': 12, 'blocked': 0, 'on_xend_stop': 'ignore', 'memory_static_min': 0, 'HVM_boot_policy': 'BIOS order', 'shutdown_reason': 3, 'VCPUs_max': 12, 'start_time': 1348534061.3324931, 'memory_static_max': 8589934592L, 'actions_after_shutdown': 'destroy', 'on_xend_start': 'ignore', 'crashed': 0, 'memory_dynamic_max': 8589934592L, 'actions_after_suspend': '', 'is_a_template': False, 'PV_bootloader_args': '-q', 'is_control_domain': False, 'uuid': 'c33d113b-46b3-91d3-0984-8c8b192e23c3', 'cpu_time': 22146.567213627, 'shadow_memory': 76, 'dying': 0, 'vcpu_avail': 4095, 'notes': {'SUSPEND_CANCEL': 1}, 'other_config': {}, 'auto_power_on': False, 'running': 0, 'actions_after_reboot': 'restart', 'vif_refs': ['ee5890ab-345b-479b-1504-f3a9405d4dbf'], 'target': 0, 'vtpm_refs': [], 's3_integrity': 1, 'devices': {'cd42c91a-1f15-053e-78c4-d2ea9876e6d9': ('vbd', {'uuid': 'cd42c91a-1f15-053e-78c4-d2ea9876e6d9', 'bootable': 0, 'devid': 832, 'driver': 'paravirtualised', 'dev': 'hdb', 'uname': 'file:/var/ovs/mount/58F85FD535AC460495D6CD8D56EF0E94/running_pool/10_xxxxx/JDEdwards.img', 'mode': 'w'}), 'eba3a0c2-a3ff-00cf-4e40-7c6b433801a1': ('console', {'location': '14', 'devid': 0, 'protocol': 'vt100', 'uuid': 'eba3a0c2-a3ff-00cf-4e40-7c6b433801a1', 'other_config': {}}), 'ee5890ab-345b-479b-1504-f3a9405d4dbf': ('vif', {'bridge': 'xenbr1', 'mac': '00:16:3E:03:AF:B3', 'devid': 0, 'type': 'ioemu', 'uuid': 'ee5890ab-345b-479b-1504-f3a9405d4dbf'}), '4b836ae6-2352-73f0-94e8-b570764113b4': ('vbd', {'uuid': '4b836ae6-2352-73f0-94e8-b570764113b4', 'bootable': 0, 'devid': 2048, 'driver': 'paravirtualised', 'dev': 'sda', 'uname': 'file:/var/ovs/mount/58F85FD535AC460495D6CD8D56EF0E94/running_pool/10_xxxxx/Backup.img', 'mode': 'w'}), 'db501a0b-36a5-a080-dab2-d42bace0b21b': ('vfb', {'vncunused': 1, 'other_config': {'vncunused': 1, 'vncpasswd': 'XXXXXXXX', 'vnclisten': '0.0.0.0', 'vnc': '1'}, 'vnc': '1', 'uuid': 'db501a0b-36a5-a080-dab2-d42bace0b21b', 'vnclisten': '0.0.0.0', 'vncpasswd': 'XXXXXXXX', 'location': '0.0.0.0:5901', 'devid': 0}), '0b8eb7e0-91cb-1dd8-93ba-92b1a3ff26b3': ('vbd', {'uuid': '0b8eb7e0-91cb-1dd8-93ba-92b1a3ff26b3', 'bootable': 1, 'devid': 768, 'driver': 'paravirtualised', 'dev': 'hda', 'uname': 'file:/var/ovs/mount/58F85FD535AC460495D6CD8D56EF0E94/running_pool/10_xxxxx/System.img', 'mode': 'w'}), '16e4af8e-048d-7be9-5fc1-e14070ff3551': ('vbd', {'uuid': '16e4af8e-048d-7be9-5fc1-e14070ff3551', 'bootable': 0, 'devid': 5696, 'driver': 'paravirtualised', 'dev': 'hdd', 'uname': 'file:/var/ovs/mount/58F85FD535AC460495D6CD8D56EF0E94/running_pool/10_xxxxx/MSSQL.img', 'mode': 'w'}), '9a5ce3cb-8a63-60ed-c8b1-ef00d666fe06': ('vbd', {'uuid': '9a5ce3cb-8a63-60ed-c8b1-ef00d666fe06', 'bootable': 0, 'devid': 5632, 'driver': 'paravirtualised', 'dev': 'hdc:cdrom', 'uname': '', 'mode': 'r'})}, 'PV_superpages': 0})
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2327) XendDomainInfo.constructDomain
      [2012-09-25 15:22:20 8235] DEBUG (balloon:181) Balloon: 3208900 KiB free; need 4096; done.
      [2012-09-25 15:22:20 8235] DEBUG (XendDomain:452) Adding Domain: 24
      [2012-09-25 15:22:20 8235] DEBUG (XendDomainInfo:2528) XendDomainInfo.initDomain: 24 256

      Any idea ???
        • 1. Re: ovm virtual machine domain crashed
          Dude!
          I think only the following line is relevant:
          2012-09-25 15:22:18 8235] WARNING (image:490) domain 10_xxxxx: device model failure: pid 11149: died due to signal 11; see /var/log/xen/qemu-dm-10_xxxxx.log
          SIGSEGV (11) is the signal sent to a process when it makes an invalid memory reference, or segmentation fault. You may have a hardware issue or driver problem. I suggest to contact Oracle Support, or post it in the Oracle VM forum.
          • 2. Re: ovm virtual machine domain crashed
            858318
            Ok Dude

            i will post it on Oracle VM forum

            support already gave some workaround which does not correct the problem.
            actually i wonder where i can find driver for broadcom higher than 2.0.8 as stated in doc id ID 1312709.1
            cause my driver version is 2.0.8-rh

            also do you know where i can find this driver ?

            thanks
            • 3. Re: ovm virtual machine domain crashed
              Dude!
              Sorry no idea. Actaully I don't see the information that points to the network driver. You wrote your systems crashes on high network load. How about system load? Is this a proven fact that can be reproduced or just an observation? Did this work before? Did you change or add any hardware? What is the history of the problem? Does the problem seem to happen periodically, like once a week?
              • 4. Re: ovm virtual machine domain crashed
                858318
                About the load it is just an observation

                Other observation the virtual machine that reboots have is ram full.

                It did not work before, it is like this since we installed the virtual machine

                We did not change any hardware

                The history of the problem:

                1st day

                We have oracle virtual server 2.2.1 running on a HP Proliant DL 380 G7

                We have a (virtual machine) server under windows 2008 R2 (A17). We installed this machine in the last position

                2nd day

                We noticed we were not able to contact the different machine via the network
                We checked the log and noticed an error NETDEV WATCHDOG: eth0: transmit timed and something else related to bnx2 driver

                after some research under support of oracle we found something like this [ID 1312709.1] where the problem were supposed to be solved in oracle virtual server 2.2.2 or using an another kernel and when disabling option C-STATE in the BIOS C6 and C3.
                were drivers related to NIC were updated. We installed this new kernel but the problem were now related to the windows server 2008 R2 machine.
                Actually this machine reboot and the nic where the machine is cannot be contacted after the machine reboot
                i mean if A17 on xenbr1(eth1) reboots ping on eth1 does not respond but still able to ping A15 and A16 on xenbr0(eth0)
                and the nic is:

                Lspci grep ethernet
                Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet

                ethtool -i eth1
                driver: bnx2
                version: 2.0.8-rh
                firmware-version: bc 5.2.3 NCSI 2.0.6
                bus-info: 0000:33:00.1

                end of the story

                the problem sems to happen when users launch batch application which execute batch sql statement (midday or 5 pm). twice or once a day.

                Any idea ?!!

                thanks
                • 5. Re: ovm virtual machine domain crashed
                  Dude!
                  I'm afraid there is not enough information to pin point the problem. The issue could be software or firmware related, in particular since the problem started with installing virtualization, but it may also be more complex or even a combination of problems. To my experience, if a problem is very uncommon, or a known fix does not work, check the hardware. Computers are fragile and complex machines. I was often shocked in the past to see how inappropriate some PC technicians carried out a hardware maintenance.

                  Your problem could be related to workload, perhaps due to temperature or amount of memory used. How about your RAM? Is it 3rd party? Perhaps you have a some sort of a hardware issue that manifests depending on temperature, e.g. a tiny crack on a PCB or contact or bad electronic component. Most software or drivers are not able to intercept bad hardware, such a bad cable or bad adapter card.
                  • 6. Re: ovm virtual machine domain crashed
                    858318
                    ok i will check all possibilities (change hardware) and let you know

                    thanks for your help !!!