OLVM: remove snapshot broken

ChristianRe
ChristianRe Member, Orion GCS Customer Posts: 1 Green Ribbon

Hi all,


I just upgraded my OLVM lab this morning and bad surprise, it seems that removing live a snapshot is broken.

Step to reproduce:

  1. start a vm
  2. create a snapshot
  3. delete the snapshot => erreur in virt-engine, disk snapshot is tagged as "Illegal"
  4. vdsm.log: libvirtError: internal error .. doesn't match expected (see vdsm.log below)

To recover you have to:

a. cleanup in engine database with script "unlock_entity.sh" (remove Illegal flag)

b. stop the vm

c. remove the snapshot

d. restart the vm

(snapshot can also bemoved after the vm has been restarted!)


Anybody has the same issue ?

Was working before patch !!


Rgds

/Christian


log:

021-03-05 11:35:04,328+0100 ERROR (jsonrpc/7) [virt.vm] (vmId='f9b81502-dc90-46bf-9ff5-92c5c05f67b8') Live merge failed (job: 66306095-955a-4508-8ad4-e7ca8b921213) (vm:5957)

Traceback (most recent call last):

 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5955, in merge

  bandwidth, flags)

 File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 100, in f

  ret = attr(*args, **kwargs)

 File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper

  ret = f(*args, **kwargs)

 File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper

  return func(inst, *args, **kwargs)

 File "/usr/lib64/python2.7/site-packages/libvirt.py", line 719, in blockCommit

  if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self)

libvirtError: internal error: qemu block name 'json:{"backing": {"driver": "raw", "file": {"driver": "host_device", "filename": "/rhev/data-center/mnt/blockSD/6f1fa90f-27a6-4907-9e01-431c865aa203/images/d88edd28-b7dd-4e7d-8710-220ec8ee51a5/b2baa19f-7c3b-4c40-8223-b01cbc0d6929"}}, "driver": "qcow2", "file": {"driver": "host_device", "filename": "/rhev/data-center/mnt/blockSD/6f1fa90f-27a6-4907-9e01-431c865aa203/images/d88edd28-b7dd-4e7d-8710-220ec8ee51a5/a22f5909-4540-4e09-a3cb-f81e46a6d3b3"}}' doesn't match expected '/rhev/data-center/mnt/blockSD/6f1fa90f-27a6-4907-9e01-431c865aa203/images/d88edd28-b7dd-4e7d-8710-220ec8ee51a5/a22f5909-4540-4e09-a3cb-f81e46a6d3b3'

Comments

  • fcardax
    fcardax Member Posts: 0 Green Ribbon

    Hello Chrisitan.

    I've the same problem I opened a casa yesterday still trying to figured out a solution.

    All the snapshot are marked illegal and is not possible to remove.

  • ChristianRe
    ChristianRe Member, Orion GCS Customer Posts: 1 Green Ribbon

    Sounds very similar to

    https://bugzilla.redhat.com/show_bug.cgi?id=1785939

  • ChristianRe
    ChristianRe Member, Orion GCS Customer Posts: 1 Green Ribbon

    Doc ID 2757406.1

    "The live merge bugs which will be fixed in 4.4.x from upstream."

    It means we have to wait OLVM4.4 (ie ovirt4.4) to have this problem fixed.. and no date is announced for olvm4.4..

    It means we can currently not remove any snapshot with vm running !!

  • fcardax
    fcardax Member Posts: 0 Green Ribbon

    I read all thread on bugzilla and yes is our problem. They refer to a bug in qemu as you said seems to be resolved in 4.4

    Anyway we don't actuality have any 4.4 Oracle release a few days ago I opened a case and I'm waiting a response.

    without snapshot I can't make any backup without backup I don't sleep very well.

  • ChristianRe
    ChristianRe Member, Orion GCS Customer Posts: 1 Green Ribbon

    Exactly, "vprotect" does not like at all this issue..

    If Oracle Support provide a solution for this case, I would appreciate if you share it here. THX.

  • fcardax
    fcardax Member Posts: 0 Green Ribbon

    @ChristianRe Oracle is still figuring out the problem. Anyway I made a Test Enviromant today and I foud out a workround.

    Downgrade all the qemu package: yum downgrade qemu-kvm-3.1.0-7.el7.x86_64 qemu-system-x86-3.1.0-7.el7.x86_64 qemu-system-x86-core-3.1.0-7.el7.x86_64 qemu-common-3.1.0-7.el7.x86_64

    After that you are able to manage the snapshot as always. Anyway I guess the qemu people will made a patch.

    Now i will stress my enviromant and eventyally i'll downgrade only on a few production cluster just to be sure don't catch any other issue.

  • ChristianRe
    ChristianRe Member, Orion GCS Customer Posts: 1 Green Ribbon

    Thanks a lot for this feedback.

    Your workaround works perfectly and my lab run correctly again.

    I hope Oracle will make pressure to get a fix soon...