This discussion is archived
13 Replies Latest reply: Feb 4, 2013 12:06 AM by budachst RSS

power outage - lost my local storage and ocfs2 mount errors

user5373812 Newbie
Currently Being Moderated
I recently installed OVM and had everything working nicely for about 2 days. My OVM Server shut down unexpectedly and I seemed to have lost access to my repository on the OVM.
I am using local storage on the OVM Server. When I go to /OVS/Repositories and ls, there is nothing there.

On the OVM server, when I boot, I get a bunch of errors about ofcf2 which I now understand is the oracle file system that was probably installed on the local storage when I originally set everything up.

Any help would be GREATLY appreciated!!!!

When I reboot the OVM Server, I get the following errors:
mount.ocfs2,4344,0):ocfs2_check_volume:2468 File system was not unmounted cleanly, recovering volume.
Info fld=0x240b3
end_request: critical target error, dev sdd, sector 148632
end_request: critical target error, dev dm-2, sector 147632
Info fld=0x2a399
end_request: critical target error, dev sdd, sector 175768
end_request: critical target error, dev dm-2, sec 175768
JBD: Failed to read block at offset 20435
JBD: recovery failed
(mount.ocfs2,4515,2):ocfs2_journal_load:1004 ERROR: Failed to load journal!
(mount.ocfs2,4515,2):ocfs2_check_volume:2476 ERROR: ocfs2 journal load failed! -5
(mount.ocfs2,4515,2):ocfs2_check_volume:2525 ERROR: status = -5
(mount.ocfs2,4515,2):ocfs2_check_volume:1910 ERROR: status = -5
(mount.ocfs2,4515,2):ocfs2_journal_shutdown:942 ERROR: status = -5
(mount.ocfs2,4515,2):ocfs2_fill_super:1240 ERROR: status: -5

OVM Server running on PowerEdge 2900

When I try to start a VM, I get the following error:

Job Construction Phase
----------------------
begin()
Appended operation 'Virtual Machine Start' to object '0004fb0000060000205a8c08f511c265 (DB)'.
Appended operation 'Bridge Configure Operation' to object '0004fb00002000001a58d1d8e5d47fa2 (network.BondPort (1) in ovs-36-01-3f)'.
commit()
Completed Step: COMMIT

Objects and Operations
----------------------
Object (IN_USE): [VirtualDisk] 0004fb00001200002f0d60a7afa9a9c4.img (System.img (12))
Object (IN_USE): [VmDiskMapping] 0004fb0000130000547e921f59135cba
Object (IN_USE): [VmDiskMapping] 0004fb00001300001e5ab7465015e717
Object (IN_USE): [VirtualMachine] 0004fb0000060000205a8c08f511c265 (DB)
Operation: Virtual Machine Start
Object (IN_USE): [BondPort] 0004fb00002000001a58d1d8e5d47fa2 (network.BondPort (1) in ovs-36-01-3f)
Operation: Bridge Configure Operation
Object (IN_USE): [Vnic] 0004fb0000070000711bbf916a1066d8 (00:21:f6:00:00:00)
Object (IN_USE): [VirtualDisk] 0004fb00001200002cad6110e182ef76.img (FSCMDB.img (4))
Object (IN_USE): [VmDiskMapping] 0004fb0000130000607a9d0741a8f768
Object (IN_USE): [VirtualDisk] 0004fb000012000005bee2b246851db1.img (oracle11g_x86_64_asm.img (4))

Job Running Phase at 21:11 on Thu, Dec 6, 2012
----------------------------------------------
Job Participants: [44:45:4c:4c:43:00:10:38:80:44:c7:c0:4f:59:42:31 (ovs-36-01-3f)]


Actioner
--------
Starting operation 'Virtual Machine Start' on object '0004fb0000060000205a8c08f511c265 (DB)'
Completed operation 'Virtual Machine Start' completed with direction ==> LATER
Starting operation 'Bridge Configure Operation' on object '0004fb00002000001a58d1d8e5d47fa2 (network.BondPort (1) in ovs-36-01-3f)'
Bridge [10.0.1.0] already exists (and should exist) on interface [bond0] on server [ovs-36-01-3f]; skipping bridge creation
Completed operation 'Bridge Configure Operation' completed with direction ==> DONE
Starting operation 'Virtual Machine Start' on object '0004fb0000060000205a8c08f511c265 (DB)'
Job Internal Error (Operation)com.oracle.ovm.mgr.api.exception.FailedOperationException: OVMAPI_4010E Attempt to send command: start_vm to server: ovs-36-01-3f failed. OVMAPI_4004E Server Failed Command: start_vm 0004fb00000300007e64717b049deaef 0004fb0000060000205a8c08f511c265, Status: org.apache.xmlrpc.XmlRpcException: exceptions.RuntimeError:Command: ['xm', 'create', '/OVS/Repositories/0004fb00000300007e64717b049deaef/VirtualMachines/0004fb0000060000205a8c08f511c265/vm.cfg'] failed (1): stderr: Error: Unable to open config file: /OVS/Repositories/0004fb00000300007e64717b049deaef/VirtualMachines/0004fb0000060000205a8c08f511c265/vm.cfg
stdout: Usage: xm create [options] [vars]

Create a domain based on .
...

Create a domain.

Domain creation parameters can be set by command-line switches, from
a python configuration script or an SXP config file. See documentation
for defconfig, config. Configuration variables can be set using
VAR=VAL on the command line. For example vmid=3 sets vmid to 3.


...
Thu Dec 06 21:11:03 EST 2012...
Thu Dec 06 21:11:03 EST 2012
at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActionEngine.java:507)
at com.oracle.ovm.mgr.action.ActionEngine.sendUndispatchedServerCommand(ActionEngine.java:459)
at com.oracle.ovm.mgr.action.ActionEngine.sendServerCommand(ActionEngine.java:385)
at com.oracle.ovm.mgr.action.VirtualMachineAction.sendStartVmCommand(VirtualMachineAction.java:121)
at com.oracle.ovm.mgr.op.virtual.VirtualMachineStart.asyncAction(VirtualMachineStart.java:65)
at com.oracle.ovm.mgr.api.collectable.ManagedObjectDbImpl.executeCurrentJobOperationAsyncAction(ManagedObjectDbImpl.java:999)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:330)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:290)
at com.oracle.odof.core.storage.Transaction.invokeMethod(Transaction.java:822)
at com.oracle.odof.core.Exchange.invokeMethod(Exchange.java:245)
at com.oracle.ovm.mgr.api.virtual.VirtualMachineProxy.executeCurrentJobOperationAsyncAction(Unknown Source)
at com.oracle.ovm.mgr.api.job.JobEngine.objectAsyncStart(JobEngine.java:596)
at com.oracle.ovm.mgr.op.virtual.VirtualMachineStart.action(VirtualMachineStart.java:52)
at com.oracle.ovm.mgr.api.collectable.ManagedObjectDbImpl.executeCurrentJobOperationAction(ManagedObjectDbImpl.java:1009)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:330)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:290)
at com.oracle.odof.core.storage.Transaction.invokeMethod(Transaction.java:822)
at com.oracle.odof.core.Exchange.invokeMethod(Exchange.java:245)
at com.oracle.ovm.mgr.api.virtual.VirtualMachineProxy.executeCurrentJobOperationAction(Unknown Source)
at com.oracle.ovm.mgr.api.job.JobEngine.operationActioner(JobEngine.java:218)
at com.oracle.ovm.mgr.api.job.JobEngine.objectActioner(JobEngine.java:309)
at com.oracle.ovm.mgr.api.job.InternalJobDbImpl.objectCommitter(InternalJobDbImpl.java:1140)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:330)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:290)
at com.oracle.odof.core.BasicWork.invokeMethod(BasicWork.java:136)
at com.oracle.odof.command.InvokeMethodCommand.process(InvokeMethodCommand.java:100)
at com.oracle.odof.core.BasicWork.processCommand(BasicWork.java:81)
at com.oracle.odof.core.TransactionManager.processCommand(TransactionManager.java:773)
at com.oracle.odof.core.WorkflowManager.processCommand(WorkflowManager.java:401)
at com.oracle.odof.core.WorkflowManager.processWork(WorkflowManager.java:459)
at com.oracle.odof.io.AbstractClient.run(AbstractClient.java:42)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.oracle.ovm.mgr.api.exception.IllegalOperationException: OVMAPI_4004E Server Failed Command: start_vm 0004fb00000300007e64717b049deaef 0004fb0000060000205a8c08f511c265, Status: org.apache.xmlrpc.XmlRpcException: exceptions.RuntimeError:Command: ['xm', 'create', '/OVS/Repositories/0004fb00000300007e64717b049deaef/VirtualMachines/0004fb0000060000205a8c08f511c265/vm.cfg'] failed (1): stderr: Error: Unable to open config file: /OVS/Repositories/0004fb00000300007e64717b049deaef/VirtualMachines/0004fb0000060000205a8c08f511c265/vm.cfg
stdout: Usage: xm create [options] [vars]

Create a domain based on .

Options:

...

...
Thu Dec 06 21:11:03 EST 2012
at com.oracle.ovm.mgr.action.ActionEngine.sendAction(ActionEngine.java:798)
at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActionEngine.java:503)
... 40 more


FailedOperationCleanup
----------
Starting failed operation 'Virtual Machine Start' cleanup on object 'DB'
Complete rollback operation 'Virtual Machine Start' completed with direction=DB

Rollbacker
----------
Executing rollback operation 'Virtual Machine Start' on object '0004fb0000060000205a8c08f511c265 (DB)'
Complete rollback operation 'Virtual Machine Start' completed with direction=DONE
Executing rollback operation 'Bridge Configure Operation' on object '0004fb00002000001a58d1d8e5d47fa2 (network.BondPort (1) in ovs-36-01-3f)'
Complete rollback operation 'Bridge Configure Operation' completed with direction=DONE

Objects To Be Rolled Back
-------------------------
Object (IN_USE): [VirtualDisk] 0004fb00001200002f0d60a7afa9a9c4.img (System.img (12))
Object (IN_USE): [VmDiskMapping] 0004fb0000130000547e921f59135cba
Object (IN_USE): [VmDiskMapping] 0004fb00001300001e5ab7465015e717
Object (IN_USE): [VirtualMachine] 0004fb0000060000205a8c08f511c265 (DB)
Object (IN_USE): [BondPort] 0004fb00002000001a58d1d8e5d47fa2 (network.BondPort (1) in ovs-36-01-3f)
Object (IN_USE): [Vnic] 0004fb0000070000711bbf916a1066d8 (00:21:f6:00:00:00)
Object (IN_USE): [VirtualDisk] 0004fb00001200002cad6110e182ef76.img (FSCMDB.img (4))
Object (IN_USE): [VmDiskMapping] 0004fb0000130000607a9d0741a8f768
Object (IN_USE): [VirtualDisk] 0004fb000012000005bee2b246851db1.img (oracle11g_x86_64_asm.img (4))


Write Methods Invoked
-------------------
Class=InternalJobDbImpl vessel_id=9797 method=addTransactionIdentifier accessLevel=6
Class=VirtualMachineDbImpl vessel_id=5862 method=start accessLevel=6
Class=VnicDbImpl vessel_id=1644 method=lock accessLevel=6
Class=VirtualDiskDbImpl vessel_id=5874 method=lock accessLevel=6
Class=VirtualDiskDbImpl vessel_id=5884 method=lock accessLevel=6
Class=VirtualDiskDbImpl vessel_id=5896 method=lock accessLevel=6
Class=VnicDbImpl vessel_id=1644 method=lock accessLevel=6
Class=VmDiskMappingDbImpl vessel_id=5879 method=lock accessLevel=6
Class=VmDiskMappingDbImpl vessel_id=5891 method=lock accessLevel=6
Class=VmDiskMappingDbImpl vessel_id=5901 method=lock accessLevel=6
Class=VirtualDiskDbImpl vessel_id=5874 method=lock accessLevel=6
Class=VirtualDiskDbImpl vessel_id=5884 method=lock accessLevel=6
Class=VirtualDiskDbImpl vessel_id=5896 method=lock accessLevel=6
Class=VirtualMachineStartingEventDbImpl vessel_id=9806 method=notify accessLevel=6
Class=VirtualMachineStartingEventDbImpl vessel_id=9806 method=setDescription accessLevel=6
Class=BondPortDbImpl vessel_id=802 method=addJobOperation accessLevel=6
Class=InternalJobDbImpl vessel_id=9797 method=setCompletedStep accessLevel=6
Class=InternalJobDbImpl vessel_id=9797 method=setAssociatedHandles accessLevel=6
Class=InternalJobDbImpl vessel_id=9797 method=setTuringMachineFlag accessLevel=6
Class=VirtualMachineDbImpl vessel_id=5862 method=setCurrentOperationToLater accessLevel=6
Class=InternalJobDbImpl vessel_id=9797 method=setTuringMachineFlag accessLevel=6
Class=BondPortDbImpl vessel_id=802 method=setCurrentJobOperationComplete accessLevel=6
Class=BondPortDbImpl vessel_id=802 method=nextJobOperation accessLevel=6
Class=VirtualMachineDbImpl vessel_id=5862 method=nextJobOperation accessLevel=6
Class=InternalJobDbImpl vessel_id=9797 method=setFailedOperation accessLevel=6
Class=VirtualDiskDbImpl vessel_id=5874 method=nextJobOperation accessLevel=6
Class=VmDiskMappingDbImpl vessel_id=5901 method=nextJobOperation accessLevel=6
Class=VmDiskMappingDbImpl vessel_id=5879 method=nextJobOperation accessLevel=6
Class=VirtualMachineDbImpl vessel_id=5862 method=nextJobOperation accessLevel=6
Class=BondPortDbImpl vessel_id=802 method=nextJobOperation accessLevel=6
Class=VnicDbImpl vessel_id=1644 method=nextJobOperation accessLevel=6
Class=VirtualDiskDbImpl vessel_id=5896 method=nextJobOperation accessLevel=6
Class=VmDiskMappingDbImpl vessel_id=5891 method=nextJobOperation accessLevel=6
Class=VirtualDiskDbImpl vessel_id=5884 method=nextJobOperation accessLevel=6
Class=VirtualMachineDbImpl vessel_id=5862 method=nextJobOperation accessLevel=6
Class=BondPortDbImpl vessel_id=802 method=nextJobOperation accessLevel=6
Completed Step: ROLLBACK
Job failed commit (internal) due to OVMAPI_4010E Attempt to send command: start_vm to server: ovs-36-01-3f failed. OVMAPI_4004E Server Failed Command: start_vm 0004fb00000300007e64717b049deaef 0004fb0000060000205a8c08f511c265, Status: org.apache.xmlrpc.XmlRpcException: exceptions.RuntimeError:Command: ['xm', 'create', '/OVS/Repositories/0004fb00000300007e64717b049deaef/VirtualMachines/0004fb0000060000205a8c08f511c265/vm.cfg'] failed (1): stderr: Error: Unable to open config file: /OVS/Repositories/0004fb00000300007e64717b049deaef/VirtualMachines/0004fb0000060000205a8c08f511c265/vm.cfg
stdout: Usage: xm create [options] [vars]

Create a domain based on .

Options:

Edited by: user5373812 on Dec 7, 2012 9:26 AM
  • 1. Re: power outage - lost my local storage and ocfs2 mount errors
    user12273962 Pro
    Currently Being Moderated
    fsck.ocfs the mount point.
  • 2. Re: power outage - lost my local storage and ocfs2 mount errors
    user5373812 Newbie
    Currently Being Moderated
    fsck.ocfs didn't work, but I tried fsck.ocfs2 on /dev/sda3 and I get bad magic number in superblock while opening /dev/sda3

    Edited by: user5373812 on Dec 7, 2012 2:22 PM
  • 3. Re: power outage - lost my local storage and ocfs2 mount errors
    user5373812 Newbie
    Currently Being Moderated
    Ok...I was wrong on /dev/sd3
    Looks like ocfs was on /dev/sdd

    I ran fsck.ocfs2 on /dev/sdd and /dev/dm-2and did something, but I'm still having trouble - still giving me errors
    JBD:Ignoring recovery information on journal
    end_request: critical target error, dev sdd, sector 147362
    end_request: critical target error, dev dm-2, sector 147362
    JBD: Failed to read block at offset 20345

    then some ocfs errors:
    mount.ocfs2,4596,3 -
    ocfs2_journal_wipe:1045
    ocfs2_check_volume:2463
    ocfs2_mount_volume:1910
    ocfs2_journal_shutdown:942
    ocfs2_fill_super:1240

    If I run blkid, I get the following:
    /dev/sda1
    /dev/sda2
    /dev/sda3
    /dev/sr0
    /dev/mapper/SATA_ST31000340AS_9QJ1MSOP
    /dev/sdd

    sda1 and sda2 are ext3
    sdd and the dev mapper are ocfs2
    sr0 is an iso

    if I do df, I only see sda2, sda1 an then tempfs and none
  • 4. Re: power outage - lost my local storage and ocfs2 mount errors
    budachst Pro
    Currently Being Moderated
    df will only report on mounted volumes and since your ocfs2 volume isn't mounted, it won't be reported by df.
    Since blkid doesn't report a valid OCFS2 fs on /dev/sdd, there's actually no chance that, that fsck.ocfs could work anyway.
    If I run blkid on my VM server it spits out all existing file systems - can you provide the exact output of blkid?
  • 5. Re: power outage - lost my local storage and ocfs2 mount errors
    user5373812 Newbie
    Currently Being Moderated
    Thanks for the response, but when the oracle forums went down, I let the frustration get the best of me and reinstalled everything from scratch. Was able to get back to where I was previously pretty quickly.
  • 6. Re: power outage - lost my local storage and ocfs2 mount errors
    user5373812 Newbie
    Currently Being Moderated
    Ok - another extended power outage and same issue. This time sI have some work into the various environments, so its not as easy for me to just start over.
    Here's what I have figured out. The device is /dev/sdd. This is a separate 1TB drive on the physical server. When the power outage occurs, the drive seems to get unresponsive when it tries to remount. So, successive reboots, the device is not recognized after reboots.
    I seem to need to do a complete power down (unplug server) and then the NEXT reboot, I can see the drive.

    Problem is, its not mounting and I don't understand where in OCFS2 I need to configure it.
    Here's my configuration:
    3 disks presented as repositories. 2 are working fine, the 3rd is not.
    If I do df, I see the two that work correctly as (full names shortened for clarity):
    /dev/mapper/SATA_8162
    /dev/mapper/SATA_8186

    If I go to /dev/mapper and do an ls, I see all 3 devices:
    SATA_8186, SATA_8162 and SATA_ST31 (SATA_ST31 is the one that isn't working correctly/remounting)


    If I go to /OVS/Repositories, I see the 3 mount points, but only two of them have the OVM directories (the two that work), the one thats not working is not present.

    Leads me to believe that its simply not mounting, but I'm not sure where to look to see the errors or where to look to configure it to automatically mount.

    Any ideas?
  • 7. Re: power outage - lost my local storage and ocfs2 mount errors
    user5373812 Newbie
    Currently Being Moderated
    Ok - got a little farther - actually found the disk in ovm manager on the 'storage' tab and then was able to associate it with the server.
    At that point, it showed up under repository tab but was unpresented to the server - but at this point, I could click the present button and give it a go.

    Doing the present I assume is where it mounts and here I got a whole bunch of errors

    starting with slot 0 in use
    JDB: Ignoring recovery information on journal
    then end-request critical target error dev dm-2 (and various sectors)
    the ocfs2_write_block: 94 ERROR
    ocfs2_update_disk_slot:217 ERROR
    ocfs2_put_slot:531 ERROR

    Then a big trace for the mount
    followed by a series of end_request: I/0 error, dev sdd, sector 0
  • 8. Re: power outage - lost my local storage and ocfs2 mount errors
    budachst Pro
    Currently Being Moderated
    Besides you seem to have some flaky hardware and no backup of your OVS storage repos, what everybody should have, just in case, a desaster like this happens…

    what dies blkid return?
  • 9. Re: power outage - lost my local storage and ocfs2 mount errors
    user5373812 Newbie
    Currently Being Moderated
    This is basically a POC environment that I was using to learn about the system - I didn't think I needed to build in redundancy at this level, but after this one, I agree - I definitely need it. I have a new storage server on the way, but I'd like to recover this disk if possible.

    It shows up in blkid, its specific entry is
    /dev/mapper/SATA_ST31... : LABEL="OVS4b.." UUID="0004f..." TYPE="ocfs2"
  • 10. Re: power outage - lost my local storage and ocfs2 mount errors
    user5373812 Newbie
    Currently Being Moderated
    I just noticed that /dev/sdd is not showing up in blkid

    /dev/sdb, /dev/sdc both show up with details that correspond to the OVS dev/mapper entries.
  • 11. Re: power outage - lost my local storage and ocfs2 mount errors
    budachst Pro
    Currently Being Moderated
    Have you checked the cabling of sdd? Maybe you could put this disk into another box where you can install OVM onto and see, if blkid comes up with something useful.
  • 12. Re: power outage - lost my local storage and ocfs2 mount errors
    user5373812 Newbie
    Currently Being Moderated
    What specifically would I check for? The drive appears to work until a power outage, and then will not remount and gives the errors I listed above (critical sectors, etc).
    My guess is that I could completely re-initialize the drive and present it to the server and it would work fine - but obviously that wouldn't get me anywhere.

    Is there something I can do to fix the bad sectors? Also, are there any logs I can look at to figure out whether there are any other errors I'm missing?
  • 13. Re: power outage - lost my local storage and ocfs2 mount errors
    budachst Pro
    Currently Being Moderated
    If the drive only shows up after a reboot, than there's probably something wrong with your system. That's why I suggested trying the drive on another host.
    Obviously, you're having issues getting the drive to work reliably in your current system, which is not a very good starting point for a desaster recovery.

    There is also a really good - if not the best - hard drive maintenace and recovery utility called SpinRite. This will take care of physical issues with your drive sectors, should they need repair - provided that any repair is possible. After that, I'd mount the drive on another system and have OCFS repair it, if possible.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points