This discussion is archived
1 2 Previous Next 18 Replies Latest reply: Dec 18, 2013 7:42 AM by cindys RSS

How to repair a corrupted zfs filesystems?

lancetan Newbie
Currently Being Moderated

I am running Solaris 11 x86. I had a zpool which was running on 2 mirrored 3TB disks. Ok, the short story is, some of zfs file systems on this pool have permanent errors, including the root zfs filesystem, which means I'v lost the whole pool, about 2TB data.

 

The following is the status of my pool:

root@solaris:~# zpool status -v dps

  pool: dps

state: DEGRADED

status: One or more devices has experienced an error resulting in data

        corruption.  Applications may be affected.

action: Restore the file in question if possible. Otherwise restore the

        entire pool from backup.

   see: http://support.oracle.com/msg/ZFS-8000-8A

  scan: resilvered 442K in 0h0m with 551 errors on Thu Sep 12 00:11:57 2013

config:

 

        NAME      STATE     READ WRITE CKSUM

        dps       DEGRADED     0     0    12

          c4t1d0  DEGRADED     0     0    24

 

device details:

 

        c4t1d0  DEGRADED          too many errors

        status: FMA has degraded this device.

        action: Run 'fmadm faulty' for more information. Clear the errors

                using 'fmadm repaired'.

           see: http://support.oracle.com/msg/ZFS-8000-GH for recovery

 

 

errors: Permanent errors have been detected in the following files:

 

        dps/Sharepoint/VirtualDisks:<0x0>

        dps:<0x0>

        dps:/VirtualBoxDisks/xppro64.vdi

        dps/Media:<0x0>

        dps/Sharepoint:<0x0>

 

And here is the long story, one day my server hung and I had to force it powered off and when it started up again, I saw an error said one of the file on that zfs has permanent error on one of the disks. Since I had mirror disks, I tried to fix it by resilvering the disks. And the pool was still in service during the resilvering. After a few hours, I found that the resilvering seemed got stuck and more errors appeared. I then detached one disk from the mirror in order to preserve a copy from more damaging. But later, the detached disk became completely useless, "zpool import" couldn't find any pool info on that disk. I tried my best to recover it by exporting-importing the pool, rebooting the system, etc. But unfortunately thing get worse and worse, and end up with loosing the whole pool.

 

Does any one here knows if this pool is still recoverable fully or partially? Any suggestion what I should do next? Any input would be greatly appreciated.

  • 1. Re: How to repair a corrupted zfs filesystems?
    cindys Pro
    Currently Being Moderated

    Hi--

     

    Sounds like this pool had 2 DEGRADED disks which caused the data corruption. Its unusual that both disks started failing at the same time. Unfortunately, detaching a disk from a pool wipes off the pool info. Resilvering onto DEGRADED disks won't help resolve the existing corrupted data. Data doesn't go bad on its own, but disks do which causes bad data so you have to resolve the disk problems before you can resolve the data problems.

    You can use these commands to determine when the devices problems started:

     

    # fmadm faulty

    # iostat -En

    # fmdump

     

    Then, use this command to review the fmdump reports:

     

    # fmdump -v -u <EVENT-ID>

     

    I would also rule out a larger problem like a bad cable or a controller problem.

     

    If you had REPLACED one of the DEGRADED disks instead of detaching it, then you might have recovered this pool. You would most likely have some corrupted data and I can't tell how severe it is from the paths above. Is this system running on VB or just hosting VMs with VB?

     

    Thanks, Cindy

  • 2. Re: How to repair a corrupted zfs filesystems?
    lancetan Newbie
    Currently Being Moderated

    Hi Cindy,

     

    Thanks for your reply.

     

    I assumed the problem was at the controller but not the disk, cause I thought both disk failed at the same time would be a really rare case.  I had sent the MB to the manufacture to have it repaired. BTW, I'm using GigaByte GA-Z77M-D3H-MVP. GigaByte said the MB is repaired, but it didn't say what problem they found. I'm not quite sure, but I hope the hardware is problem is eliminated. Of course I'll monitoring it closely.

     

    How idiot was I !!  I should have thought that detaching disk would not preserve the data, for the basic security sake. The current status is kind disaster to me. Almost all my digital life were in that pool. The disaster happened at 3 months ago, after several attempts of repairing, things got worse and worse, and I was scared of touching it any more. I have turned off the server for 3 months and until recently, I thought probably someone online could help then I started to seeking help here.

     

    Cindy, do you think this pool is repairable, by any magic tool or manually with amazing expertise? And I just had another thought. Detaching wipes out pool info, however, does it wipe out all zfs filesystem info? My best guess is it doesn't, if this is the case, can I try to rebuild the pool info without clean up zfs filesystem info?

     

    Thanks,

    Lance

  • 3. Re: How to repair a corrupted zfs filesystems?
    cindys Pro
    Currently Being Moderated

    Hi Lance,

     

    We won't know the state of this pool until you can get the system back together. I doubt that this will completely solve the data corruption. Do you have snapshots of these file systems? Snapshots don't always help because they originate from the same location of the file systems and share the same blocks.

     

    Let us know when the system is back and up and running.

     

    Trying to recover data when the pool info is lost is not impossible, but very difficult. It will be easier to try to recover from the remaining disk, I think.

     

    Thanks, Cindy

  • 4. Re: How to repair a corrupted zfs filesystems?
    lancetan Newbie
    Currently Being Moderated

    Hi Cindy,

     

    The system is back now, unfortunately I don't have snapshots for it. Pleas let me know what I should do next.

     

    Thanks,

    Lance

  • 5. Re: How to repair a corrupted zfs filesystems?
    cindys Pro
    Currently Being Moderated

    Hi Lance,

     

    Can you attempt to import this pool, if its not available:

     

    # zpool import dps

     

    Let us know the result.

     

    I'm catching a plane shortly so will not be available again until this evening.

     

    Thanks, Cindy

  • 6. Re: How to repair a corrupted zfs filesystems?
    lancetan Newbie
    Currently Being Moderated

    The pool is already imported. the status info I posted is its current status.

  • 7. Re: How to repair a corrupted zfs filesystems?
    cindys Pro
    Currently Being Moderated

    HI Lance,

     

    I need to review some things first and I'll get back to you. Don't give up just yet. Thanks,Cindy

  • 8. Re: How to repair a corrupted zfs filesystems?
    cindys Pro
    Currently Being Moderated

    Lance,

     

    If the higher level system problems are resolved then I would try the next steps to see if the device issues are resolved.

     

    # zpool online dps c4t1d0

    # zpool clear dps

     

    Let me know the results. Thanks, Cindy

  • 9. Re: How to repair a corrupted zfs filesystems?
    lancetan Newbie
    Currently Being Moderated

    Hi Cindy,

     

    Here are the results,

    root@solaris:~# zpool online dps c4t1d0

    warning: device 'c4t1d0' onlined, but remains in degraded state

    root@solaris:~# zpool clear dps

    root@solaris:~# echo $?

    0

     

    Seems nothing happened.

  • 10. Re: How to repair a corrupted zfs filesystems?
    cindys Pro
    Currently Being Moderated

    Lance,

     

    I take it that the root pool is functioning on this system and just dps is a problem.

    Looks like the disk is failing so lets see if FMA confirms this problem.

     

    # fmdump -eV > /tmp/fmdump.out

    # grep c4t1d0 /tmp/fmdump.out

     

    If c4t1d0 is listed in this file, then vi the file to find out the date of the problems.

    Maybe this is a separate problem from the motherboard problem but it is hard to say.

    If the root pool is fine then maybe this is separate disk failure.

     

    If this disk needs to be replaced based on the FMA data, then I want to check with

    some experts to see if we should try to recover the data before the disk is replaced.

     

    Thanks, Cindy

  • 11. Re: How to repair a corrupted zfs filesystems?
    lancetan Newbie
    Currently Being Moderated

    Hi Cindy,

     

    There are totally 110 occurrence of c4t1d0 in fmdump output and all of them are denote the same device, vdev_path = /dev/dsk/c4t1d0s0.

     

    The first date it appeared was July 21, 2013, there are three entries at almost the same time. And then there are a bunch of entries happened at Aug 30 and 31, which was the day I had problem of accessing this pool. And then Nov 30, which was the day I restarted my system after repaired the MB.

     

    The following is the first batch in fmdump output,

     

       1373 Jul 21 2013 23:27:52.414828706 ereport.fs.zfs.probe_failure

       1374 nvlist version: 0

       1375         class = ereport.fs.zfs.probe_failure

       1376         ena = 0x3dcdf54e21e00401

       1377         detector = (embedded nvlist)

       1378         nvlist version: 0

       1379                 version = 0x0

       1380                 scheme = zfs

       1381                 pool = 0xae7e2d470cb4144c

       1382                 vdev = 0xb410bba0db8b87fd

       1383         (end detector)

       1384

       1385         pool = dps

       1386         pool_guid = 0xae7e2d470cb4144c

       1387         pool_context = 0

       1388         pool_failmode = wait

       1389         vdev_guid = 0xb410bba0db8b87fd

       1390         vdev_type = disk

       1391         vdev_path = /dev/dsk/c4t1d0s0

       1392         vdev_devid = id1,sd@SATA_____ST3000DM001-9YN1____________W1F0K4F3/a

       1393         parent_guid = 0xee23f913afcb43d2

       1394         parent_type = mirror

       1395         prev_state = 0x0

       1396         __ttl = 0x1

       1397         __tod = 0x51eca6b8 0x18b9c8a2

       1398

       1399 Jul 21 2013 23:27:52.414875304 ereport.fs.zfs.io

       1400 nvlist version: 0

       1401         class = ereport.fs.zfs.io

       1402         ena = 0x3dcdf5595c600c01

       1403         detector = (embedded nvlist)

       1404         nvlist version: 0

       1405                 version = 0x0

       1406                 scheme = zfs

       1407                 pool = 0xae7e2d470cb4144c

       1408                 vdev = 0xb410bba0db8b87fd

       1409         (end detector)

       1410

       1411         pool = dps

       1412         pool_guid = 0xae7e2d470cb4144c

       1413         pool_context = 0

       1414         pool_failmode = wait

       1415         vdev_guid = 0xb410bba0db8b87fd

       1416         vdev_type = disk

       1417         vdev_path = /dev/dsk/c4t1d0s0

       1418         vdev_devid = id1,sd@SATA_____ST3000DM001-9YN1____________W1F0K4F3/a

       1419         parent_guid = 0xee23f913afcb43d2

       1420         parent_type = mirror

       1421         zio_err = 6

       1422         zio_txg = 0x27ab27a

       1423         zio_offset = 0xb219648000

       1424         zio_size = 0x20000

       1425         zio_objset = 0x10

       1426         zio_object = 0x19

       1427         zio_level = 0

       1428         zio_blkid = 0x4de6

       1429         __ttl = 0x1

       1430         __tod = 0x51eca6b8 0x18ba7ea8

       1431

       1432 Jul 21 2013 23:27:52.414875452 ereport.fs.zfs.io

       1433 nvlist version: 0

       1434         class = ereport.fs.zfs.io

       1435         ena = 0x3dcdf5595c600c01

       1436         detector = (embedded nvlist)

       1437         nvlist version: 0

       1438                 version = 0x0

       1439                 scheme = zfs

       1440                 pool = 0xae7e2d470cb4144c

       1441                 vdev = 0xb410bba0db8b87fd

       1442         (end detector)

       1443

       1444         pool = dps

       1445         pool_guid = 0xae7e2d470cb4144c

       1446         pool_context = 0

       1447         pool_failmode = wait

       1448         vdev_guid = 0xb410bba0db8b87fd

       1449         vdev_type = disk

       1450         vdev_path = /dev/dsk/c4t1d0s0

       1451         vdev_devid = id1,sd@SATA_____ST3000DM001-9YN1____________W1F0K4F3/a

       1452         parent_guid = 0xee23f913afcb43d2

       1453         parent_type = mirror

       1454         zio_err = 6

       1455         zio_txg = 0x27ab27a

       1456         zio_offset = 0xb219648000

       1457         zio_size = 0x20000

       1458         zio_objset = 0x10

       1459         zio_object = 0x19

       1460         zio_level = 0

       1461         zio_blkid = 0x4de6

       1462         __ttl = 0x1

       1463         __tod = 0x51eca6b8 0x18ba7f3c

     

    Cindy, please let me know if you need the whole fmdump.out.

     

    Thanks,

    Lance

  • 12. Re: How to repair a corrupted zfs filesystems?
    cindys Pro
    Currently Being Moderated

    Hi Lance,

     

    No, I don't need the entire fmdump.out. This is enough to see that this disk has had problems and continues

    to have problems. I think it needs to be replaced but I want to see if we can try to rescue the data before

    the disk replacement. I'll get back to you tomorrow.

     

    Thanks, Cindy

  • 13. Re: How to repair a corrupted zfs filesystems?
    lancetan Newbie
    Currently Being Moderated

    Cindy, I see your point. I hope you can find a way to rescue the data. Again, since the pool had mirrored disks, is it possible to replicate the pool info from c4t1d0 to the detached disk. I wish that disk is good and zfs info are not corrupted.

     

    Thanks,

    Lance

  • 14. Re: How to repair a corrupted zfs filesystems?
    cindys Pro
    Currently Being Moderated

    Hi Lance,

     

    Corrupted pool recovery is not my speciality but I discussed your issues with someone who is a recovery expert and we

    have a few ideas:

     

    1. Keep the detached disk available, don't overwrite it or do anything with it just yet.

    2. Can you create a new ZFS pool on an extra spare disk?

    3. If you can do #2 above, then please create a new pool and copy all of your existing data from the existing dps pool.

    4. Let me know if you can do 2-3 and if the data is reasonably good shape.

    5. If you can't do 2-3, then we need go to more involved steps.

     

    Thanks, Cindy

1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points