This content has been marked as final. Show 4 replies
Google for uses of zdb. You might find something useful.1 person found this helpful
DescriptionRemember - "the ZFS file system is always consistent on disk and is self-repairing".
The zdb command is used by support engineers to diagnose failures and gather statistics. Since the ZFS file system is always consistent on disk and is self-repairing, zdb should only be run under the direction by a support engineer.
If no arguments are specified, zdb, performs basic consistency checks on the pool and associated datasets, and report any problems detected.
Any options supported by this command are internal to Sun and subject to change at any time.
Thanks for the tip. I am running zdb now against the pool. I don't know if it normally takes 20 minutes or 20 months to complete. Hopefully it will find something useful.
I think my pool is pretty much borked because zdb tells me:
Traversing all blocks to verify metadata checksums and verify nothing leaked ...
zdb_blkptr_cb: Got error 50 reading <0, 0, 0, 65> -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 3249, 1, 0> -- skipping
I am trying to get a clean backup now of the remaining filesystems. This pool only has a few TB on it and it has not crashed yet today (fingers crossed). Sadly, this recovery is hampered by a strange set of timeouts every few minutes of the fibre disks similar to the following:
Jan 16 10:49:57 dl585 scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):
Jan 16 10:49:57 dl585 /scsi_vhci/disk@g20000014c350f580 (sd40): Command Timeout on path fp7/disk@w21000014c350f580,0
Those only happen during replication, but they effectively freeze pool activity for a minute or so every few minutes.
Worse, there is some strange nuance with the (Sun branded) nVidia 3D card when the machine reboots. It throws some error about invalid settings on the card, causing the console and Sun Rays to be unusable. The only way to get a clean boot is to power down then up, so crashes restart to an unusable server. This would be more palatable if each boot didn't take 8 minutes.
This is going to be a long day...
My bad, it seems I do have some shaky drives. Using 'iostat -E' and smartctl, I found the main offenders. I am running a background scan on the drives and the timeouts have calmed down.