1 2 Previous Next 18 Replies Latest reply: Mar 31, 2009 6:46 PM by 506787 RSS

    Corrupt Root Logical Volume

    678641
      Hey Folks,

      I'm having a bit of a problem with the Root Logical Volume / Volume Group on one of our Servers (OEL 4 - Release 7 running on Oracle VM 2.1.2). One of our SAN Administrators pulled the disk from the system by accident, and has since re-presented it to the server but now when it boots I'm getting errors on the Root Logical Volume.

      It boots in to maintenance mode and I've tried fsck'ing the root file system but it gives errors about the first SuperBlock...

      "Attempt to read block from filesystem resulted in short read while trying to open /dev/VolGroup00/LogVol00
      Could this be a zero-length partition? "

      I've tried specifying the next SuperBlock and the e2fsck runs through repairing corrupt inodes, but it doesn't make a difference, I still get errors on bootup. Any other LVM commands give an error to say that 'File Descriptor XX not closed'.

      I've booted from an OS DVD in to Rescue Mode and it mounts the Volume Groups and I can still see the files are present.

      Does anyone know of anything I can do? Are their commands I can run in Rescue Mode to fix corrupted file systems? I'm off to do some googling on the subject but if anyone could offer advice it would be much appreciated.

      Thanks
      James
        • 1. Re: Corrupt Root Logical Volume
          506787
          There are two types of errors here:
          - filesystem errors
          - LVM errors

          First you need to repair LVM, because LVM gives access to the logical volume in which the filesystem is present.

          - boot from a livecd like the rescue mode from the o/s cd or knoppix.
          Is LVM giving errors when doing a pvscan?

          - after LVM is fixed, fsck the filesystem in the logical volume
          Is the filesystem still corrupt?

          Once both are checked and resolved, boot again with the system itself.

          If the errors persists after fixing all with a boot cd, please post extensive errormessages.
          • 2. Re: Corrupt Root Logical Volume
            678641
            Bit of an update...

            In Rescue Mode (I've skipped mounting my disks) - I've run a vgscan and a vgchange -ay. All Logical Volumes were found.

            When I try to do an e2fsck of /dev/VolGroup00/LogVol00 I get the following...

            # e2fsck /dev/VolGroup00/LogVol00
            ..
            ext3 recovery flag clear, but journal has data
            Recovery flag not set in backup superblock, so running journal anyway.
            /dev/VolGroup00/LogVol00: recovering journal
            ext3 recovery flag clear, but journal has data
            REcovery flag not set in backup superblock, so running journal anyway.
            e2fsck: unable to set superblock flags on /dev/VolGroup00/LogVol00

            Any thoughts?
            Thanks
            James
            • 3. Re: Corrupt Root Logical Volume
              506787
              Okay, my assumption now is the LVM errors are resolved. (correct?)

              Now let's try to resolve the filesystem errors:

              -Boot without using the on-disk filesystems (bootcd or something else)
              -run:
               fsck −t ext2 −fC −y /dev/VolGroup00/LogVol00 
              it's possible you need to run it several times.
              Please mind it's checked as ext2 (non journalling) instead of ext3

              -If it is free of errors, recreate the journal with:
               tune2fs −j /dev/VolGroup00/LogVol00 
              • 4. Re: Corrupt Root Logical Volume
                506787
                Please mind any corruptions inside the filesystem can occur. You should check that in two stages:

                -Non-oracle
                All operating system tools use normal, buffered IO. (that I am aware of, please correct me if somebody has seen something different)

                This means that any file modifications (user-imposed, thus modified configuration files, and system imposed, like installing binaries) are done in memory, and eventually flushed to disk (by a process called 'pdflush'). When a disk is pulled, there is a chance not all modifications are already flushed to disk, thus will be corrupt.

                -Oracle
                This is the same for the oracle database files, with the exception of databases using DIO, direct IO. Direct IO is enabled if the database parameter 'filesystemio_options' is set to 'direct' or 'setall'. Direct IO bypasses the operating system buffercache and does IO (reading and writing) directly from the blockdevice, instead of asking the operating system for the block in normal mode, which caches the requested block(s) in the operating system/linux buffercache.

                That is why normal/cached IO also is said to do 'double buffering', because there are two cache's ('buffers') involved.

                -Okay, but it's journalled, isn't it?
                Eh, yes.
                But it depends on the type of journalling. Journalling might not be what you expect it to be.

                Most filesystems, including NTFS and default ext3, only journal metadata transactions. Metadata transactions mean that only changes to the structure of the filesystem are journalled, not data-transactions.

                Please mind that it is possible for ext3 to journal all modifications instead of only meta-data with the 'data=journal' mount option.
                • 5. Re: Corrupt Root Logical Volume
                  678641
                  Yes, LVM seems to be fine...

                  When I try to issue the 'fsck -t ext2 -fC -y /dev/VolGroup00/LogVol00 command I get the following...

                  WARNING: couldn't open /etc/fstab: No such file or directory
                  ..
                  /dev/VolGroup00/LogVol00: recoverying journal
                  fsck.ext2: unable to set superblock flags on /dev/VolGroup00/LogVol00

                  So it appears to still be treating it as an ext3 filesystem?

                  Thanks for your help so far!
                  James
                  • 6. Re: Corrupt Root Logical Volume
                    506787
                    Can you try specifying another superblock?
                    • 7. Re: Corrupt Root Logical Volume
                      506787
                      Also, are you sure you haven't mounted it?
                      • 8. Re: Corrupt Root Logical Volume
                        678641
                        If I try running mke2fs -n /dev/VolGroup00/LogVol00 I get the following Superblock backups listed...

                        32768, 98304, 163840, 229376

                        I've running e2fsck -b for each but I'm still getting errors (unable to set superblock flags on /dev/...).
                        • 9. Re: Corrupt Root Logical Volume
                          678641
                          In response to your mounted question - I skipped mounting them when entering Rescue Mode, and when I do a df -k I only get the following...

                          FileSystem... Mounted on
                          rootfs /
                          /dev/root.old /
                          /tmp/loop0 /
                          • 10. Re: Corrupt Root Logical Volume
                            506787
                            But do you startup using an alternate operating system disk, or is booting done using the internal, local disks?
                            • 11. Re: Corrupt Root Logical Volume
                              678641
                              I started up using the Oracle DVD
                              • 12. Re: Corrupt Root Logical Volume
                                506787
                                Okay.

                                Just to be certain: are you root?
                                What are the rights set on the logical volume (if I recall correctly /dev/volumegroup/logicalvolume is a link, we need the rights on the device itself)?

                                This message could indicate the slice is presented readonly to the machine.

                                A quick test should be able to proof this.

                                PLEASE MIND THIS IS DANGEROUS!

                                This copy the first 1 kilobyte of the logical volume to a file called 'tt' in /tmp:
                                dd if=/dev/VolGroup00/LogVol00 of=/tmp/tt bs=1k count=1
                                And copy it back:
                                dd if=/tmp/tt of=/dev/VolGroup00/LogVol00 bs=1k count=1
                                And see if this yields any errors.
                                • 13. Re: Corrupt Root Logical Volume
                                  678641
                                  Tried that but no joy, I'm still getting the superblock errors. The annoying thing is, if I restart in rescue mode and ask it to search and mount the filesystems it does without error and I can see and browse the files.

                                  Is it possible to 'rebuild' a filesystem or to change parameters to force it to mount?

                                  Thanks
                                  James
                                  • 14. Re: Corrupt Root Logical Volume
                                    506787
                                    Can you elaborate way more?

                                    What do you do exactly when all goes well, and what do you do exactly when it goes wrong?

                                    You need to take into account we cannot look into your system, and know what messages are appearing. All this information is needed in order to help you.

                                    Yes, it is possible to change things inside the filesystem, but you need to make the picture complete.

                                    Provide all information about when you are able to mount the filesystem, and similar information about when you get errors.
                                    1 2 Previous Next