9 Replies Latest reply: Oct 7, 2011 3:30 AM by 892521 RSS

    /lib/svc/method/fs-usr failing at boot.

    892521
      Hi all,

      I'm having a wierd problem, when I reboot my Solaris 10 server, it fails during boot on the service "usr:default" to mount disks. When entering maintenance mode, if I clear the service twice, the server starts booting again and everything works as it should.

      Now, I already have run fsck on all disks suposedly failing, I've resynced the meta disks, but the error keeps appearing. Here's a copy of the /etc/svc/volation/system-filesystem-usr:default.log file:

      [ start + 1.34s Enabled. ]
      [ okt  4 15:26:18 Executing start method ("/lib/svc/method/fs-usr") ]

      WARNING: Automatic update of the boot archive failed.
      Update the archives using 'bootadm update-archive'
      command and then reboot the system from the same device that
      was previously booted.

      [ okt  4 15:26:19 Method "start" exited with status 95 ]
      [ okt  4 15:27:04 Leaving maintenance because clear requested. ]
      [ okt  4 15:27:04 Enabled. ]
      [ okt  4 15:27:04 Executing start method ("/lib/svc/method/fs-usr") ]
      /dev/dsk/c0t0d0s1: Overlapping swap files are not allowed
      The / file system (/dev/md/rdsk/d0) is being checked.
      /dev/md/dsk/d0: /dev/md/dsk/d0 IS CURRENTLY MOUNTED READ/WRITE.
      /dev/md/dsk/d0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
      The /usr file system (/dev/md/rdsk/d2) is being checked.
      /dev/md/dsk/d2: /dev/md/dsk/d2 IS CURRENTLY MOUNTED READ/WRITE.
      /dev/md/dsk/d2: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

      WARNING: Reboot required.
      The system has updated the cache of files (boot archive) that is used
      during the early boot sequence. To avoid booting and running the system
      with the previously out-of-sync version of these files, reboot the
      system from the same device that was previously booted.

      [ okt  4 15:27:05 Method "start" exited with status 95 ]
      [ okt  4 15:27:08 Leaving maintenance because clear requested. ]
      [ okt  4 15:27:08 Enabled. ]
      [ okt  4 15:27:08 Executing start method ("/lib/svc/method/fs-usr") ]
      /dev/dsk/c0t0d0s1: Overlapping swap files are not allowed
      The / file system (/dev/md/rdsk/d0) is being checked.
      /dev/md/dsk/d0: /dev/md/dsk/d0 IS CURRENTLY MOUNTED READ/WRITE.
      /dev/md/dsk/d0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
      The /usr file system (/dev/md/rdsk/d2) is being checked.
      /dev/md/dsk/d2: /dev/md/dsk/d2 IS CURRENTLY MOUNTED READ/WRITE.
      /dev/md/dsk/d2: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
      [ okt  4 15:27:08 Method "start" exited with status 0 ]

      Anyone have any idea how to fix this?

      Thanks, wybren.
        • 1. Re: /lib/svc/method/fs-usr failing at boot.
          abrante
          It complains on overlapping swap devices, whats the output of:

          prtvtoc /dev/rdsk/c0t0d0s1

          ?

          .7/M.
          • 2. Re: /lib/svc/method/fs-usr failing at boot.
            abrante
            .. and the prtvtoc of the other disk used in the metadevice-configuration..

            .7/M.
            • 3. Re: /lib/svc/method/fs-usr failing at boot.
              892521
              * /dev/rdsk/c0t0d0s1 partition map
              *
              * Dimensions:
              * 512 bytes/sector
              * 63 sectors/track
              * 255 tracks/cylinder
              * 16065 sectors/cylinder
              * 17847 cylinders
              * 17845 accessible cylinders
              *
              * Flags:
              * 1: unmountable
              * 10: read-only
              *
              * Unallocated space:
              * First Sector Last
              * Sector Count Sector
              * 0 69593580 69593579
              *
              * First Sector Last
              * Partition Tag Flags Sector Count Sector Mount Directory
              0 2 00 69593580 217086345 286679924
              1 3 01 16065 2104515 2120579
              2 5 00 0 286679925 286679924
              3 4 00 8144955 20482875 28627829
              4 4 00 28627830 40965750 69593579
              6 0 00 2120580 64260 2184839
              7 8 00 4048380 4096575 8144954
              8 1 01 0 16065 16064


              and

              * /dev/rdsk/c0t1d0s1 partition map
              *
              * Dimensions:
              * 512 bytes/sector
              * 63 sectors/track
              * 255 tracks/cylinder
              * 16065 sectors/cylinder
              * 17847 cylinders
              * 17845 accessible cylinders
              *
              * Flags:
              * 1: unmountable
              * 10: read-only
              *
              * Unallocated space:
              * First Sector Last
              * Sector Count Sector
              * 0 69593580 69593579
              *
              * First Sector Last
              * Partition Tag Flags Sector Count Sector Mount Directory
              0 2 00 69593580 217086345 286679924
              1 3 01 16065 2104515 2120579
              2 5 00 0 286679925 286679924
              3 4 00 8144955 20482875 28627829
              4 4 00 28627830 40965750 69593579
              6 0 01 2120580 64260 2184839
              7 8 00 4048380 4096575 8144954
              8 1 01 0 16065 16064


              But this happened overnight, I rebooted the server a lot more before, and no errors appeared..

              Thanks again.
              • 4. Re: /lib/svc/method/fs-usr failing at boot.
                892521
                I'm thinking the swap complaint is to be ignored in this matter:

                swap 8,8G 904K 8,8G 1% /etc/svc/volatile

                Perhaps this is mounted by the "maintenance mode" scripts, and therefor giving this warning. I think we'll have to concentrate on the original error:

                [ okt  4 15:26:18 Executing start method ("/lib/svc/method/fs-usr") ]

                WARNING: Automatic update of the boot archive failed.
                Update the archives using 'bootadm update-archive'
                command and then reboot the system from the same device that
                was previously booted.

                [ okt  4 15:26:19 Method "start" exited with status 95 ]

                This is where it hangs, and enters the maintenance mode..

                Although I did the suggested 'bootadm update-archive' several times, the error keeps appearing..
                • 5. Re: /lib/svc/method/fs-usr failing at boot.
                  892521
                  Well, I've really tried everything:

                  I deleted the boot_archive file and rebooted the server in failsafe mode, mounted one disk of the mirrored meta disks and recreated the bootarchive (bootadm update-archive -f -R /a), the system resynced the disks after the reboot with no errors, did an auto filecheck and rebooted again automagically. The error was NOT gone.

                  I've done a fsck on every fs I could imagine (md and non md), with no errors..

                  I checked all the scripts used by the service usr:default and everything even remotely connected to the boot-archive. I'm sure there is an error somewhere, since the exit code is not 0 somewhere, but I'm also sure bootadm is not the one generating the exit code, since nothing is wrong there.

                  I was no fan of Solaris before, and I'm no fan now. Why would you want to stop a booting system for an error which is not documented nor forseen in any of the startscripts.. An error which is not logged anywhere and cannot be found. This system is crap.

                  My solaris server is running smoothly after booting like this:

                  entering safemode:

                  # mountall
                  all is mounted
                  # svcadm clear usr:default
                  some random errors about a swap file
                  # svcadm clear usr:default
                  all is started as it should.

                  If anyone has any idea, even a small one, please let me know, because this is not a good way to start a system..
                  • 6. Re: /lib/svc/method/fs-usr failing at boot.
                    abrante
                    A long shoot, but what does the vfstab look like?

                    .7/M.
                    • 7. Re: /lib/svc/method/fs-usr failing at boot.
                      892521
                      #device device mount FS fsck mount mount
                      #to mount to fsck point type pass at boot options
                      #
                      fd - /dev/fd fd - no -
                      /proc - /proc proc - no -
                      #/dev/dsk/c0t0d0s1 - - swap - no -
                      /dev/md/dsk/d0 /dev/md/rdsk/d0 / ufs 1 no -
                      #/dev/dsk/c0t0d0s3 /dev/rdsk/c0t0d0s3 /usr ufs 1 no -
                      /dev/md/dsk/d2 /dev/md/rdsk/d2 /usr ufs 1 no -
                      #/dev/dsk/c0t0d0s7 /dev/rdsk/c0t0d0s7 /export/home ufs 2 yes -
                      /dev/md/dsk/d4 /dev/md/rdsk/d4 /export/home ufs 2 yes -
                      #/dev/dsk/c0t0d0s4 /dev/rdsk/c0t0d0s4 /usr/local ufs 2 yes -
                      /dev/md/dsk/d3 /dev/md/rdsk/d3 /usr/local ufs 2 yes -
                      #/dev/dsk/c0t2d0s0 /dev/rdsk/c0t2d0s0 /zones ufs 2 yes -
                      #/dev/md/dsk/d5 /dev/md/rdsk/d5 /zones ufs 2 yes -
                      /devices - /devices devfs - no -
                      sharefs - /etc/dfs/sharetab sharefs - no -
                      ctfs - /system/contract ctfs - no -
                      objfs - /system/object objfs - no -
                      swap - /tmp tmpfs - yes -


                      I have been busy with altering the scripts, well more with getting the output written to log files..

                      the fs-usr script gives no errors, but it handles the file: "boot-archive-needs-update", which is mode by the "boot_archive" script (service). In this script, this command gives an exit code of 1 (error one) which has no further info available.. There is no output to stout nor sterror, man pages state "The program exited with some error".

                      The command responsible for this is:

                      /sbin/bootadm update-archive -vnC

                      after this, $? = 1 so that update file is created..

                      However, I cannot recreate this output once the server is fully booted, nor in maintenance mode.. Makes me think this service is started too fast, so I let it sleep for 10 seconds, but it did not help..

                      This then makes me think there is a service starting after boot-archive, which should be started before it.. However, I did not find it so far..

                      I still find it strange, because I altered the boot-archive script in the way it does not create the boot-archive-needs-update file, and the server boot fine, all I get is 2 services not booting, eeprom and ppp-cahce-update..

                      However, this is still not the way to start a server..
                      • 8. Re: /lib/svc/method/fs-usr failing at boot.
                        892521
                        I've altered the boot-archive script, so it mounts de / partition first, then does the update thing and unmount.

                        I'm guessing usr:default is going into "online" state before it is finished mounting all file systems, as I get errors for eeprom now: /sbin/eeprom not found..

                        So I let the bootarchive script sleep 20 seconds before successfully exiting, and I let the fs-usr script do the same..

                        I also changed the dependency on the eeprom service, it now waits until svc:/system/filesystem/local:default is online.

                        The server now boots without any errors.
                        • 9. Re: /lib/svc/method/fs-usr failing at boot.
                          892521
                          Although the inititial problem is not identified and really the question remains open, the problem is fixed in my opinion.. All checks implemented by the Solaris creators are done and passed.. Still can't figure out why this happened..