This discussion is archived
9 Replies Latest reply: Jun 5, 2013 3:07 AM by JulianG RSS

T2000 won't boot after patching

JulianG Newbie
Currently Being Moderated
Hi - I've just O/S patched one of our Solaris 10 servers and it's not come back after the reboot. It's remote and I can't go onsite until later today but do have details of what happened just after I issued the reboot command and am wondering if anyone knows what this means? It's a SunFire T2000 running several non-global zones. Previous patch cluster applied was from 14 Feb 2013 and the kernel running was:

SunOS xxxxxxxxxxxx 5.10 Generic_147147-26 sun4v sparc SUNW,SPARC-Enterprise-T2000

I patched it with Oracles patch cluster from 2013.04.23.

Upon reboot I got this on my ssh session:

updating /platform/sun4v/boot_archive
mount: /dev/lofi/1 is not this fstype
umount: warning: //boot/create_ramdisk.14706.tmp/rd.mount.32 not in mnttab
umount: //boot/create_ramdisk.14706.tmp/rd.mount.32 not mounted
rmdir: directory "//boot/create_ramdisk.14706.tmp/rd.mount.32": Directory not empty
15+0 records in
15+0 records out

It's the lines in BOLD that I've never seen before, the patching went fine and only about 4 patches were applied. The servers O/S is mirrored via SVM and I'd split the disk mirrors before patching so worst case I'll boot from the unpatched half of the mirror .... hopefully.

Anyone seen this before or any help / advice would be appreciated.

Thanks - Julian.
  • 1. Re: T2000 won't boot after patching
    bigdelboy Pro
    Currently Being Moderated
    799786 wrote:
    Hi - I've just O/S patched one of our Solaris 10 servers and it's not come back after the reboot. It's remote and I can't go onsite until later today but do have details of what happened just after I issued the reboot command and am wondering if anyone knows what this means? It's a SunFire T2000 running several non-global zones. Previous patch cluster applied was from 14 Feb 2013 and the kernel running was:

    SunOS xxxxxxxxxxxx 5.10 Generic_147147-26 sun4v sparc SUNW,SPARC-Enterprise-T2000

    I patched it with Oracles patch cluster from 2013.04.23.

    Upon reboot I got this on my ssh session:

    updating /platform/sun4v/boot_archive
    mount: /dev/lofi/1 is not this fstype
    umount: warning: //boot/create_ramdisk.14706.tmp/rd.mount.32 not in mnttab
    umount: //boot/create_ramdisk.14706.tmp/rd.mount.32 not mounted
    rmdir: directory "//boot/create_ramdisk.14706.tmp/rd.mount.32": Directory not empty
    15+0 records in
    15+0 records out

    It's the lines in BOLD that I've never seen before, the patching went fine and only about 4 patches were applied. The servers O/S is mirrored via SVM and I'd split the disk mirrors before patching so worst case I'll boot from the unpatched half of the mirror .... hopefully.

    Anyone seen this before or any help / advice would be appreciated.

    Thanks - Julian.
    There are people who know these things far better than me, and I will easily defer, but ensure you have visited e.g. http://docs.oracle.com/cd/E19082-01/819-2379/gglbi/index.html to review the troubleshooting advice about boot archive
  • 2. Re: T2000 won't boot after patching
    JulianG Newbie
    Currently Being Moderated
    As a test I patched one of our other T2000's in the exact same way and got the exact same result but this time I was logged onto the console, below is what I got:

    root@xxxxxxxxxxx:~ 293$ reboot
    May 9 14:20:34 xxxxxxxxxxx reboot: rebooted by xxxxxxx
    updating /platform/sun4v/boot_archive
    May 9 14:21:09 xxxxxx unix: /kernel/sys/sparcv9/kaio: undefined symbol
    May 9 14:21:09 xxxxxx unix: 'aio_copyout_result_common'
    May 9 14:21:09 xxxxxx unix: WARNING: mod_load: cannot load module 'kaio'
    May 9 14:21:09 xxxxxx unix: /kernel/sys/sparcv9/kaio: undefined symbol
    May 9 14:21:09 xxxxxx unix: 'aio_copyout_result_common'
    May 9 14:21:09 xxxxxx unix: WARNING: mod_load: cannot load module 'kaio'
    May 9 14:21:10 xxxxxx ufs: NOTICE: mount: not a UFS magic number (0xffffffff)
    mount: /dev/lofi/1 is not this fstype
    umount: warning: //boot/create_ramdisk.7775.tmp/rd.mount.32 not in mnttab
    umount: //boot/create_ramdisk.7775.tmp/rd.mount.32 not mounted
    rmdir: directory "//boot/create_ramdisk.7775.tmp/rd.mount.32": Directory not empty
    15+0 records in
    15+0 records out
    May 9 14:21:40 xxxxxxx syslogd: going down on signal 15
    syncing file systems... done
    rebooting...

    SC Alert: Host System has Reset
    /

    SPARC Enterprise T2000, No Keyboard
    Copyright (c) 1998, 2011, Oracle and/or its affiliates. All rights reserved.
    OpenBoot 4.30.4.d, 32640 MB memory available, Serial #81921274.
    Ethernet address 0:14:4f:e2:4:fa, Host ID: 84e204fa.



    Boot device: /pci@780/pci@0/pci@9/scsi@0/disk@0,0:a File and args:
    -
    ERROR: /packages/ufs-file-system: Last Trap: Division by Zero

    {0} ok


    Again the server was at the same kernel revision as my previous server and was working perfectly fine beforehand. Patch procedure was to split the SVM disk mirrors, stop the non-global zones, stop all running apps and then patch. Never had a problem with this in the past.

    Thanks - Julian.
  • 3. Re: T2000 won't boot after patching
    cindys Pro
    Currently Being Moderated
    Hi,

    Did this patch include a README with specific handling instructions?

    I somewhat rusty with UFS and never used SVM before but you will see the "Division by Zero" error when you accidentally apply UFS boot blocks on a ZFS root file system. This isn't happening here but it must be related to some kind of mismatch/inconsistency in the boot device and the boot info. See below. The unknown symbol errors looks like the system doesn't understand some modules.

    I would reapply the boot blocks and update the boot archive:

    1. Boot from the network or from media.

    2. Apply the UFS boot blocks.

    # installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t0d0s0

    3. Update the boot archive.

    # mount -F ufs /dev/dsk/c1t0d0s0 /a
    # bootadm update-archive -R /a
    # umount /a

    4. Reboot.
  • 4. Re: T2000 won't boot after patching
    1008635 Newbie
    Currently Being Moderated
    We've had similar problems with T2000s, V210s, and V440s at our site. Our analysis looks something like this:

    - The patch replaces the kaio kernel module on disk.
    - When it comes time to write the boot archive out at shutdown, the system tries to load the kaio module, which fails because of the mismatched kernel versions.
    - Because it can't load kaio, it can't finish creating the loopback file for the new boot archive and blows up.

    Our solution: run */usr/sbin/modload /kernel/sys/sparcv9/kaio* before applying the patch. It's only been tested on a V210, but we have no reason yet to suspect it won't work on the other hardware.
  • 5. Re: T2000 won't boot after patching
    JulianG Newbie
    Currently Being Moderated
    Ah, that's great - well not great but thanks!! As soon as I can get one of these servers back to how it was I'll try that.

    Julian.
  • 6. Re: T2000 won't boot after patching
    JulianG Newbie
    Currently Being Moderated
    Right don't bother looking at this - Steps to repair SVM on rootdisk

    My next problem was that my server is now running of one disk - the non-patched half of the mirror. So how do I fix this first before attempting to patch this server again or get it back in to a production like state with mirrored disks. It was easier than I thought to do this.

    df -h showed this:

    /dev/dsk/c0t1d0s0 12G 8.8G 3.0G 75% /
    /dev/dsk/c0t1d0s3 3.9G 3.0G 898M 78% /var
    /dev/md/dsk/d7 33G 17G 15G 53% /logs
    /dev/dsk/c0t1d0s5 17G 272M 16G 2% /export
    /dev/dsk/c0t3d0s0 135G 2.9G 130G 3% /export/zones

    At this stage all I was bothered about was getting "/" under SVM control again and seeing if it booted, so I just did this, bearing in mind SVM had previously been used all the metadevices existed and as I found out I had to do a metaclear on anything that referred to c0t0d0. Oddly the size of c0t0d0s0 differed slightly from that of c0t1d0s0. Anyway.

    cp /etc/vfstab /etc/vfstab.beforesvm
    cp /etc/system /etc/system.beforesvm

    metaclear d0

    d0 was the rootmirror and was made up of d14 which was c0t0d0s0

    d15 was still intact and c0t1d0s0 as can been seen in the df output above


    metainit d0 -m d15

    create a new d0 mirror, well half mirror at this stage

    metaroot d0 to update vfstab


    And finally make sure your server knows to boot from disk1 and not disk.

    eeprom boot-device="disk1 disk2 net"



    And then the moment of truth.

    shutdown -y -g 0 -i 6


    To my amazement the server rebooted without issue. Now it was just a case of getting my head around the mess that was left behind. Basically destroy anything SVM related that referred to c0t0d0 and start again. So that's what I'm on with now and it appears to work just fine.

    Thanks.
  • 7. Re: T2000 won't boot after patching
    JulianG Newbie
    Currently Being Moderated
    Right so we're back in business, the server has mirrored disks again - it's a manual process and a little confusing at times given the number of slices and names etc. but it's worth it. When time allows I'm going to try the suggestions made by Nicholas Haggin and see what happens. I've also been asked if I did the patching in single user mode - I did NOT but I did stop all apps, programs and non-global zones from running - just like I always do. So will probably try in single user mode as well.

    df -h |grep dsk
    /dev/md/dsk/d0 12G 8.8G 3.0G 75% /
    /dev/md/dsk/d3 3.9G 3.0G 900M 78% /var
    /dev/md/dsk/d7 33G 17G 15G 53% /logs
    /dev/md/dsk/d5 17G 272M 16G 2% /export
    /dev/md/dsk/d9 135G 3.0G 130G 3% /export/zones

    metastat |grep "Resync in"
    Resync in progress: 2 % done
    Resync in progress: 0 % done
    Resync in progress: 0 % done
    Resync in progress: 6 % done
    Resync in progress: 0 % done

    swap -l
    swapfile dev swaplo blocks free
    /dev/md/dsk/d1 85,1 16 4202672 4202672

    I've since tried the very same patchset on a Sun V125 and V240 and it installed fine. Tried it on two T2000's and it failed - all servers patched in the same way.

    Thanks - Julian.
  • 8. Re: T2000 won't boot after patching
    1008635 Newbie
    Currently Being Moderated
    Julian,

    Apologies for not keeping track of the thread; I hadn't set up my notifications correctly to be emailed when you replied. I see you've been enjoying the awesomeness that is manual recovery from SVM glitches....

    I also apologize for not linking this earlier, as we found it useful in recovering wedged systems:

    http://docs.oracle.com/cd/E19253-01/817-1985/mirror-1/index.html

    Rebuilding the boot archive using the failsafe kernel fixed every unbootable system; force-loading the kaio module prevented the rest from becoming unbootable.
  • 9. Re: T2000 won't boot after patching
    JulianG Newbie
    Currently Being Moderated
    Hi Nicholas - apologies in the delay, thanks for your help on this one. Your fix of running "/usr/sbin/modload /kernel/sys/sparcv9/kaio" worked a treat. I'm also now pretty handy at recovering from boot failures and tinkering with SVM ;-)

    I've tried Oracles patchset from 30-5-2013 and had the exact same issues, each time removing the kernel patch 148888-03 enables the server to boot again, and then installing it after running the modload command above fixes the problem. All this is being done in single user mode.

    Again this has been on T2000s, V210s and V125s.

    Thanks again - Julian.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points