This discussion is archived
7 Replies Latest reply: May 31, 2012 8:25 PM by 935345 RSS

Making a hanged Solaris Node, Panic!!

935345 Newbie
Currently Being Moderated
Presently , I have a requirement, where I need to make a hanged system panic. So as to do the same , I have come across , one of the possible way, which will be helpful in few cases .i.e deadman timer.So , i am trying to test the same and make test cases, before the same can be used in Production. I have been trying to make the Solaris System Hang, but as I am newbie, I have not been successful in my attempts so far.

So , before , I go ahead, and share the details(OS , OS patch level,Sever Model etc.), I would like, to confirm, Am I on the right track? Can implementing the deadman timer, on Solaris preferably 10, will ensure, hanged system(Dodgy Storage mainly), will panic? If yes, Is there also a better way of doing the same?

Kindly share if any details are required from my end as a pre-requisite for the above mentioned queries.

Looking forward for your revert.
  • 1. Re: Making a hanged Solaris Node, Panic!!
    935345 Newbie
    Currently Being Moderated
    It would be great, if some one can please share some necessary information(any relevant links etc.) regarding the same.

    Thanks!!

    Edited by: 932342 on May 27, 2012 11:19 AM
  • 2. Re: Making a hanged Solaris Node, Panic!!
    cindys Pro
    Currently Being Moderated
    If you create a non-redundant ZFS storage pool, set the "fail mode" pool property to panic, and start pulling
    out devices from the pool, the system should hang and then panic. This helpful editor keeps putting fail mode
    as two words, but its actually one, like this:

    # zpool set failmode=panic pool-name

    I'm not sure what you are trying to accomplish, but this method is worth a try.

    Thanks,

    Cindy
  • 3. Re: Making a hanged Solaris Node, Panic!!
    935345 Newbie
    Currently Being Moderated
    Thanks Cindy.

    Just to elaborate, in last 4 months, we had encountered couple of instances, where our Solaris 10 UltraSPARC(UFS) systems have hanged. Even trying sync from OBP has not worked and as a last resort, we have to give a hard reset to the systems to make them back online. As the system was given hard reset, so we do not have any crash dumps to work on and find out the exact reason.

    From the dmesg, it seems , that the same has happened due to the dodgy Hard Disk. The setup contains Raid 0+1 JBOD.

    So, while searching for the system hangs , I have come across, deadman timer which looks promising to me. But before, I can go ahead and implement the same on our UFS system(production), I have to test the same , so as to confirm, that the same is useful in our scenario.

    If this works, it will reduce the Downtime and would provide us with the crash dump as well :)

    As your procedure, revolves around ZFS, so the same might not that useful(all systems have ufs), but thanks again for your revert. If the same concept(pulling off disk) is used in UFS, it might cause the system to Hang. So , I would google around the same and see and at the same time, would check the feasibility.

    THANKS!!

    Edited by: 932342 on May 27, 2012 11:14 AM
  • 4. Re: Making a hanged Solaris Node, Panic!!
    rukbat Guru Moderator
    Currently Being Moderated
    Moderator Comment:
    932342 wrote:
    if someone does has a copy of SUN Document ID->13258 ...
    That statement suggests you don't have service contract privileges to MOS.
    If that is correct, then if someone provides the information to you they will be violating the terms of their service agreement privileges.

    Giving something like that to anyone that does not have proper access would subject them to having all the privileges revoked by Oracle. That is a high price to pay, and it is doubtful anyone would take that chance.
    (.... and pasting a quote of the document into a forum post is the same sort of reportable violation as handing it to you.)

    ----
    Now, having said all that, it doesn't take much creativiity to use a Internet search web site to look for Sun Infodoc 13258 and get usable results.
  • 5. Re: Making a hanged Solaris Node, Panic!!
    935345 Newbie
    Currently Being Moderated
    Thanks for sharing that bit of information. I must have checked that before...

    The posts have been edited.

    Edited by: 932342 on May 27, 2012 11:31 AM
  • 6. Re: Making a hanged Solaris Node, Panic!!
    cindys Pro
    Currently Being Moderated
    I assume that you have also tried the usual diagnostics like booting with kadb if the system
    hangs to attempt to get a crash dump.

    Also, I should have mentioned that with dodgy hardware, creating non-redundant ZFS pools and then pulling the disks is not a good idea if you want your data consistent. I don't recommend this under normal conditions.

    To track down bad devices that are hanging the system, you can also review iostat -En and fmadm faulty and fmdump -eV.

    Thanks,

    Cindy
  • 7. Re: Making a hanged Solaris Node, Panic!!
    935345 Newbie
    Currently Being Moderated
    Thanks Cindy.

    Scripts that are gathering system statistics are already present. The script that contains the below commands have been scheduled to run 12 times a day using cron but in previous instances , we have not observed any issue from the logs captured.

    Thanks,
    Saurabh

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points