3 Replies Latest reply on May 23, 2013 5:59 PM by 1008852

    ps hangs, can't kill

      The 'ps' command is hanging on our Solaris 11 test environment. It does not respond to kill or even kill -9. This happens even when ps is called in its plain form with no arguments, and I tried it as two different users and as root, and it hung in all cases. At least, 'ptree' is still functioning. This happens only in the global zone.

      I can think of two possible culprits. The first is that we blindly removed a zfs filesystem that had been delegated to a non-global zone. I shut that zone down and is now only in the 'installed' state, but it didn't make a difference. Truss actually shows 'ps' hanging shortly after this stat call to a different zone:

      769: stat("/dev/zcons/aodtest4/masterconsole", 0xFFFFFFFF7FFFE9B0) = 0
      769: stat("/dev/zcons/aodtest4/zoneconsole", 0xFFFFFFFF7FFFE9B0) = 0
      769: getdents(5, 0xFFFFFFFF7EB02000, 8192) = 0
      769: close(5) = 0
      769: stat("/dev/zcons/builder", 0xFFFFFFFF7FFFEBB0) = 0
      <hangs here>

      I shut down this second zone ('builder') to no effect.

      I had also accessed a NFS automount in the days before noticing this problem, although 'mount | grep ^/net' is currently showing now output.

      Any suggestions? I could reboot but I'd hate to encounter this problem in production with several zones running.

        • 1. Re: ps hangs, can't kill
          This is a feature introduced into Solaris with Solaris 10. I think it was ported from Linux.

          Processes get stuck in a way that you can't "kill -9" them.

          Live with it or reboot.
          • 2. Re: ps hangs, can't kill
            Is it possible that the disk is mounted? Can you run df and mount?

            The theory that this is something ported from linux sounds like crap, there are, in my experience, three reasons why ps might hang;

            1: a mounted filesystem (i.e /mnt) where the disk is gone

            2: a hanging namingservice, so ps can't look up usernames

            3: /proc missing or not being accessible for some reason

            You can try prstat or /usr/ucb/ps to see if those works ;)

            • 3. Re: ps hangs, can't kill
              Both prstat and ptree are OK, but there is no /usr/ucb/ps (this is Solaris 11).

              1. I had one zone root filesystem that was erroring on 'df' but that cleared after zfs umount

              2: we are not using any naming services except for DNS

              3: I am not having any trouble reading /proc in general.

              I think it's getting stuck around the zcons driver. I have another truss showing ps getting stuck shortly after reading /etc/ttysrch, having just traversed /dev/term. /dev/zcons is next in the ttysearch list and 'ls /dev/zcons' hangs.