2 Replies Latest reply: Mar 25, 2014 5:50 AM by Tommyreynolds-Oracle RSS

    Random server lockups

    7f821735-5942-49b9-85fd-a0e30d7da831

      Here's the environment:

       

      OEL 5.5 (unpatched)

      Oracle DB 11.2.0.2

      VMWare virtual instance

       

      Client is experiencing random lockups.  Not very frequent; like once every two-three months.  SAR data doesn't show any anomalies (or so I'm told).  VMWare console is blank; server pings but no login response from the net; no response from DB.

       

      I'm going to suggest they update to 5.10 w/UEK.  In the meantime, are there any known issues with 5.5 that would cause this behavior?  Any issues with OEL 5.5 and RDBMS 11.2.0.2?

       

      Thanks

        • 1. Re: Random server lockups
          Dude!

          Nothing like this has been mentioned in the forum before. Any clues in the message log? Updating is probably a good idea, in particular since the UEK is a tickless kernel. I would also suggest to check the virtual environment and VM software, and consider that the client's computer could be the source of the problem too.

          • 2. Re: Random server lockups
            Tommyreynolds-Oracle

            Random lock-ups: oh, joy.


            A server typically becomes autistic when it experiences either memory fragmentation, memory starvation, or a storage I/O failure that congests the system.


            Oracle has an "OSWatcher" tool, that is really just some shell scripts that run top(1), vmstat(8), free(1) and the like on a regular basis.  Every server should always be running OSWatcher; it's cheap insurance.


            You should also setup the KDUMP facility to dump core to a LOCAL DISK in the event of a kernel panic.


            The system console is an invaluable source of diagnostic information, always keep the moral equivalent of a terminal attached to it.


            1. When the lockup occurs, can you SSH in to the server?
            2. During the lockup, can you "root" login at the console?
            3. How do you recover from the lock-up?  Command "# reboot"?  System reset?  Power cycle?  The answer to this question helps determine just how locked-up the server is.