Hi.. My server is running in Sol10. It has got two non-global zones hosted in it in which the database is running.
There was some complain from the database team that they were not able to login to the server. When I checked, it the status of the local zones were fine. But when tried to "# zlogin" to them, it got hung. So i tried to " # zlogin -S <zone_name>" and i was able to login in the failsafe mode but not able to execute any command in it. Any command from "uptime", "zfs list", gets hung and i had to forcefully logout.
So I tried to halt the non-global zones first and then boot it. But here, it got stuck in "shutting_down" state.
When tried to kill the processes of the non-global zones using "kill -9", it failed to kill the processes.
so I rebooted the global zone which fixed the issue. But then, 10 days later, the same issue came up.
I followed the same steps to fix the issue but i'm afraid this issue might come up again since i think rebooting the global zone server is a temporary fix.
I logged a call with Oracle Support for this, but the server looks fine from the explorer output that was provided.
Has anyone faced this same problem? What can i do to fix this issue permanantly?
two pieces of information are needed:
1. what does the /var/adm/messages file of the zone tell you?
2. how is the zone setup? Where and how is the root filesystem setup and what filesystems are lofi mounted from the global zone?
These infos should give us a clue.
If you encounter the issue again in future, please get a system crash dump by panicing the global zone. This will allow us (support) to review the crash dump and understand why the zone failed to shut down. It will have been waiting on a resource and without the dump there's simply no way to know what or why.
IIRC we recently (with the past month) did a putback of a bug (which I can't find the ID of right now) whereby if a zone doesn't hang on the way down we'll fork a new instance of the zone and leave the old refs in their hung state. So it's worth ensuring that you're running the latest Patchset.
The server is running in
# uname -a
SunOS myriad 5.10 Generic_141444-09 sun4u sparc SUNW,Sun-Fire-V440
I'll ensure that if the same issue comes up, there is a dump taken when we reboot it.
Thanks for the inputs.