We have an issue with Solaris 11.0 (SRU10.5). It is installed on a super micro system, with DDR write cache, SSD read cache and SAS disks. Every now and then these boxes end up in a "ZFS pool suspended" state, the OS still responds. As a consequence we need to reboot the box. Not yet able to capture a core dump or to reproduce the conditions/error. Upgrade to 11.1 not possible, due to some other bugs. Call logged at Oracle support.
Does someone recognise this type of problem? And maybe some workarounds?
Edited by: 970534 on Nov 9, 2012 8:25 AM
This sounds like a very serious problem.
Is it the root pool that suspends across all of the systems?
I would check all the usual diagnostic data as described below.
Let me know if any of this reveals a common thread.
1. Review general system log messages:
# more /var/adm/messages
2. Review FMA fault logs:
# fmadm faulty
3. Review accumulating FMA fault info:
4. Review accumulating FMA error log:
# fmdump -eV | more
5. Check disk errors:
# iostat -en
# iostat -En
Thanks for your reply. We did all the suggestions you made in your reply. However, still stuck. We are trying to find a way to mimic the issues and cause the problem so we do a crash dump. So far no luck. Suggestions still welcome.