12 Replies Latest reply: Feb 18, 2013 10:49 AM by Dave Miner-Oracle RSS

    Instability and Poor Performance with 11 11/11 and 11.1

    988819
      I've upgraded an OpenSolaris install to Solaris 11.1 over time and ever since I hit Solaris 11 11/11 and Solaris 11.1 my system has been unstable and slow (especially ZFS and GDM (which I had to disable in 11/11.1 because it was using too much CPU)). Whenever I shut down in Solaris 11/11.1 it causes a kernel panic. I run this command to shutdown:
      -----
      /usr/sbin/shutdown -y -g 60 -i 5
      -----
      and it causes this (then the system auto-restarts -- it never completes the shutdown):
      -----
      TIME UUID SUNW-MSG-ID
      Jan 28 2013 23:19:14.682124000 54fbe302-2309-6f14-8d7f-c81e9c3369b7 SUNOS-8000-KL

      TIME CLASS ENA
      Jan 28 23:18:29.9322 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000

      nvlist version: 0
      version = 0x0
      class = list.suspect
      uuid = 54fbe302-2309-6f14-8d7f-c81e9c3369b7
      code = SUNOS-8000-KL
      diag-time = 1359433153 925385
      de = fmd:///module/software-diagnosis
      fault-list-sz = 0x1
      __case_state = 0x1
      topo-uuid = 78f32799-20fb-446f-b758-f24f4197b812
      fault-list = (array of embedded nvlists)
      (start fault-list[0])
      nvlist version: 0
      version = 0x0
      class = defect.sunos.kernel.panic
      certainty = 0x64
      asru = sw:///:path=/var/crash/opensolaris/.54fbe302-2309-6f14-8d7f-c81e9c3369b7
      resource = sw:///:path=/var/crash/opensolaris/.54fbe302-2309-6f14-8d7f-c81e9c3369b7
      savecore-succcess = 0
      os-instance-uuid = 54fbe302-2309-6f14-8d7f-c81e9c3369b7
      panicstr = deadman: timed out after 120 seconds of clock inactivity
      panicstack = fffffffffb9fcc56 () | genunix:cyclic_expire+ac () | genunix:cyclic_fire+76 () | unix:cbe_fire+65 () | unix:av_dispatch_autovect+74 () | unix:dispatch_hilevel+1f () | unix:switch_sp_and_call+13 () | unix:do_interrupt+f2 () | unix:cmnint+ba () | unix:mach_cpu_pause+21 () | unix:cpu_pause+7f () | unix:thread_start+8 () |
      crashtime = 1359432916
      panic-time = January 28, 2013 11:15:16 PM EST EST
      (end fault-list[0])

      fault-status = 0x1
      severity = Major
      __ttl = 0x1
      __tod = 0x51074dc2 0x28a862e0
      -----
      Additionally, I've seen a huge slowdown in ZFS performance (I kept the old boot environments for the previous versions so I went back and pulled these #s using dd after I upgraded to Solaris 11.1):
      -----
      WRITE:
      OpenSolaris SNV134     211 MB/s
      Solaris 11 Express     194 MB/s
      OpenIndiana 151a7     215 MB/s
      Solaris 11 11/11     182 MB/s
      Solaris 11.1          150 MB/s

      READ:
      OpenSolaris SNV134     470 MB/s
      Solaris 11 Express     499 MB/s
      OpenIndiana 151a7     417 MB/s
      Solaris 11 11/11     177 MB/s
      Solaris 11.1          276 MB/s
      -----
      Lastly, there's been a couple times where just running tests on my zfs pool would cause a kernel panic (like dd or bonnie++):
      -----
      TIME UUID SUNW-MSG-ID
      Jan 26 2013 18:40:21.947381000 5a9c2174-51bd-6af5-cda3-ceb12d0591bb SUNOS-8000-KL

      TIME CLASS ENA
      Jan 26 18:39:33.6420 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000

      nvlist version: 0
      version = 0x0
      class = list.suspect
      uuid = 5a9c2174-51bd-6af5-cda3-ceb12d0591bb
      code = SUNOS-8000-KL
      diag-time = 1359243621 817586
      de = fmd:///module/software-diagnosis
      fault-list-sz = 0x1
      __case_state = 0x1
      topo-uuid = 08cec1f5-1959-c812-85e3-fa1bb969b7a3
      fault-list = (array of embedded nvlists)
      (start fault-list[0])
      nvlist version: 0
      version = 0x0
      class = defect.sunos.kernel.panic
      certainty = 0x64
      asru = sw:///:path=/var/crash/opensolaris/.5a9c2174-51bd-6af5-cda3-ceb12d0591bb
      resource = sw:///:path=/var/crash/opensolaris/.5a9c2174-51bd-6af5-cda3-ceb12d0591bb
      savecore-succcess = 0
      os-instance-uuid = 5a9c2174-51bd-6af5-cda3-ceb12d0591bb
      panicstr = BAD TRAP: type=e (#pf Page fault) rp=fffffffc801bba00 addr=28 occurred in module "zfs" due to a NULL pointer dereference
      panicstack = unix:die+105 () | unix:trap+153e () | unix:cmntrap+e6 () | zfs:arc_hash_remove+28 () | zfs:arc_evict_from_ghost+c0 () | zfs:arc_adjust_ghost+4e () | zfs:arc_adjust+51 () | zfs:arc_reclaim_thread+1aa () | unix:thread_start+8 () |
      crashtime = 1359232124
      panic-time = January 26, 2013 03:28:44 PM EST EST
      (end fault-list[0])

      fault-status = 0x1
      severity = Major
      __ttl = 0x1
      __tod = 0x51046965 0x3877e308
      -----
      What could be causing all these issues -- why are the OpenSolaris and Solaris 11 Express installs faster/more stable? Is it a hardware incompatibility issue? How can I determine the root cause and fix it?

      Thanks.

      Edited by: RavenShadow on Feb 10, 2013 10:17 AM