9 Replies Latest reply on Jul 2, 2013 11:35 AM by 986141

    shutdown normal|immediate slow on new install of 11R2.0.3.{2|3|4} on Solaris


      Just installed a new pair of SPARC T4-4 (each w/2x CPUs, 128GB mem) with Solaris 11.1 sru 3.4.1 to run a Data Guard solution for 3x instances using 3x diff. Oracle 11R2 versions (

      But all three instances on both pro & sec server randomly indicates Time Drift warnings in their alert logs and vktm traces, but both solaris boxes are running ntp sync fine.
      Also various operations like instance startup/shutdown takes a long time, as well as a simple launch of sqlplus can take minutes. Also spawning a simple CLI like 'ps -ef' can sometimes take up to 20-30 sec. Neither swap nor cpu is loaded.

      Could it be a known bug seen before or a faulty init ora parameter and/or combination?

      Also posted this under Solaris 11 Forum:
      simple processes are very slow at intervals

      TIA for any hints!

      snip of a VKTM trace:

      Trace file /global/oraadmin/diag/rdbms/<redacted>/trace/<redacted>_vktm_27415.trc Oracle Database 11g Enterprise Edition Release - 64bit Product ion ORACLE_HOME = /global/orabase/product/ System name:    SunOS Node name:      <redacted> Release:        5.11 Version:        11.1 Machine:        sun4v Instance name: <redacted> Redo thread mounted by this instance: 0 <none> Oracle process number: 3 Unix process pid: 27415, image: oracle@<redacted> (VKTM) *** 2013-06-03 12:51:02.036 *** SESSION ID:(1609.1) 2013-06-03 12:51:02.036 *** CLIENT ID:() 2013-06-03 12:51:02.036 *** SERVICE NAME:() 2013-06-03 12:51:02.036 *** MODULE NAME:() 2013-06-03 12:51:02.036 *** ACTION NAME:() 2013-06-03 12:51:02.036 kstmmainvktm: succeeded in setting elevated priority highres_enabled *** 2013-06-03 12:51:02.036 VKTM running at (1)millisec precision with DBRM quantum (100)ms [Start] HighResTick = 4585953873256 kstmrmtickcnt = 0 : ksudbrmseccnt[0] = 1370256662 *** 2013-06-03 12:55:54.137 kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by (2984405)usec at (4586245994477) whereas (1000000) is allowed *** 2013-06-03 12:56:13.269 kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by (19133925)usec at (4586265128403) whereas (1000000) is allowed kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time stalled at 1370256973 kstmchkdrift (kstmhighrestimecntkeeper:lowres): Stall, backward drift ended at 1370256974 drift: 1 kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by (7023737)usec at (4586963505968) whereas (1000000) is allowed

      Mod. action: please, to keep tab formatting on forum side, use the tag

       before and after your code output, thanks. lease, Edited by: N Gasparotto on Jun 3, 2013 4:16 PM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
        • 1. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
          Not directly linked to your problem, but...
          Just installed a new pair of SPARC T4-4 (each w/2x CPUs, 128GB mem) with Solaris 11.1 sru 3.4.1 to run a Data Guard solution for 3x instances using 3x diff. Oracle 11R2 versions (
          I wonder where did you find the, it does not exists yet on any plateform.
          Also posted this under Solaris 11 Forum:
          simple processes are very slow at intervals
          Are you disatisfied with the reponses out there ? Don't want to continue out there ? Do you think that will help for clarity to duplicate thread over the forums ?

          That said, you may or may not have hit a known bug, at least output is very similar:
          *VKTM Trace Files Generated With KSTMCHKDRIFT "BACKWARD DRIFT ENDED AT" [ID 1185093.1]*+

          • 2. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
            Sorry typo, I meant{2-4}.

            No I'm not disatisfied with answers under Solaris 11 forum, just expecting maybe to reach wider audience with different experience.
            I assume it's 11R2 with is somehow over-utilizing the HW as without instances running there no issues seen.

            prstat seems to indicate oracle process having high %LCK state (70-100%) assuming they sometimes locks kernel structures so even simple other process are slowed down. Maybe Oracle thinks is can use to much threading. Our DBAs forced data guard not to launch too many BG processes as Oracle counts 128 CPUs, which match our 2 sockets x 8 cores x 8 threads/core.

            All instances are enableNUMA_support = TRUE but even with out we still see the same pattern.
            • 3. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
              Thanks for your suggestion but as I have read the doc then the patch for Bug 9843304 is included in the patchset and then also for PSUs*

              I think that the reason we see VKTM trace/drift is not high load as this is not the case, but more lock contention emulating high load. Question is just what lock and why and how to work around it...

              Hoped that other SPARC T4 + 11R2 users might have seen similar issue and know what to tweak of either init ora/solaris setting or if bug is known.

              Just to show a snap in time. miles varies off course and issue (as mentioned in Solaris 11 forum) is not seen constantly and not during below sampling...
              #root:~# top -n 2
              last pid:  6594;  load avg:  8.07,  6.17,  7.32;  up 6+21:28:53        18:04:14
              249 processes: 239 sleeping, 5 zombie, 5 on cpu
              CPU states: 95.5% idle,  1.0% user,  3.5% kernel,  0.0% iowait,  0.0% swap
              Kernel: 67325 ctxsw, 624 trap, 53970 intr, 55554 syscall, 15 flt
              Memory: 128G phys mem, 14G free mem, 64G total swap, 64G free swap
               20916 oracle     18  20    0   50G   50G cpu/86 146:50  0.77% oracle
                6202 oracle      7  51    0   25G   25G cpu/90   0:25  0.18% oracle
              prstat -m -U oracle -n 10
               21426 oracle    28 0.0 0.0 0.0 0.0 0.0  72 0.0   2  11  39   0 oracle/1
               20916 oracle   5.5 0.0 0.0 0.0 0.0  89 5.6 0.0   3  24  8K   0 oracle/18
                6180 oracle   2.2 2.3 0.0 0.0 0.0  62  33 0.2  8K 287 20K   0 oracle/7
                6212 oracle   1.9 1.9 0.0 0.0 0.0  67  29 0.2  8K 269 19K   0 oracle/8
                6196 oracle   1.9 1.8 0.0 0.0 0.0  67  29 0.2  8K 279 20K   0 oracle/8
                6209 oracle   1.8 1.8 0.0 0.0 0.0  67  30 0.2  8K 311 19K   0 oracle/8
                6178 oracle   1.5 1.0 0.0 0.0 0.0  65  32 0.2  1K  18  1K   0 oracle/7
                6202 oracle   1.4 1.0 0.0 0.0 0.0  67  30 0.1 974  18  1K   0 oracle/8
                6156 oracle   1.9 0.4 0.0 0.0 0.0  87  11 0.1  7K 193 16K   0 oracle/17
                6158 oracle   1.9 0.4 0.0 0.0 0.0  83  15 0.1  5K 217 11K   0 oracle/13
              Total: 171 processes, 3808 lwps, load averages: 9.36, 7.14, 7.48
              #root:~# plockstat -CH -n 10 -e 5 -p 20916
              Mutex hold
              Count     nsec Lock                         Caller
                  2    22935 libc.so.1`_time_lock         oracle`slgtd+0x70
                  1    27510 libc.so.1`_time_lock         oracle`slgtd+0x70
                  2    13120 libc.so.1`libc_malloc_lock   oracle`lwemfaa+0x60
                  2     8720 libc.so.1`libc_malloc_lock   oracle`sltstidinit+0x4
                  2     7975 libc.so.1`__aio_mutex        libc.so.1`_aio_close+0x44
                  2     5150 libc.so.1`libc_malloc_lock   oracle`sltsmxd+0x30
                  2     4935 libc.so.1`libc_malloc_lock   oracle`sltsmxi+0x4
                  2     3990 libc.so.1`libc_malloc_lock   oracle`sltstiddestroy+0x4
                  2     3945 libc.so.1`libc_malloc_lock   oracle`slwmmputmem+0x10
                  1     5910 libc.so.1`_time_lock         oracle`sLdiGetDate+0x6c
              • 4. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
                Hoped that other SPARC T4 + 11R2 users might have seen similar issue and know what to tweak of either init ora/solaris setting or if bug is known.
                As Oracle uses it's own VOS (Virtual Operating System) it is very unlikely this is a Slowaris specific bug.
                It is also very unlikely, anything can be 'tweaked', and by 'tweaking' or changing undocumented parameters you may well invalidate your support license.

                Please note this problem is not specific to the Slowaris platform.

                Please also note that Oracle is heavily instrumented and even on an instance without any real world users, there are still many processes that need to be shut down.

                One common trick has been for years
                alter system set job_queue_processes = 0 scope=memory

                Doing so, the instrumentation would 'die' automatically.

                Sybrand Bakker
                Senior Oracle DBA
                • 5. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
                  Sybrand, thank you for your reply!

                  By tweaking I didn't hint necessarily to undocumented setting(s), I'll gladly adjust doc. settings as well :)

                  Assume you don't run Slowaris, but this could be an issue on this SMP+NUMA platform with this combo of SW, one never knows. Anyway you'd not seen similar issues with a 11R2 I can assume.

                  Know Oracle is heavily instrumented, we run multiple instances in our shop, both on Linux and Slowaris, but never seen so long responses for these OPs on other installations before, we can be talking +40 min for a shutdown without real users connected.

                  Edited by: 983138 on Jun 4, 2013 1:49 PM
                  • 6. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
                    So I've found that we might have cpu scheduling issues due to high condition var access in unix`_resume_from_idle slowing things down and then I'm wondering about the VMTK processes running with (1)millisec precision with DBRM quantum (100)ms.

                    Do VKTM needs to run at 1ms precision or could I do with say 10ms or is this due to vktm saying that highres is enabled when it warns of Time Drift and can this be altered for an oracle instance at all or is this determined by the HW setup/resources?
                    #root:~# ps -fuoracle| grep ora_vktm
                      oracle 26330     1   0   Jun 03 ?          12:47 ora_vktm_SID1
                      oracle   540     1   0   May 30 ?          60:22 ora_vktm_SID2
                      oracle 29771     1   0   Jun 03 ?          15:05 ora_vktm_SID3
                    #root:~# prstat -p 26330,540,29771 -vc 1
                    Please wait...
                       540 oracle   0.3 0.5 0.0 0.0 1.1 0.0  98 0.4 969   0 969   0 oracle/1
                     29771 oracle   0.2 0.5 0.0 0.0 0.0 0.0  99 0.3 985   0 985   0 oracle/1
                     26330 oracle   0.2 0.4 0.0 0.0 0.0 0.0  99 0.3 986   0 985   0 oracle/1
                    Total: 3 processes, 3 lwps, load averages: 1.42, 1.55, 1.59
                    profiling the proc through D it shows most time spend like many others are condition variable unix`_resume_from_idle+0x200

                    #root:~# more /tmp/fihvktm_proc_profile.d
                    A agg:..........................
                      preempted                                                         0
                      running                                                   123245060
                      condition variable                                       3879920970
                    S agg:..........................
                      condition variable                                
                    • 7. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
                      Apparently not alone wondering about VKTM resource consumption

                      VKTM IN ORACLE 11G
                      • 8. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris

                        Seems we solved the issue by patching SOlaris from SRU 3.4.1 to latest SRU 7.5.0, somewhere inbetween maybe a VM patch helped (Support thought so) Anyway have seen the issue since patching to 7.5.0.

                        • 9. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris

                          Turned out that it most properly wasn't the sru7.5.0 that fixed the issue.


                          Still experienced issues, specially when memory utilization got above 90%.


                          Currently we assume this to have been due to a new feature with 2GB large memory pages, which has been turned off by:


                          disable 2G pages. Add the following line to /etc/system and reboot:


                          set disable_ism_large_pages=0x74


                          Disabling 2G pages dramatically reduced our memory utilization (30-40%) at the same time and doesn't seem to have affect perf.

                          The 2G pages is a recent addition, and in some cases it may cause performance issues, when it really should have improved performance a bit according to kernel perf. team.



                          To check if you use 2G pages, do a pmap -xs f.ex. on ora_vktm_SID process(es):


                          ps -fuoracle|awk '/ora_vktm_/{print $2}'| while read pid; do pmap -xs $pid | grep 2G; done


                          as described here: "Oracle Database Startup And SGA Value Change Takes A Very Long Time To Complete", [Doc ID 1475621.1]