This discussion is archived
9 Replies Latest reply: Jul 2, 2013 4:35 AM by 986141 RSS

shutdown normal|immediate slow on new install of 11R2.0.3.{2|3|4} on Solaris

986141 Newbie
Currently Being Moderated

Just installed a new pair of SPARC T4-4 (each w/2x CPUs, 128GB mem) with Solaris 11.1 sru 3.4.1 to run a Data Guard solution for 3x instances using 3x diff. Oracle 11R2 versions (11.2.0.3.2-4)

But all three instances on both pro & sec server randomly indicates Time Drift warnings in their alert logs and vktm traces, but both solaris boxes are running ntp sync fine.
Also various operations like instance startup/shutdown takes a long time, as well as a simple launch of sqlplus can take minutes. Also spawning a simple CLI like 'ps -ef' can sometimes take up to 20-30 sec. Neither swap nor cpu is loaded.

Could it be a known bug seen before or a faulty init ora parameter and/or combination?

Also posted this under Solaris 11 Forum:
simple processes are very slow at intervals

TIA for any hints!

snip of a VKTM trace:
--------------------------

Trace file /global/oraadmin/diag/rdbms/<redacted>/trace/<redacted>_vktm_27415.trc Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Product ion ORACLE_HOME = /global/orabase/product/11.2.0.3.4/db_1 System name:    SunOS Node name:      <redacted> Release:        5.11 Version:        11.1 Machine:        sun4v Instance name: <redacted> Redo thread mounted by this instance: 0 <none> Oracle process number: 3 Unix process pid: 27415, image: oracle@<redacted> (VKTM) *** 2013-06-03 12:51:02.036 *** SESSION ID:(1609.1) 2013-06-03 12:51:02.036 *** CLIENT ID:() 2013-06-03 12:51:02.036 *** SERVICE NAME:() 2013-06-03 12:51:02.036 *** MODULE NAME:() 2013-06-03 12:51:02.036 *** ACTION NAME:() 2013-06-03 12:51:02.036 kstmmainvktm: succeeded in setting elevated priority highres_enabled *** 2013-06-03 12:51:02.036 VKTM running at (1)millisec precision with DBRM quantum (100)ms [Start] HighResTick = 4585953873256 kstmrmtickcnt = 0 : ksudbrmseccnt[0] = 1370256662 *** 2013-06-03 12:55:54.137 kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by (2984405)usec at (4586245994477) whereas (1000000) is allowed *** 2013-06-03 12:56:13.269 kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by (19133925)usec at (4586265128403) whereas (1000000) is allowed kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time stalled at 1370256973 kstmchkdrift (kstmhighrestimecntkeeper:lowres): Stall, backward drift ended at 1370256974 drift: 1 kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by (7023737)usec at (4586963505968) whereas (1000000) is allowed

Mod. action: please, to keep tab formatting on forum side, use the tag

 before and after your code output, thanks. lease, Edited by: N Gasparotto on Jun 3, 2013 4:16 PM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
  • 1. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
    Nicolas.Gasparotto Oracle ACE
    Currently Being Moderated
    Not directly linked to your problem, but...
    Just installed a new pair of SPARC T4-4 (each w/2x CPUs, 128GB mem) with Solaris 11.1 sru 3.4.1 to run a Data Guard solution for 3x instances using 3x diff. Oracle 11R2 versions (11.2.0.2-4)
    I wonder where did you find the 11.2.0.4, it does not exists yet on any plateform.
    Also posted this under Solaris 11 Forum:
    simple processes are very slow at intervals
    Are you disatisfied with the reponses out there ? Don't want to continue out there ? Do you think that will help for clarity to duplicate thread over the forums ?

    That said, you may or may not have hit a known bug, at least output is very similar:
    *VKTM Trace Files Generated With KSTMCHKDRIFT "BACKWARD DRIFT ENDED AT" [ID 1185093.1]*+

    Nicolas.
  • 2. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
    986141 Newbie
    Currently Being Moderated
    Sorry typo, I meant 11.2.0.3.{2-4}.

    No I'm not disatisfied with answers under Solaris 11 forum, just expecting maybe to reach wider audience with different experience.
    I assume it's 11R2 with is somehow over-utilizing the HW as without instances running there no issues seen.

    prstat seems to indicate oracle process having high %LCK state (70-100%) assuming they sometimes locks kernel structures so even simple other process are slowed down. Maybe Oracle thinks is can use to much threading. Our DBAs forced data guard not to launch too many BG processes as Oracle counts 128 CPUs, which match our 2 sockets x 8 cores x 8 threads/core.

    All instances are enableNUMA_support = TRUE but even with out we still see the same pattern.
  • 3. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
    986141 Newbie
    Currently Being Moderated
    Thanks for your suggestion but as I have read the doc then the patch for Bug 9843304 is included in the 11.2.0.2 patchset and then also for PSUs 11.2.0.3.*

    I think that the reason we see VKTM trace/drift is not high load as this is not the case, but more lock contention emulating high load. Question is just what lock and why and how to work around it...

    Hoped that other SPARC T4 + 11R2 users might have seen similar issue and know what to tweak of either init ora/solaris setting or if bug is known.


    Just to show a snap in time. miles varies off course and issue (as mentioned in Solaris 11 forum) is not seen constantly and not during below sampling...
    #root:~# top -n 2
    last pid:  6594;  load avg:  8.07,  6.17,  7.32;  up 6+21:28:53        18:04:14
    249 processes: 239 sleeping, 5 zombie, 5 on cpu
    CPU states: 95.5% idle,  1.0% user,  3.5% kernel,  0.0% iowait,  0.0% swap
    Kernel: 67325 ctxsw, 624 trap, 53970 intr, 55554 syscall, 15 flt
    Memory: 128G phys mem, 14G free mem, 64G total swap, 64G free swap
    
       PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
     20916 oracle     18  20    0   50G   50G cpu/86 146:50  0.77% oracle
      6202 oracle      7  51    0   25G   25G cpu/90   0:25  0.18% oracle
    
    prstat -m -U oracle -n 10
       PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
     21426 oracle    28 0.0 0.0 0.0 0.0 0.0  72 0.0   2  11  39   0 oracle/1
     20916 oracle   5.5 0.0 0.0 0.0 0.0  89 5.6 0.0   3  24  8K   0 oracle/18
      6180 oracle   2.2 2.3 0.0 0.0 0.0  62  33 0.2  8K 287 20K   0 oracle/7
      6212 oracle   1.9 1.9 0.0 0.0 0.0  67  29 0.2  8K 269 19K   0 oracle/8
      6196 oracle   1.9 1.8 0.0 0.0 0.0  67  29 0.2  8K 279 20K   0 oracle/8
      6209 oracle   1.8 1.8 0.0 0.0 0.0  67  30 0.2  8K 311 19K   0 oracle/8
      6178 oracle   1.5 1.0 0.0 0.0 0.0  65  32 0.2  1K  18  1K   0 oracle/7
      6202 oracle   1.4 1.0 0.0 0.0 0.0  67  30 0.1 974  18  1K   0 oracle/8
      6156 oracle   1.9 0.4 0.0 0.0 0.0  87  11 0.1  7K 193 16K   0 oracle/17
      6158 oracle   1.9 0.4 0.0 0.0 0.0  83  15 0.1  5K 217 11K   0 oracle/13
    Total: 171 processes, 3808 lwps, load averages: 9.36, 7.14, 7.48
    
    
    #root:~# plockstat -CH -n 10 -e 5 -p 20916
            0
    Mutex hold
    
    Count     nsec Lock                         Caller
    -------------------------------------------------------------------------------
        2    22935 libc.so.1`_time_lock         oracle`slgtd+0x70
        1    27510 libc.so.1`_time_lock         oracle`slgtd+0x70
        2    13120 libc.so.1`libc_malloc_lock   oracle`lwemfaa+0x60
        2     8720 libc.so.1`libc_malloc_lock   oracle`sltstidinit+0x4
        2     7975 libc.so.1`__aio_mutex        libc.so.1`_aio_close+0x44
        2     5150 libc.so.1`libc_malloc_lock   oracle`sltsmxd+0x30
        2     4935 libc.so.1`libc_malloc_lock   oracle`sltsmxi+0x4
        2     3990 libc.so.1`libc_malloc_lock   oracle`sltstiddestroy+0x4
        2     3945 libc.so.1`libc_malloc_lock   oracle`slwmmputmem+0x10
        1     5910 libc.so.1`_time_lock         oracle`sLdiGetDate+0x6c
  • 4. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
    sybrand_b Guru
    Currently Being Moderated
    Hoped that other SPARC T4 + 11R2 users might have seen similar issue and know what to tweak of either init ora/solaris setting or if bug is known.
    As Oracle uses it's own VOS (Virtual Operating System) it is very unlikely this is a Slowaris specific bug.
    It is also very unlikely, anything can be 'tweaked', and by 'tweaking' or changing undocumented parameters you may well invalidate your support license.

    Please note this problem is not specific to the Slowaris platform.

    Please also note that Oracle is heavily instrumented and even on an instance without any real world users, there are still many processes that need to be shut down.

    One common trick has been for years
    alter system set job_queue_processes = 0 scope=memory

    Doing so, the instrumentation would 'die' automatically.

    ------------
    Sybrand Bakker
    Senior Oracle DBA
  • 5. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
    986141 Newbie
    Currently Being Moderated
    Sybrand, thank you for your reply!

    By tweaking I didn't hint necessarily to undocumented setting(s), I'll gladly adjust doc. settings as well :)

    Assume you don't run Slowaris, but this could be an issue on this SMP+NUMA platform with this combo of SW, one never knows. Anyway you'd not seen similar issues with a 11R2 I can assume.

    Know Oracle is heavily instrumented, we run multiple instances in our shop, both on Linux and Slowaris, but never seen so long responses for these OPs on other installations before, we can be talking +40 min for a shutdown without real users connected.

    Edited by: 983138 on Jun 4, 2013 1:49 PM
  • 6. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
    986141 Newbie
    Currently Being Moderated
    So I've found that we might have cpu scheduling issues due to high condition var access in unix`_resume_from_idle slowing things down and then I'm wondering about the VMTK processes running with (1)millisec precision with DBRM quantum (100)ms.

    Do VKTM needs to run at 1ms precision or could I do with say 10ms or is this due to vktm saying that highres is enabled when it warns of Time Drift and can this be altered for an oracle instance at all or is this determined by the HW setup/resources?
    #root:~# ps -fuoracle| grep ora_vktm
      oracle 26330     1   0   Jun 03 ?          12:47 ora_vktm_SID1
      oracle   540     1   0   May 30 ?          60:22 ora_vktm_SID2
      oracle 29771     1   0   Jun 03 ?          15:05 ora_vktm_SID3
    
    #root:~# prstat -p 26330,540,29771 -vc 1
    Please wait...
       PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
       540 oracle   0.3 0.5 0.0 0.0 1.1 0.0  98 0.4 969   0 969   0 oracle/1
     29771 oracle   0.2 0.5 0.0 0.0 0.0 0.0  99 0.3 985   0 985   0 oracle/1
     26330 oracle   0.2 0.4 0.0 0.0 0.0 0.0  99 0.3 986   0 985   0 oracle/1
    Total: 3 processes, 3 lwps, load averages: 1.42, 1.55, 1.59
    profiling the proc through D it shows most time spend like many others are condition variable unix`_resume_from_idle+0x200

    #root:~# more /tmp/fihvktm_proc_profile.d
    
    A agg:..........................
    
      preempted                                                         0
      running                                                   123245060
      condition variable                                       3879920970
    S agg:..........................
    
      preempted                                         
                  unix`_resume_from_idle+0x200
                  genunix`stop+0x89c
                  genunix`issig_forreal+0x204
                  genunix`cv_timedwait_sig_hires+0x218
                  genunix`cv_waituntil_sig+0x90
                  genunix`nanosleep+0xd4
                  unix`syscall_trap+0xc8
                    0
      running                                           
                  unix`resume+0x4
                  genunix`cv_timedwait_sig_hires+0x180
                  genunix`cv_waituntil_sig+0x90
                  genunix`nanosleep+0xd4
                  unix`syscall_trap+0xc8
            123245060
      condition variable                                
                  unix`_resume_from_idle+0x200
                  genunix`cv_timedwait_sig_hires+0x180
                  genunix`cv_waituntil_sig+0x90
                  genunix`nanosleep+0xd4
                  unix`syscall_trap+0xc8
           3879920970
  • 7. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
    986141 Newbie
    Currently Being Moderated
    Apparently not alone wondering about VKTM resource consumption

    VKTM IN ORACLE 11G
  • 8. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
    986141 Newbie
    Currently Being Moderated

    Seems we solved the issue by patching SOlaris from SRU 3.4.1 to latest SRU 7.5.0, somewhere inbetween maybe a VM patch helped (Support thought so) Anyway have seen the issue since patching to 7.5.0.

  • 9. Re: shutdown normal|immediate slow on new install of 11R2.0.{2|3|4} on Solaris
    986141 Newbie
    Currently Being Moderated

    Turned out that it most properly wasn't the sru7.5.0 that fixed the issue.

     

    Still experienced issues, specially when memory utilization got above 90%.

     

    Currently we assume this to have been due to a new feature with 2GB large memory pages, which has been turned off by:

     

    disable 2G pages. Add the following line to /etc/system and reboot:

    ---

    set disable_ism_large_pages=0x74

     

    Disabling 2G pages dramatically reduced our memory utilization (30-40%) at the same time and doesn't seem to have affect perf.

    The 2G pages is a recent addition, and in some cases it may cause performance issues, when it really should have improved performance a bit according to kernel perf. team.

     

     

    To check if you use 2G pages, do a pmap -xs f.ex. on ora_vktm_SID process(es):

     

    ps -fuoracle|awk '/ora_vktm_/{print $2}'| while read pid; do pmap -xs $pid | grep 2G; done

     

    as described here: "Oracle Database Startup And SGA Value Change Takes A Very Long Time To Complete", [Doc ID 1475621.1]

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points