This discussion is archived
1 2 Previous Next 28 Replies Latest reply: Jul 22, 2013 8:51 PM by user12024958 RSS

simple processes are very slow at intervals

986141 Newbie
Currently Being Moderated
Just installed a new T4-4 (with 2x CPUs, 128GB mem), installed oracle 11R2.0.{2|3|4}, running 3x oracle instances with each version, using approx 80GB for SGAs. Using Solaris 11.1 SRU 3.4.1.

At various interval running a simple CLI take 20-30sec, like 'ps -ef' both as root and oracle luser. CPU load is very low (runqsz<10), not sure what to look for.

sar data sampled every 300sec doesn't show much.

Oracle DBAs also complain that things like DB startup/shutdown take long time (minutes), just connecting with sqlplus can take minutes as well.

Every Instances' FS are ZFS created in three zpools/instance with settings as BCP recommends for running Oracle ontop of ZFS (record size, latency/throughput etc.).

Just wandering why the f... spawning a simple CLI like 'ps -ef' might stall for 30sec now and then, any hints are very appreciated?

Trolling the fora here, the net and Google doesn't reveal much until now...
  • 1. Re: simple processes are very slow at intervals
    Nik Expert
    Currently Being Moderated
    Hi.
    It's look like active paging/swaping process.

    You configure Oracle for: 80G * 3 = 240 GB Memory.
    Plus some memory require for kernel and zfs.
    Check and set limit for ZFS_arc_cache.


    Please show result for
    vmstat 1 5
    swap -s


    Regards.
  • 2. Re: simple processes are very slow at intervals
    986141 Newbie
    Currently Being Moderated
    Just to show a bit of the time variation in simple rapid process spawning from CLI:
    #root:~# time ps -ef > /dev/null
    
    real    0m0.054s
    user    0m0.012s
    sys     0m0.030s
    #root:~# time ps -ef > /dev/null
    
    real    0m0.048s
    user    0m0.011s
    sys     0m0.028s
    #root:~# time ps -ef > /dev/null
    
    real    0m0.041s
    user    0m0.009s
    sys     0m0.023s
    #root:~# time ps -ef > /dev/null
    
    real    0m7.621s
    user    0m0.010s
    sys     0m0.025s
    root:~# time ps -ef > /dev/null
    
    real    0m0.048s
    user    0m0.010s
    sys     0m0.028s
    #root:~# time ps -ef > /dev/null
    
    real    0m0.044s
    user    0m0.010s
    sys     0m0.024s
    
    #root:~# /usr/platform/sun4v/sbin/prtdiag 
    System Configuration:  Oracle Corporation  sun4v SPARC T4-4
    Memory size: 130560 Megabytes
    
    ================================ Virtual CPUs ================================
    
    
    CPU ID Frequency Implementation         Status
    ------ --------- ---------------------- -------
    0      2998 MHz  SPARC-T4               on-line  
    1      2998 MHz  SPARC-T4               on-line  
    2      2998 MHz  SPARC-T4               on-line  
    3      2998 MHz  SPARC-T4               on-line  
    ...
    
    125    2998 MHz  SPARC-T4               on-line  
    126    2998 MHz  SPARC-T4               on-line  
    127    2998 MHz  SPARC-T4               on-line  
    
    ======================= Physical Memory Configuration ========================
    Segment Table:
    --------------------------------------------------------------
    Base           Segment  Interleave   Bank     Contains
    Address        Size     Factor       Size     Modules
    --------------------------------------------------------------
    0x0            128 GB   4            32 GB    /SYS/PM0/CMP0/BOB0/CH0/D0
                                                  /SYS/PM0/CMP0/BOB0/CH1/D0
                                                  /SYS/PM0/CMP0/BOB1/CH0/D0
                                                  /SYS/PM0/CMP0/BOB1/CH1/D0
                                         32 GB    /SYS/PM0/CMP0/BOB2/CH0/D0
                                                  /SYS/PM0/CMP0/BOB2/CH1/D0
                                                  /SYS/PM0/CMP0/BOB3/CH0/D0
                                                  /SYS/PM0/CMP0/BOB3/CH1/D0
                                         32 GB    /SYS/PM0/CMP1/BOB0/CH0/D0
                                                  /SYS/PM0/CMP1/BOB0/CH1/D0
                                                  /SYS/PM0/CMP1/BOB1/CH0/D0
                                                  /SYS/PM0/CMP1/BOB1/CH1/D0
                                         32 GB    /SYS/PM0/CMP1/BOB2/CH0/D0
                                                  /SYS/PM0/CMP1/BOB2/CH1/D0
                                                  /SYS/PM0/CMP1/BOB3/CH0/D0
                                                  /SYS/PM0/CMP1/BOB3/CH1/D0
    
    
    #root:~# pkg info entire
              Name: entire
           Summary: entire incorporation including Support Repository Update (Oracle Solaris 11.1 SRU 3.4.1).
       Description: This package constrains system package versions to the same
                    build.  WARNING: Proper system update and correct package
                    selection depend on the presence of this incorporation.
                    Removing this package will result in an unsupported system.  For
                    more information see https://support.oracle.com/CSP/main/article
                    ?cmd=show&type=NOT&doctype=REFERENCE&id=1501435.1.
          Category: Meta Packages/Incorporations
             State: Installed
         Publisher: solaris
           Version: 0.5.11 (Oracle Solaris 11.1 SRU 3.4.1)
     Build Release: 5.11
            Branch: 0.175.1.3.0.4.1
    Packaging Date: January 15, 2013 09:52:10 PM 
              Size: 5.46 kB
              FMRI: pkg://solaris/entire@0.5.11,5.11-0.175.1.3.0.4.1:20130115T215210Z
    Mod. action: please, to keep tab formatting on forum side, use the tag
     before and after your code output, thanks.
    Edited by: N Gasparotto on Jun 3, 2013 4:17 PM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
  • 3. Re: simple processes are very slow at intervals
    986141 Newbie
    Currently Being Moderated
    No it's 80GB in total for all 3x instances (50G, 25G, 5G), no swapping is going on IFAIK
    #root:~# top -n 1
    last pid: 16973;  load avg:  7.41,  9.76,  8.56;  up 6+15:39:24        12:14:45
    243 processes: 236 sleeping, 4 zombie, 3 on cpu
    CPU states: 99.0% idle,  0.8% user,  0.2% kernel,  0.0% iowait,  0.0% swap
    Kernel: 7724 ctxsw, 2175 trap, 7963 intr, 12631 syscall, 3 fork, 244 flt
    Memory: 128G phys mem, 40G free mem, 64G total swap, 64G free swap
    
       PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
      4234 oracle      1   0    0   50G   50G cpu/4   71:15  0.77% oracle
    Mod. action: please, to keep tab formatting on forum side, use the tag
     before and after your code output, thanks.
    Edited by: N Gasparotto on Jun 3, 2013 4:18 PM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
  • 4. Re: simple processes are very slow at intervals
    986141 Newbie
    Currently Being Moderated
    Sorry You asked for below and zfs arc is caped at max 10G in /etc/system:

    set zfs:zfs_arc_max = 10737418240
    #root:~# vmstat 1 5
     kthr      memory            page            disk          faults      cpu
     r b w   swap  free  re  mf pi po fr de sr s0 s4 s5 s6   in   sy   cs us sy id
     0 0 0 82406280 32054104 159 204 0 0 0 0 0  0  8  0  0 26248 18691 29070 2 2 96
     2 0 0 69594672 21296544 294 366 0 0 0 0 0  0  4  0  0 8361 4811 7287  0  2 98
     1 0 0 69594672 21297000 0 40 0 0  0  0  0  0  0  0  0 8016 3484 7223  0  2 98
     1 0 0 69594672 21297240 243 321 0 0 0 0 0  0  0  0  0 8003 3804 7428  0  4 96
     3 0 0 69066152 20803576 260 752 0 0 0 0 0  0  4  0  0 8687 6122 8282  0 20 80
    #root:~# swap -s
    total: 74656984k bytes allocated + 6719056k reserved = 81376040k used, 69066152k available
    Mod. action: please, to keep tab formatting on forum side, use the tag
     before and after your code output, thanks.
    Edited by: N Gasparotto on Jun 3, 2013 4:19 PM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
  • 5. Re: simple processes are very slow at intervals
    986141 Newbie
    Currently Being Moderated
    find randomly cpus with mpstat show many xcals, dunno if this worrying:
     CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
      55    0   0 2066622  2    0      0     0      0      0      0      0     0   100   0   0
    as 'normal' baseline xcals are aggregated < 20k like these:
    #root:~#  mpstat -A soc 2 100
     SOC minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl sze
       0   96   0 15794 12101 4440 13255   29 1549 7506   40  8735  125 116   0 6160  64
       1  109   0 21277 14117 5363 15785   31 1735 7387   18  9969  124 122   0 6154  64
     SOC minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl sze
       0  223   0 4401  8357 1897 10623    9  784  995    0  2226  107  35   0 6258  64
       1  148   0 14525 14069 2918 21526   56 1829 1465    0 21101  107  46   0 6247  64
     SOC minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl sze
       0  839   0 6570  9673 1964 12265   11 1227 1272    0  9353   55  51   0 6294  64
       1  636   0 18113 12787 2915 19683   77 1668 1472    0 25251  221  44   0 6135  64
     SOC minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl sze
       0   28   0 4830  9415 1939 12122    9 1051  919    0  1813    8  53   0 6339  64
       1   78   0 12903 13023 2949 19837   75 1563 1168    0 27039  158  45   0 6197  64
  • 6. Re: simple processes are very slow at intervals
    Nik Expert
    Currently Being Moderated
    Hi.

    Doc 1008930.1 on support.oracle.com may be usefull for resolve this issue.

    Regards
  • 7. Re: simple processes are very slow at intervals
    800381 Explorer
    Currently Being Moderated
    What are your parallel jobs settings for your Oracle instances? Do the slowdowns correspond with a large number (hundreds or even thousands) of parallel job processes timing out and exiting at the same time? If so, this can cause your problem and IIRC you can configure those jobs so they never time out.

    Also, what's the output from
    echo ::memstat | mdb -k
    Run that as root.
  • 8. Re: simple processes are very slow at intervals
    986141 Newbie
    Currently Being Moderated
    Thanks for your reply!
    parallel_adaptive_multi_user          boolean      TRUE
    parallel_automatic_tuning          boolean      FALSE
    parallel_degree_limit               string      CPU
    parallel_degree_policy               string      MANUAL
    parallel_execution_message_size      integer      16384
    parallel_force_local               boolean      FALSE
    parallel_instance_group           string
    parallel_io_cap_enabled           boolean      FALSE
    parallel_max_servers               integer      285
    parallel_min_percent               integer      0
    parallel_min_servers               integer      0
    parallel_min_time_threshold          string      AUTO
    parallel_server                boolean      FALSE
    parallel_server_instances          integer      1
    parallel_servers_target           integer      2048
    This is wrong ImHO as we got 128 threads, and with 3 running instances each thinking there's 256 threads then they each sets automatic/default DOP to 256, but will this matter when using default parallel degree policy = manual (as before 11.2.0.2) or would this depend on possible hints by queries/lusers?
    parallel_threads_per_cpu          integer      2
    cpu_count                    integer      128
    Also got multiple defuncted oracle processes, could oracle spawn too frequently new processes to do small things, ie. breaking jobs into over-expensive parallel executions... then again seems BG processes are the top abusers of defuncts:

    #root:~# ps -fuoracle | awk '$8 ~ /<defunct>/{print $3}' - | sort -n | uniq -c | sort -n
    1 27776
    5 2055
    8 2116
    10 27514
    11 2312
    15 2306
    18 1992
    25 2174
    31 27715
    35 28784
    39 27795
    43 2020
    69 2068
    69 2220
    #root:~# ps -fuoracle | grep ora_cj
    oracle 27795 1 0 13:27:18 ? 0:18 ora_cjq0_SID1
    oracle 2068 1 0 May 28 ? 8:11 ora_cjq0_SID2
    oracle 2220 1 0 May 28 ? 4:48 ora_cjq0_SID3
    #root:~# ps -fuoracle | grep ora_smco
    oracle 2312 1 0 May 28 ? 1:22 ora_smco_SID3
    oracle 2306 1 0 May 28 ? 2:18 ora_smco_SID2
    oracle 28784 1 0 13:33:35 ? 0:08 ora_smco_SID1
    #root:~# ps -fuoracle | grep ora_mmon
    oracle 2020 1 0 May 28 ? 10:57 ora_mmon_SID2
    oracle 2174 1 0 May 28 ? 9:34 ora_mmon_SID3
    oracle 27715 1 0 13:27:02 ? 0:46 ora_mmon_SID1



    You asked for memstat:
    #root:~# echo ::memstat | mdb -k
    Page Summary                Pages                MB  %Tot
    ------------     ----------------  ----------------  ----
    Kernel                    2113647             16512   13%
    ZFS File Data             2129920             16640   13%
    Anon                     10602230             82829   63%
    Exec and libs               77295               603    0%
    Page cache                  19773               154    0%
    Free (cachelist)            13943               108    0%
    Free (freelist)           1754872             13709   10%
    Total                    16711680            130560
  • 9. Re: simple processes are very slow at intervals
    1012398 Newbie
    Currently Being Moderated
    Hi, If you connected thru Serial Terminal then, baud rate might be the reason Or anything in CRON

    Edited by: 1009395 on Jun 4, 2013 3:05 AM
  • 10. Re: simple processes are very slow at intervals
    986141 Newbie
    Currently Being Moderated
    Thanks, but I'm not on the console but through ssh on aggregated 2x 10Gb/s :)
  • 11. Re: simple processes are very slow at intervals
    986141 Newbie
    Currently Being Moderated
    trying to figure out what's stalling simple spawning of processes I tried to capture a simple ps -ef with dtrace script proc_profile.d (from this article: How To Investigate Why Commands are Slow using Truss and DTrace [ID 1416078.1]) and find this:
    #root:~# scripts/proc_profile.d -o /tmp/psef_proc_profile -c 'time ps -ef' > /dev/null
    
    real        2.6
    user        0.0
    sys         0.0
    
    #root:~# cat /tmp/psef_proc_profile
    A agg:..........................
    
      preempted                                                         0
      running                                                      462720
      condition variable                                          9506010
    S agg:..........................
    
      preempted                                         
                  unix`_resume_from_idle+0x200
                  genunix`stop+0x89c
    
    <snip>
    
    A agg:..........................
    
      preempted                                                         0
      running                                                      652050
      condition variable                                       4112848970
    S agg:..........................
    
    <snip>
    
      condition variable                                
                  unix`_resume_from_idle+0x200
                  genunix`cv_wait+0x3c
                  genunix`fed_default_sleep+0x8
                  genunix`fed_breadline+0x474
                  genunix`fed_fill_mnode+0x1c4
                  genunix`fed_fill+0x2d4
                  genunix`wallet_fill_common+0x190
                  genunix`wallet_fill+0x18
                  unix`page_create_va_common+0x134
                  unix`page_create_va+0x50
                  genunix`swap_getapage+0x19c
                  genunix`swap_getpage+0x4c
                  genunix`fop_getpage+0xdc
                  genunix`anon_zero+0x4c
                  genunix`segvn_faultpage+0x1bc
                  genunix`segvn_fault+0xb24
                  genunix`as_fault+0x3f0
                  unix`pagefault+0x8c
                  unix`trap+0xdcc
                  unix`utl0+0x4c
           1508872920
      condition variable                                
                  unix`_resume_from_idle+0x200
                  genunix`cv_wait_sig_swap_core+0x120
                  genunix`waitid+0x4b0
                  genunix`waitsys32+0x14
                  unix`syscall_trap32+0xe8
           2603976050
    it looks like slow processes are waiting a lot on 'resume from idle' condition var/mutex, this seems generally to be the case when profiling slow CLI processes.

    Assume that our DB instances are causing this properly due to mis-config and/or bug (either zfs, slowaris or rdbms), without DB running it works like a charm.

    Any clues any one?
  • 12. Re: simple processes are very slow at intervals
    800381 Explorer
    Currently Being Moderated
    I'm thinking you might be running into one of two things:

    1. Numerous Oracle parallel jobs ending all at the same time. What do your "sar -u" and "sar -v" show when you have your slow periods? If you're seeing large system CPU utilization while the number of processes drops a lot, this could be your problem. I'm not an Oracle DBA, so I don't know how to read those parallel job settings.

    2. Getting hung coalescing huge pages for Oracle. The ZFS ARC uses normal memory pages, while Oracle DB processes routinely request large memory pages. The ARC will use memory, fragmenting the pages. Then an Oracle process will request a chunk of memory consisting of large pages - so when the server is short free RAM the large pages need to be coalesced from smaller ones - and that can be SLOW. You have 10% of your RAM on the freelist, though, so I don't think that is likely to be happening. If this is happening, "sar -u" will show spikes in system CPU utilization, but without "sar -v" showing much change in the number of running processes.

    You'd get better data if you up your sar sample frequence to something like once every 10 or 15 seconds. Just make sure you don't fill up your disks...

    If you run this dtrace script while the slowdowns are happening, you'll at least know where the kernel is spending all its time:
    #!/usr/sbin/dtrace -s
    
    #pragma D option quiet
    
    profile:::profile-1001hz
    / arg0 /
    {
        @hot[ arg0 ] = count();
    }
    
    dtrace:::END
    {
        printa( "%@u %a\n", @hot );
    }
    Just get it started, let it run for a good long while, then CTRL-C to break and get the results. You're really only interested in the ones that show up at the end. If you don't get any output from it, you didn't let it run long enough. Add
    dtrace:::BEGIN
    {
         printf( "Started...\n" );
    }
    to the script so you can tell when it's actually running.
  • 13. Re: simple processes are very slow at intervals
    800381 Explorer
    Currently Being Moderated
    Oh, and the reason why either situation can cause delays in starting processes is because the virtual memory management is overwhelmed. Your process start has to contend for VM resources with either the cleanup of potentially thousands of defunct address spaces or the coalescing of potentially multiple GB of large pages from small pages.

    And that will show up as system CPU time in the sar data.

    I'm also assuming that your root cause is one of the two that I described. There's a definite chance it's something else - which is why that dtrace script is pretty important - it will show where the kernel is actually spending its time. No assuming needed. The problem is running it in situations where nothing runs properly.
  • 14. Re: simple processes are very slow at intervals
    986141 Newbie
    Currently Being Moderated
    Thanks for your reply!

    Thou I agree with your viewpoints on VM pressure, I don't think this is the case here as ARC is limited to max 10G, Oracle is running with locked ISM shm of 80G (sum of all 3x instances) thus I would think that neither do Oracle request dynamically more large pages nor do zfs ARC every grow to use rest of available memory. So would think VM shouldn't be under pressure, also no paging activity is seen. But I might be proven wrong.

    ARC stat do talk of many hash elements thou :)
    #root:~# echo ::memstat | mdb -k
    Page Summary                Pages                MB  %Tot
    ------------     ----------------  ----------------  ----
    Kernel                    2200856             17194   13%
    ZFS File Data             2129920             16640   13%
    Anon                     10486497             81925   63%
    Exec and libs               76055               594    0%
    Page cache                  18798               146    0%
    Free (cachelist)            12840               100    0%
    Free (freelist)           1786714             13958   11%
    Total                    16711680            130560
    
    
    #root:~# echo ::arc | mdb -k
    hits                      =  13456752063
    misses                    =   6791383077
    demand_data_hits          =   5172732536
    demand_data_misses        =   1416274981
    demand_metadata_hits      =   6366079007
    demand_metadata_misses    =     69373661
    prefetch_data_hits        =   1560380318
    prefetch_data_misses      =   5285379525
    prefetch_metadata_hits    =    357560202
    prefetch_metadata_misses  =     20354910
    mru_hits                  =   2662512179
    mru_ghost_hits            =            0
    mfu_hits                  =   1975721810
    mfu_ghost_hits            =            0
    deleted                   =   6967543017
    mutex_miss                =     49216220
    hash_elements             = 18446744067378501496
    hash_elements_max         = 18446744073709551615
    hash_collisions           =    180625394
    hash_chains               = 18446744071767158391
    hash_chain_max            =            9
    p                         =           59 MB
    c                         =           64 MB
    c_min                     =           64 MB
    c_max                     =        10240 MB
    size                      =           80 MB
    buf_size                  =           14 MB
    data_size                 =           51 MB
    other_size                =           14 MB
    evict_mfu                 =      5848975 MB
    evict_mru                 =     56372464 MB
    evict_l2_cached           =            0 MB
    evict_l2_eligible         =     56215884 MB
    evict_l2_ineligible       =      6005555 MB
    l2_hits                   =            0
    l2_misses                 =   1485648642
    l2_feeds                  =            0
    l2_rw_clash               =            0
    l2_read_bytes             =            0 MB
    l2_write_bytes            =            0 MB
    l2_writes_sent            =            0
    l2_writes_done            =            0
    l2_writes_error           =            0
    l2_writes_hdr_miss        =            0
    l2_evict_lock_retry       =            0
    l2_evict_reading          =            0
    l2_abort_lowmem           =            0
    l2_cksum_bad              =            0
    l2_io_error               =            0
    l2_hdr_size               =            0 MB
    memory_throttle_count     =       603207
    meta_used                 =           29 MB
    meta_max                  =          830 MB
    meta_limit                =            0 MB
    arc_no_grow               =            1
    arc_tempreserve           =            0 MB
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points