1 2 Previous Next 18 Replies Latest reply: Oct 28, 2013 4:22 AM by user12054962 RSS

    High Memory , Swap usage - severe Oracle Performance issue

    user12054962

      Hi ,

       

      Can anyone help in finding and fixing the root cause for below issue ?

       

      Issue:

      ======

       

      We have this issue in one of our customer's environment - where number of processes shoots up within few minutes : - For eg :

      -- You can notice that the number of processes increased from 789 to 1147 in 10 minutes of time .

       

      12:20:01 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15

      02:30:01 PM         0       530      0.29      0.21      0.13

      02:40:01 PM         1       563      0.09      0.10      0.09

      02:50:01 PM         0       553      0.16      0.08      0.08

      03:00:01 PM         5       789      1.08      0.59      0.27

      03:10:01 PM         8      1147      4.68      3.63      1.64 ----> number of process increased

      03:20:01 PM         1      1107      5.20      7.64      5.21

      03:30:01 PM         2      1116      4.13      5.67      5.39

      03:40:01 PM         1      1115      1.64      2.45      3.93

      03:50:01 PM         0      1094      1.71      1.95      3.02

      04:00:01 PM         4      1104      0.99      1.42      2.32

      04:10:01 PM         1      1112      2.47      1.77      2.12

      04:20:01 PM         1      1112     16.27     11.78      6.40

       

      This caused high memory utilization in the system - but as you can see the load avg is very less - So the problem is basically when the number of processes increased - the free memory in the system

      started reducing - and once the free memory got exhausted - swap started shooting up.At one point the DB server got completely hung because of 100%swap utilization - and very low memory.

       

      12:20:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
      02:30:01 PM   8345528   8431688     50.26     48964   5864200   8388600         0      0.00         0
      02:40:01 PM   8090768   8686448     51.78     51112   5883768   8388600         0      0.00         0
      02:50:01 PM   8013112   8764104     52.24     52568   5899568   8388600         0      0.00         0
      03:00:01 PM   6244916  10532300     62.78     54588   5947988   8388600         0      0.00         0
      03:10:01 PM   1901984  14875232     88.66     56912   6113344   8388600         0      0.00         0
      03:20:01 PM   1331168  15446048     92.07     58956   6198384   8388600         0      0.00         0
      03:30:01 PM    988036  15789180     94.11     61260   6223280   8388600         0      0.00         0 ---> You can see sudden
      03:40:01 PM   1045776  15731440     93.77     62488   6230012   8388600         0      0.00         0 drop in kbmemfree
      03:50:01 PM   1559860  15217356     90.70     63768   6236324   8388600         0      0.00         0
      04:00:01 PM   1510516  15266700     91.00     65056   6237992   8388600         0      0.00         0
      04:10:01 PM   1351428  15425788     91.94     66748   6314744   8388600         0      0.00         0
      04:20:01 PM     16624  16760592     99.90      2300   1551920    611792   7776808     92.71   1429484 ---> 0 free mem , so swap
      Average:      7465852   9311364     55.50     45122   4924517   8064566    324034      3.86     59562 shoots up


      PLEASE NOTE : This issue was not analyzed online - when it was happening - and i was involved in determining the root cause for this incident.

      OS watcher data helped me to understand there are lot of shadow processes on the server LOCAL=NO sessions which started in the smaller duration of

      time. This DB is part of - Oracle Ebusiness suite environment - and we are guessing that it should be some concurrent program which is triggering these

      many connections in the environment.


      My questions:

      ============

       

      (1) Which tool we can use for analyzing the root cause - *After* the incident is over - Does the
      "SQL ordered by Sharable Memory" in AWR report helps us to identify the root cause for high memory consumption in the DB server ?

       

      (ii) There are sql queries to determine the high memory consuming sqls - but in this case server got hung - so we had to reboot the DB server.

      Running sql queries post DB restart may not really help - because all the performance stats in dynamics views will be lost. So is there any way

      to run the query to find high memory consumption using snapshots details in awr report (any history tables have this data) ?


      Thanks.


      (ii) is there any other way to find out which program is triggering these many processes in the system?



        • 1. Re: High Memory , Swap usage - severe Oracle Performance issue
          sb92075

          I would AUDIT CREATE SESSION

          and/or write LOGON TRIGGER to log every new session that starts.

          From which system originates the majority of the rash of new sessions?

          • 2. Re: High Memory , Swap usage - severe Oracle Performance issue
            user12054962

            Hi

             

            • 3. Re: High Memory , Swap usage - severe Oracle Performance issue
              sb92075

              992444 wrote:

               

              Hi

               

              Oracle DB is just responding to client requests.

              If it hurts when you poke yourself in the eye, then consider to stop poking yourself in the eye.

              Oracle DB is the victim; not the culprit.

              • 4. Re: High Memory , Swap usage - severe Oracle Performance issue
                user12054962

                I Agree to your point - but again we just have an idea it must be coming from concurrent program. But questions are :

                 

                (1) how to find out which request is triggering these number of processes (its a concurrent program - but there are different concurrent programs) ?

                is there any way to find out from DB side ?

                 

                (2) what is the way to find high memory consuming sqls - post incident - does section in "SQL ordered by Sharable Memory" AWR report help to find out the memory intensive programs ?


                Blaming MT doesn't really help - i need some help in identifying the root cause.


                Thanks,

                Senthil.


                • 5. Re: High Memory , Swap usage - severe Oracle Performance issue
                  user12054962

                  Ok Let Me simplify my questions :

                   

                  (1) how we can identify the high memory consuming sql's - POST INCIDENT - using any tool ? Does any section in awr report gives this detail ?

                   

                  (ii) How to identify the root cause for sudden spike in number of processes on DB Server ? any systematic method ?

                   

                  Thanks,

                  Senthil

                  • 6. Re: High Memory , Swap usage - severe Oracle Performance issue
                    sb92075

                    >there are lot of shadow processes on the server LOCAL=NO sessions

                    Root cause is NOT on the DB Server!

                    • 7. Re: High Memory , Swap usage - severe Oracle Performance issue
                      Nikolay Savvinov

                      Hi,

                       

                      if you have the Diagnostic Pack License, then use ASH to find SQL that was consuming most PGA (see ASH basics | Oracle Diagnostician, query 12).

                       

                      Best regards,

                      Nikolay

                      • 8. Re: High Memory , Swap usage - severe Oracle Performance issue
                        user12054962

                        yes root cause is obviously not db server - my question is just is there any tool / ways on db - using which we can identify the culprit? like awr / ash / addm etc ..

                        • 9. Re: High Memory , Swap usage - severe Oracle Performance issue
                          sb92075

                          992444 wrote:

                           

                          yes root cause is obviously not db server - my question is just is there any tool / ways on db - using which we can identify the culprit? like awr / ash / addm etc ..

                          culprit originates off the DB Server; therefore the DB server is not a part of the solution.

                          • 10. Re: High Memory , Swap usage - severe Oracle Performance issue
                            user12054962

                            Hi Noklay,

                             

                             

                            This is really useful solution you have give - But is there any way to use the query again two snap ids ? This incident happened on 23rd Oct between 3PM to 4PM

                            and i have the AWR snap ids with me. Do you have query which we can use snap id information :

                             

                            select ash.sql_id,

                             

                                   replace(nvl(txt.sql_text_short, 'N/A'),

                             

                                           chr(10), '\n'

                             

                                           ) sql_text,

                             

                                   round(ASH.PGA_ALLOCATED/1024/1024) pga_mb

                             

                            from dba_hist_active_sess_history ash,

                             

                                 (

                             

                                    select sql_id, dbms_lob.substr(sql_text, :len) sql_text_short

                             

                                    from v$sql

                             

                                    union all

                             

                                    select sql_id, dbms_lob.substr(sql_text, :len) sql_text_short

                             

                                    from dba_hist_sqltext

                             

                                 ) txt

                             

                            where ash.sql_id = txt.sql_id

                             

                            and ash.sql_id is not null

                             

                            and ash.sample_time = (select max(sample_time)

                             

                                                 from dba_hist_active_sess_history

                             

                                                 where sample_time < to_date(:time, 'dd-mon-yyyy hh24:mi:ss'))

                             

                            order by ash.pga_allocated desc;

                             

                            Thanks,

                             

                            Senthil

                            • 11. Re: High Memory , Swap usage - severe Oracle Performance issue
                              user12054962

                              Hi Nikolay,

                               

                              Below query helped to get the job done :

                               

                              select sql_id,

                                    starting_time,

                                    end_time,

                              (EXTRACT(HOUR FROM run_time) * 3600

                                                  + EXTRACT(MINUTE FROM run_time) * 60

                                                  + EXTRACT(SECOND FROM run_time)) run_time_sec,

                                    READ_IO_BYTES,

                                    PGA_ALLOCATED PGA_ALLOCATED_BYTES,

                                    TEMP_ALLOCATED TEMP_ALLOCATED_BYTES

                              from  (

                              select

                                     sql_id,

                                     max(sample_time - sql_exec_start) run_time,

                                     max(sample_time) end_time,

                                     sql_exec_start starting_time,

                                     sum(DELTA_READ_IO_BYTES) READ_IO_BYTES,

                                     sum(DELTA_PGA) PGA_ALLOCATED,

                                     sum(DELTA_TEMP) TEMP_ALLOCATED

                                     from

                                     (

                                     select sql_id,

                                     sample_time,

                                     sql_exec_start,

                                     DELTA_READ_IO_BYTES,

                                     sql_exec_id,

                                     greatest(PGA_ALLOCATED - first_value(PGA_ALLOCATED) over (partition by sql_id,sql_exec_id order by sample_time rows 1 preceding),0) DELTA_PGA,

                                     greatest(TEMP_SPACE_ALLOCATED - first_value(TEMP_SPACE_ALLOCATED) over (partition by sql_id,sql_exec_id order by sample_time rows 1 preceding),0) DELTA_TEMP

                                     from

                                     dba_hist_active_sess_history

                                     where

                                     sample_time >= to_date ('2013/10/15 10:00:00','YYYY/MM/DD HH24:MI:SS')

                                     and sample_time < to_date ('2013/10/15 11:00:00','YYYY/MM/DD HH24:MI:SS')

                                     and sql_exec_start is not null

                                     and IS_SQLID_CURRENT='Y'

                                     )

                              group by sql_id,SQL_EXEC_ID,sql_exec_start

                              order by sum(DELTA_PGA) desc

                              );

                               

                              Question :

                              =========

                               

                              How this is related to high memory/swap issue on the DB node. Does PGA leaks cause this kind of issue ?

                               

                              Thanks,

                              Senthil

                              • 12. Re: High Memory , Swap usage - severe Oracle Performance issue
                                JohnWatson

                                992444 wrote:

                                 

                                Hi

                                 

                                As the spike in activity is caused by launching concurrent programs, the answer is obvious:adjust the operation of the concurrent managers!

                                You could reduce the number of processes each concurrent manager can launch. Or change the workshifts so they don't overlap. Or assign them to a lower priority consumer group.

                                Fixing this from within EBS will be much simpler than trying to diagnose the problem from outside.

                                • 13. Re: High Memory , Swap usage - severe Oracle Performance issue
                                  Nikolay Savvinov

                                  Hi,

                                   

                                  1) with ASH, you don't need any snap_id's. You just specify the time window you're interested in (i.e. sample_time > ... or sample_time between ... and ...). That's one of advantages of ASH over AWR

                                  2) after so much time elapsed since the incident, the data you need may have been purged from ASH. That's one of disadvantages of ASH compared to AWR

                                   

                                  Check how much ASH history you have:

                                   

                                  select min(sample_time) from dba_hist_active_sess_history;

                                   

                                  Best regards,

                                    Nikolay

                                  • 14. Re: High Memory , Swap usage - severe Oracle Performance issue
                                    EBSDBA

                                    992444 wrote:

                                     

                                    Hi ,

                                     

                                    Can anyone help in finding and fixing the root cause for below issue ?

                                     

                                    Issue:

                                    ======

                                     

                                    We have this issue in one of our customer's environment - where number of processes shoots up within few minutes : - For eg :

                                    -- You can notice that the number of processes increased from 789 to 1147 in 10 minutes of time .

                                     

                                    12:20:01 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15

                                    02:30:01 PM         0       530      0.29      0.21      0.13

                                    02:40:01 PM         1       563      0.09      0.10      0.09

                                    02:50:01 PM         0       553      0.16      0.08      0.08

                                    03:00:01 PM         5       789      1.08      0.59      0.27

                                    03:10:01 PM         8      1147      4.68      3.63      1.64 ----> number of process increased

                                    03:20:01 PM         1      1107      5.20      7.64      5.21

                                    03:30:01 PM         2      1116      4.13      5.67      5.39

                                    03:40:01 PM         1      1115      1.64      2.45      3.93

                                    03:50:01 PM         0      1094      1.71      1.95      3.02

                                    04:00:01 PM         4      1104      0.99      1.42      2.32

                                    04:10:01 PM         1      1112      2.47      1.77      2.12

                                    04:20:01 PM         1      1112     16.27     11.78      6.40

                                     

                                    This caused high memory utilization in the system - but as you can see the load avg is very less - So the problem is basically when the number of processes increased - the free memory in the system

                                    started reducing - and once the free memory got exhausted - swap started shooting up.At one point the DB server got completely hung because of 100%swap utilization - and very low memory.

                                     

                                    12:20:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
                                    02:30:01 PM   8345528   8431688     50.26     48964   5864200   8388600         0      0.00         0
                                    02:40:01 PM   8090768   8686448     51.78     51112   5883768   8388600         0      0.00         0
                                    02:50:01 PM   8013112   8764104     52.24     52568   5899568   8388600         0      0.00         0
                                    03:00:01 PM   6244916  10532300     62.78     54588   5947988   8388600         0      0.00         0
                                    03:10:01 PM   1901984  14875232     88.66     56912   6113344   8388600         0      0.00         0
                                    03:20:01 PM   1331168  15446048     92.07     58956   6198384   8388600         0      0.00         0
                                    03:30:01 PM    988036  15789180     94.11     61260   6223280   8388600         0      0.00         0 ---> You can see sudden
                                    03:40:01 PM   1045776  15731440     93.77     62488   6230012   8388600         0      0.00         0 drop in kbmemfree
                                    03:50:01 PM   1559860  15217356     90.70     63768   6236324   8388600         0      0.00         0
                                    04:00:01 PM   1510516  15266700     91.00     65056   6237992   8388600         0      0.00         0
                                    04:10:01 PM   1351428  15425788     91.94     66748   6314744   8388600         0      0.00         0
                                    04:20:01 PM     16624  16760592     99.90      2300   1551920    611792   7776808     92.71   1429484 ---> 0 free mem , so swap
                                    Average:      7465852   9311364     55.50     45122   4924517   8064566    324034      3.86     59562 shoots up


                                    PLEASE NOTE : This issue was not analyzed online - when it was happening - and i was involved in determining the root cause for this incident.

                                    OS watcher data helped me to understand there are lot of shadow processes on the server LOCAL=NO sessions which started in the smaller duration of

                                    time. This DB is part of - Oracle Ebusiness suite environment - and we are guessing that it should be some concurrent program which is triggering these

                                    many connections in the environment.


                                    My questions:

                                    ============

                                     

                                    (1) Which tool we can use for analyzing the root cause - *After* the incident is over - Does the
                                    "SQL ordered by Sharable Memory" in AWR report helps us to identify the root cause for high memory consumption in the DB server ?

                                     

                                    (ii) There are sql queries to determine the high memory consuming sqls - but in this case server got hung - so we had to reboot the DB server.

                                    Running sql queries post DB restart may not really help - because all the performance stats in dynamics views will be lost. So is there any way

                                    to run the query to find high memory consumption using snapshots details in awr report (any history tables have this data) ?


                                    Thanks.


                                    (ii) is there any other way to find out which program is triggering these many processes in the system?



                                     

                                    Hi,

                                    Check the dump directories(bdump, udump) of the database to see if any large trace files are generated, What is the your db version?

                                     

                                    Thanks

                                    1 2 Previous Next