13 Replies Latest reply: Sep 8, 2013 8:45 PM by Hemant K Chitale RSS

    tkprof: high elapsed time but no waits

    John Stegeman

      Hello everyone,

       

      I've been doing some investigation of a performance issue on one of our customer's databases, and have run into an interesting issue that I cannot explain.

       

      The background:

       

      • Oracle 11.2.0.3 running on MS Windows Server.
      • The application is built using Oracle Forms running client/server.
      • The Forms performance diagnostics point to lots of time being spent running queries, which matches our observation that the DB is the culprit for the problem described below.
      • I know it's a bad idea, but there are multiple instances running on the same server. We have done this at other clients for this particular application, and have not experienced issues. The other instances on this server are running fine and do not experience the problem described below.

       

      There is one instance that works fine from a performance perspective when the database is cold started but the performance deteriorates over time (after a few days, it becomes so slow that the users complain and consider the application nearly unusable). Shutting down and restarting the instance restores the performance to a good level, and the cycle repeats. Statspack doesn't show anything of interest in the waits or SGA/PGA recommendations.

       

      What we did was to enable tracing (dbms_monitor.session_trace_enable(NULL, NULL, TRUE, TRUE) in a logon trigger for a specific user) and to have a look at the trace/tkprof files to compare the "bad" trace with the "good" trace. This is where I'm getting perplexed. I've picked out a specific SQL to compare, and this is what I observe between the "bad" and "good" tkprof (tkprof snippets are later on in this post):

       

      1. The execution plans are the same.
      2. The number of reads from the buffer cache is the same. There are no physical reads in either trace file for this particular SQL.
      3. Neither the "good" nor the "bad" tkprof shows any waits
      4. The "bad" tkprof shows 1 miss in the library cache during parse (that's OK - it got hard parsed once and I understand that) and 1 miss during execute (this is more perplexing). I'm not worried about the first hard parse, as elapsed time during parse is only .02 sec. Based on the other observations in this list, I *think* I'm worried about the miss during execute.
      5. The CPU and elapsed time during fetch are the same for both.
      6. The CPU and elapsed time during execute is "bad" in the "bad" tkprof (.5 sec elapsed vs 0 sec elapsed in the good file).

       

      I would have expected that there would be some waits shown in the tkprof output to show me why it's taking so long in the "bad" case, but it is showing nothing at all. Even if it's a contention with the other instances on the server, I would have expected to see waits on CPU or something like that.

       

      Here's a piece from the "good" tkprof:

       

      call     count       cpu    elapsed       disk      query    current        rows

      ------- ------  -------- ---------- ---------- ---------- ----------  ----------

      Parse        1      0.00       0.00          0          0          0           0

      Execute      2      0.00       0.00          0          0          0           0

      Fetch        2      0.00       0.00          0         36          0           2

      ------- ------  -------- ---------- ---------- ---------- ----------  ----------

      total        5      0.00       0.00          0         36          0           2

       

       

      Misses in library cache during parse: 0

      Optimizer mode: ALL_ROWS

      Parsing user id: 52 

      Number of plan statistics captured: 1


      ...snip...

       

      Elapsed times include waiting on following events:

        Event waited on                             Times   Max. Wait  Total Waited

        ----------------------------------------   Waited  ----------  ------------

        SQL*Net message to client                       7        0.00          0.00

        SQL*Net message from client                     7        0.00          0.00

      ********************************************************************************

       

      and here's the same part from the "bad" tkprof (same SQL statement, bind variables, etc):

       

      call     count       cpu    elapsed       disk      query    current        rows

      ------- ------  -------- ---------- ---------- ---------- ----------  ----------

      Parse        1      0.00       0.02          0          0          0           0

      Execute      2      0.07       0.53          0          0          0           0

      Fetch        2      0.00       0.00          0         36          0           2

      ------- ------  -------- ---------- ---------- ---------- ----------  ----------

      total        5      0.07       0.56          0         36          0           2

       

       

      Misses in library cache during parse: 1

      Misses in library cache during execute: 1

      Optimizer mode: ALL_ROWS

      Parsing user id: 52 

      Number of plan statistics captured: 1

       

      ...snip...

       

      Elapsed times include waiting on following events:

        Event waited on                             Times   Max. Wait  Total Waited

        ----------------------------------------   Waited  ----------  ------------

        SQL*Net message to client                       7        0.00          0.00

        SQL*Net message from client                     7        0.00          0.00

      ********************************************************************************

       

      Can anyone shed any light on what might be going on or point me in a direction as far as avenues to investigate or additional tracing/etc that might give more information?

       

      As of right now, we're just restarting the instance when performance goes south, but that's just a band-aid and not a real fix of the problem. As I mentioned, Statspack doesn't seem to give anything additional (no waits, SGA/PGA recommendations are to leave it at the current size, etc).

       

      Thanks,

       

      John

        • 1. Re: tkprof: high elapsed time but no waits
          Martin Preiss

          John,

           

          I think Jonathan Lewis mentioned similar phenomena more than once - here is a link that mentions some possible explanations: http://jonathanlewis.wordpress.com/2007/07/19/library-cache-miss/. An important point is that hard parses (i.e. plan optimizations) can happen during execute.


          Regards


          Martin

          • 2. Re: tkprof: high elapsed time but no waits
            sb92075

            when the slowdown is occurring what do you see at the OS level?

            which system resource (CPU, DISK, RAM, Network) is being saturated?

             

            Needless to say, you need to determine where time is actually being spent before you have any chance at making a change which corrects this situation.

            • 3. Re: tkprof: high elapsed time but no waits
              John Stegeman

              sb,

               

              I agree - we actually don't see any saturation at the OS level, which is a bit perplexing.

               

              John

              • 4. Re: tkprof: high elapsed time but no waits
                John Stegeman

                Thanks, Martin - I had seen that before posting here. I do understand that parses can happen during execute, but the questions I would have regarding this are:

                 

                • Why does the execute phase hard parse seemingly take so long compared to the parse phase one?
                • As this is a trace of a single Form execution, what would cause the plan to disappear from the SGA so quickly?
                • Why doesn't the trace file give me an indication as to what is being waited on?

                 

                John

                • 5. Re: tkprof: high elapsed time but no waits
                  sb92075

                  JohnStegeman wrote:

                   

                  sb,

                   

                  I agree - we actually don't see any saturation at the OS level, which is a bit perplexing.

                   

                  John

                   

                  during the slowdown, what results when you run adhoc SQL queries external to the application?

                  What  I am trying to determine is the Oracle DB as a whole having a problem or is the application code just getting in the way of itself?

                  Can you manually query what would be normally active tables & get reasonable response during period when application users are complaining?

                  • 6. Re: tkprof: high elapsed time but no waits
                    Martin Preiss

                    good questions: I agree that the optimization may explain the cpu but not the missing waits. Perhaps the wait interface (v$sesstat) could add some information - but I guess the problem will be to narrow down the range of necessary snapshots.

                    • 7. Re: tkprof: high elapsed time but no waits
                      Hemant K Chitale

                      Two FETCHes but Seven SQL*Net message waits ?

                       

                      No time in the FETCH but time in the EXECUTE ?    What type of SQL statement is it ?

                       

                       

                      Hemant K Chitale

                       


                      • 8. Re: tkprof: high elapsed time but no waits
                        John Stegeman

                        Hemant,

                         

                        It's a select statement.

                         

                        John

                        • 9. Re: tkprof: high elapsed time but no waits
                          sb92075

                          and here's the same part from the "bad" tkprof (same SQL statement, bind variables, etc):

                           

                          call     count       cpu    elapsed       disk      query    current        rows

                          ------- ------  -------- ---------- ---------- ---------- ----------  ----------

                          Parse        1      0.00       0.02          0          0          0           0

                          Execute      2      0.07       0.53          0          0          0           0

                          Fetch        2      0.00       0.00          0         36          0           2

                          ------- ------  -------- ---------- ---------- ---------- ----------  ----------

                          total        5      0.07       0.56          0         36          0           2

                           

                           

                          WHY are you obsessing over SELECT that takes under 0.3 seconds to execute?

                          • 10. Re: tkprof: high elapsed time but no waits
                            John Stegeman

                            I'm not necessarily obsessing over this particular statement - I've picked out this one as an example of what's going on. The net sum of the performance degradation is on the order of 5-10 seconds, and since it occurs on a screen that normally takes 2-4 seconds to display, it's a definite performance issue for the users.

                            • 11. Re: tkprof: high elapsed time but no waits
                              jgarry

                              Does it stay in this instance when you shut everything down and start them up in a different order?

                               

                              I'm thinking the perplexing lack of saturation at the OS level may simply be a limitation of whatever tools the OS uses, not showing you that something is getting swapped out or whatever.

                               

                              Conversely, if the instance continues to be the slow one after such a shuffle, the tools may not be showing you that there is really a pga problem overusing memory (I'd say "or a shared pool problem," but Oracle instrumentation would pick up that kind of slow, I think - but hey, it's windows).

                              • 12. Re: tkprof: high elapsed time but no waits
                                sb92075

                                JohnStegeman wrote:

                                 

                                I'm not necessarily obsessing over this particular statement - I've picked out this one as an example of what's going on. The net sum of the performance degradation is on the order of 5-10 seconds, and since it occurs on a screen that normally takes 2-4 seconds to display, it's a definite performance issue for the users.

                                 

                                then we needs to see the SQL that accounts for a majority of that duration.

                                • 13. Re: tkprof: high elapsed time but no waits
                                  Hemant K Chitale

                                  >The net sum of the performance degradation is on the order of 5-10 seconds

                                  Meaning that the same SQL is executed *multiple* times, not just the two times the tkprof output shows ?

                                   

                                  Hemant K Chitale