This discussion is archived
13 Replies Latest reply: Sep 8, 2013 6:45 PM by Hemant K Chitale RSS

tkprof: high elapsed time but no waits

John Stegeman Oracle ACE
Currently Being Moderated

Hello everyone,

 

I've been doing some investigation of a performance issue on one of our customer's databases, and have run into an interesting issue that I cannot explain.

 

The background:

 

  • Oracle 11.2.0.3 running on MS Windows Server.
  • The application is built using Oracle Forms running client/server.
  • The Forms performance diagnostics point to lots of time being spent running queries, which matches our observation that the DB is the culprit for the problem described below.
  • I know it's a bad idea, but there are multiple instances running on the same server. We have done this at other clients for this particular application, and have not experienced issues. The other instances on this server are running fine and do not experience the problem described below.

 

There is one instance that works fine from a performance perspective when the database is cold started but the performance deteriorates over time (after a few days, it becomes so slow that the users complain and consider the application nearly unusable). Shutting down and restarting the instance restores the performance to a good level, and the cycle repeats. Statspack doesn't show anything of interest in the waits or SGA/PGA recommendations.

 

What we did was to enable tracing (dbms_monitor.session_trace_enable(NULL, NULL, TRUE, TRUE) in a logon trigger for a specific user) and to have a look at the trace/tkprof files to compare the "bad" trace with the "good" trace. This is where I'm getting perplexed. I've picked out a specific SQL to compare, and this is what I observe between the "bad" and "good" tkprof (tkprof snippets are later on in this post):

 

  1. The execution plans are the same.
  2. The number of reads from the buffer cache is the same. There are no physical reads in either trace file for this particular SQL.
  3. Neither the "good" nor the "bad" tkprof shows any waits
  4. The "bad" tkprof shows 1 miss in the library cache during parse (that's OK - it got hard parsed once and I understand that) and 1 miss during execute (this is more perplexing). I'm not worried about the first hard parse, as elapsed time during parse is only .02 sec. Based on the other observations in this list, I *think* I'm worried about the miss during execute.
  5. The CPU and elapsed time during fetch are the same for both.
  6. The CPU and elapsed time during execute is "bad" in the "bad" tkprof (.5 sec elapsed vs 0 sec elapsed in the good file).

 

I would have expected that there would be some waits shown in the tkprof output to show me why it's taking so long in the "bad" case, but it is showing nothing at all. Even if it's a contention with the other instances on the server, I would have expected to see waits on CPU or something like that.

 

Here's a piece from the "good" tkprof:

 

call     count       cpu    elapsed       disk      query    current        rows

------- ------  -------- ---------- ---------- ---------- ----------  ----------

Parse        1      0.00       0.00          0          0          0           0

Execute      2      0.00       0.00          0          0          0           0

Fetch        2      0.00       0.00          0         36          0           2

------- ------  -------- ---------- ---------- ---------- ----------  ----------

total        5      0.00       0.00          0         36          0           2

 

 

Misses in library cache during parse: 0

Optimizer mode: ALL_ROWS

Parsing user id: 52 

Number of plan statistics captured: 1


...snip...

 

Elapsed times include waiting on following events:

  Event waited on                             Times   Max. Wait  Total Waited

  ----------------------------------------   Waited  ----------  ------------

  SQL*Net message to client                       7        0.00          0.00

  SQL*Net message from client                     7        0.00          0.00

********************************************************************************

 

and here's the same part from the "bad" tkprof (same SQL statement, bind variables, etc):

 

call     count       cpu    elapsed       disk      query    current        rows

------- ------  -------- ---------- ---------- ---------- ----------  ----------

Parse        1      0.00       0.02          0          0          0           0

Execute      2      0.07       0.53          0          0          0           0

Fetch        2      0.00       0.00          0         36          0           2

------- ------  -------- ---------- ---------- ---------- ----------  ----------

total        5      0.07       0.56          0         36          0           2

 

 

Misses in library cache during parse: 1

Misses in library cache during execute: 1

Optimizer mode: ALL_ROWS

Parsing user id: 52 

Number of plan statistics captured: 1

 

...snip...

 

Elapsed times include waiting on following events:

  Event waited on                             Times   Max. Wait  Total Waited

  ----------------------------------------   Waited  ----------  ------------

  SQL*Net message to client                       7        0.00          0.00

  SQL*Net message from client                     7        0.00          0.00

********************************************************************************

 

Can anyone shed any light on what might be going on or point me in a direction as far as avenues to investigate or additional tracing/etc that might give more information?

 

As of right now, we're just restarting the instance when performance goes south, but that's just a band-aid and not a real fix of the problem. As I mentioned, Statspack doesn't seem to give anything additional (no waits, SGA/PGA recommendations are to leave it at the current size, etc).

 

Thanks,

 

John

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points