Forum Stats

  • 3,733,981 Users
  • 2,246,853 Discussions
  • 7,856,964 Comments

Discussions

AWR SQL Report: cannot tell where most of the wait time is going to

bderous
bderous Member Posts: 29 Bronze Badge

Hi,

I have a very slow query on my 12c database (EE 12.1.0.2). It queries a view based on many tables with many outer joins, these tables are all partitioned and have millions of rows.

I'm not asking for advice on the query plan but I would appreciate some guidance on the AWR SQL Report. Consider this:

Stat NameStatement TotalPer Execution% Snap Total
Elapsed Time (ms)61,780,59961,780,599.3743.45
CPU Time (ms)28,567,14628,567,146.0856.87
Executions1
Buffer Gets776,323776,323.000.16
Disk Reads720,828720,828.000.42
Parse Calls11.000.00
Rows00.00
User I/O Wait Time (ms)88,533
Cluster Wait Time (ms)0
Application Wait Time (ms)1,154
Concurrency Wait Time (ms)33
Invalidations0
Version Count18
Sharable Mem(KB)6,547

My question here is: where is most of elapse time going to? A big part is going to CPU. I would have expected that IO wait time would take up the rest of the wait time. Because during query runtime, I see a lot of waits on "direct path read temp" and "direct path read". Is there something missing? Is perhaps the network taking a lot of time because I know that this query is supposed to output 4M rows (the report shows 0; have no idea why).

On what part do I need to focus if I want this query to run faster? on cpu? on IO? on network throughput?

Relevant reports are attached.

Kind regards,

Benny Derous.

Tagged:
bderousJonathan LewisFranck PachotGbenga Ajakaye

Best Answer

  • Franck Pachot
    Franck Pachot Member Posts: 912 Bronze Trophy
    edited February 2017 Accepted Answer

    Depends what you call 'run'. A core can run only one thread cpu instruction at a time. But when one waits for memory access, the core can run few instructions from the other thread.

    I forgot to take the picture (or rather I didn't dare) but the best illustration I've seen was at a supermarket where the cashier had a moving separation on the conveyor belt that allows the cashier to start taking another customer while the previous one is packing his stuff. Two customers served at the same time, but only one cashier working... that's hyper-threading. it has better throughput than one tread but will never compete with two cores.

    bderousbderous

Answers

  • SUPRIYO DEY
    SUPRIYO DEY Member Posts: 2,127 Silver Trophy
    edited February 2017

    do you have you statistics up-to date on the tables?

    Rows 0 0 suggest that the optimizer is no getting the proper execution plan.

  • bderous
    bderous Member Posts: 29 Bronze Badge
    edited February 2017

    Yes, I have done a lot of 'experiments' with statistics. That was the first thing I (had) checked. I cannot explain why it talks about 0 rows. Perhaps the execution of this query was not finished in this snapshot range - hope that makes sense ? Regardless of this, can you tell from these reports where I need to focus on?  cpu ? IO ?  I would of guessed IO but why is the IO wait time so low?

    Benny

  • JohnWatson2
    JohnWatson2 Member Posts: 4,238 Bronze Crown
    edited February 2017

    The problem is that DB Time has three elements, not two: CPU time, non-idle waiting time, and IO time. That last element (which many resources, including Oracle Uni courses, ignore) may not be reported by AWR reports. You could be seeing this effect, I see it often on AIX systems. This is a good reference,

    http://www.oracle.com/technetwork/oem/db-mgmt/s317294-db-perf-tuning-with-db-time-181631.pdf

  • Rob the Real Relic
    Rob the Real Relic Member Posts: 335
    edited February 2017

    The query having not completed in the snapshot period was the first reason I thought of for the row count being zero.

    My suggestion for this would be to use extended data trace on the session executing the query.

    Note that it may produce a very large trace file.

    Then use tkprof to analyse the trace file, which should help highlight whether wait time of any form is an issue at all.

  • SUPRIYO DEY
    SUPRIYO DEY Member Posts: 2,127 Silver Trophy
    edited February 2017

    Your SGA size is on the lower side. That may cause direct path temp.

    bderous
  • JohnWatson2
    JohnWatson2 Member Posts: 4,238 Bronze Crown
    edited February 2017

    This version is better, slides 5, 9, 10, and 32 particularly show that you have to add IO time to your CPU time and your wait time if you want the figures to balance:

    Average Active Sessions RMOUG2007

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited February 2017

    It looks like you produced the sql awr report using the sql_id for

    Which explains why:

    Rows00.00

    This won't be much help, you'd be better off looking at the sql directly:

    0cst4mzgxfuc6SELECT * FROM BE_BRS_RT6_PLAUDIT_V POS WHERE LOT_TYPE_FK = :B2 AND VALID_ON = :B1

    Looks to be the most relevent

    bderousbderous
  • bderous
    bderous Member Posts: 29 Bronze Badge
    edited February 2017

    You're right, I uploaded the wrong file. However, the sql report for SQL ID "0cst4mzgxfuc6" is exactly the same.

    This is because it's a procedure which calls this sql. I'll see if I can change the uploaded files.

  • JohnWatson2
    JohnWatson2 Member Posts: 4,238 Bronze Crown
    edited February 2017

    If you want to improve performance of the query, you need to talk to your sys admins. The disc read times on your data0001 datafiles are awful, I have never seen read times of over 60ms before. 1ms would be more like it. That will be where your DB time is going.

    Have your sys admins enabled asynchronous, direct, and concurrent IO? I notice that you have not set any filesystemio_options.

  • John Thorton
    John Thorton Member Posts: 14,493 Silver Crown
    edited February 2017

    AWR report is wrong tool to learn where time is being spent in detail by SQL statements

  • Franck Pachot
    Franck Pachot Member Posts: 912 Bronze Trophy
    edited February 2017

    Hi Benny,

    From the AWR report, you are on an IBM POWER 4-way SMT. Your LPAR has 2 VCPUs with 8 threads. The host is 33.5% busy on average which means 33.5% of 8 threads, which means that each VCPUs is at 130% thank to hyper-threading. And we don't even know if those 2 VPUs are entitled, so maybe you have less physical processor resources.

    IMO most of the elapsed time is spent in runqueue because the LPAR is in CPU starvation. This instance is responsible only for 9.2% of the host CPU, so the cause may be in another instance.

    I've an old post about that: http://blog.dbi-services.com/sockets-cores-virtual-cpu-logical-cpu-hyper-threading-what-is-a-cpu-nowadays-1/

    Regards,

    Franck.

    bderousJonathan Lewis
  • bderous
    bderous Member Posts: 29 Bronze Badge
    edited February 2017

    Thank you Franck,

    I must say I was not expecting this answer because my customer is telling me that his database is on a dedicated server, it is not virtualized, he says. Could it be that this customer is not aware of his architecture?

    From what on the AWR report can one tell if the server is really virtualized or not?

    Perhaps customer has hard partitioning on his machine and just tells me that way it is not virtualized. Or maybe they are using Dedicated LPAR (see below).

    Your old post was very interesting. Worth bookmarking for sure.

    Kind regards,

    Benny.

    Look at the IBM Manual for Dedicated LPAR: This technology offers the ability to make a server run as though it were two or more independent servers. When a physical system is logically partitioned, the resources on the server are divided into subsets called logical partitions (LPARs). Processors, memory, and I/O devices can be individually assigned to logical partitions. The LPARs hold these resources for exclusive use. You can separately install and operate each dedicated LPAR because LPARs run as independent logical servers with the resources allocated to them. Since the resources are dedicated to use by the partition, it is called Dedicated LPAR.

  • Franck Pachot
    Franck Pachot Member Posts: 912 Bronze Trophy
    edited February 2017

    Hi,

    From OS statistics NUM_VCPUS=2 you know that you are virtualized. It is possible that the two VCPUs are entitled to your LPAR, but that's still only 2 cores. Then the problem is not virtualization but you are still running more processes than the number of cores.There is 4-way multithreading (this comes from NUM_LCPUS=8) which allows 8 processes to run from the OS point of view, but they can't run all at the same time.

    Regards,

    Franck.

    Gbenga Ajakayebderous
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited February 2017

    Very nice.

    Although I'm always a little uncertain about how trustworthy the OS_CPU_WAIT_TIME is from the OSSTATS I can't help noticing that in this case that the it's about 30% larger than the BUSY_TIME, which is roughly in line with the AWR SQL report where the "lost" time is roughly 30% larger than the recorded CPU time.

    Regards

    Jonathan Lewis

    Franck Pachotbderous
  • bderous
    bderous Member Posts: 29 Bronze Badge
    edited February 2017

    Forgive my ignorance Franck but I was under the impression that 8 processes could sun simultaneous on a 4 SMT thread 2 core system.

    Links:

    1. https://www.ibm.com/support/knowledgecenter/linuxonibm/performance/tuneforsybase/smtsettings.htm

    2. IBM Knowledge Center

  • Franck Pachot
    Franck Pachot Member Posts: 912 Bronze Trophy
    edited February 2017 Accepted Answer

    Depends what you call 'run'. A core can run only one thread cpu instruction at a time. But when one waits for memory access, the core can run few instructions from the other thread.

    I forgot to take the picture (or rather I didn't dare) but the best illustration I've seen was at a supermarket where the cashier had a moving separation on the conveyor belt that allows the cashier to start taking another customer while the previous one is packing his stuff. Two customers served at the same time, but only one cashier working... that's hyper-threading. it has better throughput than one tread but will never compete with two cores.

    bderousbderous
This discussion has been closed.