This discussion is archived
1 2 Previous Next 15 Replies Latest reply: Mar 28, 2013 6:37 AM by subhavsa RSS

AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin

subhavsa Newbie
Currently Being Moderated
Hi,

We are reviewing an AWR report for couple of days. And the below events are consistently reported as Top timed events.

DB CPU time.
enq: TM – contention
Streams AQ: qmn coordinator waiting for slave to start
enq: TX - row lock contention     
PX Nsq: PQ load info query


The environment is 2 node RAC on 11.2.0.3.

- For "DB CPU time", "enq: TX - row lock contention" and "enq: TM – contention", we have found couple of recommendations and approach for review.

I have got a question of the other 2 events.

1. "Streams AQ: qmn coordinator waiting for slave to start"
====================================================
Based on metalink and couple of forums, this seems to be linked to aq_tm_processes.

The documents suggests to unset this parameter for 10g onwards. And for 11g, the default value (when it is unset) shows the value of 1 in v$parameter. And that is what is set for our environment and that would mean oracle is managing it automatically. But we still see this event all the time.

Can the aq_tm_processes parameter value be set for 11gR2? If yes, what would be the appropriate value?
( Note: Some documents (Note:746313.1) suggests, in 9.2 and 10g+ databases, AQ_TM_PROCESSES must not be set to 10, as setting it to 10 disables some other features of the Queue Monitoring )


2. PX Nsq: PQ load info query
==========================
Is there anyway to find the root cause of the above wait event? Cannot find much information about the above event.


Would appreciate any recommendation to debug this event


Thanks.
Sunil

Edited by: subhavsa on Mar 19, 2013 11:52 AM
  • 1. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    SalmanQureshi Expert
    Currently Being Moderated
    Hi,
    Please enclose your output/code in
     tag.
    Can you also tell the % mentioned in AWR with these top 5 wait events.
    
    Salman                                                                                                                                                                                                                                                                            
  • 2. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    Nikolay Savvinov Guru
    Currently Being Moderated
    Hi,

    a few recommendations:

    1) as already mentioned by Salman, you should pay attention not just to what to what events made it to the top-5 list, but to % of DB time
    2) you can use ASH (active session history) to find which sessions/statements are most affected by these wait events, perhaps this can give you some clues
    3) whenever seeing some unfamiliar exotic wait events, keep in mind that they could be a side effect of something else going on in the system. The most typical scenario
    for a wait event to occur is when someone is holding some resource needed by someone else, and while the resource held can be rare and unusual (and thus the corresponding
    wait event would be rare and unusual, too), the reason while it's being held longer than usual could be quite simple (e.g. a plan change causing a SQL statement work slower
    than it used to).

    So take a look at top SQL section of the AWR report to see if there are any resource hogs in there, while continuing to search for info related to these exotic wait events you're seeing.

    Best regards,
    Nikolay
  • 3. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    subhavsa wrote:

    We are reviewing an AWR report for couple of days. And the below events are consistently reported as Top timed events.

    DB CPU time.
    enq: TM – contention
    Streams AQ: qmn coordinator waiting for slave to start
    enq: TX - row lock contention     
    PX Nsq: PQ load info query

    The environment is 2 node RAC on 11.2.0.3.

    1. "Streams AQ: qmn coordinator waiting for slave to start"
    ====================================================
    Based on metalink and couple of forums, this seems to be linked to aq_tm_processes.

    The documents suggests to unset this parameter for 10g onwards. And for 11g, the default value (when it is unset) shows the value of 1 in v$parameter. And that is what is set for our environment and that would mean oracle is managing it automatically. But we still see this event all the time.
    The best strategy at this point is to follow the manuals. Bear in mind that because the dynamic view happens to show the value you had set doesn't mean that the behaviour is the same - there may, for example, be moments when the value changes but you've never queried v$parameter at the right moment. However if your CPU load is high, or the number of active processes is high, then the appearance of this wait may simply be a side effect of the CPU load.
    >
    2. PX Nsq: PQ load info query
    ==========================
    Is there anyway to find the root cause of the above wait event? Cannot find much information about the above event.
    Since this is a RAC system and Oracle tries to determine dynamically which node(s) to run the parallel execution slaves on, I'd take a guess that this is the result of sending a request to the other node to send load information. If your CPU usage is high at either node then this could result in that request for information being delayed.

    You haven't given us any clue about the actual CPU count and usage, but it's possible that (missing foreign key indexes, aside) you're simply doing too many parallel queries, which burn up CPU and abuse your RAC interconnect.

    Regards
    Jonathan Lewis
  • 4. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    subhavsa Newbie
    Currently Being Moderated
    Hi Everyone,

    Thank you very much for your valuable pointers. I am sorry, I do not have idea how to close the output/code in
     tag. But here is a quick summary.
    
    - As I mentioned, for "DB CPU time" wait time, we have reviewed ADDM and ASH reports. And both are pointing to some of the top SQL codes that consistently appear for SQL Tuning advisors. We plan to review those codes to begin with. 
    
    - The ASH details for the above top SQL codes shows "CPU + Wait for CPU" as the top Event. It shows the status "** Row Source Not Available **" for some of SQLs. Based on your responses, this could be in turn causing the other wait events. But do correct me if I am getting it wrong. 
    
    - The AWR %DB Time, on an avg, shows the following  statistics.
    DB CPU %DB time = 88.24
    SQL Exec Ela %DB time = 91.16
    
    Is there any other area to check the CPU load from any of these reports?
    
    Thanks again for your help.
    
    Regards,
    Sunil                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
  • 5. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    subhavsa Newbie
    Currently Being Moderated
    Hi Jonathan,

    You stated
    "You haven't given us any clue about the actual CPU count and usage, but it's possible that (missing foreign key indexes, aside) you're simply doing too many parallel queries, which burn up CPU and abuse your RAC interconnect."

    Are there any specific statistics in AWR that could provide more information about this parallel queries and its connection to interconnect? We checked the Interconnect Statistics but do not find anything specific to this.

    As I mentioned in my last update, the AWR %DB Time, on an avg, shows the following statistics.
    DB CPU %DB time = 88.24
    SQL Exec Ela %DB time = 91.16

    Thanks.
    Sunil
  • 6. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    subhavsa wrote:

    DB CPU %DB time = 88.24
    And what percentage of the total system CPU does that represent ?
    (And you still haven't mentioned the CPU count)
    Look for the section on OS Statistics.

    Regards
    Jonathan Lewis
  • 7. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    subhavsa Newbie
    Currently Being Moderated
    Hi Jonathan,

    The CPU_Count is 24.

    The O/S stats and %DB time is as below (Sorry but I am unable to attach the html in a readable with code tag. I will try to run the report in txt mode if the below does not help)

    OS Statistics By Instance
    •     Listed in order of instance number, I# 
    •     End values are diplayed only if different from begin values 
    I#     Num CPUs     CPU Cores     CPU Sckts     Load Begin     Load End     % Busy     % Usr     % Sys     % WIO     % Idl     Busy Time (s)     Idle Time (s)     Total Time (s)     Memory (M)     End CPUs     End Cores     End Sckts     End Memory (M)
    1     24     12     2     3.03     3.40     9.86     9.39     0.29     0.75     90.14     1,473,581.42     13,472,140.16     14,945,721.58     32,161.28                    
    2     24     12     2     1.58     1.73     1.81     1.45     0.18     1.01     98.19     271,103.59     14,674,625.76     14,945,729.35     32,161.28                    
    Sum                                                       1,744,685.01     28,146,765.92     29,891,450.93                         
    ________________________________________
    
    
    Time Model - % of DB time
    •     % Total [DB time/bg time] - instance [DB time/bg time] as a percentage of the cluster-wide total [DB time/bg time] 
    I#     % Total DB time     DB CPU %DB time     SQL Exec Ela %DB time     Parse Ela %DB time     Hard Parse %DB time     PL/SQL Ela %DB time     Java Ela %DB time     % Total bg time     bg CPU %bg time
    1     90.22     88.33     91.98     2.15     1.40     0.00     0.34     60.06     65.54
    2     9.78     70.46     83.36     1.89     1.11     0.00     1.05     39.94     57.41
    Avg     50.00     79.40     87.67     2.02     1.26     0.00     0.69     50.00     61.48
  • 8. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    subhavsa wrote:

    The CPU_Count is 24.

    OS Statistics By Instance
    •     Listed in order of instance number, I# 
    •     End values are diplayed only if different from begin values 
    I#     Num CPUs     CPU Cores     CPU Sckts     Load Begin     Load End     % Busy     % Usr     % Sys     % WIO     % Idl     Busy Time (s)     Idle Time (s)     Total Time (s)     Memory (M)     End CPUs     End Cores     End Sckts     End Memory (M)
    1     24     12     2     3.03     3.40     9.86     9.39     0.29     0.75     90.14     1,473,581.42     13,472,140.16     14,945,721.58     32,161.28                    
    2     24     12     2     1.58     1.73     1.81     1.45     0.18     1.01     98.19     271,103.59     14,674,625.76     14,945,729.35     32,161.28                    
    Sum                                                       1,744,685.01     28,146,765.92     29,891,450.93          > 
    Taking the total time and dividing by the number of CPUs, then by 3,600 we see that you've run a report across roughly 172 hours (7 days).
    There's no point in running a report like that because any interesting problems will probably disappear in the averaging.

    If you think you have a problem, and you can't just follow critical tasks in detail then you need to examine AWR reports across short intervals (such as one hour) during critical processing times.

    Regards
    Jonathan Lewis
  • 9. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    subhavsa Newbie
    Currently Being Moderated
    Thanks Jonathan. Currently our performance issue is intermittent so unable to run it for a specific interval but I have the report for each day as well. Let me review it as you suggested. Will update this thread with my findings and question soon.

    Thanks again.
  • 10. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    subhavsa Newbie
    Currently Being Moderated
    Hi

    Here are the O/S stat for shorter duration.
    OS Statistics By Instance
    •     Listed in order of instance number, I# 
    •     End values are diplayed only if different from begin values 
    I#     Num CPUs     CPU Cores     CPU Sckts     Load Begin     Load End     % Busy     % Usr     % Sys     % WIO     % Idl     Busy Time (s)     Idle Time (s)     Total Time (s)     Memory (M)     End CPUs     End Cores     End Sckts     End Memory (M)
    1     24     12     2     3.19     2.88     9.41     8.91     0.31     0.74     90.59     89,358.28     860,168.00     949,526.28     32,161.28                    
    2     24     12     2     1.23     1.53     1.84     1.49     0.18     0.88     98.16     17,463.19     932,052.93     949,516.12     32,161.28                    
    Sum                                                       106,821.47     1,792,220.93     1,899,042.40                         
    
    Time Model - % of DB time
    •     % Total [DB time/bg time] - instance [DB time/bg time] as a percentage of the cluster-wide total [DB time/bg time] 
    I#     % Total DB time     DB CPU %DB time     SQL Exec Ela %DB time     Parse Ela %DB time     Hard Parse %DB time     PL/SQL Ela %DB time     Java Ela %DB time     % Total bg time     bg CPU %bg time
    1     90.79     96.70     95.30     2.84     1.95     0.00     0.19     51.22     55.08
    2     9.21     94.16     93.04     2.98     2.12     0.00     0.58     48.78     59.49
    Avg     50.00     95.43     94.17     2.91     2.04     0.00     0.39     50.00     57.29
    The Top Timed Events for this session are
    Class                                      Event                                                                        Waits     %Timeouts     Total(s)     Avg(ms)     %DB time
                                          DB CPU                                                                                 87,354.01              96.47
    Concurrency                     library cache pin                                                       6,997,645           1.02     1,223.81     0.17         1.35
    Other                                     Streams AQ: qmn coordinator waiting for slave to start     291     63.23     1,030.98     3542.90         1.14
    Other                                     PX Nsq: PQ load info query                                      2,143     94.12     417.49     194.82         0.46
    System I/O                                     log file parallel write                                                       556,152     0.00     363.50     0.65         0.40
    Application                                     enq: TX - row lock contention                                       392     0.00     185.97     474.41         0.21
    Have few questions with reference to this :

    - In the ASH report for this duration, we see lots of "PX Nsq: PQ load info query" related wait messages. Could this be linked to comment in previous post that parallel query could be hogging the interconnect?

    But at the same time, the parallel settings in the env are default values.
    i.e.
    parallel_degree_policy = MANUAL
    parallel_min_time_threshold = AUTO
    parallel_threads_per_cpu = 2

    v$pq_sesstat does not show any query being Parallelized
    select * from v$pq_sesstat;
    
    STATISTIC                      LAST_QUERY SESSION_TOTAL
    ------------------------------ ---------- -------------
    Queries Parallelized                    0             0
    DML Parallelized                        0             0
    DDL Parallelized                        0             0
    DFO Trees                               0             0
    Server Threads                          0             0
    Allocation Height                       0             0
    Allocation Width                        0             0
    Local Msgs Sent                         0             0
    Distr Msgs Sent                         0             0
    Local Msgs Recv'd                       0             0
    Distr Msgs Recv'd                       0             0
    
    11 rows selected.
    The value of parallel_force_local = false.

    If none of the query are parallelized and if we still see "PX Nsq: PQ load info query" event, is there anyway to avoid this by any setting? Would that help reducing DB CPU Time? As I mentioned, most of the ASH reports show the above event following DB CPU + CPU Wait event.

    Not sure if this is the right interpretation. But need some second opinion as there is no information "PX Nsq: PQ load info query" ...

    Edited by: subhavsa on Mar 21, 2013 6:58 AM
  • 11. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    61841 Newbie
    Currently Being Moderated
    Nikolay
    a few recommendations:
    
    1) as already mentioned by Salman, you should pay attention not just to what to what events made it to the top-5 list, but to % of DB time
    2) you can use ASH (active session history) to find which sessions/statements are most affected by these wait events, perhaps this can give you some clues
    how to find which sessions/statements are most affected by these wait events?
    All I know is I can run ASH for SID, sql statement etc.

    Thanks,
  • 12. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    Nikolay Savvinov Guru
    Currently Being Moderated
    Hi,

    ASH is a view (V$ACTIVE_SESSION_HISTORY) that contains 1s snapshots for all active sessions in the system, so you just query it, and based on the number of samples you can calculate how much time a certain session or a statement spent waiting for a certain event.

    Note that V$ACTIVE_SESSION_HISTORY only contains data for recent samples (normally within a few last hours), but after that you can query DBA_HIST_ACTIVE_SESS_HISTORY view. The main difference between V$ and DBA_HIST versions is that the latter contains only 1 sample per 10 seconds.

    Sample query:
    select event, count(*) 
    from v$active_session_history 
    where sql_id = :sql_id
    group by event
    order by 2 desc;
    it will show what a specific query was waiting on (you can divide individual counts by their sum to get percentages).

    For more information on the subject I recommend Graham Wood's presentations on the subject (findable on google).

    Best regards,
    Nikolay
  • 13. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    subhavsa wrote:
    select * from v$pq_sesstat;
    
    STATISTIC                      LAST_QUERY SESSION_TOTAL
    ------------------------------ ---------- -------------
    Queries Parallelized                    0             0
    DML Parallelized                        0             0
    DDL Parallelized                        0             0
    DFO Trees                               0             0
    Server Threads                          0             0
    Allocation Height                       0             0
    Allocation Width                        0             0
    Local Msgs Sent                         0             0
    Distr Msgs Sent                         0             0
    Local Msgs Recv'd                       0             0
    Distr Msgs Recv'd                       0             0
    
    11 rows selected.
    The view v$pq_SESstat reports the effects of parallelism in your current session. You need to check v$pq_SYSstat.
    Apart from that, you have AWR reports for the interval, so look at the Instance Activity for any statistics with PX, PQ, or parallel in their name. The only way to be fairly confident that you can't get parallel execution occuring is to set parallel_max_servers to zero - and you haven't said anything about that parameter.

    For the last, shorter, period, you might want to look at the "SQL ordered by CPU" - compare the totals reported with the CPU time recorded in the top 5, and also look at the Latch activity for any latch with a large number of misses and spin_gets.

    If you want to post more results from an html formatted AWR, I suggest you cut and paste into a text document first, then use a text editor (like notepad or vi) to align columns properly before posting.

    Regards
    Jonathan Lewis
  • 14. Re: AWR Top timed events : PX Nsq: PQ load info query & Streams AQ: qmn coordin
    subhavsa Newbie
    Currently Being Moderated
    Thanks for the valuable pointers Nikolay & Jonathan.

    The parallel_max_servers is set to 30 at the moment. That was another question I had but forgot to put in my last update.


    I will check v$pq_SYSstat and DBA_HIST_ACTIVE_SESS_HISTORY as suggested by you.

    By the way, the ASH / ADDM & AWR do report some of thd sql_ids that consistently appear as expensive one and "SQL ordered by CPU" section confirms the same. But I will run ash report for the specific sql_ids to find with sql reports more PX events or any other specific wait class.

    Will update the post with more findings or questions.

    Thanks again to all of you.

    Sunil
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points