This discussion is archived
11 Replies Latest reply: Jan 17, 2013 12:47 AM by Bawer RSS

Very long running query

stee1rat Newbie
Currently Being Moderated
Can someone help me to understand what is going on here?

One of the users called and said that session 8248 is already working for almost 5 days. And it's not normal.

First of all, in v$session i can see that session 8248 is repeatedly executing the same sql_id:
SQL_ID        PREV_SQL_ID   SQL_EXEC_START
------------- ------------- ---------------------
ftr4s16std8zv cmzykvtmuufas 16.01.2013 15:57:55
ftr4s16std8zv cmzykvtmuufas 16.01.2013 15:57:58
ftr4s16std8zv cmzykvtmuufas 16.01.2013 15:58:00
ftr4s16std8zv cmzykvtmuufas 16.01.2013 15:58:03
7pg0nhttfb6dd 45srvxj2ur5yk 16.01.2013 15:58:06
ftr4s16std8zv cmzykvtmuufas 16.01.2013 15:58:07
45srvxj2ur5yk bhspd5gvthxcg 16.01.2013 15:58:09
So it looks like this session mainly executes query with sql_id = ftr4s16std8zv.

And this query works fast:
select sql_id, executions, round(elapsed_time/executions/1e6, 2) from v$sql where sql_id = 'ftr4s16std8zv';

SQL_ID        EXECUTIONS ROUND(ELAPSED_TIME/EXECUTIONS/1E6,2)
------------- ---------- ------------------------------------
ftr4s16std8zv     448022                                 0.82
What bothering me is the content in v$session_longops when i'm querying this sid:
select sql_id, opname, message, elapsed_seconds/60/60, time_remaining/60/60  from v$session_longops where sofar <> totalwork and sid = 8248;

SQL_ID        OPNAME          MESSAGE                                                                       ELAPSED_SECONDS/60/60 TIME_REMAINING/60/60
------------- --------------- ----------------------------------------------------------------------------- --------------------- --------------------
b655yk3umpqaf Table Scan      Table Scan:   SCHEMANAME.TABLE1: 43058 out of 44871 Blocks done                   123.3508333          5.193888889 
Now let's take a look in v$open_cursor:
select sql_id, sql_text, last_sql_active_time, sql_exec_id, cursor_type from v$open_cursor where sid = 8248 and sql_exec_id is not null;

SQL_ID        SQL_TEXT                                                     LAST_SQL_ACTIVE_TIME  SQL_EXEC_ID CURSOR_TYPE
------------- ------------------------------------------------------------ --------------------- ----------- ---------------------
ftr4s16std8zv SELECT * FROM SCHEMANAME2.TABLE1CARD WHERE NOT (PERSON_MINOR 17.09.1974 21:01:54      17227295 PL/SQL CURSOR CACHED
4kx07abxnjkby BEGIN package1.procedure1 (683203,0); END;                   null                     16777216 OPEN
b655yk3umpqaf SELECT * FROM SCHEMANAME.TABLE1 WHERE ID=:B1                 null                     16777216 OPEN-PL/SQL
50j9u08b5z5ns SELECT * FROM SCHEMANAME.TABLE1CARD WHERE ET=:B2             17.09.1974 21:11:15      16969198 PL/SQL CURSOR CACHED
I did a little depersonization here, but table names still correlates with ones in another views.

For me it looks like this session has executed the procedure package1.procedure1 (4kx07abxnjkby) and this procedure has executed the long running query b655yk3umpqaf hence they're have same SQL_EXEC_ID in v$open_cursor.
But i can't understand the reason for this query to running that long, 'cause SCHEMANAME.TABLE1 is not that big:
select count(*) from SCHEMANAME.TABLE1;

  COUNT(*)
----------
     49250 
And the last thing is v$sql_monitor:
select status, sql_id, sql_text, sql_exec_start, elapsed_time/1e6 "seconds" from v$sql_monitor where sid = 8248;

STATUS              SQL_ID        SQL_TEXT                                                     SQL_EXEC_START           seconds
------------------- ------------- ------------------------------------------------------------ --------------------- ----------
EXECUTING           b655yk3umpqaf SELECT * FROM SCHEMANAME.TABLE1 WHERE ID =:B1                11.01.2013 13:26:48    36.203471 
EXECUTING           4kx07abxnjkby BEGIN package1.procedure1 (683203,0); END;                   11.01.2013 13:26:47   445411.5887 
It says that elapsed time for b655yk3umpqaf was 36 seconds, but at the same time it appears still running in v$session_longops.
Could someone help me to put this together?

While i was writing all this i've got an idea, that maybe this pl/sql procedure just queried needed lines from SCHEMANAME.TABLE1 with b655yk3umpqaf wich took it 36 seconds to complete and now it's just fetching this rows and doing something about them? How can i see it?
  • 1. Re: Very long running query
    Bawer Journeyer
    Currently Being Moderated
    is this single long running job?

    it would be better if you can provide the code of package.
    It seems there is an endless-loop.
  • 2. Re: Very long running query
    Rob_J Journeyer
    Currently Being Moderated
    Look at v$session_event and v$session_wait for that session and see what it's doing.
  • 3. Re: Very long running query
    stee1rat Newbie
    Currently Being Moderated
    Unfortunately, I can't post the package body here, and this package is very long anyway.

    I'm sure that there's no endless-loop, because sofar blocks in v$session_longops constantly increasing and getting closer to totalwork blocks, but it does it very slow.

    It's moslty 'db file sequential read' in v$session_event and v$session_wait. Nothing bad, i guess.
    select * from v$session_event where sid = 8248 order by time_waited desc;
    
           SID EVENT                                                            TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT   MAX_WAIT TIME_WAITED_MICRO   EVENT_ID WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS                                                     
    ---------- ---------------------------------------------------------------- ----------- -------------- ----------- ------------ ---------- ----------------- ---------- ------------- ----------- ----------------------------------------------------------------
          8248 db file sequential read                                             10604964              0     8142415         0.77        214           8.1E+10 2652584166    1740759767           8 User I/O                                                         
          8248 db file parallel read                                                  10179              0       10794         1.06         34         107944388  834992820    1740759767           8 User I/O                                                         
          8248 enq: TM - contention                                                    1359              0       10636         7.83       1152         106358449  668627480    4217450380           1 Application                                                      
          8248 db file scattered read                                                  5470              0        5993          1.1          9          59930128  506183215    1740759767           8 User I/O                                                         
          8248 utl_file I/O                                                          519612              0        4764         0.01        123          47639673 2804400934    1740759767           8 User I/O                                                         
          8248 latch: shared pool                                                      9481              0        2537         0.27          4          25368843 2696347763    3875070507           4 Concurrency                                                      
          8248 latch: row cache objects                                                3696              0         295         0.08          8           2946948 1117386924    3875070507           4 Concurrency                                                      
          8248 log file switch (private strand flush incomplete)                         63              0         186         2.96         22           1861721  114164561    3290255840           2 Configuration                                                    
          8248 events in waitclass Other                                               2283              0         147         0.06         15           1467913 1736664284    1893977003           0 Other                                                            
          8248 log file sync                                                            759              0         143         0.19          4           1429803 1328744198    3386400367           5 Commit                                                           
          8248 resmgr:internal state change                                              24             11         110         4.58         10           1100285 4043670897    3875070507           4 Concurrency                                                      
          8248 enq: TX - row lock contention                                              1              0          57        57.06         57            570577  310662678    4217450380           1 Application                                                      
          8248 library cache: mutex X                                                   996              0          30         0.03          3            302708 1646780882    3875070507           4 Concurrency                                                      
          8248 undo segment extension                                                    20             16          17         0.85          4            170824 1781586680    3290255840           2 Configuration                                                    
          8248 read by other session                                                      8              0           9         1.08          2             86326 3056446529    1740759767           8 User I/O                                                         
          8248 log file switch completion                                                 1              0           1         1.47          1             14721 3834950329    3290255840           2 Configuration                                                    
          8248 latch: cache buffers chains                                              155              0           1         0.01          0              7911 2779959231    3875070507           4 Concurrency                                                      
          8248 SQL*Net message from client                                               15              0           1         0.04          0              5498 1421975091    2723168908           6 Idle                                                             
          8248 resmgr:cpu quantum                                                       256              0           1            0          0             11014 1452455426    2396326234          10 Scheduler                                                        
          8248 Disk file operations I/O                                                  78              0           0            0          0               751  166678035    1740759767           8 User I/O                                                         
          8248 buffer busy waits                                                         47              0           0         0.01          0              2844 2161531084    3875070507           4 Concurrency                                                      
          8248 SQL*Net message to client                                                 15              0           0            0          0                23 2067390145    2000153315           7 Network                                                          
          8248 latch: In memory undo latch                                               17              0           0         0.01          0              1738 1394127552    3875070507           4 Concurrency                                                      
    
     23 rows selected 
  • 4. Re: Very long running query
    Rob_J Journeyer
    Currently Being Moderated
    db file sequential read may or may not be bad. What if a full scan is quicker than (a presumably) index lookup? Maybe the db file sequential read is coming from UNDO? Can you see what blocks, tablespace and segment the session is trying to read?

    How is the response time on the disk? Average wait looks low so I doubt that's a problem but worth checking.

    You say it's not normal, so what is normal? Has something changed in the meantime which could account for the difference? Perhaps a stats gather to change the plan? Perhaps data has been loaded? Anything that could have affected it?

    Just trying to give some suggestions as to what could be the cause here, but maybe you have considered all of these options already.

    Rob
  • 5. Re: Very long running query
    Dom Brooks Guru
    Currently Being Moderated
    The answer surely lies with looking at the code.
    But without visibility of this, it's hard to comment.
    If you can't post the code - perfectly understandable - then unfortunately you're largely on your own.

    What version?
    Are you licensed for ASH? I presume you are seeing as you've been looking at V$SQL_MONITOR

    If so, I'd look at V$ACTIVE_SESSION_HISTORY and DBA_HIST_ACTIVE_SESS_HISTORY to try to get the relationship of the SQL statements involved (although surely looking at the code is a more obvious way to do it).
    See what you relationships you can establish between SQL_ID, TOP_LEVEL_SQL_ID, PLSQL(_ENTRY)_OBJECT_ID/SUBPROGRAM_ID.

    And, of course, EVENT - what have you been waiting on (or at least partly waiting on, because ASH is sampled).

    Also look at historic execution metrics of the sql statements involved using the raw AWR data in DBA_HIST_SQLSTAT.
    Has something changed? Plan, volume of data, etc, etc.
  • 6. Re: Very long running query
    JohnWatson Guru
    Currently Being Moderated
    You have lost about 24 hours of processing time on waits for single block I/O, and you have a full table scan that is absurdly slow. My guess: the scan is populating a cursor, which you are looping through. Within the loop you are doing index lookups. You probably need to attack the code running within the loop, not the SELECT that is doing the scan.
  • 7. Re: Very long running query
    Mark Malakanov (user11181920) Expert
    Currently Being Moderated
    >
    ftr4s16std8zv SELECT * FROM SCHEMANAME2.TABLE1CARD WHERE NOT (PERSON_MINOR 17.09.1974 21:01:54 17227295 PL/SQL CURSOR CACHED
    b655yk3umpqaf SELECT * FROM SCHEMANAME.TABLE1 WHERE ID=:B1 null 16777216 OPEN-PL/SQL
    >

    Did you look into the code?
    May be it is not infinite loop, but ftr4s16std8zv is executed thousands times for one b655yk3umpqaf fetch?
  • 8. Re: Very long running query
    stee1rat Newbie
    Currently Being Moderated
    >
    You have lost about 24 hours of processing time on waits for single block I/O, and you have a full table scan that is absurdly slow. My guess: the scan is populating a cursor, which you are looping through. Within the loop you are doing index lookups. You probably need to attack the code running within the loop, not the SELECT that is doing the scan.>
    The question is how to find it? There are deletes in the code (and the code is a little bit complex and big and it will take a lot of time to read and understand it), that i assume should be executing in the loop, but i can't find it in views (v$open_cursor, v$session, v$sql_monitor). Shouldn't it be there?
  • 9. Re: Very long running query
    JohnWatson Guru
    Currently Being Moderated
    stee1rat wrote:

    The question is how to find it? There are deletes in the code (and the code is a little bit complex and big and it will take a lot of time to read and understand it), that i assume should be executing in the loop, but i can't find it in views (v$open_cursor, v$session, v$sql_monitor). Shouldn't it be there?
    Use statspack or (if licensed) AWR reports. It will be obvious.
  • 10. Re: Very long running query
    marksmithusa Journeyer
    Currently Being Moderated
    I would guess that you're executing the same SQL many times as part of a cursor. Are you able to take an AWR snapshot and see how many times in a given period that same SQL executed? Have a look towards the SQL Statistics section of the report, especially towards the 'SQL Statements by Execution' (from what I've read on this thread).

    Somewhat related, on one of the databases I support, I have one SQL statement which can run 75m/hour. Not only that, but it hits a VPD table, so the VPD package gets called 75m times just for that particular statement. Maybe something like that's happening - VPD, auditing, triggers, etc
  • 11. Re: Very long running query
    Bawer Journeyer
    Currently Being Moderated
    stee1rat wrote:
    >
    You have lost about 24 hours of processing time on waits for single block I/O, and you have a full table scan that is absurdly slow. My guess: the scan is populating a cursor, which you are looping through. Within the loop you are doing index lookups. You probably need to attack the code running within the loop, not the SELECT that is doing the scan.>
    The question is how to find it?
    select ss.schemaname,ss.machine,ss.program,ss.logon_time,ss.sql_exec_start,ss.wait_time,
    sa.first_load_time,sa.application_wait_time,sa.plsql_exec_time,sa.cpu_time,sa.elapsed_time,sa.sql_fulltext
    from v$session ss, v$sqlarea sa
    where sa.hash_value=ss.sql_hash_value
    sa.sql_fulltext returns the query, which is actually run. This can help if the loop is executed through cursor. (If not, you should try to check loops)

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points