11 Replies Latest reply: Jan 17, 2013 2:47 AM by Bawer RSS

    Very long running query

    stee1rat
      Can someone help me to understand what is going on here?

      One of the users called and said that session 8248 is already working for almost 5 days. And it's not normal.

      First of all, in v$session i can see that session 8248 is repeatedly executing the same sql_id:
      SQL_ID        PREV_SQL_ID   SQL_EXEC_START
      ------------- ------------- ---------------------
      ftr4s16std8zv cmzykvtmuufas 16.01.2013 15:57:55
      ftr4s16std8zv cmzykvtmuufas 16.01.2013 15:57:58
      ftr4s16std8zv cmzykvtmuufas 16.01.2013 15:58:00
      ftr4s16std8zv cmzykvtmuufas 16.01.2013 15:58:03
      7pg0nhttfb6dd 45srvxj2ur5yk 16.01.2013 15:58:06
      ftr4s16std8zv cmzykvtmuufas 16.01.2013 15:58:07
      45srvxj2ur5yk bhspd5gvthxcg 16.01.2013 15:58:09
      So it looks like this session mainly executes query with sql_id = ftr4s16std8zv.

      And this query works fast:
      select sql_id, executions, round(elapsed_time/executions/1e6, 2) from v$sql where sql_id = 'ftr4s16std8zv';
      
      SQL_ID        EXECUTIONS ROUND(ELAPSED_TIME/EXECUTIONS/1E6,2)
      ------------- ---------- ------------------------------------
      ftr4s16std8zv     448022                                 0.82
      What bothering me is the content in v$session_longops when i'm querying this sid:
      select sql_id, opname, message, elapsed_seconds/60/60, time_remaining/60/60  from v$session_longops where sofar <> totalwork and sid = 8248;
      
      SQL_ID        OPNAME          MESSAGE                                                                       ELAPSED_SECONDS/60/60 TIME_REMAINING/60/60
      ------------- --------------- ----------------------------------------------------------------------------- --------------------- --------------------
      b655yk3umpqaf Table Scan      Table Scan:   SCHEMANAME.TABLE1: 43058 out of 44871 Blocks done                   123.3508333          5.193888889 
      Now let's take a look in v$open_cursor:
      select sql_id, sql_text, last_sql_active_time, sql_exec_id, cursor_type from v$open_cursor where sid = 8248 and sql_exec_id is not null;
      
      SQL_ID        SQL_TEXT                                                     LAST_SQL_ACTIVE_TIME  SQL_EXEC_ID CURSOR_TYPE
      ------------- ------------------------------------------------------------ --------------------- ----------- ---------------------
      ftr4s16std8zv SELECT * FROM SCHEMANAME2.TABLE1CARD WHERE NOT (PERSON_MINOR 17.09.1974 21:01:54      17227295 PL/SQL CURSOR CACHED
      4kx07abxnjkby BEGIN package1.procedure1 (683203,0); END;                   null                     16777216 OPEN
      b655yk3umpqaf SELECT * FROM SCHEMANAME.TABLE1 WHERE ID=:B1                 null                     16777216 OPEN-PL/SQL
      50j9u08b5z5ns SELECT * FROM SCHEMANAME.TABLE1CARD WHERE ET=:B2             17.09.1974 21:11:15      16969198 PL/SQL CURSOR CACHED
      I did a little depersonization here, but table names still correlates with ones in another views.

      For me it looks like this session has executed the procedure package1.procedure1 (4kx07abxnjkby) and this procedure has executed the long running query b655yk3umpqaf hence they're have same SQL_EXEC_ID in v$open_cursor.
      But i can't understand the reason for this query to running that long, 'cause SCHEMANAME.TABLE1 is not that big:
      select count(*) from SCHEMANAME.TABLE1;
      
        COUNT(*)
      ----------
           49250 
      And the last thing is v$sql_monitor:
      select status, sql_id, sql_text, sql_exec_start, elapsed_time/1e6 "seconds" from v$sql_monitor where sid = 8248;
      
      STATUS              SQL_ID        SQL_TEXT                                                     SQL_EXEC_START           seconds
      ------------------- ------------- ------------------------------------------------------------ --------------------- ----------
      EXECUTING           b655yk3umpqaf SELECT * FROM SCHEMANAME.TABLE1 WHERE ID =:B1                11.01.2013 13:26:48    36.203471 
      EXECUTING           4kx07abxnjkby BEGIN package1.procedure1 (683203,0); END;                   11.01.2013 13:26:47   445411.5887 
      It says that elapsed time for b655yk3umpqaf was 36 seconds, but at the same time it appears still running in v$session_longops.
      Could someone help me to put this together?

      While i was writing all this i've got an idea, that maybe this pl/sql procedure just queried needed lines from SCHEMANAME.TABLE1 with b655yk3umpqaf wich took it 36 seconds to complete and now it's just fetching this rows and doing something about them? How can i see it?
        • 1. Re: Very long running query
          Bawer
          is this single long running job?

          it would be better if you can provide the code of package.
          It seems there is an endless-loop.
          • 2. Re: Very long running query
            Rob_J
            Look at v$session_event and v$session_wait for that session and see what it's doing.
            • 3. Re: Very long running query
              stee1rat
              Unfortunately, I can't post the package body here, and this package is very long anyway.

              I'm sure that there's no endless-loop, because sofar blocks in v$session_longops constantly increasing and getting closer to totalwork blocks, but it does it very slow.

              It's moslty 'db file sequential read' in v$session_event and v$session_wait. Nothing bad, i guess.
              select * from v$session_event where sid = 8248 order by time_waited desc;
              
                     SID EVENT                                                            TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT   MAX_WAIT TIME_WAITED_MICRO   EVENT_ID WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS                                                     
              ---------- ---------------------------------------------------------------- ----------- -------------- ----------- ------------ ---------- ----------------- ---------- ------------- ----------- ----------------------------------------------------------------
                    8248 db file sequential read                                             10604964              0     8142415         0.77        214           8.1E+10 2652584166    1740759767           8 User I/O                                                         
                    8248 db file parallel read                                                  10179              0       10794         1.06         34         107944388  834992820    1740759767           8 User I/O                                                         
                    8248 enq: TM - contention                                                    1359              0       10636         7.83       1152         106358449  668627480    4217450380           1 Application                                                      
                    8248 db file scattered read                                                  5470              0        5993          1.1          9          59930128  506183215    1740759767           8 User I/O                                                         
                    8248 utl_file I/O                                                          519612              0        4764         0.01        123          47639673 2804400934    1740759767           8 User I/O                                                         
                    8248 latch: shared pool                                                      9481              0        2537         0.27          4          25368843 2696347763    3875070507           4 Concurrency                                                      
                    8248 latch: row cache objects                                                3696              0         295         0.08          8           2946948 1117386924    3875070507           4 Concurrency                                                      
                    8248 log file switch (private strand flush incomplete)                         63              0         186         2.96         22           1861721  114164561    3290255840           2 Configuration                                                    
                    8248 events in waitclass Other                                               2283              0         147         0.06         15           1467913 1736664284    1893977003           0 Other                                                            
                    8248 log file sync                                                            759              0         143         0.19          4           1429803 1328744198    3386400367           5 Commit                                                           
                    8248 resmgr:internal state change                                              24             11         110         4.58         10           1100285 4043670897    3875070507           4 Concurrency                                                      
                    8248 enq: TX - row lock contention                                              1              0          57        57.06         57            570577  310662678    4217450380           1 Application                                                      
                    8248 library cache: mutex X                                                   996              0          30         0.03          3            302708 1646780882    3875070507           4 Concurrency                                                      
                    8248 undo segment extension                                                    20             16          17         0.85          4            170824 1781586680    3290255840           2 Configuration                                                    
                    8248 read by other session                                                      8              0           9         1.08          2             86326 3056446529    1740759767           8 User I/O                                                         
                    8248 log file switch completion                                                 1              0           1         1.47          1             14721 3834950329    3290255840           2 Configuration                                                    
                    8248 latch: cache buffers chains                                              155              0           1         0.01          0              7911 2779959231    3875070507           4 Concurrency                                                      
                    8248 SQL*Net message from client                                               15              0           1         0.04          0              5498 1421975091    2723168908           6 Idle                                                             
                    8248 resmgr:cpu quantum                                                       256              0           1            0          0             11014 1452455426    2396326234          10 Scheduler                                                        
                    8248 Disk file operations I/O                                                  78              0           0            0          0               751  166678035    1740759767           8 User I/O                                                         
                    8248 buffer busy waits                                                         47              0           0         0.01          0              2844 2161531084    3875070507           4 Concurrency                                                      
                    8248 SQL*Net message to client                                                 15              0           0            0          0                23 2067390145    2000153315           7 Network                                                          
                    8248 latch: In memory undo latch                                               17              0           0         0.01          0              1738 1394127552    3875070507           4 Concurrency                                                      
              
               23 rows selected 
              • 4. Re: Very long running query
                Rob_J
                db file sequential read may or may not be bad. What if a full scan is quicker than (a presumably) index lookup? Maybe the db file sequential read is coming from UNDO? Can you see what blocks, tablespace and segment the session is trying to read?

                How is the response time on the disk? Average wait looks low so I doubt that's a problem but worth checking.

                You say it's not normal, so what is normal? Has something changed in the meantime which could account for the difference? Perhaps a stats gather to change the plan? Perhaps data has been loaded? Anything that could have affected it?

                Just trying to give some suggestions as to what could be the cause here, but maybe you have considered all of these options already.

                Rob
                • 5. Re: Very long running query
                  Dom Brooks
                  The answer surely lies with looking at the code.
                  But without visibility of this, it's hard to comment.
                  If you can't post the code - perfectly understandable - then unfortunately you're largely on your own.

                  What version?
                  Are you licensed for ASH? I presume you are seeing as you've been looking at V$SQL_MONITOR

                  If so, I'd look at V$ACTIVE_SESSION_HISTORY and DBA_HIST_ACTIVE_SESS_HISTORY to try to get the relationship of the SQL statements involved (although surely looking at the code is a more obvious way to do it).
                  See what you relationships you can establish between SQL_ID, TOP_LEVEL_SQL_ID, PLSQL(_ENTRY)_OBJECT_ID/SUBPROGRAM_ID.

                  And, of course, EVENT - what have you been waiting on (or at least partly waiting on, because ASH is sampled).

                  Also look at historic execution metrics of the sql statements involved using the raw AWR data in DBA_HIST_SQLSTAT.
                  Has something changed? Plan, volume of data, etc, etc.
                  • 6. Re: Very long running query
                    JohnWatson
                    You have lost about 24 hours of processing time on waits for single block I/O, and you have a full table scan that is absurdly slow. My guess: the scan is populating a cursor, which you are looping through. Within the loop you are doing index lookups. You probably need to attack the code running within the loop, not the SELECT that is doing the scan.
                    • 7. Re: Very long running query
                      Mark Malakanov (user11181920)
                      >
                      ftr4s16std8zv SELECT * FROM SCHEMANAME2.TABLE1CARD WHERE NOT (PERSON_MINOR 17.09.1974 21:01:54 17227295 PL/SQL CURSOR CACHED
                      b655yk3umpqaf SELECT * FROM SCHEMANAME.TABLE1 WHERE ID=:B1 null 16777216 OPEN-PL/SQL
                      >

                      Did you look into the code?
                      May be it is not infinite loop, but ftr4s16std8zv is executed thousands times for one b655yk3umpqaf fetch?
                      • 8. Re: Very long running query
                        stee1rat
                        >
                        You have lost about 24 hours of processing time on waits for single block I/O, and you have a full table scan that is absurdly slow. My guess: the scan is populating a cursor, which you are looping through. Within the loop you are doing index lookups. You probably need to attack the code running within the loop, not the SELECT that is doing the scan.>
                        The question is how to find it? There are deletes in the code (and the code is a little bit complex and big and it will take a lot of time to read and understand it), that i assume should be executing in the loop, but i can't find it in views (v$open_cursor, v$session, v$sql_monitor). Shouldn't it be there?
                        • 9. Re: Very long running query
                          JohnWatson
                          stee1rat wrote:

                          The question is how to find it? There are deletes in the code (and the code is a little bit complex and big and it will take a lot of time to read and understand it), that i assume should be executing in the loop, but i can't find it in views (v$open_cursor, v$session, v$sql_monitor). Shouldn't it be there?
                          Use statspack or (if licensed) AWR reports. It will be obvious.
                          • 10. Re: Very long running query
                            marksmithusa
                            I would guess that you're executing the same SQL many times as part of a cursor. Are you able to take an AWR snapshot and see how many times in a given period that same SQL executed? Have a look towards the SQL Statistics section of the report, especially towards the 'SQL Statements by Execution' (from what I've read on this thread).

                            Somewhat related, on one of the databases I support, I have one SQL statement which can run 75m/hour. Not only that, but it hits a VPD table, so the VPD package gets called 75m times just for that particular statement. Maybe something like that's happening - VPD, auditing, triggers, etc
                            • 11. Re: Very long running query
                              Bawer
                              stee1rat wrote:
                              >
                              You have lost about 24 hours of processing time on waits for single block I/O, and you have a full table scan that is absurdly slow. My guess: the scan is populating a cursor, which you are looping through. Within the loop you are doing index lookups. You probably need to attack the code running within the loop, not the SELECT that is doing the scan.>
                              The question is how to find it?
                              select ss.schemaname,ss.machine,ss.program,ss.logon_time,ss.sql_exec_start,ss.wait_time,
                              sa.first_load_time,sa.application_wait_time,sa.plsql_exec_time,sa.cpu_time,sa.elapsed_time,sa.sql_fulltext
                              from v$session ss, v$sqlarea sa
                              where sa.hash_value=ss.sql_hash_value
                              sa.sql_fulltext returns the query, which is actually run. This can help if the loop is executed through cursor. (If not, you should try to check loops)