Forum Stats

  • 3,782,315 Users
  • 2,254,637 Discussions
  • 7,880,045 Comments

Discussions

question on ORA-1555

jmft2012
jmft2012 Member Posts: 371 Blue Ribbon
edited May 25, 2016 11:04AM in General Database Discussions

Oracle 11.2.0.4.0 on Solaris 10.

ORA-01555 caused by SQL statement below (SQL ID: *****, Query Duration=91392 sec, SCN: ****):

SELECT PC.PROJ_NUMBER, PC.PROJ_NAME, PC.PROGRAM_CODE,

***

***

UNDOTBS1 tablespace 25GB

undo_management: AUTO

undo_retention :126000 second

The query under question, I tried it returns a 20 seconds. but 1555 error told "Query Duration=91392 sec".

and we have undo_retention 126000 sec. the same query caused 1555 three times this week.

Please comment.

TIA

Tagged:
AndrewSayerHemant K Chitale

Best Answer

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited May 21, 2016 12:47PM Accepted Answer
    jmft2012 wrote:
    
    "....
    Do they fetch all rows at once, or can they (as I am guessing) fetch one row, leave their desks for the day, continue fetching from the same cursor the next day?
    ""
    I doubt it. otherwise would have been shown in AWR for sql time /elapsed time etc.
    
    
    

    Your logic is the wrong way round.

    If the SQL really took 91392 seconds to complete then it would show up in the AWR (that is a huge length of time IMO). The situation I described means that the actual cursor isn't being executed for that time, it is left open for that amount of time and then work is continued later.

    Consider this demo:

    set timing off
    drop table as_test_fetch_times;
    create table as_test_fetch_times (column_a number not null, column_b varchar2(30)) tablespace ts1;
    insert into as_test_fetch_times
    select rownum, 'a' from dual connect by rownum <=60;
    commit;
    set timing on DECLARE
    CURSOR quick_cursor IS SELECT column_a, column_b FROM as_test_fetch_times;
    nColumn_a NUMBER;
    sColumn_b VARCHAR2(30);
    BEGIN
      OPEN quick_cursor;
      LOOP 
        dbms_lock.sleep(1);
        FETCH quick_cursor INTO nColumn_a, sColumn_b;
        EXIT WHEN quick_cursor%NOTFOUND;
      END LOOP;
      CLOSE quick_cursor;
    END;
    / Elapsed: 00:01:01.46
    set timing off

    This is the basic situation I'm describing. From the beginning to the end of the fetch the cursor is open for 61 seconds (+the overhead of the actual execution). All rows need to be fetched from blocks that are consistent as of the SCN I start executing the cursor. According to your expectations, this would appear in your AWR report as taking 60 seconds. You can compare this to what v$sql says:


    SELECT  sql_id
           ,parse_calls
           ,SUBSTR(sql_text,1,100) sql_text
           ,elapsed_time
           ,fetches
           ,rows_processed
    FROM    v$sql
    WHERE   upper(sql_text) not like '%V$SQL%'
    AND     upper(sql_text) LIKE UPPER('SELECT column_a, column_b FROM as_test_fetch_times'); SQL_ID        PARSE_CALLS SQL_TEXT                                                                                             ELAPSED_TIME    FETCHES ROWS_PROCESSED
    ------------- ----------- ---------------------------------------------------------------------------------------------------- ------------ ---------- --------------
    bhnfbdavsd31q           1 SELECT COLUMN_A, COLUMN_B FROM AS_TEST_FETCH_TIMES                                                           6784         61             60

    That elapsed time is in microseconds you have to divide by 10^6 to get it in seconds: 0.006784 seconds.

    The actual query duration, which is what the alert was telling you, would have been the full 61 seconds.

    Can we replicate it? Well first I shrunk my undo tablespace down to 50m and made the datafile (one datefile only) non autoextendable. Then I added another block to my table, the rows in this new block will be updated during the sleeps in the pl/sql block, I then rewrite my query to ensure that the rows are fetched from the table in this order (by using first rows and an order by on the column which is indexed). I then did a purposefully abusive loop that generated a lot of undo so that the UNDO generated by my update statement had to be replaced.

    insert /*+append*/into as_test_fetch_times
    select rownum+60, 'x' from dual connect by rownum <=60;
    commit;
    select dbms_rowid.rowid_block_number(rowid), min(column_a) min_col, max(column_a) max_col
    from   as_test_fetch_times
    group by dbms_rowid.rowid_block_number(rowid);
    DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)    MIN_COL    MAX_COL
    ------------------------------------ ---------- ----------
                                     212          1         60
                                     216         61        120 create index as_test_fetch_times_idx on as_test_fetch_times (column_a) tablespace ts1 DECLARE CURSOR quick_cursor IS SELECT column_a, column_b FROM as_test_fetch_times; nColumn_a NUMBER; sColumn_b VARCHAR2(30); BEGIN   OPEN quick_cursor;   LOOP     dbms_lock.sleep(1);     FETCH quick_cursor INTO nColumn_a, sColumn_b;     IF nColumn_a = 55 THEN       dbms_lock.sleep(600);     END IF;     EXIT WHEN quick_cursor%NOTFOUND;   END LOOP;   CLOSE quick_cursor; END; /

    This was my update

    update as_test_fetch_times
    set    column_b  = to_char(i);
    where  column_a > 60;
    commit;
    
    

    And my UNDO abusive block

    create table as_test_big as select * from all_objects;
    delete as_test_big;
    begin
    for i in 1..10
    loop
    insert into as_test_big  select * FROM all_objects ;
    commit;
    delete as_test_big;
    commit;
    end loop;
    end;
    /
    
    

    The result for my querying session (the error came back after the 600 second sleep, it only returns when Oracle goes to fetch the next block).

    ERROR at line 1:

    ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_128326981$" too small

    ORA-06512: at line 9

    And the entry in my alert log:

    Sat May 21 17:30:22 2016

    ORA-01555 caused by SQL statement below (SQL ID: bhnfbdavsd31q, Query Duration=661 sec, SCN: 0x0000.0063b604):

    SELECT sql_fulltext, elapsed_time/1000000 elapsed_time_s from v$sql where sql_id ='bhnfbdavsd31q' and rownum =1;
    
    SQL_FULLTEXT                                                                     ELAPSED_TIME_S
    -------------------------------------------------------------------------------- --------------
    SELECT COLUMN_A, COLUMN_B FROM AS_TEST_FETCH_TIMES                                      .027629
    
    
    

    Notice how the query has been reported as having a duration of 661 seconds due to the way it was fetched but v$sql reports it as taking 0.03 seconds.

    Hopefully that has convinced you of the possibility.

    Hemant K Chitale
«1345

Answers

  • Vsevolod Afanassiev
    Vsevolod Afanassiev Member Posts: 348 Bronze Badge
    edited May 20, 2016 5:25PM

    It is possible that the query did run for 91392 seconds while in manual execution it returns in 20 seconds. For example, it may contain a temporary table: when the application submits this query the temporary table is populated, when you run the same query manually the temporary table is empty so the query returns much faster.

    Regarding undo retention: when autoextend is disabled for datafiles of undo tablespace Oracle ignores parameter undo_retention and tries to keep undo as long as possible.

    Check V$UNDOSTAT:

    - Column MAXQUERYLEN shows runtime of the longest running query. There should be evidence of long-running query that exceeded 90,000 seconds. Something like: 80000, 80600, 81200, 81800, ...

    - Column TUNED_UNDORETENTION shows actual undo retention. At the time of ORA-01555 the values in MAXQUERYLEN and TUNED_UNDORETENTION should be approximately the same.

    ORA-01555 error message contains SQL_ID. Search V$SQL for this SQL_ID, it contains number of buffer gets, disk reads, cpu time, elapsed time. This should confirm that the query took long time to run.

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited May 20, 2016 5:31PM
    jmft2012 wrote:
    
    Oracle 11.2.0.4.0 on Solaris 10.
    
    ORA-01555 caused by SQL statement below (SQL ID: *****, Query Duration=91392 sec, SCN: ****):
    SELECT PC.PROJ_NUMBER, PC.PROJ_NAME, PC.PROGRAM_CODE,
    ***
    ***
    
    
    UNDOTBS1 tablespace 25GB
    undo_management: AUTO
    undo_retention :126000 second
    
    The query under question, I tried it returns a 20 seconds. but 1555 error told "Query Duration=91392 sec".
    and we have undo_retention 126000 sec. the same query caused 1555 three times this week.
    
    Please comment.
    
    TIA
    

    Why have you set undo_retention to 35 hours? Can you really fit that much undo in 25 gb?

    Are users genuinely running this query and it takes 25 hours to return? OR do they start fetching rows, leave their desks, come back the next day, continue fetching data using the same cursor? I suspect the latter if it runs in 20 seconds for you.

  • jmft2012
    jmft2012 Member Posts: 371 Blue Ribbon
    edited May 20, 2016 6:05PM

    Thanks for the input!

    I got the query in question from alert log

    looked into the query of objects there are two tables involved, no views , check whether any temporary table involved with

    select TEMPORARY from dba_tables where table_name=***

    Got"N" for no temporary table.

    Check V$UNDOSTAT:

    select  MAXQUERYLEN from  V$UNDOSTAT where MAXQUERYLEN=91392;

    no rows selected

    91392 was the ORA-1555's

    select max(MAXQUERYLEN) from  V$UNDOSTAT;

    MAX(MAXQUERYLEN)

    ----------------

               91023

    select TUNED_UNDORETENTION from  V$UNDOSTAT where MAXQUERYLEN=91023;

    TUNED_UNDORETENTION

    -------------------

                  26375

    this is much less than parameter undo_retention 126000

    Check on V$SQL

    select CPU_TIME,ELAPSED_TIME,DISK_READS,BUFFER_GETS from v$sql where SQL_ID='<1555 query>';

      CPU_TIME ELAPSED_TIME DISK_READS BUFFER_GETS

    ---------- ------------ ---------- -----------

      14535777     14537734     267717      314566

    the time in micro in 1/1000000 sec

    it said the query complete 14 sec, just like i run it manually.

  • jmft2012
    jmft2012 Member Posts: 371 Blue Ribbon
    edited May 20, 2016 6:14PM

    forgot to state that the autoextent is on for the undo datafile.

  • jgarry
    jgarry Member Posts: 13,842
    edited May 20, 2016 6:46PM

    How big is the file?  How big can it get?

    Here's a quote from asktom:

    "The 1555 is fired when a READ to the database cannot find the undo needed to put a block it wants

    back the way it was when that statement began. EVERY modification that happened from the time the

    statement was opened contributes to that -- EVERY one."

    So perhaps the "three times this week" comes because someone runs it, or something else, that uses up undo?

  • John Thorton
    John Thorton Member Posts: 14,493 Silver Crown
    edited May 20, 2016 6:41PM
    jmft2012 wrote:
    
    Oracle 11.2.0.4.0 on Solaris 10.
    
    ORA-01555 caused by SQL statement below (SQL ID: *****, Query Duration=91392 sec, SCN: ****):
    SELECT PC.PROJ_NUMBER, PC.PROJ_NAME, PC.PROGRAM_CODE,
    ***
    ***
    
    
    UNDOTBS1 tablespace 25GB
    undo_management: AUTO
    undo_retention :126000 second
    
    The query under question, I tried it returns a 20 seconds. but 1555 error told "Query Duration=91392 sec".
    and we have undo_retention 126000 sec. the same query caused 1555 three times this week.
    
    Please comment.
    
    TIA
    

    The SQL (generally a long running SELECT) that reports the ORA-01555 error is the victim; not the culprit.

    The problem is that some session is doing DML against the same table as the long running SELECT & does frequent COMMIT; usually inside a LOOP.

    If the COMMIT is inside a LOOP, the fix is to move the COMMIT after END LOOP.

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited May 20, 2016 6:42PM

    Undo_retention is just a target unless you guarantee it, Oracle will attempt to keep undo for 35 hours if there is space in the undo tablespace (you say autoextend is on for undo datafile, what is the max size? how many data files?).

    If DML is attempted and there is no space to store the UNDO or if the UNDO tablespace can't extend further then it will either clear out old UNDO (so you run into ora-1555 should you actually try to use it) or it will prevent the DML from occurring (which will cause an error for the users).

    I don't believe you need your undo retention that large (and it seems you are not able to store that much undo anyway). Your problem is that these cursors are being executed and fetched from over large periods of time. You need to consider what is calling these SQLs and how they are being used, then you need to fix the underlying problem, is it an application issue or is it a user issue?

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,850 Gold Crown
    edited May 21, 2016 4:48AM

    A different set of values used in the predicate or a change in statistics between the original run and your re-run could explain the change in run-time.

    You can check the latter by checking the last_analyzed_time on the objects involved.

    You could get some corroboration for the former by checking the AWR reports (if you're licenced).  If the query actually ran and was working for 91,000 seconds then it would have to show up somewhere in the AWR report "SQL ordered by" for most of the reports across that time window - check the reports hour by hour.  If you can't see the statement you could still try to pull the statement from the AWR with a call to dbms_xplan.display_awr() to see what that gives you; and if nothing else the normal AWR reports would give you some idea of the "undo records applied" statistics across the time - for the ORA-01555 to appear there must have been a LOT.  There are three statistics that record this information.

    Regards

    Jonathan Lewis

    P.S.  An idle thought - running parallel to Andrew's comment about a quick statement with a long delay between the first and last fetches.  If someone "set transaction read only" (or serializable) and forgot about it, they would eventually end up getting ORA-01555 errors. It's POSSIBLE (but I haven't tested it) that the alter log would report the time the transaction went read only, rather than the time the query started. If that's the case (and I'll leave it to someone else to investigate) then you could have a 20 second query reporting a 91,000 second problem.

  • jmft2012
    jmft2012 Member Posts: 371 Blue Ribbon
    edited May 21, 2016 10:06AM

    "

    A different set of values used in the predicate or a change in statistics between the original run and your re-run could explain the change in run-time.

    You can check the latter by checking the last_analyzed_time on the objects involved.

    "

    I doubt. it does not use bind variable, i think it was coded in an application and i will ask the development. The data should not have been changed over a week.

    it has occurred 3 times last week and i tried it manual each time , it ran less than 20 sec each time.

    As far as whether it shown in AWR, I ran the report for respective time window 25 hours (91392) and it sql_id had not been in. This should be consistent with it was shown in v$sql for the sql_id, which ran 14 sec. because of a such short time , it would not be in AWR's which shown the order of the top  resource used.

    The theory of "set transaction read only" is interesting. I will see if there was relevant logged in alert log and so on. what would be the msg like?

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited May 21, 2016 10:11AM
    jmft2012 wrote:
    
    "
    A different set of values used in the predicate or a change in statistics between the original run and your re-run could explain the change in run-time.
    You can check the latter by checking the last_analyzed_time on the objects involved.
    "
    I doubt. it does not use bind variable, i think it was coded in an application and i will ask the development. The data should not have been changed over a week.
    it has occurred 3 times last week and i tried it manual each time , it ran less than 20 sec each time.
    
    As far as whether it shown in AWR, I ran the report for respective time window 25 hours (91392) and it sql_id had not been in. This should be consistent with it was shown in v$sql for the sql_id, which ran 14 sec. because of a such short time , it would not be in AWR's which shown the order of the top  resource used.
    
    The theory of "set transaction read only" is interesting. I will see if there was relevant logged in alert log and so on. what would be the msg like?
    

    Have you gathered any information from the actual users? What are they actually doing? How do they call this SQL? Do they fetch all rows at once, or can they (as I am guessing) fetch one row, leave their desks for the day, continue fetching from the same cursor the next day?

This discussion has been closed.