12 Replies Latest reply: Sep 6, 2013 11:51 AM by jgarry RSS

    Diagnosing ORA-01555: what table had the unreconstructable block?

    mtefft

      We have some very long-running batch processes (i.e. > 20 hours for a single INSERT statement) and we sometimes get  ORA-01555 (snapshot too old) failures. Our undo_retention is 10800 seconds=3 hours.

       

      We have discussed (with the development team) the combination of events that are needed for this situation to arise. They are unable to think of a situation where any of the tables (in today's incident, there are 7 of them) would have been updated after this job started.

       

      Is there a method for analyzing the trace file to determine which table had the block that could not be reconstructed due to overwritten UNDO? This might give us ideas for how to avoid this via operational changes (i.e. changing the job schedule).

       

      Of course, we are also pursuing tuning efforts for this query.

       

      Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi

      PL/SQL Release 10.2.0.5.0 - Production

      CORE 10.2.0.5.0 Production

      TNS for Solaris: Version 10.2.0.5.0 - Production

      NLSRTL Version 10.2.0.5.0 - Production

       

      Thanks,

      Mike

        • 1. Re: Diagnosing ORA-01555: what table had the unreconstructable block?
          sb92075

          does any COMMIT exist within  LOOP?

          • 2. Re: Diagnosing ORA-01555: what table had the unreconstructable block?
            jgarry

            How big is your undo?  Why don't you have a retention time as long as the process?  Do you use a guarantee?  Is it possible a transaction started by someone else before the batch started was left open?  Does the batch insert or update anything?  Have you considered the undo might be about a related oracle internal rather than the table itself?  What exactly does the alert log say?  Where can I go to get organic...

            • 3. Re: Diagnosing ORA-01555: what table had the unreconstructable block?
              mtefft

              Not within this process. It is one big INSERT /*+ APPEND */. There is a function call in the INSERT , that has a SELECT but no COMMIT.

               

              I don't know if there are other processes running during this time that did any update/insert/delete - that is what I was hoping to pursue with this avenue. If I had a specific table to track down, it would be easier.

              • 4. Re: Diagnosing ORA-01555: what table had the unreconstructable block?
                sb92075

                >Not within this process. It is one big INSERT /*+ APPEND */. There is a function call in the INSERT , that has a SELECT but no COMMIT.


                Does the big INSERT contain SELECT clause to provide the data/rows to the INSERT?


                any table from which SELECT operates against can throw ORA-01555 when other session does DML with COMMIT against the same table.

                the COMMIT informs the DB that the UNDO is no longer required & therefore is subject to being overwritten.

                 

                • 5. Re: Diagnosing ORA-01555: what table had the unreconstructable block?
                  mtefft

                  The alert log says:

                  Thu Sep 05 06:10:54 EDT 2013

                  ORA-01555 caused by SQL statement below (SQL ID: dxbgc634kbz74, Query Duration=82905 sec, SCN: 0x0986.73cba867):

                  Thu Sep 05 06:10:54 EDT 2013

                  insert /*+ APPEND */ into ODI_WORK.I$_EIN (  ...

                   

                  and the client message was

                  ORA-01555: snapshot too old: rollback segment number 51 with name "_SYSSMU51$" too small

                   

                  UNDO is approx 620 GB.

                  Why isn't our UNDO sized for 20 hours? or 50 hours? We want the developers to tune their stuff, or break it down, so it doesn't run so ridiculously long.

                   

                  > Is it possible a transaction started by someone else before the batch started was left open?

                  I don't know, if I knew the table it would help the investigation.

                  > Does the batch insert or update anything?  Just this one table, which the session created just before the insert started.

                  > Have you considered the undo might be about a related oracle internal rather than the table itself?

                  Seriously, is that a risk?

                   

                  Following MOS doc 467872.1 I can conclude that it is a rollback segment, not LOB retention. Not much else.

                  • 6. Re: Diagnosing ORA-01555: what table had the unreconstructable block?
                    mtefft

                    Yes, I understand the scenario that leads to ORA-01555. That is why I said that I was trying to discuss with the developers but 'they are unable to think of a situation where any of the tables (in today's incident, there are 7 of them) would have been updated after this job started.'

                     

                    There are 7 tables involved in teh SELECT query that is driving this INSERT. I want to know which of them was the source of the error.

                    • 7. Re: Diagnosing ORA-01555: what table had the unreconstructable block?
                      sb92075

                      >There are 7 tables involved in teh SELECT query that is driving this INSERT. I want to know which of them was the source of the error


                      AUDIT DML against all 7 of those tables.

                      • 8. Re: Diagnosing ORA-01555: what table had the unreconstructable block?
                        sb92075

                        mtefft wrote:

                         

                        Yes, I understand the scenario that leads to ORA-01555. That is why I said that I was trying to discuss with the developers but 'they are unable to think of a situation where any of the tables (in today's incident, there are 7 of them) would have been updated after this job started.'

                         

                        There are 7 tables involved in teh SELECT query that is driving this INSERT. I want to know which of them was the source of the error.

                         

                        If you are desperate for an answer, you could use DBMS_LOGMNR to look for DML against any of the 7 tables involved in providing data to the INSERT.

                        • 9. Re: Diagnosing ORA-01555: what table had the unreconstructable block?
                          rp0428

                           

                          There are 7 tables involved in teh SELECT query that is driving this INSERT. I want to know which of them was the source of the error.

                          Why not just prevent the problem from occuring in the future?

                           

                          Just put the tables into READ ONLY mode when the batch process starts and that exception can't occur.

                           

                          An 'ALTER TABLE myTable READ ONLY' will succeed if no one is doing DML on the table and will fail if they are. You don't even need to ask the developers what their opinion is.

                          • 10. Re: Diagnosing ORA-01555: what table had the unreconstructable block?
                            mtefft

                            Unfortunately ALTER TABLE mytable READ ONLY is an 11g capability...but we may find a way to approximate this.

                            • 11. Re: Diagnosing ORA-01555: what table had the unreconstructable block?
                              rp0428

                              Well you could still check to see if there are any uncommitted transactions for those tables:

                               

                              Query the DBA_DML_LOCKS view (and/or DDL view)

                              http://docs.oracle.com/cd/E11882_01/server.112/e25513/statviews_3157.htm

                              select * from DBA_DML_LOCKS

                               

                              SESSION_ID,OWNER,NAME,MODE_HELD,MODE_REQUESTED,LAST_CONVERT,BLOCKING_OTHERS

                              10,SCOTT,EMP_READ_ONLY,Row-X (SX),None,1163,Not Blocking

                              69,SCOTT,EMP_READ_ONLY,Row-X (SX),None,1648,Not Blocking

                              That shows that two session are holding exclusive row lock on the EMP_READ_ONLY TABLE. As a test I inserted a row from two different sessions but did NOT commit it yet.

                               

                              The DDL version of the view can be used in addition.

                               

                              See my reply Sep 4, 2013 12:23 AM  in this thread from yesterday for more info.

                              https://forums.oracle.com/message/11174417#11174417

                               

                              At least in 10g you can tell before you start if there are any issues to resolve. If there are no issues then once you start your query there won't be any issues that could cause the snapshot problem since your query will be first in line.

                              • 12. Re: Diagnosing ORA-01555: what table had the unreconstructable block?
                                jgarry

                                Hopefully by now you understand that the "caused by" in the alert log message is really a misnomer - it should say something like "thrown by," the insert is really a victim of something else (assuming you don't do the commit in loop sb asked about).  I don't see a reference in your MOS note to some of the older standard possibilities, such as delayed block cleanout, see http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:275215756923

                                 

                                So, it could be that some huge table (or seven) has a bunch of blocks getting updated before your huge process even starts, only to be blown off at the knees through no fault of the process.