Forum Stats

  • 3,816,521 Users
  • 2,259,200 Discussions
  • 7,893,502 Comments

Discussions

Issues with "in-doubt" distributed transactions

Weis
Weis Member Posts: 11
edited Jul 11, 2014 11:12AM in SQL & PL/SQL

Hello all,

I could use some advice with the following problem. Hopefully you're able to help me a bit in the right direction.

On several remote locations (each having his own unique instance name, but same configuration/applications) we keep seeing the same behavior. Each location has two instances running on the same database server. Application A is running in Instance 'PROD1' and Application B in instance 'PROD2'. There is a Database Link between 'PROD1' and 'PROD2'.

Although this setup has runned fine for years, since this year occasionaly errors keep coming up in the alert file:

"ORA-24756: transaction does not exist"

Via the trace files we see that these relate to records in the 'dba_2pc_pending' View. In this View we indeed find transactions that are executed via the Database Link. All records have the State 'collecting'. Both in Instance 'PROD1' and 'PROD2'. I could purge them via the command:

execute dbms_transaction.purge_lost_db_entry ('[LOCAL_TRAN_ID]')

That however will not solve our issue in a structural manner, as the FAIL_TIME shows that recently (actually yesterday) more of these "in-doubt" transactions occured. Currently I have no clue about why this happened. If it was a network related issue, I would not expect that it happened at completely seperated environments. There are no signs of that an Oracle process has crashed, or there was any interference. Also it concerns two local instances, running on the same database server. Don't see (yet) what can cause that a distributed transaction would fail.

It would help if I was able to identify what the statements actually where. My question is whether it is possible to use the LOCAL_TRAN_ID, or the GLOBAL_TRAN_ID, to fetch the SQL Text in some way. I don't know whether these values can be related to other available Data Dictionary Views where the SQL Text is stored? Or perhaps you have a suggestion what could be a better approach to determine what is going on. If I need to provide more information, please let me know.

FYI: we use Oracle Database 11.2 on Windows Server 2008 R2 SP1 servers.

Thanks for your replies/suggestions in advance.

Regards,

Weis

Tagged:

Answers

  • Zlatko Sirotic
    Zlatko Sirotic Member Posts: 2,134 Gold Trophy
    edited Jul 8, 2014 12:06PM

    So:

    - you use Oracle-Controlled Distributed Transactions (not XA Based Distributed Transactions ?)

    - you have database server(s) with two databases on each server (without RAC, I guess)

    - local transactions in both databases ends with state 'collecting' (this state usually have databases that are global or local coordinators).

    Are you sure that the distributed transaction does not include a third database, which is located on a remote server?

    Regards

  • Weis
    Weis Member Posts: 11

    Hello Zlatko,

    Thanks for your reply.

    - Yes, this concerns Oracle-Controlled Distributed Transactions.

    - Correct, on each remote location it concerns one database server with two databases ('PROD1', 'PROD2'). We don't use RAC.

    - I'm sure that these are not transactions to another remote database. We see similar occurences in 'dba_2pc_pending' in 'PROD1' and 'PROD2' (with same GLOBAL_TRAN_ID).

    Will also look a bit more into the documentation, don't know exactly how local- and global coordinators work.

    In the meantime, if you have some suggestions about anything I can check, please let me know.

    Regards,

    Weis

  • Zlatko Sirotic
    Zlatko Sirotic Member Posts: 2,134 Gold Trophy
    edited Jul 9, 2014 8:56AM

    > It would help if I was able to identify what the statements actually where.

    See MOSC Note 1327159.1:

    How to check the user and the SQL text (DML) of a distributed transaction found in dba_2pc_pending

    Regards

    Zlatko Sirotic
  • Weis
    Weis Member Posts: 11
    edited Jul 10, 2014 5:03AM

    Hi again,

    Well, I tried the query:

    SELECT a.sql_text
         , s.osuser
         , s.username
      FROM v$transaction t
         , v$session s
         , v$sqlarea a 
     WHERE s.taddr = t.addr
       AND a.address = s.prev_sql_addr
       AND t.xidusn = [LOCAL_TRAN_ID_1]
       AND t.xidslot = [LOCAL_TRAN_ID_2]
       AND t.xidsqn = [LOCAL_TRAN_ID_3]
    ;
    

    However, this would only help if I would be able to catch the issue on the spot. As it are all transactions from days ago until somewhere early this year, I can't identify the statements via v$transaction and v$sqlarea.

    Is there also a way to relate the LOCAL_TRAN_ID to Views/Tables from for example Statspack or AWR (did not find usable documents via Metalink yet)?

    For now I'm going to purge the transactions. They have the State 'collecting', so I assume that this can't lead to any inconsistency. As it hasn't been able to prepare the statements, I don't think anything has been executed so far. Please let me know if you think these assumptions are wrong. The errors keep coming up in the alert files, which makes it more difficult to keep overview.

    Next steps are trying to reproduce this behavior and monitor the environments a bit more closely, so I can check things immediately on the moment this happens.

    Regards,

    Weis

  • Zlatko Sirotic
    Zlatko Sirotic Member Posts: 2,134 Gold Trophy
    edited Jul 10, 2014 8:59AM

    > For now I'm going to purge the transactions.

    It is not necessary to immediately delete the records in dba_2pc_pending.

    I think that they do not interfere with the normal operation of the database.

    > They have the State 'collecting', so I assume that this can't lead to any inconsistency.

    You're right, there should not be a problem when the state is 'collecting' in both databases.

    We assume that a distributed transaction works on only two databases (located on the same server).


    Let's try to imagine a scenario how distributed transaction work in your case.

    Assume the following:

    1. Database PROD1 gives:

    UPDATE local_table ...

    UPDATE [email protected] ...

    COMMIT

    2. Database PROD1 is a (global) Coordinator

    3. Database PROD1 is a Commit Point Site

    (if the parameter COMMIT_POINT_STRENGTH in the database PROD1 is greater than or equal to this parameter in the database PROD2)

    4. Global Coordinator (= PROD1) sends a message to the database PROD2 for preparation

    (local transaction in the database PROD1 has the state 'collecting')

    5. The database PROD2 not prepared (the state of the local transaction is 'collecting', not 'prepared')

    Of course, we do not know why the database PROD2 not prepared.

    I'll try to speculate something: perhaps the problem is the value of parameters CONNECT_TIME / IDLE_TIME within PROD2 database.

    Regards

  • Weis
    Weis Member Posts: 11
    Zlatko Sirotic wrote:
    
    > For now I'm going to purge the transactions.
    It is not necessary to immediately delete the records in dba_2pc_pending.
    I think that they do not interfere with the normal operation of the database.
    

    The reason I wanted to get rid of these records was that the alert logs kept repeating this error ("ORA-24756: transaction does not exist"). This kind of makes it more difficult to keep track of the really urgent issues. So on one of the remote locations I purged the transactions. If this does not cause further problems I'll do the same in the other locations. And then just monitor whether/when it will happen again. This has been very infrequent, so not sure yet what could be that 'external factor' then. I don't think the 'COMMIT_POINT_STRENGTH' or Connect Time / Idle Time play a role here.

    Thanks and have a nice weekend,

    Weis

  • Zlatko Sirotic
    Zlatko Sirotic Member Posts: 2,134 Gold Trophy
    edited Jul 11, 2014 11:12AM

    Of course, COMMIT_POINT_STRENGTH can not be the cause of your problem.

    The node in the transaction with the highest value for COMMIT_POINT_STRENGTH will be the commit point site, and commit point site never enters the state 'prepared'.

    I mentioned COMMIT_POINT_STRENGTH only to depict one possible sequence of events.

    Have a nice weekend.


    Regards

This discussion has been closed.