1 Reply Latest reply on Apr 13, 2011 1:27 PM by 524722

    Open database transactions during election causing error.

      I've been tracking an issue in my replicated database application, at first I thought it was a generic database issue, and opened a thread in the Berkeley DB forum, here DBCursor reports "Transaction that opened the DB is still active"

      As I dig deeper and deeper it seems the problem is some sort of interaction between my application threads and the repmgr election thread. I have yet to be able to produce a minimal test case, to verify this. But what I have seen is the following:

      1. I startup a threaded, transactional database A starting the repmgr with DB_REP_ELECTION
      2. I startup a threaded, transactional database B also with DB_REP_ELECTION
      3. A is MASTER, B is CLIENT
      4. I then kill A, forcing B to become MASTER
      5. I then issue a number of queries against B.
      Doing these steps I am able to eventually get database B to fail in a call to __db_check_txn, the error string is "Transaction that opened the DB is still active".

      Although my application is multi-threaded, I've reduced it to run with only a single thread. Furthremore the value of dbp->cur_locker->tid is the same thread id as the thread issuing the DB->get which causes the error.

      I have enough debug logging on to know that the election thread exits before this error appears, but I can also see that the error does not seem to appear if no election ever occurs (i.e. if I only start up a single database this doesn't happen) and the error also does not appear when the MASTER database was not first a CLIENT.

      I am building against libdb.a version 5.1.25

      The database handles are all opened with DB_AUTO_COMMIT, and the calls to DB->get are passing NULL in the txn parameter.

      Any ideas what could be causing this error?
        • 1. Re: Open database transactions during election causing error.
          Hello Chris,

          Although I have no immediate ideas about the issue, could you please turn on replication verbose messages in your application and reproduce the error with the simplest setup (i.e. 1 thread). You can see the dbenv->set_verbose man page and use the DB_VERB_REPLICATION flag.

          That will genereate a lot of output. You can then contact me by email using the typical form of firstname.lastname@oracle.com as I've spelled it below. Thanks.

          Sue LoVerso