9 Replies Latest reply: Dec 22, 2009 11:09 AM by 674194 RSS

    Commited transaction durability and DB_REP_HANDLE_DEAD

    674194
      From the BDB 4.8 manual "Dead replication handles happen whenever a replication election results in a previously committed transaction becoming invalid. This is an error scenario caused by a new master having a slightly older version of the data than the original master and so all replicas must modify their database(s) to reflect that of the new master. In this situation, some number of previously committed transactions may have to be unrolled."

      In the application I am working on, I can't afford to have committed transactions "unrolled". Suppose I set my application to commit transactions only when a majority of electable peers acknowledges the transaction and stop the application on a DB_EVENT_REP_PERM_FAILED event. Will that guarantee the durability of committed transactions and (equivalently) guarantee that no replica will ever see DB_REP_HANDLE_DEAD error (assuming absence of bugs)?

      Also as I understand it DB_REP_HANDLE_DEAD errors should never be seen on the current the master, is this correct? Is there a way to register a callback with the Replication Manager

      -Sanjit
        • 1. Re: Commited transaction durability and DB_REP_HANDLE_DEAD
          524722
          Yes, that should protect you. In particular as long as you set the priority of the
          site early on and don't change it. Elections are always based on a site having
          the latest LSN/log records. However, having sites that have 0 priority muddies the
          water and those sites could still get HANDLE_DEAD. If all sites have non-zero
          priority and the same ack policy then I think what you described will work.

          Do you intend on having all sites electable?

          Sue LoVerso
          Oracle
          • 2. Re: Commited transaction durability and DB_REP_HANDLE_DEAD
            674194
            Yes, I intend on having all sites be electable. Also as far as I am concerned all sites have equal priority so if it set them all to the same priority (say 1), never change the priority and all replicas have a commit policy of wait for ack from quorum of electable peers, would that be enough to avoid the DEAD_HANDLE case? Also what error would I see if I tried to do a write on a replica that is no longer the master ?


            -Sanjit
            • 3. Re: Commited transaction durability and DB_REP_HANDLE_DEAD
              524722
              I think it is important to separate the txn commit guarantees from the
              HANDLE_DEAD error return. What you are describing mitigates the
              chance of getting that error, but you can never eliminate it 100%.

              Your app description for your group (all electable, quorum ACKs)
              uses the best scenario for providing the guarantees for txn commit.
              Of course the cavaets still remain that you run risk if you use TXN_NOSYNC
              and if you have total group failure and things in memory are lost.

              Also, it is important to separate making a txn guarantee at the master site
              with getting the HANDLE_DEAD return value at a client site. The
              client can get that error even with all these safeguards in place.

              But, let's assume you have a running group, as you described, and
              you have only the occasional failure of a single site. I will describe
              at least 2 ways a client can get HANDLE_DEAD while your txn integrity
              is still maintained.

              Both examples assume a group of 5 sites, call them A, B, C, D, E
              and site A is the master. You have all sites electable and quorum
              policy.

              In the first example, site E is slower and more remote than the other 4
              sites. So, when A commits a txn, sites B, C, and D quickly apply that
              txn and send an ack. They meet the quorum policy and processing
              on A continues. Meanwhile, E is slow and slowly gets further and
              further behind the rest of the group. At some point, the master runs
              log_archive and removes most of its log files because it has sufficient
              checkpoint history. Then, site E requests a log record from the master
              that is now archived. The master sends a message to E saying it has
              to perform an internal initialization because it is impossible to
              provide that old log record. Site E performs this initialization (under the
              covers and not directly involving the application) but any
              DB handles that were open prior to the initialization will now get
              HANDLE_DEAD because the state of the world has changed and
              they need to be closed and reopened.

              Technically, no txns were lost, the group has still maintained its
              txn integrity because all the other sites have all the txns. But E cannot
              know what may or may not exist as a result of this initialization so
              it must return HANDLE_DEAD.

              In the second example, consider that a network partition has happened
              that leaves A and B running on one side, and C, D, and E on the other.
              A commits a txn. B receives the txn and applies it, and sends an ack.
              Site A never hears from C, D, E and quorum is not met and PERM_FAILED
              is returned. In the meantime, C, D, and E notice that they no longer can
              communicate with the master and hold an election. Since they have a
              majority of the sites, they elect one, say C to be a new master. Now,
              since A received PERM_FAILED, it stops. If the network partition
              is resolved, B will find the new master C. However, B still has the
              txn that was not sufficiently ack'ed. So, when B sync's up with C, it
              will unroll that txn. And then HANDLE_DEAD will be returned on B.

              In this case, the unrolled txn was never confirmed as durable by A to
              any application, but B can get the HANDLE_DEAD return. Again, B
              should close and reopen the database.

              I think what you are describing provides the best guarantees,
              but I don't think you can eliminate the possibility of getting that error
              return on a client. But you can know about your txn durability on the
              master.

              You might also consider master leases. You can find a description of
              them in the Reference Guide. Leases provide additional guarantees
              for replication.

              Sue LoVerso
              Oracle
              • 4. Re: Commited transaction durability and DB_REP_HANDLE_DEAD
                674194
                Hi Sue,

                Thanks for the details.

                My application does not use TXN_NOSYNC. Also I intend to setup my application so that the master process exits if it does not get acks from the quorum within the timeout period (by registering for DB_EVENT_REP_PERM_FAILED events). Also the replicas exist purely for fault tolerance so all reads and writes will be served by the master. Failures are not expected very often so under these circumstances, I'm guessing it would be very rare to see the HANDLE_DEAD case and this could be further avoided by simply reopening the handles on all sites after a new election. As I understand it, I can't reopen the handles in the callback function directly and need to do this in a separate thread. Is that right?

                Thanks,
                Sanjit
                • 5. Re: Commited transaction durability and DB_REP_HANDLE_DEAD
                  524761
                  Hi Sanjit,

                  Yes, it is true that you must refrain from invoking any Berkeley DB
                  functions from within a callback function.

                  Even in the scenario that you outline, certainly Sue's first example
                  still applies. So it would still be possible to see HANDLE_DEAD.

                  The recommended behavior is for applications to check for the
                  HANDLE_DEAD return from each operation. When you get the HANDLE_DEAD
                  error return, all you have to do is close the DB handle, and then
                  re-open it and you should be able to retry the operation.

                  Alan Bram
                  Oracle
                  • 6. Re: Commited transaction durability and DB_REP_HANDLE_DEAD
                    674194
                    In my application I have multiple threads sharing the same DB handle.
                    Wouldn't doing a close and then and open on the DB handle introduce a race condition ? I'd rather not have to use my own mutex to protect all DB handle access to avoid such a case. Assuming that close and open are individually atomic, what error would I see if another thread tried to access the DB using a closed handle ? I suppose I could treat both these errors in the same way that I currently treat the DB_DEADLOCK which is to wait a random amount a time and retry the operation.

                    Btw is there a way to "atomically" reopen (do close+open in a single atomic step) the handles ?

                    -Sanjit
                    • 7. Re: Commited transaction durability and DB_REP_HANDLE_DEAD
                      524722
                      No, there is no "reopen" API.

                      You are correct that closing and opening a DB handle introduces the
                      race condition. I suspect most multi-threaded apps have each thread open
                      up its own DB handle, while sharing the DB_ENV handle, for the reason
                      you stated. If you have a pool of worker threads then you would only need
                      to open when the thread starts and close when it exits or when it would
                      get HANDLE_DEAD, which should be rare, but possible. That way each
                      thread could just operate on its own DB handle without conflict.

                      Otherwise, if you ultimately choose to share a handle, you also have the
                      responsibility to protect its access and modification as needed. A mutex
                      would work or a read/write locking mechanism, where any change in the
                      dbp itself (close/open) would need the write lock and the vast majority of
                      operations (get, put, del, etc) would just need the read lock.

                      If you access the DB handle after a close the results are undefined. You
                      are accessing freed memory. It could potentially be reallocated elsewhere,
                      or reinitialized. BDB itself, on close, ultimately calls free() and puts the
                      handle back on the memory heap and any call to malloc() within your
                      process could reallocate the memory that was the handle.

                      Sue LoVerso
                      Oracle
                      • 8. Re: Commited transaction durability and DB_REP_HANDLE_DEAD
                        524722
                        I just reread this piece more carefully. The HANDLE_DEAD return value can
                        only be returned on client sites. The master will not get that return value. You
                        state that all read and write operations will be happening on the master. Do you
                        use your clients at all for DB operations?

                        Sue LoVerso
                        Oracle
                        • 9. Re: Commited transaction durability and DB_REP_HANDLE_DEAD
                          674194
                          Thanks for the suggestions, Sue. I use clients only for replication, not for any application read/writes.

                          In addition, for now it is sufficient that no data is lost and its acceptable if the application shuts down and all replication sites are manually restarted (after copying an archive from a replica to a fresh machine if required). Given this case, I think I can ignore the HANDLE_DEAD case for now.

                          Eventually, I will follow your suggestion and re-write my code so that threads don't share the DB handle and instead only share the DBEnv. Then I can deal with the HANDLE_DEAD case as it arises on an a site which switches from client to master (clients will continue to be used only for replication/durability).

                          -Sanjit