12 Replies Latest reply: Jul 5, 2008 8:42 AM by 597600 RSS

    Deadlock handling for beginners

    597600
      Not being the brightest person in the universe, I've spent some time trying to figure out how to handle DB_LOCK_DEADLOCK for my application.

      It's a Web application using the PHP interface from Apache. The application is currently accessing the container in a read-only fashion. There is an update process, which time and again accesses the container to do some updating.

      The environment is created with DB_INIT_MPOOL | DB_INIT_LOG | DB_INIT_TXN | DB_INIT_LOCK. This is a setup for transactions, as the update process uses transactions.

      During updates, another writer, or another reader, may receive DB_LOCK_DEADLOCK when trying to access data that resides on a page locked by the update.

      I do not want the web application to fail when accessing the container while an update is going on. So I have it catch the XmlException that results from the DB_LOCK_DEADLOCK, pause one second, and retry, up to ten times. (Then, I want it to fail.)

      Now if some process holds a lock and keeps holding for 30 seconds, the web application will block on the request until the lock is released. It does not receive DB_LOCK_DEADLOCK until the locker releases its lock.

      I found out that having the db_deadlock utility run with -t 1, I can trigger deadlock detection, so the web application will be sent DB_LOCK_DEADLOCK, which it can handle to pause, retry the lookup, and eventually succeed, or fail.

      Could someone more experienced than I am please confirm this approach is - or is not - completely insane?

      Any suggestions as to how to handle this more professionally highly appreciated.

      Would the read-only web application have any benefit from opening the container using DBXML_TRANSACTIONAL in addition to DB_RDONLY?

      Note that mod_db4, which is included in the source code distribution (dbxml-2.4.13/db-4.6.21/mod_db4), is probably not an option at the moment.

      Michael Ludwig
        • 1. Re: Deadlock handling for beginners
          524395
          Michael,

          Why can't you use mod_db4? It was designed to solve exactly this issue. The process model for Apache (or any http server) is essentially to hand off work to a worker process/thread and then to interrupt that process/thread at any time for any number of reasons. That turns out to be really hard on DB for many reasons, which again is why we have mod_db4.

          -greg
          • 2. Re: Deadlock handling for beginners
            597600
            Michael,

            Why can't you use mod_db4?
            Hi Greg,

            thanks for taking the time to reply. Why can't I use mod_db4? Last time I tried it there were error messages showing up in the Apache log files:

            "Transaction and database from different environments"
            Re: Transactions in Multi-Process Application with PHP

            I remember you not excluding a bug in mod_db4 (in an email sent to me). Unfortunately, I'm still not up to dealing with this. So while I think I've understood that mod_db4 is conceptually the path to follow, given my current skill set this path won't lead me very far.

            The handle caching done by mod_db4 is certainly good, but in my app I have found environment and container opens to be pretty quick.

            As for recovery, I'll simply have to make sure the need will not arise :-)

            Michael Ludwig
            • 3. Re: Deadlock handling for beginners
              Gmfeinberg-Oracle
              Michael,

              Welcome to the brave new world of transactions!

              This topic can get complicated which is why mod_db4 was created. Before going into details, are you using transactions because you need concurrency or because you need recoverability from failures, or both? For example, if your update process is infrequent and you don't mind if your readers hold off during updates you can choose to use the CDS/CDB configuration (single writer, multiple readers. This is much simpler to code but the problem with CDS is that you do not get recoverability from errors so you need a backup/recovery strategy.

              Regards,
              George
              • 4. Re: Deadlock handling for beginners
                597600
                George, thanks for taking the time to give us a couple of helpful answers!
                [...] are you using transactions because
                you need concurrency or because you need
                recoverability from failures, or both?
                As far as I've understood, CDS/CDB means there can either be a single writer or multiple readers. This would make the site difficult to use during updates. Reading from the database would fail. The update process might have to wait a long time to be granted a write lock.

                While at the moment this would probably work, in the future it wouldn't. So I prefer using transactions and offer concurrency, if possible.

                Recoverability is also appreciated, because else there needs to be a backup process, and also a process detect the need for restoring the backup, and also doing it. Recoverability seems better, to me. But I may be wrong ...

                So, to answer your question, I guess I want both.

                In the RD_ONLY web app, I've set up a loop to retry an operation in case a deadlock occurs. And it works when db_deadlock is also running. If db_deadlock is not running, the DB_LOCK_DEADLOCK is only returned when the conflicting locker exits; in this scenario, I do not have control over the lapse of time that is consumed. So db_deadlock (or an equivalent program using the API) seems to be a necessity.
                  $ts1 = microtime(true);
                  $retries = 0;
                  while ( true ) {
                    try {
                      $results = $mgr->query($preamble . $qry, $qc, $flags);
                      $ts2 = microtime(true);
                      if (true) error_log(sprintf("%s ### Zeit: %f", $qry, $ts2 - $ts1));
                      return $results;
                    }
                    catch ( XmlException $ex ) {
                      error_log( microtime( true) . " " . $ex->what());
                      $err = $ex->getDbErrno();
                      if ( $err === DB_LOCK_DEADLOCK && ++$retries < 10 ) {
                        error_log( "Versuch " . ($retries + 1));
                        sleep( 1);
                        continue;
                      }
                      error_log( "Wir kapitulieren.");
                      throw( $ex);
                    }
                  }
                Does this look about ok? Note I'm not using transactions for the RD_ONLY web app at the moment. Would there be a benefit? Should I reconsider?

                As for mod_db4 - I had it all set up, but the "Transaction and database from different environments" error messages discouraged me. I couldn't exclude a bug in my code, and Greg couldn't exclude a bug in mod_db4, so I gave up not having the skillset needed to tackle this.

                Best regards,

                Michael Ludwig
                • 5. Re: Deadlock handling for beginners
                  524395
                  Michael,

                  I forget, what bug didn't I exclude from mod_db4? If there is a bug, I'll make sure we have a SR against it and we fix it in the next release. I don't recall where mod_db4 is broken. Can you explain?

                  -greg
                  • 6. Re: Deadlock handling for beginners
                    Gmfeinberg-Oracle
                    Michael,

                    CDS or transactions, you still need to handle backup in some way. With transactions you also have to manage log files (which will be created and grow forever -- so you need to manage them, usually as part of your backup strategy). Have you read the documentation on using transactions with BDB XML?
                    http://www.oracle.com/technology/documentation/berkeley-db/xml/index.html
                    I assume you have since you referenced it in another post :-)

                    A few points:
                    1. You don't need to use db_deadlock. You can set lock detection in the environment. It seems that this DbEnv interface is not exported in PHP which would mean that you'd need to use a DB_CONFIG file to set it. That, or I can give you code to export it. The documented interface is set_lk_detect() on the DbEnv (or DB_ENV) object.
                    2. do you expect your updater to take a really long time? If not, there's no reason to sleep in your loop. If so, why? Long-running transactions are a bad idea in general and worse when there is concurrency.
                    3. You must open your containers using DBXML_TRANSACTIONAL in order to get transactional containers (or pass a transaction to the openContainer() interface). This applies to readers as well as writers. If you do not you will eventually hang somewhere in a reader. This also means your readers need to use transactions.
                    4. Your initial open of the environment needs to handle recovery -- and it must be done in a single-threaded manner (at application/server startup). This means passing the DB_RECOVER flag. There are rules on this. This is a major part of what mod_db4 handles -- single-threaded recovery.
                    5. You will get better read/write concurrency and virtually no deadlocks at all if you use MVCC -- open your containers using DB_MULTIVERSION and use DB_TXN_SNAPSHOT on your read transactions. Your writer transactions just stay the same.

                    Test all of this well and pass along the results!

                    Regards,
                    George
                    • 7. Re: Deadlock handling for beginners
                      597600
                      Greg Burd wrote:
                      I forget, what bug didn't I exclude from mod_db4?
                      If there is a bug, I'll make sure we have a SR
                      against it and we fix it in the next release. I
                      don't recall where mod_db4 is broken. Can you
                      explain?
                      Sorry, I can't. It was not a specific bug. There was this error message showing up, "Transaction and database from different environments". I was (and still am) clueless as to what it could mean. I just had a feeling this was going to be a bit too difficult for me. This could have been my buggy code just as well. Anyway, it discouraged me.

                      Michael Ludwig
                      • 8. Re: Deadlock handling for beginners
                        524395
                        Michael,

                        I'll agree that there is a "bug" in the sense that the error message isn't all that helpful. :) Also, mod_db4 needs some attention in general. We could make it easier to use, document it more, and update it so that it works well with Apache 2.x. So, in that sense I appreciate your frustration.

                        However, it was designed to manage the exact situation that you're trying to do on your own. A very smart Apache/PHP engineer worked in conjunction with our lead DB engineer for months to get mod_db4 just right. So, as a novice or a beginner (as you describe yourself) to DB concepts I'd say that you're likely better off fighting to understand mod_db4 (read the code, find the error message) than going it alone and re-inventing a rather tricky wheel.

                        As for our part, we'll work to find resources to update and improve mod_db4's code and docs for our next release. Until then I hope you can make progress and not get discouraged.

                        but, that's just my $0.02, cheers,

                        -greg
                        • 9. Re: Deadlock handling for beginners
                          597600
                          Michael,

                          CDS or transactions, you still need to handle backup
                          in some way. With transactions you also have to
                          manage log files (which will be created and grow
                          forever -- so you need to manage them, usually as
                          part of your backup strategy).
                          This is easy: db_archive to the rescue! I simply pipe the names of the logs I don't need for normal recovery to STDOUT, scp/tar them to another machine, and then delete them.
                          Have you read the documentation on using transactions with BDB XML?

                          ttp://www.oracle.com/technology/documentation/berkeley-db/xml/index.html
                          I assume you have since you referenced it in another post :-)
                          Well, yes, of course - I always read the documentation :-) There may be some details I have yet to understand, though ...
                          A few points:
                          1. You don't need to use db_deadlock. You can set
                          lock detection in the environment. It seems that
                          this DbEnv interface is not exported in PHP which
                          would mean that you'd need to use a DB_CONFIG file to
                          set it. That, or I can give you code to export it.
                          The documented interface is set_lk_detect() on the
                          DbEnv (or DB_ENV) object.
                          My DB_CONFIG file:

                          set_cachesize 0 20480000 1
                          set_lk_max_lockers 1000
                          set_lk_max_locks 2000
                          set_lk_max_objects 2000
                          set_txn_timeout 1000
                          set_lock_timeout 1000
                          set_lk_detect DB_LOCK_OLDEST

                          The cache size is 20 MB for a ~ 150 MB database. I think it does no harm to raise that to 100 MB, does it?

                          The locks and lock objects are raised from the default values because I needed just a little bit more than the default. I made the adjustment on account of the "db_stat -c" output.

                          Transaction timeout and lock timeout are 1 ms each. Maybe too little? What should I set this to?

                          Deadlock handling strategy, hm. I don't know. Just picked one. Probably not the best choice. Thinking about it, I don't want the writer to be selected for abortion, so I should probably tell BDB to abort the thread using the MINWRITE criterion. What do you think?

                          As for db_deadlock, I can't see how I could do without it. In my experience so far, it is not sufficient to have set_lk_detect sitting around in the config file. In the GSG XML TXN, Chapter 3, Section "Configuring the Transaction Subsystem", I'm reading the following:

                          "[...] transaction timeouts are checked only when BDB XML examines its lock tables for blocked locks [...] Therefore, a transaction's timeout can have expired, but the application will not be notified until BDB XML has a reason to examine its lock tables."

                          I take this to explain the behaviour I'm seeing. There is a write lock on the container, the web app blocks, but no DB_LOCK_DEADLOCK is arriving. When running db_deadlock, the lock tables are being examined and DB_LOCK_DEADLOCK is returned to the application. Am I on the wrong tracks here?

                          And the sleep in the loop, well - I'll get rid of it. Sort of strange what I've set up: There is (1) the (ultra-short) transaction timeout in DB_CONFIG, (2) the db_deadlock interval of one second, and (3) one second of sleep in the application loop. Ideally, it would work with just the transaction timeout in DB_CONFIG - but I think I need another thread that checks the lock tables to trigger the timeout (like db_deadlock).
                          . do you expect your updater to take a really long
                          time? If not, there's no reason to sleep in your
                          loop. If so, why? Long-running transactions are a
                          bad idea in general and worse when there is
                          concurrency.
                          Well, it does take quite some time (substring indexing going on) - but it's not one giant transaction. It basically checks a delivery set of about 50,000 XML documents that are ready for import and compares these to the XML stored in the database, based on ID and mtime. A document is looked up by ID. If it's not there, it's a putDocument() operation. If it's there and has an older timestamp than the new document, its an updateDocument() operation. Else it's nothing at all.

                          I bundle a configurable number of put and update operations into one transaction. Currently, that is 10 operations in one transactions. This is probably not such a good idea, as it makes the transaction last longer than necessary. Thinking about it, it's perhaps downright silly, because checking documents in the delivery set and looking them up in the database to compare them also takes time. So I guess it's better to use a dedicated transaction for each document update operation. What do you think?
                          3. You must open your containers using
                          DBXML_TRANSACTIONAL in order to get transactional
                          containers (or pass a transaction to the
                          openContainer() interface). This applies to readers
                          as well as writers. If you do not you will
                          eventually hang somewhere in a reader. This also
                          means your readers need to use transactions.
                          I'm going to examine this. I do not quite understand how using a transaction for a read operation protects against hanging. (On the other hand I've observed that using the dbxml shell with the -t switch to enable transactions gives me a chance to open a container that is undergoing an update bombardment.) I'll get back to you once I've made some progress here.
                          4. Your initial open of the environment needs to
                          handle recovery -- and it must be done in a
                          single-threaded manner (at application/server
                          startup). This means passing the DB_RECOVER flag.
                          There are rules on this. This is a major part of
                          what mod_db4 handles -- single-threaded recovery.
                          I cannot have the PHP web application processes run recovery unless I use mod_db4 because in order to do this safely, I'd need to guarantee that only one PHP process runs recovery while the others refrain from accessing the database. (Maybe I could come up with some hack to try and guarantee this, but it would be ugly and fragile. I trust you guys mod_db4 is the solution to this problem. I simply couldn't get it to work.)

                          So I don't have the web app run recovery. It's a read-only web app; it doesn't modify the data.

                          The initial open of the update process, however, does attempt recovery on encountering an exception. And it works.
                          . You will get better read/write concurrency and
                          virtually no deadlocks at all if you use MVCC -- open
                          your containers using DB_MULTIVERSION and use
                          DB_TXN_SNAPSHOT on your read transactions. Your
                          writer transactions just stay the same.
                          Same thing as for (3) - I'm going to examine this and get back to you.

                          Thank you very much for your help!

                          Michael Ludwig
                          • 10. Re: Deadlock handling for beginners
                            Gmfeinberg-Oracle
                            Michael,

                            >
                            My DB_CONFIG file:

                            set_cachesize 0 20480000 1
                            set_lk_max_lockers 1000
                            set_lk_max_locks 2000
                            set_lk_max_objects 2000
                            set_txn_timeout 1000
                            set_lock_timeout 1000
                            With deadlock detection going on I wouldn't use timeouts at all. They don't do what you think and in my experience create more confusion than anything.
                            set_lk_detect DB_LOCK_OLDEST
                            As you mentioned, you should use MINWRITE if you want to avoid choosing your updater.

                            >
                            The cache size is 20 MB for a ~ 150 MB database. I
                            think it does no harm to raise that to 100 MB, does
                            it?
                            Nope. Generally larger is better for performance if you have enough virtual and real memory.

                            >
                            Transaction timeout and lock timeout are 1 ms each.
                            Maybe too little? What should I set this to?
                            I mentioned this above -- I'd eliminate them.

                            >
                            As for db_deadlock, I can't see how I could do
                            without it. In my experience so far, it is not
                            sufficient to have set_lk_detect sitting around in
                            the config file. In the GSG XML TXN, Chapter 3,
                            Section "Configuring the Transaction Subsystem", I'm
                            reading the following:

                            "[...] transaction timeouts are checked only when BDB
                            XML examines its lock tables for blocked locks [...]
                            Therefore, a transaction's timeout can have expired,
                            but the application will not be notified until BDB
                            XML has a reason to examine its lock tables."
                            This is talking about timeouts not deadlocks. They are very different beasts which is why I avoid timeouts. If you have deadlock detection set up you will get notification of deadlock immediately upon attempting to lock a lock that will cause it.

                            >
                            I take this to explain the behaviour I'm seeing.
                            There is a write lock on the container, the web app
                            blocks, but no DB_LOCK_DEADLOCK is arriving. When
                            running db_deadlock, the lock tables are being
                            examined and DB_LOCK_DEADLOCK is returned to the
                            application. Am I on the wrong tracks here?
                            Seems like it. This should not be happening given your DB_CONFIG file. It's possible that the timeout settings are interfering. Get rid of them and see what happens. You really shouldn't need db_deadlock.

                            >
                            Well, it does take quite some time (substring
                            indexing going on) - but it's not one giant
                            transaction. It basically checks a delivery set of
                            about 50,000 XML documents that are ready for import
                            and compares these to the XML stored in the database,
                            based on ID and mtime. A document is looked up by ID.
                            If it's not there, it's a putDocument() operation. If
                            it's there and has an older timestamp than the new
                            document, its an updateDocument() operation. Else
                            it's nothing at all.

                            I bundle a configurable number of put and update
                            operations into one transaction. Currently, that is
                            10 operations in one transactions. This is probably
                            not such a good idea, as it makes the transaction
                            last longer than necessary. Thinking about it, it's
                            perhaps downright silly, because checking documents
                            in the delivery set and looking them up in the
                            database to compare them also takes time. So I guess
                            it's better to use a dedicated transaction for each
                            document update operation. What do you think?
                            A transaction for each set of N documents is reasonable -- have the transaction surround the read operations/comparisons as well. I.e.:
                            start txn
                            read/compare XML
                            if different, update
                            commit

                            >
                            3. You must open your containers using
                            DBXML_TRANSACTIONAL in order to get transactional
                            containers (or pass a transaction to the
                            openContainer() interface). This applies to
                            readers
                            as well as writers. If you do not you will
                            eventually hang somewhere in a reader. This also
                            means your readers need to use transactions.
                            I'm going to examine this. I do not quite understand
                            how using a transaction for a read operation protects
                            against hanging. (On the other hand I've observed
                            that using the dbxml shell with the -t switch to
                            enable transactions gives me a chance to open a
                            container that is undergoing an update bombardment.)
                            I'll get back to you once I've made some progress
                            here.
                            Trust me, you need the transactions on the reads as well as handling of the deadlock exception. I think I mentioned MVCC though. If you use that (I think you should) you will probably see no deadlocks ever. If you do choose to use MVCC (DB_MULTIVERSION, DB_TXN_SNAPSHOT on read txns) you will need to significantly increase the number of transactions configured (don't ask why, just do it :-).

                            >
                            I cannot have the PHP web application processes run
                            recovery unless I use mod_db4 because in order to do
                            this safely, I'd need to guarantee that only one PHP
                            process runs recovery while the others refrain from
                            accessing the database. (Maybe I could come up with
                            some hack to try and guarantee this, but it would be
                            ugly and fragile. I trust you guys mod_db4 is the
                            solution to this problem. I simply couldn't get it to
                            work.)

                            So I don't have the web app run recovery. It's a
                            read-only web app; it doesn't modify the data.

                            The initial open of the update process, however, does
                            attempt recovery on encountering an exception. And it
                            works.
                            This will work ONLY if you don't have any readers that have the environment open at the time. Can you guarantee this? If not you are playing with fire with respect to corrupting your environment.

                            A heads up -- I'm on vacation the rest of this week and won't be online much. Someone may pick up where I left off but no guarantees.

                            Regards,
                            George
                            • 11. Re: Deadlock handling for beginners
                              597600
                              [timeouts, deadlock detection and mode, cache size]

                              I'm going to drop timeouts, then. And set set_lk_detect DB_LOCK_MINWRITE. And be generous with cache.
                              If you have deadlock detection set up you
                              will get notification of deadlock immediately upon
                              attempting to lock a lock that will cause it.
                              [No DB_LOCK_DEADLOCK arriving] should not be happening given
                              your DB_CONFIG file. It's possible that the timeout
                              settings are interfering. Get rid of them and see
                              what happens. You really shouldn't need db_deadlock.
                              Changes made: I got rid of both the timeout configuration and db_deadlock. Hm. Not sure this works as intended in PHP. At least, the behaviour is not as you say. But it may be a bug on my side - I'll examine it further and report back.
                              A transaction for each set of N documents is
                              reasonable -- have the transaction surround the read
                              operations/comparisons as well. I.e.:
                              start txn
                              read/compare XML
                              if different, update
                              commit
                              Good, thanks for confirming!
                              Trust me, you need the transactions on the reads as
                              well as handling of the deadlock exception. I think
                              I mentioned MVCC though. If you use that (I think
                              you should) you will probably see no deadlocks ever.
                              If you do choose to use MVCC (DB_MULTIVERSION,
                              DB_TXN_SNAPSHOT on read txns) you will need to
                              significantly increase the number of transactions
                              configured (don't ask why, just do it :-).
                              This works marvelous! I get no deadlocks any more where I used to be blocked. Perfect! Thanks a lot!
                              So I don't have the web app run recovery. It's a
                              read-only web app; it doesn't modify the data.

                              The initial open of the update process, however, does
                              attempt recovery on encountering an exception. And it
                              works.
                              This will work ONLY if you don't have any readers
                              that have the environment open at the time. Can you
                              guarantee this? If not you are playing with fire
                              with respect to corrupting your environment.
                              I guess I just have to find a way to guarantee this. Could be easy: Lock the whole environment using a suitable chmod to exclude the web server, and have the web server fail gracefully. These restrictive file permissions need only be set during recovery, so that won't be long :-) I hope! The update, which takes longer, can then be done with open permissions so the web server can get to the data again. Does that sound halfway reasonable?
                              A heads up -- I'm on vacation the rest of this week
                              and won't be online much. Someone may pick up where
                              I left off but no guarantees.
                              We'll get by, dont't worry :-) Have a very nice holiday, and thanks for all your help!

                              Michael Ludwig
                              • 12. Re: Deadlock handling for beginners
                                597600
                                George Feinberg wrote:
                                Trust me, you need the transactions on the reads as
                                well as handling of the deadlock exception. I think
                                I mentioned MVCC though. If you use that (I think
                                you should) you will probably see no deadlocks ever.
                                If you do choose to use MVCC (DB_MULTIVERSION,
                                DB_TXN_SNAPSHOT on read txns) you will need to
                                significantly increase the number of transactions
                                configured (don't ask why, just do it :-).
                                For those are curious, or who use MVCC and encounter the error message "Unable to allocate memory for transaction detail" - you may have forgotten to significantly increase the number of transactions configured.

                                To quote from the BDB FAQ:

                                Berkeley DB occasionally returns the error: "Unable to allocate memory for transaction detail". What does that mean?

                                This error means the maximum number of active transactions configured for Berkeley DB has been reached. The Berkeley DB environment should be configured to support more active transactions. When all of the memory available in the database environment for transactions is in use, calls to being a transaction will fail until some active transactions complete. By default, the database environment is configured to support at least 20 active transactions.


                                Now you wonder why you get this error when db_stat -t shows there are no active transactions. In the same db_stat -t output, take a look at "Snapshot transactions" and "Maximum snapshot transactions". A lot of them? Alright, but they are not active.

                                Not quite. They are sort of active. Read the C++ API doc for DbEnv::set_tx_max():

                                Transactions that update multiversion databases are not freed until the last page version that the transaction created is flushed from cache. This means that applications using multi-version concurrency control may need a transaction for each page in cache, in the extreme case.

                                That seems to mean if you want a large cache and MVCC you have to configure a large number of transactions.

                                Read these statement by Ron Cohen:

                                The environment should also be configured for sufficient transactions using DB_ENV->set_tx_max. The maximum number of transactions needs to include all transactions executed concurrently by the application plus all cursors configured for snapshot isolation. Further, the transactions are retained until the last page they created is evicted from cache, so in the extreme case, an additional transaction may be needed for each page in the cache.

                                In-Memory Logging

                                With the addition of Snapshot isolation in Berkeley DB, the maximum number of active transactions must be increased in some cases by a large number. You need to include the snapshot transactions in the value you set.

                                You get some numbers to configure the maximum number of transactions like this:
                                db_stat -m | grep -i size
                                There is a "Total cache size" and then "Page size" for each database (or XML container). To be on the safe side, you'd take the cache size and divide it by the smallest page size. But maybe it's not really necessary to go up to the theoretical maximum.

                                If what I've put together here is inaccurate, I'd be glad to have it corrected by the experts. Thanks,

                                Michael Ludwig