13 Replies Latest reply: May 29, 2013 3:55 AM by Oracle,CindyZeng RSS

    Stress-testing CDB as used by RPM causes a crash

    982876
      Hi,

      I've run across a problem with RPM (as shipped with el6, appliest to Fedora and GIT tip as well), that's easily reproduced by repeatedly installing a package while repeatedly listing the list of installed packages. I was able to craft a minimal reproducer with rpmdb as small as a single package.

      While tracking down the problem, I've attempted to mimic RPM's use of Berkeley DB, and the standalone reproducer exhibits the problem as well.

      The crash reproducer code is available here:
      http://v3.sk/~lkundrak/bdb-crash/

      Run it as follows:
      $ make test
      cc -g -c -o reader.o -DREADERS test.c
      cc -ldb -lpthread reader.o -o reader
      cc -g -c -o writer.o -DWRITER test.c
      cc -ldb -lpthread writer.o -o writer
      sh test.sh
      Fri Jan 4 09:30:35 CET 2013
      Thread/process 29738/3078265648 failed: Thread died in Berkeley DB library
      test.c:46: DB_RUNRECOVERY: Fatal error, run database recoveryWriter died
      test.sh: line 14: 29722 Terminated ( while ./reader; do
      :;
      done; echo 'Reader died'; kill $ONE )
      Fri Jan 4 09:30:36 CET 2013
      $

      I'm wondering if this could be a problem in Berkeley DB, or the way RPM uses CDB.
      Any help will be much appreciated!

      Thanks,
      Lubo
        • 1. Re: Stress-testing CDB as used by RPM causes a crash
          656853
          Hi Lubo,

          Could you please let us know what BDB version are you using?
          Thread/process 29738/3078265648 failed: Thread died in Berkeley DB library
          For failchk, you will need to specify pid/tid in DB_ENV->set_thread_id() and check them in DB_ENV->set_isalive(). Please refer to http://docs.oracle.com/cd/E17076_01/html/api_reference/C/envset_thread_id.html and http://docs.oracle.com/cd/E17076_01/html/api_reference/C/envset_isalive.html.

          Some docs about CDS and failchk:
          - http://docs.oracle.com/cd/E17076_01/html/api_reference/C/envfailchk.html;
          - http://docs.oracle.com/cd/E17076_02/html/programmer_reference/cam_app.html;
          - http://www.oracle.com/technetwork/products/berkeleydb/tutorial-berkeleydb-cds-090013.html

          Hope it helps!

          Regards,
          Emily Fu, Oracle Berkeley DB
          • 2. Re: Stress-testing CDB as used by RPM causes a crash
            982876
            Hi Emily,

            Thank you for your response!

            >
            Could you please let us know what BDB version are you using?
            I'm now using 4.3.29 (enterprise linux package db4-4.3.29-10.el5_5.2).
            I've also reproduced the issue with libdb-5.3.21-4.fc19 that ships with fedora devel.
            Thread/process 29738/3078265648 failed: Thread died in Berkeley DB library
            For failchk, you will need to specify pid/tid in DB_ENV->set_thread_id() and check them in DB_ENV->set_isalive(). Please refer to http://docs.oracle.com/cd/E17076_01/html/api_reference/C/envset_thread_id.html and http://docs.oracle.com/cd/E17076_01/html/api_reference/C/envset_isalive.html.
            I was indeed missing the thread_id callback (and so is RPM).
            Adding it (http://v3.sk/~lkundrak/bdb-crash/0001-Use-thread_id.patch) does not seem to fix the issue though -- Berkeley DB seems to use the actual PID with bogus TID if the callback is not specified, which works reasonably well as the application is single-threaded and is_alive callback only checks the PID.
            • 3. Re: Stress-testing CDB as used by RPM causes a crash
              userBDBDMS-Oracle
              Hi,

              What is the end goal that you are looking for? Are you looking to have you test pgm corrected so it works correctly, or are looking to figure what is possible incorrect in the RPM implementation? If for the RPM implementation, how is information going to be feed back to them.


              thanks
              mike
              • 4. Re: Stress-testing CDB as used by RPM causes a crash
                982876
                user651890 wrote:
                What is the end goal that you are looking for? Are you looking to have you test pgm corrected so it works correctly, or are looking to figure what is possible incorrect in the RPM implementation? If for the RPM implementation, how is information going to be feed back to them.
                Exactly.

                I guess I'll forward the fix to RPM in an usual way (GIT pull request).
                • 5. Re: Stress-testing CDB as used by RPM causes a crash
                  userBDBDMS-Oracle
                  This thread is a little unclear at the moment. Have the suggestions provided so far addressed the RPM implementation issue? I think GIT pull request is the best way to go to pass changes over to the RPM folks.

                  thanks
                  mike
                  • 6. Re: Stress-testing CDB as used by RPM causes a crash
                    982876
                    user651890 wrote:
                    This thread is a little unclear at the moment. Have the suggestions provided so far addressed the RPM implementation issue?
                    No.

                    I've applied the fixes (both to RPM and to the minimal reproducer) and it did not help.

                    The updated crash reproducer code is available here:
                    http://v3.sk/~lkundrak/bdb-crash/

                    Thank you!
                    • 7. Re: Stress-testing CDB as used by RPM causes a crash
                      userBDBDMS-Oracle
                      The release you are using is rather old -- if we test it and it works successfully in a more recent release, is it possible to upgrade the version of BDB being using in RPM?

                      thanks
                      mike
                      • 8. Re: Stress-testing CDB as used by RPM causes a crash
                        982876
                        user651890 wrote:
                        The release you are using is rather old -- if we test it and it works successfully in a more recent release, is it possible to upgrade the version of BDB being using in RPM?
                        Problem occurs with latest release as well:

                        bdb-crash♥ make test
                        cc -g -I/home/lkundrak/src/db-5.3.21/build_unix -c -o reader.o -DREADERS test.c
                        cc -Wl,-rpath=/home/lkundrak/src/db-5.3.21/build_unix/.libs -L/home/lkundrak/src/db-5.3.21/build_unix/.libs -ldb-5.3 reader.o -o reader
                        cc -g -I/home/lkundrak/src/db-5.3.21/build_unix -c -o writer.o -DWRITER test.c
                        cc -Wl,-rpath=/home/lkundrak/src/db-5.3.21/build_unix/.libs -L/home/lkundrak/src/db-5.3.21/build_unix/.libs -ldb-5.3 writer.o -o writer
                        sh test.sh
                        Wed Jan 30 19:34:41 CET 2013
                        test.c:53: BDB0087 DB_RUNRECOVERY: Fatal error, run database recoveryWriter died
                        test.sh: line 14: 28826 Terminated ( while ./reader; do
                        :;
                        done; echo 'Reader died'; kill $ONE )
                        Wed Jan 30 19:35:29 CET 2013
                        bdb-crash♥
                        • 9. Re: Stress-testing CDB as used by RPM causes a crash
                          userBDBDMS-Oracle
                          we will take another look, this is going to take a little bit of time but we will get back to it.


                          thanks
                          mike
                          • 10. Re: Stress-testing CDB as used by RPM causes a crash
                            982876
                            I'm wondering if there are any news here?
                            • 11. Re: Stress-testing CDB as used by RPM causes a crash
                              userBDBDMS-Oracle
                              I had thought that this one was resolved. Our apologies. We will take another look.

                              thanks
                              mike
                              • 12. Re: Stress-testing CDB as used by RPM causes a crash
                                Oracle,CindyZeng
                                Hi,

                                I have taken it over and will look at it.

                                Regards,
                                Cindy
                                • 13. Re: Stress-testing CDB as used by RPM causes a crash
                                  Oracle,CindyZeng
                                  Hi,
                                  The crash reproducer code is available here:
                                  http://v3.sk/~lkundrak/bdb-crash/

                                  Run it as follows:
                                  $ make test
                                  cc -g -c -o reader.o -DREADERS test.c
                                  cc -ldb -lpthread reader.o -o reader
                                  cc -g -c -o writer.o -DWRITER test.c
                                  cc -ldb -lpthread writer.o -o writer
                                  sh test.sh
                                  Fri Jan 4 09:30:35 CET 2013
                                  Thread/process 29738/3078265648 failed: Thread died in Berkeley DB library
                                  test.c:46: DB_RUNRECOVERY: Fatal error, run database recoveryWriter died
                                  test.sh: line 14: 29722 Terminated ( while ./reader; do
                                  :;
                                  done; echo 'Reader died'; kill $ONE )
                                  Fri Jan 4 09:30:36 CET 2013
                                  $
                                  I have tried to reproduce the issue by your program. But when I run the test.sh, it loops infinitely and doesn't stop until I terminate it manually.

                                  I put some print information in the code and here is the test output.
                                  $bash ./test.sh
                                  def WRITER
                                  reader() return -30988
                                  writer() return 0
                                  writer main() return 0
                                  Wed May 29 16:04:29 CST 2013
                                  def WRITER
                                  reader() return 0
                                  writer() return 0
                                  def READERS
                                  reader() return 0
                                  writer main() return 0
                                  reader main() return 0
                                  def WRITER
                                  reader() return 0
                                  def READERS
                                  writer() return 0
                                  reader() return 0
                                  reader main() return 0
                                  writer main() return 0
                                  def READERS
                                  reader() return 0
                                  def WRITER
                                  reader() return 0
                                  writer() return 0
                                  reader main() return 0
                                  def READERS
                                  reader() return 0
                                  writer main() return 0
                                  reader main() return 0
                                  ...
                                  ^C./test.sh: line 14: 23685 Terminated ( while ./writer; do
                                  :;
                                  done; echo 'Writer died' )
                                  ./test.sh: line 15: 23686 Terminated ( while ./reader; do
                                  :;
                                  done; echo 'Reader died'; kill $ONE )

                                  I don't see the error of thread died. Could you please verify your reproducer program?

                                  I am using db-5.3.21 and don't apply your patch when running your program.

                                  Regards,
                                  Cindy