This discussion is archived
13 Replies Latest reply: May 29, 2013 1:55 AM by Oracle,CindyZeng RSS

Stress-testing CDB as used by RPM causes a crash

982876 Newbie
Currently Being Moderated
Hi,

I've run across a problem with RPM (as shipped with el6, appliest to Fedora and GIT tip as well), that's easily reproduced by repeatedly installing a package while repeatedly listing the list of installed packages. I was able to craft a minimal reproducer with rpmdb as small as a single package.

While tracking down the problem, I've attempted to mimic RPM's use of Berkeley DB, and the standalone reproducer exhibits the problem as well.

The crash reproducer code is available here:
http://v3.sk/~lkundrak/bdb-crash/

Run it as follows:
$ make test
cc -g -c -o reader.o -DREADERS test.c
cc -ldb -lpthread reader.o -o reader
cc -g -c -o writer.o -DWRITER test.c
cc -ldb -lpthread writer.o -o writer
sh test.sh
Fri Jan 4 09:30:35 CET 2013
Thread/process 29738/3078265648 failed: Thread died in Berkeley DB library
test.c:46: DB_RUNRECOVERY: Fatal error, run database recoveryWriter died
test.sh: line 14: 29722 Terminated ( while ./reader; do
:;
done; echo 'Reader died'; kill $ONE )
Fri Jan 4 09:30:36 CET 2013
$

I'm wondering if this could be a problem in Berkeley DB, or the way RPM uses CDB.
Any help will be much appreciated!

Thanks,
Lubo
  • 1. Re: Stress-testing CDB as used by RPM causes a crash
    656853 Explorer
    Currently Being Moderated
    Hi Lubo,

    Could you please let us know what BDB version are you using?
    Thread/process 29738/3078265648 failed: Thread died in Berkeley DB library
    For failchk, you will need to specify pid/tid in DB_ENV->set_thread_id() and check them in DB_ENV->set_isalive(). Please refer to http://docs.oracle.com/cd/E17076_01/html/api_reference/C/envset_thread_id.html and http://docs.oracle.com/cd/E17076_01/html/api_reference/C/envset_isalive.html.

    Some docs about CDS and failchk:
    - http://docs.oracle.com/cd/E17076_01/html/api_reference/C/envfailchk.html;
    - http://docs.oracle.com/cd/E17076_02/html/programmer_reference/cam_app.html;
    - http://www.oracle.com/technetwork/products/berkeleydb/tutorial-berkeleydb-cds-090013.html

    Hope it helps!

    Regards,
    Emily Fu, Oracle Berkeley DB
  • 2. Re: Stress-testing CDB as used by RPM causes a crash
    982876 Newbie
    Currently Being Moderated
    Hi Emily,

    Thank you for your response!

    >
    Could you please let us know what BDB version are you using?
    I'm now using 4.3.29 (enterprise linux package db4-4.3.29-10.el5_5.2).
    I've also reproduced the issue with libdb-5.3.21-4.fc19 that ships with fedora devel.
    Thread/process 29738/3078265648 failed: Thread died in Berkeley DB library
    For failchk, you will need to specify pid/tid in DB_ENV->set_thread_id() and check them in DB_ENV->set_isalive(). Please refer to http://docs.oracle.com/cd/E17076_01/html/api_reference/C/envset_thread_id.html and http://docs.oracle.com/cd/E17076_01/html/api_reference/C/envset_isalive.html.
    I was indeed missing the thread_id callback (and so is RPM).
    Adding it (http://v3.sk/~lkundrak/bdb-crash/0001-Use-thread_id.patch) does not seem to fix the issue though -- Berkeley DB seems to use the actual PID with bogus TID if the callback is not specified, which works reasonably well as the application is single-threaded and is_alive callback only checks the PID.
  • 3. Re: Stress-testing CDB as used by RPM causes a crash
    userBDBDMS Guru Moderator
    Currently Being Moderated
    Hi,

    What is the end goal that you are looking for? Are you looking to have you test pgm corrected so it works correctly, or are looking to figure what is possible incorrect in the RPM implementation? If for the RPM implementation, how is information going to be feed back to them.


    thanks
    mike
  • 4. Re: Stress-testing CDB as used by RPM causes a crash
    982876 Newbie
    Currently Being Moderated
    user651890 wrote:
    What is the end goal that you are looking for? Are you looking to have you test pgm corrected so it works correctly, or are looking to figure what is possible incorrect in the RPM implementation? If for the RPM implementation, how is information going to be feed back to them.
    Exactly.

    I guess I'll forward the fix to RPM in an usual way (GIT pull request).
  • 5. Re: Stress-testing CDB as used by RPM causes a crash
    userBDBDMS Guru Moderator
    Currently Being Moderated
    This thread is a little unclear at the moment. Have the suggestions provided so far addressed the RPM implementation issue? I think GIT pull request is the best way to go to pass changes over to the RPM folks.

    thanks
    mike
  • 6. Re: Stress-testing CDB as used by RPM causes a crash
    982876 Newbie
    Currently Being Moderated
    user651890 wrote:
    This thread is a little unclear at the moment. Have the suggestions provided so far addressed the RPM implementation issue?
    No.

    I've applied the fixes (both to RPM and to the minimal reproducer) and it did not help.

    The updated crash reproducer code is available here:
    http://v3.sk/~lkundrak/bdb-crash/

    Thank you!
  • 7. Re: Stress-testing CDB as used by RPM causes a crash
    userBDBDMS Guru Moderator
    Currently Being Moderated
    The release you are using is rather old -- if we test it and it works successfully in a more recent release, is it possible to upgrade the version of BDB being using in RPM?

    thanks
    mike
  • 8. Re: Stress-testing CDB as used by RPM causes a crash
    982876 Newbie
    Currently Being Moderated
    user651890 wrote:
    The release you are using is rather old -- if we test it and it works successfully in a more recent release, is it possible to upgrade the version of BDB being using in RPM?
    Problem occurs with latest release as well:

    bdb-crash♥ make test
    cc -g -I/home/lkundrak/src/db-5.3.21/build_unix -c -o reader.o -DREADERS test.c
    cc -Wl,-rpath=/home/lkundrak/src/db-5.3.21/build_unix/.libs -L/home/lkundrak/src/db-5.3.21/build_unix/.libs -ldb-5.3 reader.o -o reader
    cc -g -I/home/lkundrak/src/db-5.3.21/build_unix -c -o writer.o -DWRITER test.c
    cc -Wl,-rpath=/home/lkundrak/src/db-5.3.21/build_unix/.libs -L/home/lkundrak/src/db-5.3.21/build_unix/.libs -ldb-5.3 writer.o -o writer
    sh test.sh
    Wed Jan 30 19:34:41 CET 2013
    test.c:53: BDB0087 DB_RUNRECOVERY: Fatal error, run database recoveryWriter died
    test.sh: line 14: 28826 Terminated ( while ./reader; do
    :;
    done; echo 'Reader died'; kill $ONE )
    Wed Jan 30 19:35:29 CET 2013
    bdb-crash♥
  • 9. Re: Stress-testing CDB as used by RPM causes a crash
    userBDBDMS Guru Moderator
    Currently Being Moderated
    we will take another look, this is going to take a little bit of time but we will get back to it.


    thanks
    mike
  • 10. Re: Stress-testing CDB as used by RPM causes a crash
    982876 Newbie
    Currently Being Moderated
    I'm wondering if there are any news here?
  • 11. Re: Stress-testing CDB as used by RPM causes a crash
    userBDBDMS Guru Moderator
    Currently Being Moderated
    I had thought that this one was resolved. Our apologies. We will take another look.

    thanks
    mike
  • 12. Re: Stress-testing CDB as used by RPM causes a crash
    Oracle,CindyZeng Newbie
    Currently Being Moderated
    Hi,

    I have taken it over and will look at it.

    Regards,
    Cindy
  • 13. Re: Stress-testing CDB as used by RPM causes a crash
    Oracle,CindyZeng Newbie
    Currently Being Moderated
    Hi,
    The crash reproducer code is available here:
    http://v3.sk/~lkundrak/bdb-crash/

    Run it as follows:
    $ make test
    cc -g -c -o reader.o -DREADERS test.c
    cc -ldb -lpthread reader.o -o reader
    cc -g -c -o writer.o -DWRITER test.c
    cc -ldb -lpthread writer.o -o writer
    sh test.sh
    Fri Jan 4 09:30:35 CET 2013
    Thread/process 29738/3078265648 failed: Thread died in Berkeley DB library
    test.c:46: DB_RUNRECOVERY: Fatal error, run database recoveryWriter died
    test.sh: line 14: 29722 Terminated ( while ./reader; do
    :;
    done; echo 'Reader died'; kill $ONE )
    Fri Jan 4 09:30:36 CET 2013
    $
    I have tried to reproduce the issue by your program. But when I run the test.sh, it loops infinitely and doesn't stop until I terminate it manually.

    I put some print information in the code and here is the test output.
    $bash ./test.sh
    def WRITER
    reader() return -30988
    writer() return 0
    writer main() return 0
    Wed May 29 16:04:29 CST 2013
    def WRITER
    reader() return 0
    writer() return 0
    def READERS
    reader() return 0
    writer main() return 0
    reader main() return 0
    def WRITER
    reader() return 0
    def READERS
    writer() return 0
    reader() return 0
    reader main() return 0
    writer main() return 0
    def READERS
    reader() return 0
    def WRITER
    reader() return 0
    writer() return 0
    reader main() return 0
    def READERS
    reader() return 0
    writer main() return 0
    reader main() return 0
    ...
    ^C./test.sh: line 14: 23685 Terminated ( while ./writer; do
    :;
    done; echo 'Writer died' )
    ./test.sh: line 15: 23686 Terminated ( while ./reader; do
    :;
    done; echo 'Reader died'; kill $ONE )

    I don't see the error of thread died. Could you please verify your reproducer program?

    I am using db-5.3.21 and don't apply your patch when running your program.

    Regards,
    Cindy

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points