A couple of more observations: this does not seem to be caused by anything specific to RHEL. Happens on openSUSE as well.
Looks like there are two ways it can fail:
BDB0087 DB_RUNRECOVERY: Fatal error, run database recoveryReader died
and BDB0113 Thread/process 4388/4388 failed: BDB1507 Thread died in Berkeley DB library
The second one (BDB0113) is easier to hit when running under strace(2), possibly due to a slowdown. There's a quite obvious race in src/env/env_failchk.c:__env_in_api():
When the check is running as another process is starting and being added to the table, its ip->dbth_state state changes while the body of the SH_TAILQ_FOREACH(ip...) loop is running having a different value in the if() conditionals, with a chance that none of those will match.
I'm nor sure how to fix that though. A big case() instead of the conditionals would cause it to be evaluated only once, but the same issue affects other fields (tid, pid) as well; the other process can reuse the slot, changing those fields to their own identification mid-air.
Message was edited by: 982876: The race affects other fields too