there is a bug in db-4.7.25 (already fixed in newer versions) where two processes enter critical section for the same mutex at the same time. We were able to pinpoint the cause of the bug:
The problem is that __env_open() calls ENV_ENTER() before __mutex_open(). ENV_ENTER() calls __env_set_state() which can call __env_alloc() which takes a mutex, but because __mutex_open() has not been called, env->mutex_handle == NULL, which disables locking. Hence two processes in the critical section, hence bad stuff.
There is also a quick patch for fixing that bug:
diff -uNr db-4.7.25.orig/env/env_open.c db-4.7.25/env/env_open.c
--- db-4.7.25.orig/env/env_open.c 2008-03-26 02:00:28.000000000 +1000
+++ db-4.7.25/env/env_open.c 2013-08-05 10:35:58.471827969 +1000
@@ -339,15 +339,7 @@
* Initialize thread tracking and enter the API.
infop = env->reginfo;
- if ((ret =
- __env_thread_init(env, F_ISSET(infop, REGION_CREATE) ? 1 : 0)) != 0)
- goto err;
- ENV_ENTER(env, ip);
- * Initialize the subsystems.
* Initialize the mutex regions first. There's no ordering requirement,
@@ -357,7 +349,18 @@
if ((ret = __mutex_open(env, create_ok)) != 0)
+ /* The MUTEX_REQUIRED() in __env_alloc() expectes this to be set. */
+ infop->mtx_alloc = ((REGENV *)infop->primary)->mtx_regenv;
+ if ((ret =
+ __env_thread_init(env, F_ISSET(infop, REGION_CREATE) ? 1 : 0)) != 0)
+ goto err;
+ ENV_ENTER(env, ip);
+ * Initialize the subsystems.
* We can now acquire/create mutexes: increment the region's reference
However, as the bug is somewhat tricky to reproduce, I would be glad if someone could confirm or disconfirm that the patch will actually fix that issue.
Thanks in advance,
Jan Staněk <firstname.lastname@example.org>