0 Replies Latest reply: Aug 7, 2013 7:27 AM by jstanek RSS

    Locking bug in db4 & fix validation

    jstanek

      Hi,

      there is a bug in db-4.7.25 (already fixed in newer versions) where two processes enter critical section for the same mutex at the same time. We were able to pinpoint the cause of the bug:

      The problem is that __env_open() calls ENV_ENTER() before __mutex_open(). ENV_ENTER() calls __env_set_state() which can call __env_alloc() which takes a mutex, but because __mutex_open() has not been called, env->mutex_handle == NULL, which disables locking. Hence two processes in the critical section, hence bad stuff.

       

      There is also a quick patch for fixing that bug:

      diff -uNr db-4.7.25.orig/env/env_open.c db-4.7.25/env/env_open.c

      --- db-4.7.25.orig/env/env_open.c     2008-03-26 02:00:28.000000000 +1000

      +++ db-4.7.25/env/env_open.c     2013-08-05 10:35:58.471827969 +1000

      @@ -339,15 +339,7 @@

            * Initialize thread tracking and enter the API.

            */

           infop = env->reginfo;

      -     if ((ret =

      -         __env_thread_init(env, F_ISSET(infop, REGION_CREATE) ? 1 : 0)) != 0)

      -          goto err;

      -

      -     ENV_ENTER(env, ip);

       

      -     /*

      -      * Initialize the subsystems.

      -      */

      #ifdef HAVE_MUTEX_SUPPORT

           /*

            * Initialize the mutex regions first.  There's no ordering requirement,

      @@ -357,7 +349,18 @@

            */

           if ((ret = __mutex_open(env, create_ok)) != 0)

                goto err;

      +     /* The MUTEX_REQUIRED() in __env_alloc() expectes this to be set. */

      +     infop->mtx_alloc = ((REGENV *)infop->primary)->mtx_regenv;

      #endif

      +     if ((ret =

      +         __env_thread_init(env, F_ISSET(infop, REGION_CREATE) ? 1 : 0)) != 0)

      +          goto err;

      +

      +     ENV_ENTER(env, ip);

      +

      +     /*

      +      * Initialize the subsystems.

      +      */

           /*

            * We can now acquire/create mutexes: increment the region's reference

            * count.


      However, as the bug is somewhat tricky to reproduce, I would be glad if someone could confirm or disconfirm that the patch will actually fix that issue.

       

      Thanks in advance,

       

      Jan Staněk <jstanek@redhat.com>