1 Reply Latest reply on Jan 4, 2017 3:14 PM by userBDBDMS-Oracle

    Berkeley DB hangs after updating to newer glibc

    pkubat

      Hi everyone,

       

      I have recently hit an issue with Berkeley DB (specifically version 5.3.28 but my guess is that this is present even in the latest 6.x versions) after having updated to a newer version of glibc on a Fedora machine, resulting in hangs of applications using Berkeley DB:

       

      0 0x00007fd8520f82c1 in futex_wait (private=<optimized out>, expected=4294967295, futex_word=0x7fd842759c04) at ../sysdeps/unix/sysv/linux/futex-internal.h:61

      #1 futex_wait_simple (private=<optimized out>, expected=4294967295, futex_word=0x7fd842759c04) at ../sysdeps/nptl/futex-internal.h:135

      #2 __pthread_cond_destroy (cond=cond@entry=0x7fd842759be0) at pthread_cond_destroy.c:54

      #3 0x00007fd846d969cf in __db_pthread_mutex_destroy (env=env@entry=0x55ac4daae720, mutex=mutex@entry=340) at ../../src/mutex/mut_pthread.c:757

      #4 0x00007fd846d9610f in __db_tas_mutex_destroy (env=env@entry=0x55ac4daae720, mutex=mutex@entry=340) at ../../src/mutex/mut_tas.c:602

      #5 0x00007fd846e4dda8 in __mutex_free_int (env=0x55ac4daae720, locksys=locksys@entry=1, indxp=indxp@entry=0x7fd84036f550) at ../../src/mutex/mut_alloc.c:248

       

      This is happening due to a change in an internal representation of conditional variables that came with the glibc update (upstream commit). As far as I know, Berkeley DB does not destroy/reinitialize pthreads synchronization primitives after the last process using the environment exits (which seems to be undefined behavior according to POSIX). This results in the next process trying to use the old instance of a conditional variable using a newer definition with the internal representation changed.

      In my case for an unused conditional variable which caused the aforementioned hang:

       

      older representation:

      {__data = {__lock = 0, __futex = 0, __total_seq = 0,

        __wakeup_seq = 0, __woken_seq = 0, __mutex =

        0xffffffffffffffff, __nwaiters = 0, __broadcast_seq = 0},

       

      newer representation:

      {__data = {{__wseq = 0, __wseq32 = {__low = 0, __high = 0}}, {

        __g1_start = 0, __g1_start32 = {__low = 0, __high = 0}},

        __g_refs = {0, 0}, __g_size = {0, 0},

        __g1_orig_size = 4294967295, __wrefs = 4294967295,

        __g_signals = {0, 0}},

       

      Is this something that can be fixed on the Berkeley DB's side (e.g. by forcing the re-initialization of synchronization primitives when no other process is accessing the environment) or can this be worked around by any means other than just rebuilding the old DB environment?

       

      Thanks, Petr