This discussion is archived
0 Replies Latest reply: Jul 27, 2012 3:13 PM by 952529 RSS

BDB 4.2.52: Threads hang waiting for mutex

952529 Newbie
Currently Being Moderated
We have come across a condition where multiple application threads hang waiting for mutex locks. This is a telecom system which has been running fine for many months before we ran into this issue. The backtrace of all the threads (12 threads are stuck) shows the following:

11 threads are stuck here:

#0 0xb7cfe1d7 in ?? () from /lib/libc.so.6
#1 0x08195b24 in __os_sleep (dbenv=0x0, secs=0, usecs=10000) at ../../mvl-cge-3.1/src/dist/../os/os_sleep.c:84
#2 0x08195c3a in __os_yield (dbenv=0x0, usecs=10000) at ../../mvl-cge-3.1/src/dist/../os/os_spin.c:112
#3 0x081a045e in __db_tas_mutex_lock (dbenv=0x84a3dc8, mutexp=0x97b3ae90) at ../../mvl-cge-3.1/src/dist/../mutex/mut_tas.c:169
#4 0x0817d156 in __kw_tas_mutex_lock (a=0x84a3dc8, b=0x97b3ae90) at ../../mvl-cge-3.1/src/dist/../dbinc/mutex.h:882
#5 0x0817ee7d in __lock_get_internal (lt=0x84dfbb8, locker=2627936712, flags=0, obj=0x88c2bc4, lock_mode=DB_LOCK_WRITE, timeout=0, lock=0x88c2c6c)
at ../../mvl-cge-3.1/src/dist/../lock/lock.c:990
#6 0x0817e0da in __lock_get (dbenv=0x84a3dc8, locker=2627936712, flags=0, obj=0x88c2bc4, lock_mode=DB_LOCK_WRITE, lock=0x88c2c6c)
at ../../mvl-cge-3.1/src/dist/../lock/lock.c:586
#7 0x081eec04 in __db_lget (dbc=0x88c2b58, action=0, pgno=4, mode=DB_LOCK_WRITE, lkflags=0, lockp=0x88c2c6c) at ../../mvl-cge-3.1/src/dist/../db/db_meta.c:459
#8 0x0821906d in __ham_lock_bucket (dbc=0x88c2b58, mode=DB_LOCK_WRITE) at ../../mvl-cge-3.1/src/dist/../hash/hash_page.c:1659
#9 0x08218e08 in __ham_get_cpage (dbc=0x88c2b58, mode=DB_LOCK_WRITE) at ../../mvl-cge-3.1/src/dist/../hash/hash_page.c:1572
#10 0x082148a8 in __ham_item_next (dbc=0x88c2b58, mode=DB_LOCK_WRITE, pgnop=0x914fdfe8) at ../../mvl-cge-3.1/src/dist/../hash/hash_page.c:386
#11 0x0820edf6 in __ham_lookup (dbc=0x88c2b58, key=0x914fe500, sought=0, mode=DB_LOCK_WRITE, pgnop=0x914fdfe8) at ../../mvl-cge-3.1/src/dist/../hash/hash.c:1706
#12 0x0820bbc5 in __ham_c_get (dbc=0x88c2b58, key=0x914fe500, data=0x914fe050, flags=28, pgnop=0x914fdfe8) at ../../mvl-cge-3.1/src/dist/../hash/hash.c:478
#13 0x081e1d5b in __db_c_get (dbc_arg=0x88ab2b0, key=0x914fe500, data=0x914fe050, flags=28) at ../../mvl-cge-3.1/src/dist/../db/db_cam.c:643
#14 0x081d9b98 in __db_del (dbp=0x88aabf0, txn=0x8a07f20, key=0x914fe500, flags=0) at ../../mvl-cge-3.1/src/dist/../db/db_am.c:533
#15 0x081e992f in __db_del_pp (dbp=0x88aabf0, txn=0x8a07f20, key=0x914fe500, flags=0) at ../../mvl-cge-3.1/src/dist/../db/db_iface.c:444

1 thread is stuck here:

#0 0xb7cfe1d7 in ?? () from /lib/libc.so.6
#1 0x08195b24 in __os_sleep (dbenv=0x0, secs=0, usecs=25000) at ../../mvl-cge-3.1/src/dist/../os/os_sleep.c:84
#2 0x08195c3a in __os_yield (dbenv=0x0, usecs=25000) at ../../mvl-cge-3.1/src/dist/../os/os_spin.c:112
#3 0x081a045e in __db_tas_mutex_lock (dbenv=0x84a3dc8, mutexp=0x995e6460) at ../../mvl-cge-3.1/src/dist/../mutex/mut_tas.c:169
#4 0x08192c95 in __kw_tas_mutex_lock (a=0x84a3dc8, b=0x995e6460) at ../../mvl-cge-3.1/src/dist/../dbinc/mutex.h:882
#5 0x08192f84 in __memp_sync_int (dbenv=0x84a3dc8, dbmfp=0x0, trickle_max=0, op=DB_SYNC_CACHE, wrotep=0x0) at ../../mvl-cge-3.1/src/dist/../mp/mp_sync.c:247
#6 0x08192bc8 in __memp_sync (dbenv=0x84a3dc8, lsnp=0x0) at ../../mvl-cge-3.1/src/dist/../mp/mp_sync.c:99
#7 0x0819ab25 in __txn_checkpoint (dbenv=0x84a3dc8, kbytes=0, minutes=0, flags=0) at ../../mvl-cge-3.1/src/dist/../txn/txn.c:1387
#8 0x0819a853 in __txn_checkpoint_pp (dbenv=0x84a3dc8, kbytes=0, minutes=0, flags=0) at ../../mvl-cge-3.1/src/dist/../txn/txn.c:1286


The dump of mutexp at frame #3 of all threads shows tas is set to 0x1;

(gdb) p/x *mutexp
$2 = {tas = 0x1, locked = 0x0, mutex_set_wait = 0x0, mutex_set_nowait = 0x0, mutex_set_spin = 0x0, mutex_set_spins = 0x0, flags = 0xc}


I see a few discussion on similar conditions here but didnt see any solution proposed. Does this look like a BDB bug? Any helpful hints would be much appreciated.

Note: We didn't have access to db files when this occurred. Will try to get it if it happens next time.

Thanks,
Peter

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points