4 Replies Latest reply: Sep 22, 2014 9:26 AM by 2739213 RSS

    Segmentation fault when using snapshot isolation with Berkeley DB 6.1.19 and 5.1.29

    2739213

      Hello,

       

      I have been experimenting with snapshot isolation with Berkeley DB, but I find that it frequently triggers a segmentation fault when write transactions are in progress.  The following test program reliably demonstrates the problem in Linux using either 5.1.29 or 6.1.19. 

       

      https://anl.app.box.com/s/3qq2yiij2676cg3vkgik

       

      Compilation instructions are at the top of the file.  The test program creates a temporary directory in /tmp, opens a new environment with the DB_MULTIVERSION flag, and spawns 8 threads.  Each thread performs 100 transactional put operations using DB_TXN_SNAPSHOT.  The stack trace when the program crashes generally looks like this:

       

      Program received signal SIGSEGV, Segmentation fault.
      [Switching to Thread 0x7ffff7483700 (LWP 11871)]
      0x00007ffff795e190 in __memp_fput ()
         from /usr/lib/x86_64-linux-gnu/libdb-5.1.so
      (gdb) where
      #0  0x00007ffff795e190 in __memp_fput ()
         from /usr/lib/x86_64-linux-gnu/libdb-5.1.so
      #1  0x00007ffff7883c30 in __bam_get_root ()
         from /usr/lib/x86_64-linux-gnu/libdb-5.1.so
      #2  0x00007ffff7883dca in __bam_search ()
         from /usr/lib/x86_64-linux-gnu/libdb-5.1.so
      #3  0x00007ffff7870246 in ?? () from /usr/lib/x86_64-linux-gnu/libdb-5.1.so
      #4  0x00007ffff787468f in ?? () from /usr/lib/x86_64-linux-gnu/libdb-5.1.so
      #5  0x00007ffff79099f4 in __dbc_iput ()
         from /usr/lib/x86_64-linux-gnu/libdb-5.1.so
      #6  0x00007ffff7906c10 in __db_put ()
         from /usr/lib/x86_64-linux-gnu/libdb-5.1.so
      #7  0x00007ffff79191eb in __db_put_pp ()
         from /usr/lib/x86_64-linux-gnu/libdb-5.1.so
      #8  0x0000000000400f14 in thread_fn (foo=0x0)
          at ../tests/transactional-osd/bdb-snapshot-write.c:154
      #9  0x00007ffff7bc4182 in start_thread (arg=0x7ffff7483700)
          at pthread_create.c:312
      #10 0x00007ffff757f38d in clone ()
          at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      

       

      I understand that this test program, with 8 concurrent (and deliberately conflicting) writers, is not an ideal use case for snapshot isolation, but this can be triggered in other scenarios as well.

       

      You can disable snapshot isolation by toggling the value of the USE_SNAP #define near the top of the source, and the test program then runs fine without it.

       

      Can someone help me to identify the problem?

       

      many thanks,

      -Phil