9 Replies Latest reply: Mar 4, 2014 12:27 AM by ds283 RSS

Berkeley DB XML crash with multiple readers (dbxml-2.5.16 and db-4.8.26)

ds283 Newbie
Currently Being Moderated

I am using Berkeley DB XML (v. 2.5.16 and the bundled underlying Berkeley DB 4.8.26, which I suppose is now fairly old) to manage an XML database which is read by a large number (order 100) of independent worker processes communicating via MPI. These processes only read from the database; a single master process performs writes.

 

Everything works as expected with one or two worker processes. But with three or more, I am experiencing database panics with the error

 

pthread lock failed: Invalid argument

PANIC: Invalid argument

 

From searching with Google I can see that issues arising from incorrectly setting up the environment to support concurrency are are fairly common. But I have not been able to find a match for this problem, and as far as I can make out from the documentation I am using the correct combination of flags; I use DB_REGISTER and DB_RECOVER to handle the fact that multiple processes join the environment independently. Each process uses on a single environment handle, and joins using

 

DB_ENV* env;

db_env_create(&env, 0);

u_int32_t env_flags = DB_INIT_LOG | DB_INIT_MPOOL | DB_REGISTER | DB_RECOVER | DB_INIT_TXN | DB_CREATE;

env->open(env, path to environment, env_flags, 0);

 

Although the environment requests DB_INIT_TXN, I am not currently using transactions. There is an intention to implement this later, but my understanding was that concurrent reads would function correctly without the full transaction infrastructure.

 

All workers seem to join the environment correctly, but then fail when an attempt is made to read from the database. They will all try to access the same XML document in the same container (because it gives them instructions about what work to perform). However, the worker processes open each container setting the read-only flag:

 

DbXml::XmlContainerConfig models_config;

models_config.setReadOnly(true);

DbXml::XmlContainer models = this->mgr->openContainer(path to container, models_config);

 

Following the database panic, the stack trace is

 

[lcd-ds283:27730] [ 0] 2   libsystem_platform.dylib            0x00007fff8eed35aa _sigtramp + 26

[lcd-ds283:27730] [ 1] 3   ???                                 0x0000000000000000 0x0 + 0

[lcd-ds283:27730] [ 2] 4   libsystem_c.dylib                   0x00007fff87890bba abort + 125

[lcd-ds283:27730] [ 3] 5   libc++abi.dylib                     0x00007fff83aff141 __cxa_bad_cast + 0

[lcd-ds283:27730] [ 4] 6   libc++abi.dylib                     0x00007fff83b24aa4 _ZL25default_terminate_handlerv + 240

[lcd-ds283:27730] [ 5] 7   libobjc.A.dylib                     0x00007fff89ac0322 _ZL15_objc_terminatev + 124

[lcd-ds283:27730] [ 6] 8   libc++abi.dylib                     0x00007fff83b223e1 _ZSt11__terminatePFvvE + 8

[lcd-ds283:27730] [ 7] 9   libc++abi.dylib                     0x00007fff83b21e6b _ZN10__cxxabiv1L22exception_cleanup_funcE19_Unwind_Reason_CodeP17_Unwind_Exception + 0

[lcd-ds283:27730] [ 8] 10  libdbxml-2.5.dylib                  0x000000010f30e4de _ZN5DbXml18DictionaryDatabaseC2EP8__db_envPNS_11TransactionERKNSt3__112basic_stringIcNS5_11char_traitsIcEENS5_9allocatorIcEEEERKNS_15ContainerConfigEb + 1038

[lcd-ds283:27730] [ 9] 11  libdbxml-2.5.dylib                  0x000000010f2f348c _ZN5DbXml9Container12openInternalEPNS_11TransactionERKNS_15ContainerConfigEb + 1068

[lcd-ds283:27730] [10] 12  libdbxml-2.5.dylib                  0x000000010f2f2dec _ZN5DbXml9ContainerC2ERNS_7ManagerERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEEPNS_11TransactionERKNS_15ContainerConfigEb + 492

[lcd-ds283:27730] [11] 13  libdbxml-2.5.dylib                  0x000000010f32a0af _ZN5DbXml7Manager14ContainerStore13findContainerERS0_RKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEEPNS_11TransactionERKNS_15ContainerConfigEb + 175

[lcd-ds283:27730] [12] 14  libdbxml-2.5.dylib                  0x000000010f329f75 _ZN5DbXml7Manager13openContainerERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEPNS_11TransactionERKNS_15ContainerConfigEb + 101

[lcd-ds283:27730] [13] 15  libdbxml-2.5.dylib                  0x000000010f34cd46 _ZN5DbXml10XmlManager13openContainerERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERKNS_18XmlContainerConfigE + 102

 

Can I ask if it's clear to anyone what I am doing wrong?

  • 1. Re: Berkeley DB XML crash with multiple readers (dbxml-2.5.16 and db-4.8.26)
    LaurenFoutz Journeyer
    Currently Being Moderated

    Since the underlying environment is enabled for transactions, you really need to also enable the container for transactions.  If the underlying environment is not enabled for transactions, then accessing with multiple read-only threads should work.

     

    Also, you need to include the flag "DB_INIT_LOCK" among your environment open flags.

     

    Lauren Foutz

  • 2. Re: Berkeley DB XML crash with multiple readers (dbxml-2.5.16 and db-4.8.26)
    ds283 Newbie
    Currently Being Moderated

    Thanks for the very rapid response.

     

    I had inadvertently edited DB_INIT_LOCK out of my quoted code, so I hope that was not the problem. I found that disabling transactions by removing DB_INIT_TXN also required removal of DB_RECOVER and DB_REGISTER, but did not fix the crash.

     

    Enabling transactions on the containers meant that I had to implement transactions for most of the other database operations, because I write documents to it using XmlDocument::putDocumentAsEventWriter() which requires an explicit transaction. The positive outcome is that this appears to have fixed the problem – at least, I can now run as many worker processes as I can fit on my laptop. It will take a bit longer to check whether everything scales OK to the full cluster, but at this point there seems no reason to think that it won't.

     

    Many thanks for your help, which has saved me a great deal of time in debugging.

     

    David

  • 3. Re: Berkeley DB XML crash with multiple readers (dbxml-2.5.16 and db-4.8.26)
    ds283 Newbie
    Currently Being Moderated

    Unfortunately it seems I was too hasty, and my success yesterday was simply a lucky fluke. A job will sometimes complete successfully, but mostly I am back to the problem of database panics.

     

    The database environment and XmlManager object are initialized using

     

            u_int32_t env_flags = DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_REGISTER | DB_INIT_TXN | DB_CREATE;

            if(recovery) env_flags = env_flags | DB_RECOVER;          // Only run by master process

            env->open(env, env_path.string().c_str(), env_flags, 0);

     

            mgr = new DbXml::XmlManager(env, DbXml::DBXML_ADOPT_DBENV | DbXml::DBXML_ALLOW_EXTERNAL_ACCESS);


    after which each worker process opens a set of containers (there is some logic in between, but nothing which touches the database)


            // open handles to database containers

            DbXml::XmlContainerConfig int_config;

            int_config.setTransactional(true);

            int_config.setReadOnly(true);

            std::cerr << "Pre call to openContainer for integrations" << std::endl;

            DbXml::XmlContainer integrations = this->mgr->openContainer(this->integrations_path.string().c_str(), int_config);

            std::cerr << "Post call to openContainer for integrations" << std::endl;

     

            DbXml::XmlContainerConfig models_config;

            models_config.setTransactional(true);

            models_config.setReadOnly(true);

            std::cerr << "Pre call to openContainer for models" << std::endl;

            DbXml::XmlContainer models = this->mgr->openContainer(this->packages_path.string().c_str(), models_config);

            std::cerr << "Post call to openContainer for models" << std::endl;

     

    The bold statements which emit to std::cerr are there to track which API calls successfully complete. With 7 workers, in cases where the job fails, I get

     

         Pre call to openContainer for integrations        Master task opening 'integrations' container

         Post call to openContainer for integrations       Master task successful, call returns

         Pre call to openContainer for models              Master task opening 'models' container

         Post call to openContainer for models             Master task successful, call returns

                                                           DbXml::XmlContainer objects belonging to Master process go out of scope

         Pre call to openContainer for integrations        Worker process attempts to open 'integrations' container      

         Pre call to openContainer for integrations        ...

         Pre call to openContainer for integrations

         Pre call to openContainer for integrations

         Pre call to openContainer for integrations

         Pre call to openContainer for integrations

         Pre call to openContainer for integrations        ...

                                                           in this example, no worker process sees openContainer() return

         pthread lock failed: Invalid argument             failure and database enters panic

         PANIC: Invalid argument

         PANIC: fatal region error detected; run recovery

     

    The precise sequence of events seen by the worker processes can vary. In the case above, the crash occurred before any XmlManager::openContainer() call had returned. In other cases, some of the workers may see an XmlManager::openContainer() return, but the stack backtrace shows that the crash always occurs in a subsequent call.

     

    David

  • 4. Re: Berkeley DB XML crash with multiple readers (dbxml-2.5.16 and db-4.8.26)
    LaurenFoutz Journeyer
    Currently Being Moderated

    DB_REGISTER should only be used with DB_RECOVER.  What DB_REGISTER does is only run recovery when the environment actually needs it.  If you use DB_RECOVER without DB_REGISTER then recovery is always run on open, whether it is needed or not.  So, it is safe to use DB_REGISTER and DB_RECOVER every time you open the environment.

     

    As for your crash.  Are all the libraries and processes that access the database built with pthreads enabled?  I have seen errors just like the one you are experiencing when BDB was built with pthreads, but the application was built without it.

     

    Lauren Foutz

  • 5. Re: Berkeley DB XML crash with multiple readers (dbxml-2.5.16 and db-4.8.26)
    ds283 Newbie
    Currently Being Moderated

    Dear Lauren,

     

    Thanks very much for your help, which is much appreciated.

     

    I haven't been building the application with pthread support enabled. The Berkeley DB XML library is build using the buildall.sh script with no custom settings. I've now tried building the application with -pthread (I assume this is what you had in mind), but it does not fix the crash.

     

    I can reproduce the crash behaviour using this highly stripped-down version

     

    #include <iostream>

    #include <vector>

     

    #include "dbxml/db.h"

    #include "dbxml/dbxml/DbXml.hpp"

     

    #include "boost/mpi.hpp"

     

    int main(int argc, char* argv[])

    {

      boost::mpi::environment mpi_env;

      boost::mpi::communicator mpi_world;

     

      if(mpi_world.rank() == 0)

      {

          DB_ENV* env;

          ::db_env_create(&env, 0);

     

          u_int32_t env_flags = DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_REGISTER | DB_RECOVER | DB_INIT_TXN | DB_CREATE;

          env->open(env, "test", env_flags, 0);

     

          // set up XmlManager object

          DbXml::XmlManager* mgr = new DbXml::XmlManager(env, DbXml::DBXML_ADOPT_DBENV | DbXml::DBXML_ALLOW_EXTERNAL_ACCESS);

     

          // create containers - these will be used by the workers

          DbXml::XmlContainerConfig pkg_config;

          DbXml::XmlContainerConfig int_config;

          pkg_config.setTransactional(true);

          int_config.setTransactional(true);

     

          DbXml::XmlContainer packages = mgr->createContainer("packages.dbxml", pkg_config);

          DbXml::XmlContainer integrations = mgr->createContainer("integrations.dbxml", int_config);

     

          std::vector<boost::mpi::request> reqs(mpi_world.size()-1)

          for(unsigned int i = 1; i < mpi_world.size(); i++)

          {

               reqs[i] = mpi_world.isend(i, 0); // instruct workers to open the environment

          }

     

          // wait for all messages to be received

          boost::mpi::wait_all(reqs.begin(), reqs.end());

     

          // wait for workers to advise successful termination

          unsigned int outstanding_workers = mpi_world.size()-1;

          while(outstanding_workers > 0)

          {

               boost::mpi::status stat = mpi_world.probe();

     

               switch(stat.tag())

               {

                    case 1:    

                    {

                         mpi_world.recv(stat.source(), 1);         

                         outstanding_workers--;

                         break;

                    }

               }

          }

     

          delete mgr; // exit, closing database and environment

       }

      else

      {

          mpi_world.recv(0, 0);

     

          DB_ENV* env;

          ::db_env_create(&env, 0);

     

          u_int32_t env_flags = DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_REGISTER | DB_RECOVER | DB_INIT_TXN | DB_CREATE;

          env->open(env, "test", env_flags, 0);

     

          // set up XmlManager object

          DbXml::XmlManager* mgr = new DbXml::XmlManager(env, DbXml::DBXML_ADOPT_DBENV | DbXml::DBXML_ALLOW_EXTERNAL_ACCESS);

     

          // open containers which were set up by the master

          DbXml::XmlContainerConfig pkg_config;

          DbXml::XmlContainerConfig int_config;

          pkg_config.setTransactional(true);

          pkg_config.setReadOnly(true);

          int_config.setTransactional(true);

          int_config.setReadOnly(true);

     

          DbXml::XmlContainer packages = mgr->openContainer("packages.dbxml", pkg_config);

          DbXml::XmlContainer integrations = mgr->openContainer("integrations.dbxml", int_config);

     

          mpi_world.isend(0, 1);

     

          delete mgr; // exit, closing database and environment

      }

     

      return(EXIT_SUCCESS);

    }

     

    and the compiler command line

     

    clang++ -I/usr/local/include -I/opt/local/include -I/opt/local/include/openmpi -L/opt/local/lib -Lpath-to-dbxml/dbxml-2.5.16/install/lib -std=c++11 -stdlib=libc++ -lboost_system-mt -lboost_mpi-mt -lboost_serialization-mt -ldb_cxx -ldbxml -lmpi -lmpi_cxx -pthread -Ofast -o dbxml-test dbxml-test.cpp

     

    Then openmpiexec -n 8 dbxml-test reproduces the database panic.

  • 6. Re: Berkeley DB XML crash with multiple readers (dbxml-2.5.16 and db-4.8.26)
    LaurenFoutz Journeyer
    Currently Being Moderated

    Thanks for posting code that reproduces the bug.  I will use it to look into the bug and get back to you soon.

     

    Lauren Foutz

  • 7. Re: Berkeley DB XML crash with multiple readers (dbxml-2.5.16 and db-4.8.26)
    LaurenFoutz Journeyer
    Currently Being Moderated

    I took your program and built it, and I get the following error in the writer after it creates the containers and begins to wait for the readers to end:

    {code}

    [adc2190095:01232] *** Process received signal ***

    [adc2190095:01232] Signal: Segmentation fault (11)

    [adc2190095:01232] Signal code: Invalid permissions (2)

    [adc2190095:01232] Failing at address: 0x49fafc

    [adc2190095:01232] [ 0] /lib/tls/i686/nosegneg/libpthread.so.0 [0x2f4e30]

    [adc2190095:01232] [ 1] dbxml-test [0x8051973]

    [adc2190095:01232] [ 2] dbxml-test [0x8051957]

    [adc2190095:01232] [ 3] dbxml-test [0x8051a37]

    [adc2190095:01232] [ 4] dbxml-test [0x8051c80]

    [adc2190095:01232] [ 5] dbxml-test [0x8051a74]

    [adc2190095:01232] [ 6] dbxml-test [0x80511fd]

    [adc2190095:01232] [ 7] /lib/tls/i686/nosegneg/libc.so.6(__libc_start_main+0xd3) [0xa9ae93]

    [adc2190095:01232] [ 8] dbxml-test(__gxx_personality_v0+0x91) [0x8050d49]

    [adc2190095:01232] *** End of error message ***

    --------------------------------------------------------------------------

    mpirun noticed that process rank 0 with PID 1232 on node adc2190095 exited on signal 11 (Segmentation fault).

    --------------------------------------------------------------------------

    {/code}

     

    Though strangely the reader processes all executed and exited without a problem, despite the master crash.

     

    If I remove all the DbXml code and run the program below:

    [code]

    #include <iostream>

     

    #include <vector>

     

     

    #include "dbxml/db.h"

     

    #include "dbxml/dbxml/DbXml.hpp"

     

     

    #include "boost/mpi.hpp"

     

     

    int main(int argc, char* argv[])

    {

        boost::mpi::environment mpi_env;

        boost::mpi::communicator mpi_world;

     

        if(mpi_world.rank() == 0)

        {

         

     

            std::vector<boost::mpi::request> reqs(mpi_world.size()-1);

            for(unsigned int i = 1; i < mpi_world.size(); i++)

            {

          reqs[i] = mpi_world.isend(i, 0); //instruct workers to open the environment

            }

     

           // wait for all messages to be received

            boost::mpi::wait_all(reqs.begin(), reqs.end());

     

            //wait for workers to advise successful termination

            unsigned int outstanding_workers = mpi_world.size()-1;

            while(outstanding_workers > 0)

            {

                boost::mpi::status stat = mpi_world.probe();

                switch(stat.tag())

                {

                    case 1:

                    {

                        mpi_world.recv(stat.source(), 1);

                        outstanding_workers--;

                        break;

                    }

                }

            }

     

     

            }

        else

        {

            mpi_world.recv(0, 0);

     

     

            mpi_world.isend(0, 1);

     

     

        }

     

        return(EXIT_SUCCESS);

    }

    [/code]

     

    I get the following crash in the writer.

     

    [code]

    *** glibc detected *** double free or corruption (!prev): 0x09b9baf0 ***

    [adc2190095:01303] *** Process received signal ***

    [adc2190095:01303] Signal: Aborted (6)

    [adc2190095:01303] Signal code:  (-6)

    [adc2190095:01303] [ 0] /lib/tls/i686/nosegneg/libpthread.so.0 [0x287e30]

    [adc2190095:01303] [ 1] /lib/tls/i686/nosegneg/libc.so.6(abort+0xe9) [0x3dd6e9]

    [adc2190095:01303] [ 2] /lib/tls/i686/nosegneg/libc.so.6 [0x4109ba]

    [adc2190095:01303] [ 3] /lib/tls/i686/nosegneg/libc.so.6 [0x41745f]

    [adc2190095:01303] [ 4] /lib/tls/i686/nosegneg/libc.so.6(__libc_free+0x8a) [0x41

    78ea]

    [adc2190095:01303] [ 5] /usr/lib/libstdc++.so.6(_ZdlPv+0x21) [0x388081]

    [adc2190095:01303] [ 6] justmpi [0x8052edb]

    [adc2190095:01303] [ 7] justmpi [0x8052591]

    [adc2190095:01303] [ 8] justmpi [0x80519fb]

    [adc2190095:01303] [ 9] justmpi [0x8051062]

    [adc2190095:01303] [10] justmpi(__gxx_personality_v0+0x3e6) [0x8050c5e]

    [adc2190095:01303] [11] /lib/tls/i686/nosegneg/libc.so.6(__libc_start_main+0xd3)

    [0x3c8e93]

    [adc2190095:01303] [12] justmpi(__gxx_personality_v0+0x81) [0x80508f9]

    [adc2190095:01303] *** End of error message ***

    --------------------------------------------------------------------------

    mpirun noticed that process rank 0 with PID 1303 on node adc2190095 exited on signal 6 (Aborted).

    --------------------------------------------------------------------------

    [/code]

     

    Is it possible that the root problem to this is in the MPI code or usage?  Because if the writer process crashes while holding an active transaction or open database handles, it could leave the environment in an inconsistent state that would result in the readers throwing a PANIC error when they notice the inconsistent environment.

     

    Lauren

  • 8. Re: Berkeley DB XML crash with multiple readers (dbxml-2.5.16 and db-4.8.26)
    ds283 Newbie
    Currently Being Moderated

    Is it possible that the root problem to this is in the MPI code or usage?  Because if the writer process crashes while holding an active transaction or open database handles, it could leave the environment in an inconsistent state that would result in the readers throwing a PANIC error when they notice the inconsistent environment.

     

    Thanks for looking into this.

     

    It looks like there was a small typo in the code I quoted, and I think it was this which caused the segmentation fault or memory corruption. Although I checked a few times that the code snippet produced expected results before posting it, I must have been unlucky that it just happened not to cause a segfault on those attempts.

     

    This is a corrected version:

     

    #include <iostream>

    #include <vector>

     

    #include "dbxml/db.h"

    #include "dbxml/dbxml/DbXml.hpp"

     

    #include "boost/mpi.hpp"

     

    static std::string envname = std::string("test");

    static std::string pkgname = std::string("packages.dbxml");

    static std::string intname = std::string("integrations.dbxml");

     

    int main(int argc, char *argv[])

      {

        boost::mpi::environment  mpi_env;

        boost::mpi::communicator mpi_world;

     

        if(mpi_world.rank() == 0)

          {

            std::cerr << "-- Writer creating environment" << std::endl;

     

            DB_ENV *env;

            int dberr = ::db_env_create(&env, 0);

     

            std::cerr << "**   creation response = " << dberr << std::endl;

            if(dberr > 0) std::cerr << "**   " << ::db_strerror(dberr) << std::endl;

     

            std::cerr << "-- Writer opening environment" << std::endl;

     

            u_int32_t env_flags = DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_REGISTER | DB_RECOVER | DB_INIT_TXN | DB_CREATE;

            dberr = env->open(env, envname.c_str(), env_flags, 0);

     

            std::cerr << "**   opening response = " << dberr << std::endl;

            if(dberr > 0) std::cerr << "**   " << ::db_strerror(dberr) << std::endl;

     

            // set up XmlManager object

            DbXml::XmlManager *mgr = new DbXml::XmlManager(env, DbXml::DBXML_ADOPT_DBENV | DbXml::DBXML_ALLOW_EXTERNAL_ACCESS);

     

            // create containers - these will be used by the workers

            DbXml::XmlContainerConfig pkg_config;

            DbXml::XmlContainerConfig int_config;

            pkg_config.setTransactional(true);

            int_config.setTransactional(true);

     

            std::cerr << "-- Writer creating containers" << std::endl;

     

            DbXml::XmlContainer packages       = mgr->createContainer(pkgname.c_str(), pkg_config);

            DbXml::XmlContainer integrations   = mgr->createContainer(intname.c_str(), int_config);

     

            std::cerr << "-- Writer instructing workers" << std::endl;

     

            std::vector<boost::mpi::request> reqs(mpi_world.size() - 1);

            for(unsigned int                 i = 1; i < mpi_world.size(); i++)

              {

                reqs[i - 1] = mpi_world.isend(i, 0); // instruct workers to open the environment

              }

     

            // wait for all messages to be received

            boost::mpi::wait_all(reqs.begin(), reqs.end());

     

            std::cerr << "-- Writer waiting for termination responses" << std::endl;

     

            // wait for workers to advise successful termination

            unsigned int outstanding_workers = mpi_world.size() - 1;

            while(outstanding_workers > 0)

              {

                boost::mpi::status stat = mpi_world.probe();

     

                switch(stat.tag())

                  {

                    case 1:

                      {

                        mpi_world.recv(stat.source(), 1);

                        outstanding_workers--;

                        break;

                      }

                  }

              }

     

            delete mgr; // exit, closing database and environment

          }

        else

          {

            mpi_world.recv(0, 0);

     

            std::cerr << "++ Reader " << mpi_world.rank() << " beginning work" << std::endl;

     

            DB_ENV *env;

            ::db_env_create(&env, 0);

     

            u_int32_t env_flags = DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_REGISTER | DB_RECOVER | DB_INIT_TXN | DB_CREATE;

            env->open(env, envname.c_str(), env_flags, 0);

     

            // set up XmlManager object

            DbXml::XmlManager *mgr = new DbXml::XmlManager(env, DbXml::DBXML_ADOPT_DBENV | DbXml::DBXML_ALLOW_EXTERNAL_ACCESS);

     

            // open containers which were set up by the master

            DbXml::XmlContainerConfig pkg_config;

            DbXml::XmlContainerConfig int_config;

            pkg_config.setTransactional(true);

            pkg_config.setReadOnly(true);

            int_config.setTransactional(true);

            int_config.setReadOnly(true);

     

            DbXml::XmlContainer packages     = mgr->openContainer(pkgname.c_str(), pkg_config);

            DbXml::XmlContainer integrations = mgr->openContainer(intname.c_str(), int_config);

     

            mpi_world.isend(0, 1);

     

            delete mgr; // exit, closing database and environment

          }

     

        return (EXIT_SUCCESS);

      }

     

    This repeatably causes the crash on OS X Mavericks 10.9.1. Also, I have checked that it repeatably causes the crash on a virtualized OS X Mountain Lion 10.8.5. But I do not see any crashes on a virtualized Ubuntu 13.10. My full code likewise works as expected with a large number of readers under the virtualized Ubuntu. I am compiling with clang and libc++ on OS X, and gcc 4.8.1 and libstdc++ on Ubuntu, but using openmpi in both cases. Edit: I have also compiled with clang and libc++ on Ubuntu, and it works equally well.

     

    Because the virtualized OS X experiences the crash, I hope the fact that it works on Ubuntu is not just an artefact of virtualization. (Unfortunately I don't currently have a physical Linux machine with which to check.) In that case the implication would seem to be that it's an OS X-specific problem. 2nd edit (14 Feb 2014): I have now managed to test on a physical Linux cluster, and it appears to work as expected. Therefore it does appear to be an OS X-specific issue.

     

    In either OS X 10.8 or 10.9, the crash produces this result:

     

    -- Writer creating environment

    **   creation response = 0

    -- Writer opening environment

    **   opening response = 0

    -- Writer creating containers

    ++ Reader 7 beginning work

    -- Writer instructing workers

    -- Writer waiting for termination responses

    ++ Reader 1 beginning work

    ++ Reader 2 beginning work

    ++ Reader 3 beginning work

    ++ Reader 4 beginning work

    ++ Reader 5 beginning work

    ++ Reader 6 beginning work

    pthread lock failed: Invalid argument

    PANIC: Invalid argument

    PANIC: fatal region error detected; run recovery

    PANIC: fatal region error detected; run recovery

    PANIC: fatal region error detected; run recovery

    PANIC: fatal region error detected; run recovery

    PANIC: fatal region error detected; run recovery

    PANIC: fatal region error detected; run recovery

    PANIC: fatal region error detected; run recovery

    libc++abi.dylib: terminate called throwing an exception

    [mountainlion-test-rig:00319] *** Process received signal ***

    [mountainlion-test-rig:00319] Signal: Abort trap: 6 (6)

    [mountainlion-test-rig:00319] Signal code:  (0)

     

    David

     

    Message was edited by: ds283

  • 9. Re: Berkeley DB XML crash with multiple readers (dbxml-2.5.16 and db-4.8.26)
    ds283 Newbie
    Currently Being Moderated

    Thanks for all your help so far!

     

    I was just wondering whether there was a possibility of checking whether this is a reproducible issue on OS X, and is so whether there is any prospect of patching DB XML to avoid it. If it's not possible then we will have to migrate our code to a different database backend, and it would be helpful to know whether we should plan for that.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points