2 Replies Latest reply: Jun 28, 2012 3:01 AM by 944738 RSS

    Can multiple threads write to the database?

    944738
      I am a little confused from the statement in the documentation: "Berkeley DB Data Store does not support locking, and hence does not guarantee correct behavior if more than one thread of control is updating the database at a time."

      1. Can multiple threads write to the "Simple Data Store"?
      2. Considering the sample code below which writes to the DB using 5 threads - is there a possibility of data loss?
      3. If the code will cause data loss, will adding DB_INIT_LOCK and/or DB_INIT_TXN in DBENV->open make any difference?

      ----
      #include "stdafx.h"
      #include <stdio.h>
      #include <windows.h>
      #include <db.h>

      static DB *db = NULL;
      static DB_ENV *dbEnv = NULL;

      DWORD WINAPI th_write(LPVOID lpParam)
      {
      DBT key, data;
      char key_buff[32], data_buff[32];
      DWORD i;

      printf("thread(%s) - start\n", lpParam);

      for (i = 0; i < 200; ++i)
      {
      memset(&key, 0, sizeof(key));
      memset(&data, 0, sizeof(data));

      sprintf(key_buff, "K:%s", lpParam);
      sprintf(data_buff, "D:%s:%8d", lpParam, i);

      key.data = key_buff;
      key.size = strlen(key_buff);
      data.data = data_buff;
      data.size = strlen(data_buff);

      db->put(db, NULL, &key, &data, 0);
      Sleep(5);
      }

      printf("thread(%s) - End\n", lpParam);

      return 0;
      }

      int main()
      {
      db_env_create(&dbEnv, 0);

      dbEnv->open(dbEnv, NULL, DB_CREATE | DB_INIT_MPOOL | DB_THREAD, 0);

      db_create(&db, dbEnv, 0);
      db->open(db, NULL, "test.db", NULL, DB_BTREE, DB_CREATE, 0);

      CreateThread(NULL, 0, th_write, "A", 0, 0);
      CreateThread(NULL, 0, th_write, "B", 0, 0);
      CreateThread(NULL, 0, th_write, "B", 0, 0);
      CreateThread(NULL, 0, th_write, "C", 0, 0);
      th_write("C");
      Sleep(2000);
      }
        • 1. Re: Can multiple threads write to the database?
          dmarega-Oracle
          Here some clarification about BDB Lock and Multi threads behavior

          Question 1. Can multiple threads write to the "Simple Data Store"?

          Answer 1.

          Please Refer to http://docs.oracle.com/cd/E17076_02/html/programmer_reference/intro_products.html

          A Data Store (DS) set up
          (so not using an environment or using one, but without any of the DB_INIT_LOCK, DB_INIT_TXN, DB_INIT_LOG environment regions related flags specified
          each corresponding to the appropriate subsystem, locking, transaction, logging)
          will not guard against data corruption due to accessing the same database page and overwriting the same records, corrupting the internal structure of the database etc.
          (note that in the case of the Btree, Hash and Recno access methods we lock at the database page level, only for the Queue access method we lock at record level)
          So,
          if You want to have multiple threads in the application writing concurrently or in parallel to the same database You need to use locking (and properly handle any potential deadlocks),
          otherwise You risk corrupting the data itself or the database (its internal structure).
          Of course , If You serialize at the application level the access to the database, so that no more one threads writes to the database at a time, there will be no need for locking.
          But obviously this is likely not the behavior You want.
          Hence, You need to use either a CDS (Concurrent Data Store) or TDS (Transactional Data Store) set up.

          See the table comparing the various set ups, here: http://docs.oracle.com/cd/E17076_02/html/programmer_reference/intro_products.html
          Berkeley DB Data Store

          The Berkeley DB Data Store product is an embeddable, high-performance data store. This product supports multiple concurrent threads of control, including multiple processes and multiple threads of control within a process. However, Berkeley DB Data Store does not support locking, and hence does not guarantee correct behavior if more than one thread of control is updating the database at a time. The Berkeley DB Data Store is intended for use in read-only applications or applications which can guarantee no more than one thread of control updates the database at a time.
          Berkeley DB Concurrent Data Store

          The Berkeley DB Concurrent Data Store product adds multiple-reader, single writer capabilities to the Berkeley DB Data Store product. This product provides built-in concurrency and locking feature. Berkeley DB Concurrent Data Store is intended for applications that need support for concurrent updates to a database that is largely used for reading.
          Berkeley DB Transactional Data Store

          The Berkeley DB Transactional Data Store product adds support for transactions and database recovery. Berkeley DB Transactional Data Store is intended for applications that require industrial-strength database services, including excellent performance under high-concurrency workloads of read and write operations, the ability to commit or roll back multiple changes to the database at a single instant, and the guarantee that in the event of a catastrophic system or hardware failure, all committed database changes are preserved.
          So, clearly DS is not a solution for this case, where multiple threads need to write simultaneously to the database.
          CDS (Concurrent Data Store) provides locking features, but only for multiple-reader/single-writer scenarios. You use CDS when you specify the DB_INIT_CDB flag when opening the BDB environment: http://docs.oracle.com/cd/E17076_02/html/api_reference/C/envopen.html#envopen_DB_INIT_CDB
          TDS (Transactional Data Store) provides locking features, adds complete ACID support for transactions and offers recoverability guarantees. You use TDS when you specify the DB_INIT_TXN and DB_INIT_LOG flags when opening the environment. To have locking support, you would need to also specify the DB_INIT_LOCK flag.

          Now, since the requirement is to have multiple writers (multi-threaded writes to the database),
          then TDS would be the way to go (CDS is useful only in single-writer scenarios, when there are no needs for recoverability).

          To Summarize

          The best way to have an understanding of what set up is needed, it is to answer the following questions:
          - What is the data access scenario? Is it multiple writer threads? Will the writers access the database simultaneously?
          - Are recoverability/data durability, atomicity of operations and data isolation important for the application? http://docs.oracle.com/cd/E17076_02/html/programmer_reference/transapp_why.html
          If the answers are yes, then TDS should be used, and the environment should be opened like this:
          dbEnv->open(dbEnv, ENV_HOME, DB_CREATE | DB_INIT_MPOOL | DB_INIT_LOCK | DB_INIT_TXN | DB_INIT_LOG | DB_RECOVER | DB_THREAD, 0);
          (where ENV_HOME is the filesystem directory where the BDB environment will be created)




          Question 2. Considering the sample code below which writes to the DB using 5 threads - is there a possibility of data loss?

          Answer 2.
          Definitely yes, You can see data loss and/or data corruption.
          You can check the behavior of your testcase in the following way
          1. Run your testcase
          2.After the program exits
          run db_verify to verify the database (db_verify -o test.db).

          You will likely see db_verify complaining, unless the thread scheduler on Windows weirdly starts each thread one after the other,
          IOW no two or ore threads write to the database at the same time -- kind of serializing the writes




          Question 3. If the code will cause data loss, will adding DB_INIT_LOCK and/or DB_INIT_TXN in DBENV->open make any difference?

          Answer 3.
          In Your case the TDS should be used, and the environment should be opened like this:
          dbEnv->open(dbEnv, ENV_HOME, DB_CREATE | DB_INIT_MPOOL | DB_INIT_LOCK | DB_INIT_TXN | DB_INIT_LOG | DB_RECOVER | DB_THREAD, 0);
          (where ENV_HOME is the filesystem directory where the BDB environment will be created)

          doing this You have proper deadlock handling in place and proper transaction usage
          so
          You are protected against potential data corruption/data loss.

          see http://docs.oracle.com/cd/E17076_02/html/gsg_txn/C/BerkeleyDB-Core-C-Txn.pdf

          Multi-threaded and Multi-process Applications
          DB is designed to support multi-threaded and multi-process applications, but their usage
          means you must pay careful attention to issues of concurrency. Transactions help your
          application's concurrency by providing various levels of isolation for your threads of control. In
          addition, DB provides mechanisms that allow you to detect and respond to deadlocks.
          Isolation means that database modifications made by one transaction will not normally be
          seen by readers from another transaction until the first commits its changes. Different threads
          use different transaction handles, so this mechanism is normally used to provide isolation
          between database operations performed by different threads.
          Note that DB supports different isolation levels. For example, you can configure your
          application to see uncommitted reads, which means that one transaction can see data that
          has been modified but not yet committed by another transaction. Doing this might mean
          your transaction reads data "dirtied" by another transaction, but which subsequently might
          change before that other transaction commits its changes. On the other hand, lowering your
          isolation requirements means that your application can experience improved throughput due
          to reduced lock contention.
          For more information on concurrency, on managing isolation levels, and on deadlock
          detection, see Concurrency (page 32).
          • 2. Re: Can multiple threads write to the database?
            944738
            dmarega, Thanks for the excellently worded and detailed response.