This discussion is archived
2 Replies Latest reply: Jun 28, 2012 1:01 AM by 944738 RSS

Can multiple threads write to the database?

944738 Newbie
Currently Being Moderated
I am a little confused from the statement in the documentation: "Berkeley DB Data Store does not support locking, and hence does not guarantee correct behavior if more than one thread of control is updating the database at a time."

1. Can multiple threads write to the "Simple Data Store"?
2. Considering the sample code below which writes to the DB using 5 threads - is there a possibility of data loss?
3. If the code will cause data loss, will adding DB_INIT_LOCK and/or DB_INIT_TXN in DBENV->open make any difference?

----
#include "stdafx.h"
#include <stdio.h>
#include <windows.h>
#include <db.h>

static DB *db = NULL;
static DB_ENV *dbEnv = NULL;

DWORD WINAPI th_write(LPVOID lpParam)
{
DBT key, data;
char key_buff[32], data_buff[32];
DWORD i;

printf("thread(%s) - start\n", lpParam);

for (i = 0; i < 200; ++i)
{
memset(&key, 0, sizeof(key));
memset(&data, 0, sizeof(data));

sprintf(key_buff, "K:%s", lpParam);
sprintf(data_buff, "D:%s:%8d", lpParam, i);

key.data = key_buff;
key.size = strlen(key_buff);
data.data = data_buff;
data.size = strlen(data_buff);

db->put(db, NULL, &key, &data, 0);
Sleep(5);
}

printf("thread(%s) - End\n", lpParam);

return 0;
}

int main()
{
db_env_create(&dbEnv, 0);

dbEnv->open(dbEnv, NULL, DB_CREATE | DB_INIT_MPOOL | DB_THREAD, 0);

db_create(&db, dbEnv, 0);
db->open(db, NULL, "test.db", NULL, DB_BTREE, DB_CREATE, 0);

CreateThread(NULL, 0, th_write, "A", 0, 0);
CreateThread(NULL, 0, th_write, "B", 0, 0);
CreateThread(NULL, 0, th_write, "B", 0, 0);
CreateThread(NULL, 0, th_write, "C", 0, 0);
th_write("C");
Sleep(2000);
}
  • 1. Re: Can multiple threads write to the database?
    dmarega Newbie
    Currently Being Moderated
    Here some clarification about BDB Lock and Multi threads behavior

    Question 1. Can multiple threads write to the "Simple Data Store"?

    Answer 1.

    Please Refer to http://docs.oracle.com/cd/E17076_02/html/programmer_reference/intro_products.html

    A Data Store (DS) set up
    (so not using an environment or using one, but without any of the DB_INIT_LOCK, DB_INIT_TXN, DB_INIT_LOG environment regions related flags specified
    each corresponding to the appropriate subsystem, locking, transaction, logging)
    will not guard against data corruption due to accessing the same database page and overwriting the same records, corrupting the internal structure of the database etc.
    (note that in the case of the Btree, Hash and Recno access methods we lock at the database page level, only for the Queue access method we lock at record level)
    So,
    if You want to have multiple threads in the application writing concurrently or in parallel to the same database You need to use locking (and properly handle any potential deadlocks),
    otherwise You risk corrupting the data itself or the database (its internal structure).
    Of course , If You serialize at the application level the access to the database, so that no more one threads writes to the database at a time, there will be no need for locking.
    But obviously this is likely not the behavior You want.
    Hence, You need to use either a CDS (Concurrent Data Store) or TDS (Transactional Data Store) set up.

    See the table comparing the various set ups, here: http://docs.oracle.com/cd/E17076_02/html/programmer_reference/intro_products.html
    Berkeley DB Data Store

    The Berkeley DB Data Store product is an embeddable, high-performance data store. This product supports multiple concurrent threads of control, including multiple processes and multiple threads of control within a process. However, Berkeley DB Data Store does not support locking, and hence does not guarantee correct behavior if more than one thread of control is updating the database at a time. The Berkeley DB Data Store is intended for use in read-only applications or applications which can guarantee no more than one thread of control updates the database at a time.
    Berkeley DB Concurrent Data Store

    The Berkeley DB Concurrent Data Store product adds multiple-reader, single writer capabilities to the Berkeley DB Data Store product. This product provides built-in concurrency and locking feature. Berkeley DB Concurrent Data Store is intended for applications that need support for concurrent updates to a database that is largely used for reading.
    Berkeley DB Transactional Data Store

    The Berkeley DB Transactional Data Store product adds support for transactions and database recovery. Berkeley DB Transactional Data Store is intended for applications that require industrial-strength database services, including excellent performance under high-concurrency workloads of read and write operations, the ability to commit or roll back multiple changes to the database at a single instant, and the guarantee that in the event of a catastrophic system or hardware failure, all committed database changes are preserved.
    So, clearly DS is not a solution for this case, where multiple threads need to write simultaneously to the database.
    CDS (Concurrent Data Store) provides locking features, but only for multiple-reader/single-writer scenarios. You use CDS when you specify the DB_INIT_CDB flag when opening the BDB environment: http://docs.oracle.com/cd/E17076_02/html/api_reference/C/envopen.html#envopen_DB_INIT_CDB
    TDS (Transactional Data Store) provides locking features, adds complete ACID support for transactions and offers recoverability guarantees. You use TDS when you specify the DB_INIT_TXN and DB_INIT_LOG flags when opening the environment. To have locking support, you would need to also specify the DB_INIT_LOCK flag.

    Now, since the requirement is to have multiple writers (multi-threaded writes to the database),
    then TDS would be the way to go (CDS is useful only in single-writer scenarios, when there are no needs for recoverability).

    To Summarize

    The best way to have an understanding of what set up is needed, it is to answer the following questions:
    - What is the data access scenario? Is it multiple writer threads? Will the writers access the database simultaneously?
    - Are recoverability/data durability, atomicity of operations and data isolation important for the application? http://docs.oracle.com/cd/E17076_02/html/programmer_reference/transapp_why.html
    If the answers are yes, then TDS should be used, and the environment should be opened like this:
    dbEnv->open(dbEnv, ENV_HOME, DB_CREATE | DB_INIT_MPOOL | DB_INIT_LOCK | DB_INIT_TXN | DB_INIT_LOG | DB_RECOVER | DB_THREAD, 0);
    (where ENV_HOME is the filesystem directory where the BDB environment will be created)




    Question 2. Considering the sample code below which writes to the DB using 5 threads - is there a possibility of data loss?

    Answer 2.
    Definitely yes, You can see data loss and/or data corruption.
    You can check the behavior of your testcase in the following way
    1. Run your testcase
    2.After the program exits
    run db_verify to verify the database (db_verify -o test.db).

    You will likely see db_verify complaining, unless the thread scheduler on Windows weirdly starts each thread one after the other,
    IOW no two or ore threads write to the database at the same time -- kind of serializing the writes




    Question 3. If the code will cause data loss, will adding DB_INIT_LOCK and/or DB_INIT_TXN in DBENV->open make any difference?

    Answer 3.
    In Your case the TDS should be used, and the environment should be opened like this:
    dbEnv->open(dbEnv, ENV_HOME, DB_CREATE | DB_INIT_MPOOL | DB_INIT_LOCK | DB_INIT_TXN | DB_INIT_LOG | DB_RECOVER | DB_THREAD, 0);
    (where ENV_HOME is the filesystem directory where the BDB environment will be created)

    doing this You have proper deadlock handling in place and proper transaction usage
    so
    You are protected against potential data corruption/data loss.

    see http://docs.oracle.com/cd/E17076_02/html/gsg_txn/C/BerkeleyDB-Core-C-Txn.pdf

    Multi-threaded and Multi-process Applications
    DB is designed to support multi-threaded and multi-process applications, but their usage
    means you must pay careful attention to issues of concurrency. Transactions help your
    application's concurrency by providing various levels of isolation for your threads of control. In
    addition, DB provides mechanisms that allow you to detect and respond to deadlocks.
    Isolation means that database modifications made by one transaction will not normally be
    seen by readers from another transaction until the first commits its changes. Different threads
    use different transaction handles, so this mechanism is normally used to provide isolation
    between database operations performed by different threads.
    Note that DB supports different isolation levels. For example, you can configure your
    application to see uncommitted reads, which means that one transaction can see data that
    has been modified but not yet committed by another transaction. Doing this might mean
    your transaction reads data "dirtied" by another transaction, but which subsequently might
    change before that other transaction commits its changes. On the other hand, lowering your
    isolation requirements means that your application can experience improved throughput due
    to reduced lock contention.
    For more information on concurrency, on managing isolation levels, and on deadlock
    detection, see Concurrency (page 32).
  • 2. Re: Can multiple threads write to the database?
    944738 Newbie
    Currently Being Moderated
    dmarega, Thanks for the excellently worded and detailed response.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points