This content has been marked as final. Show 2 replies
In this scenario it seems that a single database handle will not be accessed by more than one thread at a time. In other words, one thread will only perform operations on certain databases, and those databases will only be accessed and modified by this thread.
The way transactions should access the databases, in what order, what particular operations you should perform, whether you should only access a single database, depends on the application needs. For example, if certain operations that should be part of the same set of changes, and which should be performed atomically, hence within a transaction, imply changes to multiple databases, then it is clear that one transaction cannot solely operate on a single database.
For instance holding an open transaction and trying to get another one during election (where the environment is locked).This indicates that you are using HA / Replication. Most likely Replication Manager (repmgr) in BDB. During an election, actually after an election is completed, if the role needs to be changed (master -> client, client -> master), repmgr will lock-out the environment. Repmgr also locks-out the environment when there is a a major system change (recovery, internal init or role change). When locking-out the environment BDB will wait for existing active transactions to complete (completing a transaction means committing or aborting it). While BDB waits for these active transactions to be completed/resolved, new transactions cannot be started (the lockout prevents new transactions and cursors from starting, until all existing active transactions and cursors have been resolved / completed).
This is why you see new transactions waiting. If you are concerned with this waiting, that new transactions cannot be started and have to wait, then you can avoid the waiting by setting the DB_REP_CONF_NOWAIT flag from DbEnv::rep_set_config():
If you set the DB_REP_CONF_NOWAIT flag, then the DbEnv::txn_begin() call will return with DB_REP_LOCKOUT instead of waiting.
However, you would still need to resolve the existing active transactions. It might be the case that this these transactions are just sitting around, without completing. You need to inspect the application code and see what these transactions are doing.
If they are in-progress, any further BDB API call, like Db::put() or Db::get(), will return DB_LOCK_DEADLOCK so that the application will immediately resolve the txns that are holding everything up. Thus, even long running txns should resolve quickly, on the next API usage hopefully.
Is it right to say that the best way to use transactions is to hold one each time for all DBs within environment?This sounds like bad design, specifically because I doubt the operations that the transaction should perform atomically are all affecting all the databases in the environment. Doing things this way certainly is a way of degrading performance.
Does multithread like described above can harm that?
You should use transactions as before, and work on correctly identifying the places in the application where deadlocks are happening and handle the deadlocks properly.
But again, what operations you enclose in a transaction depends on what changes need to be performed atomically and on what databases.
As reference, you can review the following chapters in the BDB Reference Guide:
Can nested transactions be a solution approach here? If so how?Nested / child transactions are explained here:
I do not see how nested transactions would help here. They are useful when there is a need to decompose a large long running txn into smaller pieces, which enclose changes that can be independently (independent of the other changes in the parent txn) aborted if needed. Nested transactions inherit the locks from the parent txn so the locking issues will still exist.
I doubt that you do this, but just mention it. A transaction may only span multiple threads if the access to the txn handle is serialized (the txn handle cannot be used simultaneously in more than one thread at a time).
Hope this helps
Hi, thanks for the detailed answer.
1. If we can't modify one txn handle in more than one thread and also can't modify DB handle in more than one thread this implies that
all txns that operates on more than one DB must work in a single thread. This affects the performance of the application.
Our intention on modified each DB in a separated thread was to cope with high rate of DB operations - loosing automicity on operations to be done in more than one DB.
What is the recommended solution?
We have Replication environment on 2 machines cluster, with repmgr. We do use DB_REP_CONF_NOWAIT and handle
DB_REP_LOCKOUT and DB_LOCK_DEADLOCK errors as following:
Since we don't know on application when election/sync will start (or any operation that may lock the env, in version 5.1.19 there is no event to tell that) once we get the above errors we abort all opened txns and forbid getting new ones until election/sync completes or 20 sec of timeout. We forbid DB operations like put del etc as well.
When we get the above errors during election and despite the handling, the machine which holds the Master role gives up his role and become Client. So the new machine gets to be the Master even before the DB was totally replicated to it from the first Master.
Therefor this is a critical issue.
How to prevent that to happen?
Additional info: for each DB we got a dedicated thread that make operations on it, in Master machine only.
the open/close of env and DBs is done by the main thread.
we got also a checkpoint thread that clean unused logs (log_archive) every 30 sec and calls get_site_list to determinize connection in the cluster every 1 sec.