Hi BDB experts,
I am writing db HA application based on bdb version 4.6.21. Two daemons run on two machines, one as master which will read/write db, one as backup will only read db. Here comes my question: commonly when is it ready to use db->open on backup machine?
since backup will get db synced from master first, if the db is big, the sync will take several minutes, seen from my test, if sync is not finished, I often got error messages "DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock" for db->open. Currently I can only have method that loop on calling db->open method, is there a better way?
You code needs to be able to handle a possible DB_LOCK_DEADLOCK return code from the DB->open() method regardless of replication. When you get this return value you need to close your cursors and abort any transaction you have explicitly started. The usual course of action is to retry the operation.
If you want to reduce this polling on your backup site and also protect against reading possibly stale or inconsistent data during sync, you can create an event handler (DB_ENV->set_event_notify()) and wait until you have seen the DB_EVENT_REP_STARTUPDONE event before attempting to open your databases.
About this event "DB_EVENT_REP_STARTUPDONE", in my tests, I found sometimes when the replica started up it could not get this event. As this rarely happens, I did not get the rules to reproduce it. When this happened, I only got information from bdb as below:
CLIENT: ../client_db/ rep_send_message: msgv = 4 logv 13 gen = 2 eid -1, type alive_req, LSN  nogroup nobuf
CLIENT: ../client_db/ rep_process_message: msgv = 4 logv 13 gen = 2 eid 2, type alive, LSN 
CLIENT: Received ALIVE egen of 3, mine 3
Usually when started the replica lots more information could be gotten and ended with something like "CLIENT: Start-up is done ".
Could you help explain why this happened?
A restarting client site sends an ALIVE_REQ message when it had previously been started as a client (without an intervening recovery) and it thinks it has a valid master. Under these circumstances, it is simply rejoining the replication group and it might not generate a STARTUPDONE event. The restarting client requires some master activity (i.e. new master transactions) to verify that it is up-to-date with the master. After this master activity, if the client determines that there is now a "different" master (either the same master site has been restarted or there is a different master site), it performs a client synchronization and this will generate a STARTUPDONE event.
If your application needs to guarantee a STARTUPDONE event every time the client starts up, one way to do this would be to make sure you always perform a recovery on the client before restarting replication. You can perform the recovery using the db_recover standalone utility or by using the DB_RECOVER flag when opening the environment. This clears out the internal state that it is already a client and about a master.
I coded in the way that client would open environment with DB_RECOVER flag always from the beginning. I tested many times, and found that this can't guarantee a STARTUPDONE event/always perform a recovery. May there be some else should be done? The client has to open database for read. If there is no STARTUPDONE event, is there some other good timing to open the replica db?
I'm not sure why DB_RECOVER would sometimes not perform a recovery in 4.6. There could be valid reasons for this and I'll see if I can find out more.
You don't need to wait for client sync to finish before opening your databases. You can simply open them after a successful rep_start() call. In the ex_rep example we open the database after the rep_start() call without tying it to the STARTUPDONE event. We try to open the database in a loop to allow for the fact that it may not exist yet if this is the first time the client is starting up.
But if you really would like to avoid opening your database until the client has completed its sync, there is a different test you can try. You can check the st_startup_complete statistic using the rep_stat() call. Its value is 0 during a client sync and gets set to 1 when the client sync completes. In cases where there is no STARTUPDONE event, I believe the st_startup_complete value will already be 1.
Thank you for the help. I will use this st_startup_complete to check replica state. What will happen if the client/replica exit abornally(e.g. receive signal or power off) before sync complete? Does the client get part of the db so that these records can be read? What will happen if the previous client home directory opened as master and master env path opened as client after previous client not finished sync and exited?
What will happen if the client/replica exit abornally(e.g. receive signal or power off) before sync complete? Does the client get part of the db so that these records can be read?
The st_startup_complete stat value is set to 0 at the beginning of a sync and it will still be 0 if the system goes down before the sync is complete. If the system goes down during a sync, we have logic when we restart to detect the incomplete sync and clean it up before starting a new sync. The fragments of the incomplete sync are most likely inconsistent, which is why we clean them up.
What will happen if the previous client home directory opened as master and master env path opened as client after previous client not finished sync and exited?
I need to create a more specific example here. You have a master A and a client B that goes down during its client sync. Then A goes down as well? Then you restart B as a master and A as a client?
In this case restarting B as a master could be very bad because B might be in an inconsistent state and we would clean it up, leaving you with an empty master. If you then start A as a client, all of its previous data would be cleaned up so that it can sync with your now empty master B.
One reason to use elections if at all possible is to prevent cases like this. If there was an election between A and B in the case above, we would detect if B is in an inconsistent state and make sure it doesn't win the election.
If you can't use elections, you need to be aware of this possibility when designing your logic for appointing a master.
Can rep_start be called many times in the client?
It is not harmful to call rep_start multiple times in the client.
How to force a sync when a client disconnect to the master and reconnect to the master later?
If you are not using an election to discover your master, the client requires some master activity to determine that it is missing information and needs to sync. Master activity would be additional transactions or checkpoints on the master. Once the restarted client gets the master information from this latest master activity, it will automatically do a sync if necessary.
If you are using an election to discover your master and there is already a master, the election will confirm that master and the client will get the master information it needs to automatically do a sync if necessary.
There are more than 5 db in the HA master env. During startup sync, is it possible on client to open 1 db before all 5 db synced to client?
There is nothing to stop you from opening one or more of your databases on the client during client sync. But there is also nothing to guarantee that any one of your databases is internally consistent or is caught up with the master until client sync is finished.