This discussion is archived
2 Replies Latest reply: Mar 1, 2012 8:24 AM by 884963 RSS

DB size for HA replication

884963 Newbie
Currently Being Moderated
Do you have any recommendation on the db file size for the HA replication?

Let say I have 2TB of bekeley DB size. How long does it replicate to the slave? When a new node is added to the group, how long is to replicate 2TB of data?

Also If I want to refresh the DB, what are the options. Like, I want to clean up the Berkeley DB files and recreate a new 2TB file on the master. How does it impact the slaves?

Is there any other way to do the DB refresh?

Is Berekley DB a good solution for this kind of use case?

Thanks
  • 1. Re: DB size for HA replication
    Ashok_Ora Explorer
    Currently Being Moderated
    Hello,

    Berkeley DB replication is log-based. Everytime data is changed on the master, the log records are sent to the replicas. Replicas apply the log records to the data in order to stay current with the master. So, during normal operation, the rate of change at the master determines the performance of replication - the higher the rate of change, the more the processing required.

    If you want to add a new replica, then there are two options you have:
    You can take a recent backup of the data from the master, restore it on the newly added replica and then "add" the replica to the HA group - the master will detect that a new replica has been added, it will start sending it log records and the replica will catch up (become current) and then come "online". We call this mechanism "internal init".

    A second choice is for the new replica to request data from the master over the network. The amount of time it takes to transfer data from master to new replica will depend on available network bandwidth etc.

    For a 2 TB file, I'd suspect that the former approach might work better (restore from recent backup).

    please refer to the BDB documentation for more details on this topic.

    If you want to "refresh" the master, you could gracefully shutdown the master (BDB HA would automatically transfer mastership to an existing replica), refresh the master, and then bring it back online.

    In order to do this correctly, you need to make sure that there are enough replicas, so the election of the new master can happen correctly (needs a quorum).

    So, assuming that you use Berkeley DB HA the way it was designed, I'd venture to say that BDB is a good solution for the use-case you've described. Of course, the more detail you provide on your use case, the better we can assist you.

    I'd encourage you to read the HA documentation to understand internal init, online addition of a new replica, elections etc.

    Hope this helps.
    Thanks and warm regards.
    ashok
  • 2. Re: DB size for HA replication
    884963 Newbie
    Currently Being Moderated
    Thanks for the response.

    Do you have any performance stats for Berkeley DB published anywhere?

    I have a question on the below statement:
    -----------------------------------------------

    If you want to "refresh" the master, you could gracefully shutdown the master (BDB HA would automatically transfer mastership to an existing replica), refresh the master, and then bring it back online.

    In order to do this correctly, you need to make sure that there are enough replicas, so the election of the new master can happen correctly (needs a quorum).
    ----------------------------------------------

    So let say i shutdown the master and load the DB with a brand new BDB file. Now I bring it back online. I need to bring this back as master? And how does this replicate to slaves? This is a brand new BDB, how does the replication happen? I believe it has to replicate the whole new BDB file to the slaves rather than updates? How long will it take assuming production standard network bandwidth? During this replication, how does it handle the existing reads? Any stats published on slave catch up time?

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points