4 Replies Latest reply: Jul 23, 2013 2:17 AM by user605606 RSS

    slave instance can not synchronized with the master for a long time

    902588
      Hi,

      Slave instance can not synchronized with the master for a long time, and the gap of LSNs between slave and master are more and more larger. The transfer bytes per second between master and slave is 118MB/s , upper limit to the ethernet card. Why???

      ENVIRONMENT:
      1. A network program which we called mcdb is based on BDB 4.8.30. It is used to accept get/set requests and then query data from bdb or save data to bdb.
      2. mcdb implements replication with bdb replication manager api. The default start replication policy is DB_REP_ELECTION, rep ack policy is DB_REPMGR_ACKS_ONE_PEER and rep priority is 100.
      3. The bdb data files, bdb log files, region files and rep files are all in the same home directory.
      4. There are two instances of the mcdb are separately on two standalone servers which has no other programs. These two instances are in one replication group which can elect a master automatically.
      5. The master instance is online, which has lots of requests (get, set, delete).


      ACTIONS:
      1. Start two mcdb instances on two servers and the two instance make a replication group. Slave has already synchronized with the master.
      2. Stop the slave for a long time (more than 20 hours).
      3. Start the slave instance to synchronize data with master.

      RESULT:
      1. the gap of LSNs between slave and master are more and more larger.
      2. db_stat of master and slave:

      master db_stat:
      467170     Number of PERM messages not acknowledged
      9245     Number of messages queued due to network delay
      172415     Number of messages discarded due to queue length
      25880     Number of existing connections dropped
      3407     Number of failed new connection attempts
      =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      DB_REPMGR site information:
      10.67.15.146 (eid: 0, port: 30011)
      Environment configured as a replication master
      331726/3574994     Next LSN to be used
      0/0     Not waiting for any missed log records
      328872/466092     Maximum permanent LSN
      0     Next page number expected
      0     Not waiting for any missed pages
      0     Number of duplicate master conditions originally detected at this site
      2147M     Current environment ID (2147483647)
      100     Current environment priority
      49     Current generation number
      50     Election generation number for the current or next election
      2323     Number of duplicate log records received
      0     Number of log records currently queued
      6768     Maximum number of log records ever queued at once
      55284     Total number of log records queued
      120M     Number of log records received and appended to the log (120475988)
      111     Number of log records missed and requested
      2147M     Current master ID (2147483647)
      2     Number of times the master has changed
      0     Number of messages received with a bad generation number
      8505306     Number of messages received and processed
      12     Number of messages ignored due to pending recovery
      471869     Number of failed message sends
      12M     Number of messages sent (12959945)
      0     Number of new site messages received
      1     Number of environments believed to be in the replication group
      990543     Transmission limited
      0     Number of outdated conditions detected
      0     Number of duplicate page records received
      0     Number of page records received and added to databases
      0     Number of page records missed and requested
      Startup complete
      6244678     Number of transactions applied
      0     Number of startsync messages delayed
      1     Number of elections held
      1     Number of elections won
      No election in progress
      0.057097     Duration of last election (seconds)
      8944103     Number of bulk buffer sends triggered by full buffer
      0     Number of single records exceeding bulk buffer size
      5273M     Number of records added to a bulk buffer (5273592170)
      10M     Number of bulk buffers sent (10490865)
      0     Number of re-request messages received
      0     Number of request messages this client failed to process
      0     Number of request messages received by this client

      slave db_stat:
      0     Number of PERM messages not acknowledged
      0     Number of messages queued due to network delay
      0     Number of messages discarded due to queue length
      1454     Number of existing connections dropped
      0     Number of failed new connection attempts
      =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      DB_REPMGR site information:
      10.67.15.147 (eid: 0, port: 30011)
      Environment configured as a replication client
      329544/2916048     Next LSN expected
      330433/2013808     LSN of first log record we have after missed log records
      329543/5500501     Maximum permanent LSN
      0     Next page number expected
      0     Not waiting for any missed pages
      0     Number of duplicate master conditions originally detected at this site
      2147M     Current environment ID (2147483647)
      100     Current environment priority
      49     Current generation number
      50     Election generation number for the current or next election
      5256M     Number of duplicate log records received (5256599432)
      3925284     Number of log records currently queued
      3925285     Maximum number of log records ever queued at once
      4880561     Total number of log records queued
      3578038     Number of log records received and appended to the log
      1912297     Number of log records missed and requested
      0     Current master ID
      1     Number of times the master has changed
      0     Number of messages received with a bad generation number
      12M     Number of messages received and processed (12980442)
      2     Number of messages ignored due to pending recovery
      0     Number of failed message sends
      1912307     Number of messages sent
      0     Number of new site messages received
      0     Number of environments believed to be in the replication group
      0     Transmission limited
      0     Number of outdated conditions detected
      0     Number of duplicate page records received
      0     Number of page records received and added to databases
      0     Number of page records missed and requested
      Startup incomplete
      110568     Number of transactions applied
      80     Number of startsync messages delayed
      0     Number of elections held
      0     Number of elections won
      No election in progress
      0     Number of bulk buffer sends triggered by full buffer
      0     Number of single records exceeding bulk buffer size
      0     Number of records added to a bulk buffer
      0     Number of bulk buffers sent
      0     Number of re-request messages received
      0     Number of request messages this client failed to process
      0     Number of request messages received by this client

      Edited by: 899585 on 2013-1-4 下午10:54

      Edited by: 899585 on 2013-1-4 下午10:55