6 Replies Latest reply: Jun 15, 2011 10:54 AM by 524722 RSS

    Election problem after repeated split-brains with two nodes

    857786
      Hi

      I'm using a customized source based on BDB-5.1.19 (excxx_repquote)

      with two site one - MASTER and the other SLAVE...

      nsite=2
      ack=quorum

      - the master is writing to quotedb at a rate of 10 txn per sec
      - the test consist to isolate the client from the master (split brain) and reconnect it after a random time include from 1sec to 10sec

      the test run well about 10 times but at a moment the process slave receive DB_EVENT_REP_ELECTION_FAILED
      and the master enter in election mode and never exit from the CLIENT mode. I must say that to freeze the client I decide to kill me (kill -9 my pid) when I receive such event...


      here is the verbose log on the master...

      [1307872770:871621][6510/47655809107168] MASTER: rep_send_function returned: 110
      [1307872770:973655][6510/47655809107168] MASTER: bulk_msg: Send buffer after copy due to PERM
      [1307872770:973667][6510/47655809107168] MASTER: send_bulk: Send 266 (0x10a) bulk buffer bytes
      [1307872770:973672][6510/47655809107168] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type bulk_log, LSN [21][986648] perm
      [1307872770:973693][6510/47655809107168] MASTER: will await acknowledgement: need 1
      [1307872771:26623][6510/47655809107168] MASTER: rep_send_function returned: 110
      [1307872771:126380][6510/1162996032] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type log, LSN [21][946345]
      [1307872771:126407][6510/1162996032] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type dupmaster, LSN [0][0] nobuf
      [1307872771:126695][6510/1162996032] MASTER: rep_start: Found old version log 17
      [1307872771:126753][6510/1162996032] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type newclient, LSN [0][0] nobuf
      [1307872771:126833][6510/1183975744] CLIENT: starting election thread
      [1307872771:126876][6510/1183975744] CLIENT: Start election nsites 2, ack 1, priority 100
      [1307872771:126890][6510/1183975744] CLIENT: Election thread owns egen 69
      [1307872771:127423][6510/1173485888] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type newclient, LSN [0][0]
      [1307872771:130079][6510/1183975744] CLIENT: Tallying VOTE1[0] (2147483647, 69)
      [1307872771:130113][6510/1183975744] CLIENT: Beginning an election
      [1307872771:130134][6510/1183975744] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type vote1, LSN [21][986728] nobuf
      [1307872771:130147][6510/1173485888] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type master_req, LSN [0][0] nobuf
      [1307872771:130438][6510/1152506176] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote1, LSN [21][946437]
      [1307872771:130460][6510/1162996032] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [21][986728]
      [1307872771:130467][6510/1152506176] CLIENT: Updating gen from 68 to 70
      [1307872771:130482][6510/1162996032] CLIENT: Received ALIVE egen of 71, mine 69
      [1307872771:130503][6510/1162996032] CLIENT: Election finished in 0.003602000 sec
      [1307872771:130515][6510/1162996032] CLIENT: Election done; egen 70
      [1307872771:130534][6510/1152506176] CLIENT: Received vote1 egen 71, egen 71
      [1307872771:130581][6510/1152506176] CLIENT: Tallying VOTE1[0] (0, 71)
      [1307872771:130593][6510/1089075520] CLIENT: starting election thread
      [1307872771:130619][6510/1152506176] CLIENT: Incoming vote: (eid)0 (pri)100 ELECTABLE (gen)70 (egen)71 [21,946437]
      [1307872771:130642][6510/1152506176] CLIENT: Not in election, but received vote1 0x282c 0x8
      [1307872771:130674][6510/1089075520] CLIENT: Start election nsites 2, ack 1, priority 100
      [1307872771:130692][6510/1089075520] CLIENT: Election thread owns egen 71
      [1307872771:130704][6510/1194465600] CLIENT: starting election thread
      [1307872771:130733][6510/1194465600] CLIENT: Start election nsites 2, ack 1, priority 100
      [1307872771:132922][6510/1089075520] CLIENT: Tallying VOTE1[1] (2147483647, 71)
      [1307872771:132949][6510/1089075520] CLIENT: Accepting new vote
      [1307872771:132958][6510/1089075520] CLIENT: Beginning an election
      [1307872771:132973][6510/1089075520] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type vote1, LSN [21][986728] nobuf
      [1307872771:132985][6510/1194465600] CLIENT: election thread is exiting
      [1307872771:133012][6510/1089075520] CLIENT: Tallying VOTE2[0] (2147483647, 71)
      [1307872771:133037][6510/1089075520] CLIENT: Counted my vote 1
      [1307872771:133048][6510/1089075520] CLIENT: Skipping phase2 wait: already got 1 votes
      [1307872771:133060][6510/1089075520] CLIENT: Got enough votes to win; election done; (prev) gen 70
      [1307872771:133071][6510/1089075520] CLIENT: Election finished in 0.002367000 sec
      [1307872771:133084][6510/1089075520] CLIENT: Election done; egen 72
      [1307872771:133111][6510/1089075520] CLIENT: Ended election with 0, e_th 1, egen 72, flag 0x2a2c, e_fl 0x0, lo_fl 0x6
      [1307872771:133170][6510/1173485888] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [0][0]
      [1307872771:133187][6510/1173485888] CLIENT: Racing replication msg lockout, ignore message.
      [1307872771:173744][6510/1162996032] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote2, LSN [0][0]
      [1307872771:173769][6510/1162996032] CLIENT: Racing replication msg lockout, ignore message.
      [1307872771:231593][6510/1183975744] CLIENT: Ended election with 0, e_th 0, egen 72, flag 0x2a2c, e_fl 0x0, lo_fl 0x1c
      [1307872771:231629][6510/1183975744] CLIENT: election thread is exiting
      [1307872777:443794][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
      [1307872971:644194][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
      [1307873165:844583][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
      [1307873360:44955][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
      [1307873554:245347][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
      [1307873748:445736][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
      [1307873942:646117][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
      [1307874136:846509][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115

      .... and infinite stay to this situation



      My question is why the Master is suddenly transformed into CLIENT and why it's never returning to the MASTER



      Thanks in advance ...



      here is the log for the client

      [1307872315:455113][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][984396]
      [1307872315:455134][1282/1160603968] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][984483] perm
      [1307872315:609962][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][984733] perm
      [1307872315:764958][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][984986] perm
      [1307872315:919962][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985238] perm
      [1307872316:75018][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985491] perm
      [1307872316:229959][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985741] perm
      [1307872316:384949][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985993] perm
      [1307872316:499899][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][986141] perm
      [1307872316:539895][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986221]
      [1307872316:540078][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986307]
      [1307872316:540100][1282/1160603968] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986394] perm
      [1307872316:694950][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][986648] perm
      [1307872316:847349][1282/1129134400] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type log, LSN [21][946345]
      [1307872316:847698][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type dupmaster, LSN [0][0]
      [1307872316:847999][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type newclient, LSN [0][0]
      [1307872316:848168][1282/1171093824] MASTER: rep_start: Found old version log 17
      [1307872316:848222][1282/1181583680] CLIENT: Racing replication msg lockout, ignore message.
      [1307872316:848398][1282/1171093824] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type newclient, LSN [0][0] nobuf
      [1307872316:848504][1282/1192073536] CLIENT: starting election thread
      [1307872316:848542][1282/1192073536] CLIENT: Start election nsites 2, ack 1, priority 100
      [1307872316:848566][1282/1192073536] CLIENT: Election thread owns egen 71
      [1307872316:849634][1282/1192073536] CLIENT: Tallying VOTE1[0] (2147483647, 71)
      [1307872316:849654][1282/1192073536] CLIENT: Beginning an election
      [1307872316:849680][1282/1192073536] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type vote1, LSN [21][946437] nobuf
      [1307872316:851403][1282/1160603968] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type vote1, LSN [21][986728]
      [1307872316:851448][1282/1160603968] CLIENT: Received vote1 egen 69, egen 71
      [1307872316:851470][1282/1160603968] CLIENT: Received old vote 69, egen 71, ignoring vote1
      [1307872316:851481][1282/1160603968] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [21][986728] nobuf
      [1307872316:851538][1282/1171093824] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type master_req, LSN [0][0]
      [1307872316:851558][1282/1171093824] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [0][0] nobuf
      [1307872316:854254][1282/1160603968] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote1, LSN [21][986728]
      [1307872316:854275][1282/1160603968] CLIENT: Received vote1 egen 71, egen 71
      [1307872316:854317][1282/1160603968] CLIENT: Tallying VOTE1[1] (0, 71)
      [1307872316:854339][1282/1160603968] CLIENT: Incoming vote: (eid)0 (pri)100 ELECTABLE (gen)70 (egen)71 [21,986728]
      [1307872316:854353][1282/1160603968] CLIENT: Existing vote: (eid)2147483647 (pri)100 (gen)70 (sites)2 [21,946437]
      [1307872316:854369][1282/1160603968] CLIENT: Accepting new vote
      [1307872316:854379][1282/1160603968] CLIENT: Phase1 election done
      [1307872316:854395][1282/1160603968] CLIENT: Voting for 0
      [1307872316:854407][1282/1160603968] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type vote2, LSN [0][0] nobuf
      [1307872317:960344][1282/1192073536] CLIENT: After phase 2: votes 0, nvotes 1, nsites 2
      [1307872317:960389][1282/1192073536] CLIENT: Election finished in 1.111809000 sec
      [1307872317:960401][1282/1192073536] CLIENT: Election done; egen 72
      [1307872317:960412][1282/1192073536] CLIENT: Ended election with -30974, e_th 0, egen 72, flag 0x282c, e_fl 0x0, lo_fl 0x0
      Kill me !!



      --- my source

      on the master I run manually :

      txn_rate 1
      loop_rate 10
      loop 1 20000

      ------
      /*-
      * See the file LICENSE for redistribution information.
      *
      * Copyright (c) 2001, 2010 Oracle and/or its affiliates. All rights reserved.
      *
      * $Id$
      */

      /*
      * In this application, we specify all communication via the command line. In
      * a real application, we would expect that information about the other sites
      * in the system would be maintained in some sort of configuration file. The
      * critical part of this interface is that we assume at startup that we can
      * find out
      *      1) what our Berkeley DB home environment is,
      *      2) what host/port we wish to listen on for connections; and
      *      3) an optional list of other sites we should attempt to connect to.
      *
      * These pieces of information are expressed by the following flags.
      * -h home (required; h stands for home directory)
      * -l host:port (required; l stands for local)
      * -C or -M (optional; start up as client or master)
      * -r host:port (optional; r stands for remote; any number of these may be
      *     specified)
      * -R host:port (optional; R stands for remote peer; only one of these may
      * be specified)
      * -a all|quorum (optional; a stands for ack policy)
      * -b (optional; b stands for bulk)
      * -n nsites (optional; number of sites in replication group; defaults to 0
      *     to try to dynamically compute nsites)
      * -p priority (optional; defaults to 100)
      * -v (optional; v stands for verbose)
      */

      #include <cstdlib>
      #include <cstring>

      #include <iostream>
      #include <string>
      #include <sstream>

      #include <sys/types.h>
      #include <signal.h>


      #include <db_cxx.h>
      #include "RepConfigInfo.h"
      #include "dbc_auto.h"

      using std::cout;
      using std::cin;
      using std::cerr;
      using std::endl;
      using std::ends;
      using std::flush;
      using std::istream;
      using std::istringstream;
      using std::ostringstream;
      using std::string;
      using std::getline;

      #include <stdio.h>
      #include <readline/readline.h>
      #include <readline/history.h>

      #define     CACHESIZE     (10 * 1024 * 1024)
      #define     DATABASE     "quote.db"
      #define     DATABASE2     "quote2.db"

      const char *progname = "excxx_repquote";

      #include <errno.h>
      #ifdef _WIN32
      #define WIN32_LEAN_AND_MEAN
      #include <windows.h>
      #define     snprintf          _snprintf
      #define     sleep(s)          Sleep(1000 * (s))

      extern "C" {
      extern int getopt(int, char * const *, const char *);
      extern char *optarg;
      }

      typedef HANDLE thread_t;
      typedef DWORD thread_exit_status_t;
      #define     thread_create(thrp, attr, func, arg)                    \
      (((*(thrp) = CreateThread(NULL, 0,                         \
           (LPTHREAD_START_ROUTINE)(func), (arg), 0, NULL)) == NULL) ? -1 : 0)
      #define     thread_join(thr, statusp)                         \
      ((WaitForSingleObject((thr), INFINITE) == WAIT_OBJECT_0) &&          \
      GetExitCodeThread((thr), (LPDWORD)(statusp)) ? 0 : -1)
      #else /* !_WIN32 */
      #include <pthread.h>

      typedef pthread_t thread_t;
      typedef void* thread_exit_status_t;
      #define     thread_create(thrp, attr, func, arg)                    \
      pthread_create((thrp), (attr), (func), (arg))
      #define     thread_join(thr, statusp) pthread_join((thr), (statusp))
      #endif

      // Struct used to store information in Db app_private field.
      typedef struct {
           bool app_finished;
           bool in_client_sync;
           bool is_master;
           bool no_dummy_wr;
      } APP_DATA;

      static void log(const char *);
      void checkpoint_thread (void );
      void log_archive_thread (void );
      void dummy_write_thread (void );

      class RepQuoteExample {
      public:
           RepQuoteExample();
           void init(RepConfigInfo* config);
           void doloop();
           int terminate();

           static void event_callback(DbEnv* dbenv, u_int32_t which, void *info);
           void print_stocks_size(Db *dbp);

      private:
           // disable copy constructor.
           RepQuoteExample(const RepQuoteExample &);
           void operator = (const RepQuoteExample &);

           // internal data members.
           APP_DATA          app_data;
           RepConfigInfo *app_config;
           DbEnv          cur_env;
           thread_t ckp_thr;
           thread_t lga_thr;
           thread_t dmy_thr;

           // private methods.
           void print_stocks(Db *dbp);
           void print_env(DbEnv *dbenv);

           void prompt();
      };

      RepQuoteExample *g_runner=NULL;
      RepConfigInfo *g_config=NULL;

      class DbHolder {
      public:
           DbHolder(DbEnv env, const char _dbname) : env(env)
           {
                dbp = 0;
                if (_dbname) dbname=_dbname;
                else dbname=DATABASE;
           }

           ~DbHolder() {
           try {
                close();
           } catch (...) {
                // Ignore: this may mean another exception is pending
           }
           }

           bool ensure_open(bool creating) {
           if (dbp)
                return (true);
           dbp = new Db(env, 0);

           u_int32_t flags = DB_AUTO_COMMIT;
           if (creating)
                flags |= DB_CREATE;
           try {
                //dbp->open(NULL, DATABASE, NULL, DB_BTREE, flags, 0);
                //dbp->open(NULL, dbname, NULL, DB_BTREE, flags, 0);
                dbp->open(NULL, NULL, dbname, DB_BTREE, flags, 0);
                return (true);
           } catch (DbDeadlockException e) {
           } catch (DbRepHandleDeadException e) {
           } catch (DbException e) {
                if (e.get_errno() == DB_REP_LOCKOUT) {
                // Just fall through.
                } else if (e.get_errno() == ENOENT && !creating) {
                // Provide a bit of extra explanation.
                //
                log("Stock DB does not yet exist");
                } else
                throw;
           }

           // (All retryable errors fall through to here.)
           //
           log("please retry the operation");
           close();
           return (false);
           }

           void close() {
           if (dbp) {
                try {
                dbp->close(0);
                delete dbp;
                dbp = 0;
                } catch (...) {
                delete dbp;
                dbp = 0;
                throw;
                }
           }
           }

           operator Db *() {
           return dbp;
           }

           Db *operator->() {
           return dbp;
           }

      private:
           Db *dbp;
           DbEnv *env;
           const char *dbname;
      };

      class StringDbt : public Dbt {
      public:
      #define GET_STRING_OK 0
      #define GET_STRING_INVALID_PARAM 1
      #define GET_STRING_SMALL_BUFFER 2
      #define GET_STRING_EMPTY_DATA 3
           int get_string(char **buf, size_t buf_len)
           {
                size_t copy_len;
                int ret = GET_STRING_OK;
                if (buf == NULL) {
                     cerr << "Invalid input buffer to get_string" << endl;
                     return GET_STRING_INVALID_PARAM;
                }

                // make sure the string is null terminated.
                memset(*buf, 0, buf_len);

                // if there is no string, just return.
                if (get_data() == NULL || get_size() == 0)
                     return GET_STRING_OK;

                if (get_size() >= buf_len) {
                     ret = GET_STRING_SMALL_BUFFER;
                     copy_len = buf_len - 1; // save room for a terminator.
                } else
                     copy_len = get_size();
                memcpy(*buf, get_data(), copy_len);

                return ret;
           }
           size_t get_string_length()
           {
                if (get_size() == 0)
                     return 0;
                return strlen((char *)get_data());
           }
           void set_string(char *string)
           {
                set_data(string);
                set_size((u_int32_t)strlen(string));
           }

           StringDbt(char *string) :
           Dbt(string, (u_int32_t)strlen(string)) {};
           StringDbt() : Dbt() {};
           ~StringDbt() {};

           // Don't add extra data to this sub-class since we want it to remain
           // compatible with Dbt objects created internally by Berkeley DB.
      };

      Db *g_repquote=NULL;

      RepQuoteExample::RepQuoteExample() : app_config(0), cur_env(0) {
           app_data.app_finished = 0;
           app_data.in_client_sync = 0;
           app_data.is_master = 0; // assume I start out as client
           app_data.no_dummy_wr = 0 ; //prevent to run dummy write
      }

      /*
      int (*old_rep_process_message)
                __P((DB_ENV *, DBT *, DBT *, int, DB_LSN *));

      int my_rep_process_message __P((DB_ENV arg1, DBT arg2, DBT arg3, int arg4, DB_LSN arg5))
      {
           printf("EZ->>> my_rep_process_message:%p\n",arg5);
           old_rep_process_message(arg1,arg2,arg3,arg4,arg5);
      }
      */

      void RepQuoteExample::init(RepConfigInfo *config) {
           app_config = config;

           cur_env.set_app_private(&app_data);
           cur_env.set_errfile(stderr);
           app_data.no_dummy_wr=config->no_dummy_wr;
           if (app_data.no_dummy_wr)
                printf("No dummy !!!\n");

           //EZ->cur_env.set_errpfx(progname);
           cur_env.set_event_notify(event_callback);

           // Configure bulk transfer to send groups of records to clients
           // in a single network transfer. This is useful for master sites
           // and clients participating in client-to-client synchronization.
           //
           if (app_config->bulk)
                cur_env.rep_set_config(DB_REP_CONF_BULK, 1);

           // Set the total number of sites in the replication group.
           // This is used by repmgr internal election processing.
           //
           if (app_config->totalsites > 0)
                cur_env.rep_set_nsites(app_config->totalsites);

           // Turn on debugging and informational output if requested.
           if (app_config->verbose)
                cur_env.set_verbose(DB_VERB_REPLICATION, 1);

           cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
           cur_env.set_verbose(DB_VERB_RECOVERY, 1);
           cur_env.set_verbose(DB_VERB_REPLICATION, 1);
           cur_env.set_verbose(DB_VERB_REP_ELECT, 1);
           cur_env.set_verbose(DB_VERB_REP_LEASE, 1);
           cur_env.set_verbose(DB_VERB_REP_SYNC, 1);
           cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);

           // Set replication group election priority for this environment.
           // An election first selects the site with the most recent log
           // records as the new master. If multiple sites have the most
           // recent log records, the site with the highest priority value
           // is selected as master.
           //
           cur_env.rep_set_priority(app_config->priority);

           // Set the policy that determines how master and client sites
           // handle acknowledgement of replication messages needed for
           // permanent records. The default policy of "quorum" requires only
           // a quorum of electable peers sufficient to ensure a permanent
           // record remains durable if an election is held. The "all" option
           // requires all clients to acknowledge a permanent replication
           // message instead.
           //
           cur_env.repmgr_set_ack_policy(app_config->ack_policy);

           // Set the threshold for the minimum and maximum time the client
           // waits before requesting retransmission of a missing message.
           // Base these values on the performance and load characteristics
           // of the master and client host platforms as well as the round
           // trip message time.
           //
           cur_env.rep_set_request(20000, 500000);

           // Configure deadlock detection to ensure that any deadlocks
           // are broken by having one of the conflicting lock requests
           // rejected. DB_LOCK_DEFAULT uses the lock policy specified
           // at environment creation time or DB_LOCK_RANDOM if none was
           // specified.
           //
           cur_env.set_lk_detect(DB_LOCK_DEFAULT);

           // The following base replication features may also be useful to your
           // application. See Berkeley DB documentation for more details.
           // - Master leases: Provide stricter consistency for data reads
           // on a master site.
           // - Timeouts: Customize the amount of time Berkeley DB waits
           // for such things as an election to be concluded or a master
           // lease to be granted.
           // - Delayed client synchronization: Manage the master site's
           // resources by spreading out resource-intensive client
           // synchronizations.
           // - Blocked client operations: Return immediately with an error
           // instead of waiting indefinitely if a client operation is
           // blocked by an ongoing client synchronization.

           cur_env.repmgr_set_local_site(app_config->this_host.host,
           app_config->this_host.port, 0);

           for ( REP_HOST_INFO *cur = app_config->other_hosts; cur != NULL;
                cur = cur->next) {
                cur_env.repmgr_add_remote_site(cur->host, cur->port,
                NULL, cur->peer ? DB_REPMGR_PEER : 0);
           }

           // Configure heartbeat timeouts so that repmgr monitors the
           // health of the TCP connection. Master sites broadcast a heartbeat
           // at the frequency specified by the DB_REP_HEARTBEAT_SEND timeout.
           // Client sites wait for message activity the length of the
           // DB_REP_HEARTBEAT_MONITOR timeout before concluding that the
           // connection to the master is lost. The DB_REP_HEARTBEAT_MONITOR
           // timeout should be longer than the DB_REP_HEARTBEAT_SEND timeout.
           //
           cur_env.rep_set_timeout(DB_REP_HEARTBEAT_SEND, 5000000);
           cur_env.rep_set_timeout(DB_REP_HEARTBEAT_MONITOR, 10000000);

           // The following repmgr features may also be useful to your
           // application. See Berkeley DB documentation for more details.
           // - Two-site strict majority rule - In a two-site replication
           // group, require both sites to be available to elect a new
           // master.
           // - Timeouts - Customize the amount of time repmgr waits
           // for such things as waiting for acknowledgements or attempting
           // to reconnect to other sites.
           // - Site list - return a list of sites currently known to repmgr.

           // We can now open our environment, although we're not ready to
           // begin replicating. However, we want to have a dbenv around
           // so that we can send it into any of our message handlers.
           //
           cur_env.set_cachesize(0, CACHESIZE, 0);
           cur_env.set_flags(DB_REP_PERMANENT, 1);
           //cur_env.set_flags(DB_TXN_WRITE_NOSYNC, 1);

      /*     u_int32_t maxlocks=300000;
           if (maxlocks != 0)
                cur_env.set_lk_max_locks(maxlocks);

           u_int32_t maxlocks_o=300000;
           if (maxlocks_o != 0)
                cur_env.set_lk_max_objects(maxlocks_o);
           
           u_int32_t maxmutex=300000;
           if (maxmutex != 0)
                cur_env.mutex_set_max(maxmutex);
      */

           DbEnv          *m_env=&cur_env;
           m_env->set_flags(DB_TXN_NOSYNC, 1);
           m_env->set_lk_max_lockers(60000);
           m_env->set_lk_max_objects(60000);
           m_env->set_lk_max_locks(60000);
           m_env->set_tx_max(60000);
           
           //m_env->repmgr_set_ack_policy(DB_REPMGR_ACKS_NONE);
           
           m_env->rep_set_timeout(DB_REP_ACK_TIMEOUT, 50 * 1000); //50ms
           m_env->rep_set_timeout(DB_REP_CHECKPOINT_DELAY, 0);
           //m_env->rep_set_timeout(DB_REP_CONNECTION_RETRY, 30 * 1000 * 1000); // 30 seconds
           m_env->rep_set_timeout(DB_REP_ELECTION_TIMEOUT, 1 * 1000 * 1000); // 5 seconds
           m_env->rep_set_timeout(DB_REP_FULL_ELECTION_TIMEOUT, 5 * 1000 * 1000); // 5 seconds
           m_env->rep_set_timeout(DB_REP_CONNECTION_RETRY, 5 * 1000 * 1000);
           //m_env->rep_set_timeout(DB_REP_ELECTION_RETRY, 10 * 1000 * 1000); //10 seconds
           
           //m_env->rep_set_timeout(DB_REP_HEARTBEAT_MONITOR, 80 * 1000 * 1000); //80 seconds
           //m_env->rep_set_timeout(DB_REP_HEARTBEAT_SEND, 500 * 1000); //500 milli seconds

           //The minimum number of microseconds a client waits before requesting retransmission
           u_int32_t rep_req_min = 40000; //40 000 microsec = 40 mili
           //The maximum number of microseconds a client waits before requesting retransmission
           u_int32_t rep_req_max = 1280000;// 1 280 000 microsec = 1.28 sec
           
           u_int32_t rep_limit_gbytes = 0;
           u_int32_t rep_limit_bytes = 100 * 1024 * 1024; // 100MB
           m_env->rep_set_request(rep_req_min, rep_req_max);
           m_env->rep_set_limit(rep_limit_gbytes, rep_limit_bytes);


           cur_env.open(app_config->home, DB_CREATE | DB_RECOVER |
           DB_THREAD | DB_INIT_REP | DB_INIT_LOCK | DB_INIT_LOG |
           DB_INIT_MPOOL | DB_INIT_TXN , 0);

           //keep old function for chain
           //old_rep_process_message=cur_env.get_DB_ENV()->rep_process_message;
           //derouting
           //cur_env.get_DB_ENV()->rep_process_message=my_rep_process_message;

           /*int _i;
           cur_env.log_get_config(DB_LOG_DIRECT, &_i);printf ("DB_LOG_DIRECT = %d\n",_i);
           cur_env.log_get_config(DB_LOG_DSYNC, &_i);printf ("DB_LOG_DSYNC = %d\n",_i);
           cur_env.log_get_config(DB_LOG_AUTO_REMOVE, &_i);printf ("DB_LOG_AUTO_REMOVE = %d\n",_i);
           cur_env.log_get_config(DB_LOG_IN_MEMORY, &_i);printf ("DB_LOG_IN_MEMORY = %d\n",_i);
           cur_env.log_get_config(DB_LOG_ZERO,&_i);printf ("DB_LOG_ZERO = %d\n",_i);
           */

           // Start checkpoint and log archive support threads.
           (void)thread_create(&ckp_thr, NULL, checkpoint_thread, &cur_env);
           (void)thread_create(&lga_thr, NULL, log_archive_thread, &cur_env);
           (void)thread_create(&dmy_thr, NULL, dummy_write_thread, &cur_env);

           cur_env.repmgr_start(3, app_config->start_policy);
      }
        • 1. Election problem after repeated split-brains with two nodes <following>
          857786
          int RepQuoteExample::terminate() {
               try {
                    // Wait for checkpoint and log archive threads to finish.
                    // Windows does not allow NULL pointer for exit code variable.
                    thread_exit_status_t exstat;

                    (void)thread_join(lga_thr, &exstat);
                    (void)thread_join(ckp_thr, &exstat);
                    (void)thread_join(dmy_thr, &exstat);

                    // We have used the DB_TXN_NOSYNC environment flag for
                    // improved performance without the usual sacrifice of
                    // transactional durability, as discussed in the
                    // "Transactional guarantees" page of the Reference
                    // Guide: if one replication site crashes, we can
                    // expect the data to exist at another site. However,
                    // in case we shut down all sites gracefully, we push
                    // out the end of the log here so that the most
                    // recent transactions don't mysteriously disappear.
                    //
                    cur_env.log_flush(NULL);

                    cur_env.close(0);
               } catch (DbException dbe) {
                    cout << "error closing environment: " << dbe.what() << endl;
               }
               return 0;
          }

          void RepQuoteExample::prompt() {
               cout << "QUOTESERVER";
               if (!app_data.is_master)
                    cout << "(read-only)";
               cout << "> " << flush;
          }

          void log(const char *msg) {
          time_t currentTime;
          // get and print the current time
          time (&currentTime); // fill now with the current time
               char buff[255];
               strncpy(buff,ctime(&currentTime),sizeof(buff));
               char *p;
               for(p =buff ; *p != '\n'; p++);
               *p = '\0';

               cerr << buff << " - " << msg << endl;
          }

          // Simple command-line user interface:
          // - enter "<stock symbol> <price>" to insert or update a record in the
          //     database;
          // - just press Return (i.e., blank input line) to print out the contents of
          //     the database;
          // - enter "quit" or "exit" to quit.
          //
          void RepQuoteExample::doloop() {
               DbHolder dbh1(&cur_env,DATABASE);
               DbHolder dbh2(&cur_env,DATABASE2);
               DbHolder *dbh=&dbh1;
               DbTxn *txn;
               string input;
          bool truncate = false;
               char *c;
               using_history();
               g_repquote=*dbh;
               int loop_rate = 0;
               int txn_rate = 500;
               while (prompt(), /*getline(cin, input)*/c=readline(NULL)) {
                    input=std::string(c);
                    add_history(c);
                    free(c);
                    int start_loop = 0;
                    int end_loop = 0;
                    int start_loop_d = 0;
                    int end_loop_d = 0;
                    istringstream is(input);
                    string token1, token2, token3;
          truncate = false;
          start_loop = 0;
          end_loop = 0;

                    // Read 0, 1 or 2 tokens from the input.
                    //
                    int count = 0;
                    if (is >> token1) {
                         count++;
                         if (is >> token2)
                         count++;
                         if (is >> token3)
                         count++;
                    }

                    if (count == 1) {
               if (token1 == "truncate" ) {
                              truncate = true;     
                         }
                         else if (token1 == "env" ){
                              print_env(&cur_env);
                              continue;
                         }
               else if (token1 == "verbose" ) {
                              app_config->verbose = !app_config->verbose;
                              if (app_config->verbose)
                              {
                                   cur_env.set_verbose(DB_VERB_REPLICATION, 1);
                                   cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
                                   cur_env.set_verbose(DB_VERB_RECOVERY, 1);
                                   cur_env.set_verbose(DB_VERB_REP_ELECT, 1);
                                   cur_env.set_verbose(DB_VERB_REP_LEASE, 1);
                                   cur_env.set_verbose(DB_VERB_REP_SYNC, 1);
                                   cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
                                   log("verbose is on");
                              }
                              else
                              {
                                   cur_env.set_verbose(DB_VERB_REPLICATION, 0);
                                   cur_env.set_verbose(DB_VERB_REPMGR_MISC, 0);
                                   cur_env.set_verbose(DB_VERB_RECOVERY, 0);
                                   cur_env.set_verbose(DB_VERB_REP_ELECT, 0);
                                   cur_env.set_verbose(DB_VERB_REP_LEASE, 0);
                                   cur_env.set_verbose(DB_VERB_REP_SYNC, 0);
                                   cur_env.set_verbose(DB_VERB_REPMGR_MISC, 0);
                                   log("verbose is off");
                              }
                              continue;
                         }
               else if (token1 == "print" ) {
                         print_stocks(*dbh);
                              count = 0;      
                         }
               else if (token1 == "db1" ) {
                              dbh=&dbh1;
                              g_repquote=*dbh;
                              log( "switch to Db1");
                              count = 0;      
                         }
               else if (token1 == "db2" ) {
                              dbh=&dbh2;
                              g_repquote=*dbh;
                              log( "switch to Db2");
                              count = 0;      
                         }
                         else if (token1 == "exit" || token1 == "quit") {
                              app_data.app_finished = 1;
                              break;
                         } else {
                              log("Format: <stock> <price>");
                              continue;
                         }
                    }
          else if (count == 2)
                    {
                         if (token1 == "loop_rate" ){
               loop_rate = atoi(token2.c_str());
                              continue;
                         }
                         if (token1 == "txn_rate" ){
               txn_rate = atoi(token2.c_str());
                              continue;
                         }
          }
          else if (count == 3)
                    {
          if (token1 == "loop" ) {
          start_loop = atoi(token2.c_str());
          end_loop = start_loop + atoi(token3.c_str());
          }
          if (token1 == "delete" ) {
          start_loop_d = atoi(token2.c_str());
          end_loop_d = start_loop_d + atoi(token3.c_str());
          }
                    }
                    // Here we know count is either 0 or 2, so we're about to try a
                    // DB operation.
                    //
                    // Open database with DB_CREATE only if this is a master
                    // database. A client database uses polling to attempt
                    // to open the database without DB_CREATE until it is
                    // successful.
                    //
                    // This DB_CREATE polling logic can be simplified under
                    // some circumstances. For example, if the application can
                    // be sure a database is already there, it would never need
                    // to open it with DB_CREATE.
                    //
                    if (!dbh->ensure_open(app_data.is_master))
                         continue;

                    try {
                         if (count == 0)
                              if (app_data.in_client_sync)
                                   log( "Cannot read data during client initialization - please try again.");
                              else
                                   print_stocks_size(*dbh);
                         else if (!app_data.is_master)
                              log("Can't update at client");
                         else {
                              if (truncate)
                              {
          u_int32_t no_remove;
                              txn = NULL;
          cur_env.txn_begin(NULL, &txn, DB_TXN_NOWAIT);
                                   try
          {
                    (*dbh)->truncate(txn, &no_remove, 0);
          // commit
          txn->commit(0);
          txn = NULL;
          } catch (DbException &e) {
          std::cout << "Error on txn commit: " << e.what() << std::endl;
                              //     } catch (DbDeadlockException &) {
                              if (txn != NULL)
                                   (void)txn->abort();
          // std::cout << "Error on txn commit: " << std::endl;
                                   }
                                   

                              }
          else if (start_loop)
                              {

          int j=0;
          for (int i=start_loop; i<=end_loop; i=i+txn_rate)
                              {
          //transaction begin
                         txn = NULL;
                         cur_env.txn_begin(NULL, &txn, 0);

          for (j=i; j<=end_loop && j<=(i+txn_rate); j++)
          {
                                        Dbt key, value;
               std::string key1, value1;
               std::stringstream sstrm;

               sstrm << "key" << j << ends;
               key1 = sstrm.str();
                         key.set_data((void *)key1.c_str());
                         key.set_size((u_int32_t)strlen(key1.c_str()));

               sstrm.str("");
               int payload = rand() + j;
                                        sstrm << "price" << payload << ends;
               value1 = sstrm.str();
                         value.set_data((void *)value1.c_str());
                         value.set_size((u_int32_t)strlen(value1.c_str()));

               // Perform the database put
               (*dbh)->put(txn, &key, &value, 0);
                                   }
                                   /*
                                   printf("Kill me !!\n");
                                   kill(getpid(),-9);
                                   exit(0);
                                   */
               try
                                   {
                                        // commit
                              txn->commit(0);
                              txn = NULL;
                         } catch (DbException &e) {
                              std::cout << "Error on txn commit: " << e.what() << std::endl;
                         }
                                   if (loop_rate>0)
                                        usleep(txn_rate * 1000 * 1000 / loop_rate);

                              }
                              }
                              else if (start_loop_d)
                              {
          int j=0;
          for (int i=start_loop_d; i<=end_loop_d; i=i+100)
                              {
          //transaction begin
                         txn = NULL;
                         cur_env.txn_begin(NULL, &txn, 0);

          for (j=i; j<=end_loop_d && j<=(i+100); j++)
          {
                                        Dbt key, value;
               std::string key1, value1;
               std::stringstream sstrm;

               sstrm << "key" << j << ends;
               key1 = sstrm.str();
                         key.set_data((void *)key1.c_str());
                         key.set_size((u_int32_t)strlen(key1.c_str()));


               // Perform the database put
               (*dbh)->del(txn, &key, 0);
                                   }
               try
                                   {
                                        // commit
                              txn->commit(0);
                              txn = NULL;
                         } catch (DbException &e) {
                              std::cout << "Error on txn commit: " << e.what() << std::endl;
                         }
                              }
                              }
                              else
                              {
                                   const char *symbol = token1.c_str();
                                   StringDbt key(const_cast<char*>(symbol));

                                   const char *price = token2.c_str();
                                   StringDbt data(const_cast<char*>(price));

                                   (*dbh)->put(NULL, &key, &data, 0);
                              }
                         }
                    } catch (DbDeadlockException e) {
                         log("please retry the operation");
                         dbh->close();
                    } catch (DbRepHandleDeadException e) {
                         log("please retry the operation");
                         dbh->close();
                    } catch (DbException e) {
                         if (e.get_errno() == DB_REP_LOCKOUT) {
                         log("please retry the operation");
                         dbh->close();
                         } else
                         throw;
                    }
               }

               dbh->close();
          }

          void RepQuoteExample::event_callback(DbEnv* dbenv, u_int32_t which, void *info)
          {
               static char buf[256];
               APP_DATA app = (APP_DATA)dbenv->get_app_private();

               info = NULL;          /* Currently unused. */

               switch (which) {
               case DB_EVENT_REP_CLIENT:
                    app->is_master = 0;
                    app->in_client_sync = 1;
                    sprintf(buf,"%s - %s",progname,"CLIENT");
                    //EZ->dbenv->set_errpfx(buf);
                    log("DB_EVENT_REP_CLIENT.");
                    break;
               case DB_EVENT_REP_MASTER:
                    app->is_master = 1;
                    app->in_client_sync = 0;
                    sprintf(buf,"%s - %s",progname,"MASTER");
                    //EZ->dbenv->set_errpfx(buf);
                    log("DB_EVENT_REP_MASTER.");
                    break;
               case DB_EVENT_REP_NEWMASTER:
                    log("DB_EVENT_REP_NEWMASTER.");
                    app->in_client_sync = 1;
                    break;
               case DB_EVENT_REP_PERM_FAILED:
                    // Did not get enough acks to guarantee transaction
                    // durability based on the configured ack policy. This
                    // transaction will be flushed to the master site's
                    // local disk storage for durability.
                    //
                    log("DB_EVENT_REP_PERM_FAILED.");
                    log("Insufficient acknowledgements to guarantee transaction durability.");
                    break;
               case DB_EVENT_REP_STARTUPDONE:
                    app->in_client_sync = 0;
                    log("DB_EVENT_REP_STARTUPDONE.");
                    break;
               case DB_EVENT_REP_ELECTION_FAILED:
                    log("DB_EVENT_REP_ELECTION_FAILED.");
                    //g_runner->init(g_config);


                    printf("Kill me !!\n");
                    kill(getpid(),-9);
                    exit(0);
                    break;
               case DB_EVENT_REP_DUPMASTER:
                    log("DB_EVENT_REP_DUPMASTER.");
                    break;
               default:
                    dbenv->errx("ignoring event %d", which);
               }
          }


          void RepQuoteExample::print_stocks_size(Db *dbp) {

               DB_BTREE_STAT *statp;
          dbp->stat(NULL, &statp, 0);
               log("db_stat");
          cout << "***************************************** >>>>>>>>>>> : database contains " << (u_long)statp->bt_ndata << " records\n";
          }

          void RepQuoteExample::print_env(DbEnv *dbenv) {

               dbenv->stat_print(DB_STAT_ALL);
          }

          void RepQuoteExample::print_stocks(Db *dbp) {
               StringDbt key, data;
          #define     MAXKEYSIZE     10
          #define     MAXDATASIZE     20
               char keybuf[MAXKEYSIZE + 1], databuf[MAXDATASIZE + 1];
               char kbuf, dbuf;

               memset(&key, 0, sizeof(key));
               memset(&data, 0, sizeof(data));
               kbuf = keybuf;
               dbuf = databuf;

               DbcAuto dbc(dbp, 0, 0);
               cout << "\tSymbol\tPrice" << endl
                    << "\t======\t=====" << endl;
          int no_records =0;
               for (int ret = dbc->get(&key, &data, DB_FIRST);
                    ret == 0;
                    ret = dbc->get(&key, &data, DB_NEXT)) {
                    key.get_string(&kbuf, MAXKEYSIZE);
                    data.get_string(&dbuf, MAXDATASIZE);
          no_records++;
                    cout << "\t" << keybuf << "\t" << databuf << endl;
               }
          cout << "********************** NO Records " << no_records << endl;
               cout << endl << flush;
               dbc.close();
          }

          static void usage() {
               cerr << "usage: " << progname << " -h home -l host:port [-CM]"
               << "[-r host:port][-R host:port]" << endl
               << " [-a all|quorum][-b][-n nsites][-p priority][-v]" << endl;

               cerr << "\t -h home (required; h stands for home directory)" << endl
               << "\t -l host:port (required; l stands for local)" << endl
               << "\t -C or -M (optional; start up as client or master)" << endl
               << "\t -r host:port (optional; r stands for remote; any "
               << "number of these" << endl
               << "\t may be specified)" << endl
               << "\t -R host:port (optional; R stands for remote peer; only "
               << "one of" << endl
               << "\t these may be specified)" << endl
               << "\t -a all|quorum (optional; a stands for ack policy)" << endl
               << "\t -b (optional; b stands for bulk)" << endl
               << "\t -n nsites (optional; number of sites in replication "
               << "group; defaults " << endl
               << "\t     to 0 to try to dynamically compute nsites)" << endl
               << "\t -p priority (optional; defaults to 100)" << endl
               << "\t -v (optional; v stands for verbose)" << endl;

               exit(EXIT_FAILURE);
          }

          int main(int argc, char **argv) {
               RepConfigInfo config;
               char ch, portstr, tmphost;
               int tmpport;
               bool tmppeer;

               config.no_dummy_wr = false;

               // Extract the command line parameters
               while ((ch = getopt(argc, argv, "E:a:bCh:l:Mn:p:R:r:vw")) != EOF) {
                    tmppeer = false;
                    switch (ch) {
                    case 'a':
                         if (strncmp(optarg, "all", 3) == 0)
                              config.ack_policy = DB_REPMGR_ACKS_ALL;
                         else if (strncmp(optarg, "quorum", 6) != 0)
                              usage();
                         break;
                    case 'b':
                         config.bulk = true;
                         break;
                    case 'C':
                         config.start_policy = DB_REP_CLIENT;
                         break;
                    case 'E':
          config.start_policy = DB_REP_ELECTION;
          break;
                    case 'h':
                         config.home = optarg;
                         break;
                    case 'l':
                         config.this_host.host = strtok(optarg, ":");
                         if ((portstr = strtok(NULL, ":")) == NULL) {
                              cerr << "Bad host specification." << endl;
                              usage();
                         }
                         config.this_host.port = (unsigned short)atoi(portstr);
                         config.got_listen_address = true;
                         break;
                    case 'M':
                         config.start_policy = DB_REP_MASTER;
                         break;
                    case 'n':
                         config.totalsites = atoi(optarg);
                         break;
                    case 'p':
                         config.priority = atoi(optarg);
                         break;
                    case 'R':
                         tmppeer = true; // FALLTHROUGH
                    case 'r':
                         tmphost = strtok(optarg, ":");
                         if ((portstr = strtok(NULL, ":")) == NULL) {
                              cerr << "Bad host specification." << endl;
                              usage();
                         }
                         tmpport = (unsigned short)atoi(portstr);

                         config.addOtherHost(tmphost, tmpport, tmppeer);

                         break;
                    case 'v':
                         config.verbose = true;
                         break;
                    case 'w':
                         config.no_dummy_wr = true;
                         //config.priority = 2;
                         break;
                    case '?':
                    default:
                         usage();
                    }
               }

               // Error check command line.
               if ((!config.got_listen_address) || config.home == NULL)
                    usage();

               RepQuoteExample runner;
               g_runner=&runner;
               g_config=&config;


               try {
                    runner.init(&config);
                    runner.doloop();
               } catch (DbException dbe) {
                    cerr << "Caught an exception during initialization or"
                         << " processing: " << dbe.what() << endl;
               }
               runner.terminate();
               return 0;
          }

          // This is a very simple thread that performs checkpoints at a fixed
          // time interval. For a master site, the time interval is one minute
          // plus the duration of the checkpoint_delay timeout (30 seconds by
          // default.) For a client site, the time interval is one minute.
          //
          void checkpoint_thread(void args)
          {
               DbEnv *env;
               APP_DATA *app;
               int i, ret;

               env = (DbEnv *)args;
               app = (APP_DATA *)env->get_app_private();

               for (;;) {
                    // Wait for one minute, polling once per second to see if
                    // application has finished. When application has finished,
                    // terminate this thread.
                    //
                    for (i = 0; i < 60; i++) {
                         sleep(1);
                         if (app->app_finished == 1)
                              return ((void *)EXIT_SUCCESS);
                    }

                    // Perform a checkpoint.

                    // original line
                    if ((ret = env->txn_checkpoint(0, 0, 0)) != 0) {
                    //if ((ret = env->txn_checkpoint(0, 0, DB_FORCE)) != 0) {
                         env->err(ret, "Could not perform checkpoint.\n");
                         return ((void *)EXIT_FAILURE);
                    }
               }
          }

          // This is a simple log archive thread. Once per minute, it removes all but
          // the most recent 3 logs that are safe to remove according to a call to
          // DBENV->log_archive().
          //
          // Log cleanup is needed to conserve disk space, but aggressive log cleanup
          // can cause more frequent client initializations if a client lags too far
          // behind the current master. This can happen in the event of a slow client,
          // a network partition, or a new master that has not kept as many logs as the
          // previous master.
          //
          // The approach in this routine balances the need to mitigate against a
          // lagging client by keeping a few more of the most recent unneeded logs
          // with the need to conserve disk space by regularly cleaning up log files.
          // Use of automatic log removal (DBENV->log_set_config() DB_LOG_AUTO_REMOVE
          // flag) is not recommended for replication due to the risk of frequent
          // client initializations.
          //
          void log_archive_thread(void args)
          {
               DbEnv *env;
               APP_DATA *app;
               char **begin, **list;
               int i, listlen, logs_to_keep, minlog, ret;

               env = (DbEnv *)args;
               app = (APP_DATA *)env->get_app_private();
               logs_to_keep = 3;

               for (;;) {
                    // Wait for one minute, polling once per second to see if
                    // application has finished. When application has finished,
                    // terminate this thread.
                    //
                    for (i = 0; i < 60; i++) {
                         sleep(1);
                         if (app->app_finished == 1)
                              return ((void *)EXIT_SUCCESS);
                    }

                    // Get the list of unneeded log files.
                    if ((ret = env->log_archive(&list, DB_ARCH_ABS)) != 0) {
                         env->err(ret, "Could not get log archive list.");
                         return ((void *)EXIT_FAILURE);
                    }
                    if (list != NULL) {
                         listlen = 0;
                         // Get the number of logs in the list.
                         for (begin = list; *begin != NULL; begin++, listlen++);
                         // Remove all but the logs_to_keep most recent
                         // unneeded log files.
                         //
                         minlog = listlen - logs_to_keep;
                         for (begin = list, i= 0; i < minlog; list++, i++) {
                              if ((ret = unlink(*list)) != 0) {
                                   env->err(ret,
                                   "logclean: remove %s", *list);
                                   env->errx(
                                   "logclean: Error remove %s", *list);
                                   free(begin);
                                   return ((void *)EXIT_FAILURE);
                              }
                         }
                         free(begin);
                    }
               }
          }

          #define DATABASE_DUMMY "dummy.db"
          void create_dummy_db(DB_ENV env, DB *dbp)
          {
          DB_ENV *dbenv=env;
          int ret;
          u_int32_t db_flags;

          if ((ret = db_create(dbp, dbenv, 0)) != 0)
          {
          dbenv->err(dbenv, ret, "create_dummy_db: db_create");
          }

          db_flags = DB_AUTO_COMMIT | DB_CREATE;
          //if ((ret = (*dbp)->open(*dbp,NULL, DATABASE, NULL, DB_BTREE, db_flags, 0)) != 0)
          if ((ret = (*dbp)->open(*dbp,NULL, NULL, DATABASE_DUMMY, DB_BTREE, db_flags, 0)) != 0)
          {
          dbenv->err(dbenv, ret, "create_dummy_db: DB->open");
          }
          }

          void reopen_dummy_db(DB_ENV env, DB *dbp)
          {
          DB_ENV *dbenv=env;
          int ret;
          u_int32_t db_flags;

          if ((ret = db_create(dbp, dbenv, 0)) != 0)
          {
          dbenv->err(dbenv, ret, "create_dummy_db: db_create");
          }

          db_flags = DB_AUTO_COMMIT | DB_CREATE;
          //if ((ret = (*dbp)->open(*dbp,NULL, DATABASE, NULL, DB_BTREE, db_flags, 0)) != 0)
          if ((ret = (*dbp)->open(*dbp,NULL, NULL, DATABASE_DUMMY, DB_BTREE, db_flags, 0)) != 0)
          {
          dbenv->err(dbenv, ret, "reopen_dummy_db: DB->open");
          }
          }

          void perform_db_operation(DB_ENV env, DB *dbp, bool bRead)
          {
          //main loop
          //DB *dbp=NULL;
          DB_ENV *dbenv=env;
          int ret;
          u_int32_t db_flags;
          DBT key, data;
          char buf[20]="dummy", *rbuf;
          rbuf=buf;

          if (*dbp == NULL)
          {
          create_dummy_db(dbenv, dbp);
          }
          if (! bRead)
          {
               memset(&key, 0, sizeof(key));
               memset(&data, 0, sizeof(data));
               key.data = buf;
               key.size = (u_int32_t)strlen(buf);
               
               data.data = rbuf;
               data.size = (u_int32_t)strlen(rbuf);
               
               if ((ret = (*dbp)->put(*dbp, NULL, &key, &data, 0)) != 0)
               {
                    if (ret == DB_REP_HANDLE_DEAD)
                    {
                         //create_dummy_db(dbenv, dbp);
                         reopen_dummy_db(dbenv, dbp);
                         (*dbp)->err(*dbp, ret, "DB->put :");
                    }
                    else
                    {
                    if (ret != DB_KEYEXIST)
                         (*dbp)->err(*dbp, ret, "perform_db_operation: DB->put");
                    }
               }

               }
               else
               {
                    DB_BTREE_STAT *statp;
                    (*dbp)->stat(*dbp,NULL, &statp, 0);
                    std::cout<<"dbp read stats: key#"<< statp->bt_nkeys <<std::endl;
               }

          }

          void dummy_write_thread(void args)
          {
               DbEnv *env;
               APP_DATA *app;
               char **begin, **list;
               int i, listlen, logs_to_keep, minlog, ret;
               DB *m_dbp; // a pointer

               env = (DbEnv *)args;
               app = (APP_DATA *)env->get_app_private();
               logs_to_keep = 3;

               for (;;) {
                    if (! app->no_dummy_wr)
                    {
                         if (app->is_master)
                         {     
                         perform_db_operation(env->get_DB_ENV(),&m_dbp,false);
                              //env->txn_checkpoint(0, 0, DB_FORCE);
                         }
                    usleep(1 * 1000 * 1000);
                    }
                    else
                    {
                         if (app->is_master)
                         {     
                              //DB *db_quote=g_repquote->get_DB();
                              //perform_db_operation(env->get_DB_ENV(),&db_quote,true);
                              //if (g_repquote)
                              //     g_runner->print_stocks_size(g_repquote);
                              //env->txn_checkpoint(0, 0, DB_FORCE);
                              //perform_db_operation(env->get_DB_ENV(),&m_dbp,false);
                              env->rep_flush();
                         }
                    usleep(4 * 1000 * 1000);
                    }
               }
          }

          --------
          my script to simulate the split brain
          ---
          #!/bin/sh
          [ -z "$node1" ] && node1=10.10.32.121
          [ -z "$node2" ] && node2=10.10.32.91


          trap myend 0 1 2 3 6 9 14 15

          myend()
          {
               echo "Receive signal to stop test..."
               un_split_brain
               echo "done"
               exit 1
          }

          split_brain()
          {
               echo -n "Split-Brain at node $node..."
               snmpset -m ALL -v 2c -c svil 10.10.0.100 ifAdminStatus.41 i 2 >/dev/null 2>&1
               echo "done"
          }

          un_split_brain()
          {
               echo -n "Undo Split-Brain at node $node..."
               snmpset -m ALL -v 2c -c svil 10.10.0.100 ifAdminStatus.41 i 1 >/dev/null 2>&1
               echo "done"
          }

          is_slave()
          {
               local r=$(ssh root@$1 "tail -2 /tmp/BDB.log" | grep -c CLIENT)
               [ $r -gt 1 ] && ret=1 || ret=0
               return $ret
          }

          is_master()
          {
               local r=$(ssh root@$1 "tail -2 /tmp/BDB.log" | grep -c MASTER)
               [ $r -gt 1 ] && ret=1 || ret=0
               return $ret
          }

          wait_for_master()
          {
               echo -n "Waiting for MASTER at node $node ... "
               is_master $node
               r=$?
               while ( [ ! $r -eq 1 ] )
               do
               usleep 500000
               is_master $node
               r=$?
               echo -n "."
               done
               echo "done"
          }

          wait_for_slave()
          {
               local r
               local tm
               tm=0
               echo -n "Waiting for SLAVE at node $node ... "
               is_slave $node
               r=$?
               while ( [ ! $r -eq 1 ] )
               do
                    usleep 500000
                    is_slave $node
                    r=$?
                    echo -n "."
                    tm=$((tm+1))
                    [ $tm -gt 120 ] && break
               done
               [ $tm -gt 120 ] && ret=0 || ret=1
               echo "done"
               return $ret
          }

          run_test_split_brain()
          {
               local nt
               nt=1
               nfails=0
               x=4
               [ -z "$1" ] && node=$node2
               while ((1))
               do
                    printf "*************** TEST [%02d] ********************\n" $nt
                    split_brain
                    wait_for_master
                    x=$((RANDOM%9))
                    echo -n " waiting $x sec ..."
                    sleep $x
                    echo "done"
                    un_split_brain
                    wait_for_slave
                    r=$?
                    [ ! $r -eq 1 ] && echo "`date` - test [$nt] - fails ..." || echo "`date` - test [$nt] - OK ."
                    [ ! $r -eq 1 ] && nfails=$((nfails+1))
                    perc_failure=$(echo "100.0 - $nfails / $nt * 100.0" | bc -l)
                    echo "************************************************ [% Success test $perc_failure % ]"
                    nt=$((nt+1))
                    
                    x=$((RANDOM%9))
                    echo -n " waiting $x sec ..."
                    sleep $x
               done
          }

          run_test_split_brain

          ------
          here is the makefile to run to two environments

          i run:
          - make run
          and in another window sh test_split_brain.sh

          ----

          node1?=10.10.32.121
          node2?=10.10.32.91
          nsite?=2
          debug?=0

          all: RepQuoteExampleEric install

          RepConfigInfo.o: RepConfigInfo.cpp RepConfigInfo.h
               g++ -I/usr/local/BerkeleyDB.5.1/include/ -g -O0 -c RepConfigInfo.cpp -o RepConfigInfo.o

          RepQuoteExampleEric: RepQuoteExampleEric.cpp RepConfigInfo.o
               g++ -I/usr/local/BerkeleyDB.5.1/include/ -g -O0 RepQuoteExampleEric.cpp RepConfigInfo.o -o RepQuoteExampleEric -L /usr/local/BerkeleyDB.5.1/lib/ -lreadline -lcurses -ldb_cxx
          kill:
               -ssh -X root@$(node1) "killall -9 /root/RepQuoteExampleEric"
               -ssh -X root@$(node2) "killall -9 /root/RepQuoteExampleEric"

          run: RepQuoteExampleEric kill install clean_env
               ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.110:12345 -r 2.0.0.210:12345 -a quorum -b -n $(nsite) -v | tee /tmp/BDB.log\"" &
               ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &

          run_node2: clean_env2
               ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &


          debug_node2: clean_env2
               ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &
               sleep 3
               ssh -X root@$(node2) /sbin/pidof RepQuoteExampleEric >/tmp/pid
               ssh -X root@$(node2) ~/kdbg /root/db-5.1.19/examples/cxx/excxx_repquote/RepQuoteExampleEric -p `cat /tmp/pid`

               
          run_debug_node1: RepQuoteExampleEric kill install clean_env
               ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/kdbg /root/RepQuoteExampleEric\" " &
               ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v\"" &

          run_debug_node2: RepQuoteExampleEric kill install clean_env
               ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.110:12345 -r 2.0.0.210:12345 -a quorum -b -n $(nsite) -v\" " &
               ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/kdbg /root/RepQuoteExampleEric\"" &

          install: RepQuoteExampleEric
               scp RepQuoteExampleEric root@$(node1):~
               scp RepQuoteExampleEric root@$(node2):~

          clean_env: clean_env1 clean_env2

          clean_env1:
               ssh -X root@$(node1) rm -rf /opt/bdb/*

          clean_env2:
               ssh -X root@$(node2) rm -rf /opt/bdb/*
          • 2. Re: Election problem after repeated split-brains with two nodes
            524722
            The verbose output shows that the site wins the election and the
            RepMgr code calls rep_start to become the master, but it is stuck
            waiting for either a txn or a cursor to resolve before that state change
            can occur. I did not study your application code. But if you have a
            cursor or txn in progress at the time, then you must resolve it before
            the site can change state.

            If you can reproduce this, it would be useful to see stack traces
            of the threads. That would confirm or refute what I think I see.

            Sue LoVerso
            Oracle
            • 3. Re: Election problem after repeated split-brains with two nodes
              857786
              thanks you

              It was a txn write that send an exception that wasn't handle correctly
              • 4. Re: Election problem after repeated split-brains with two nodes
                857786
                Even If i correct the exception I rerun the test

                and in the MASTER I have this log
                -------------------
                Tue Jun 14 10:44:54 2011 - DB_EVENT_REP_PERM_FAILED.
                Tue Jun 14 10:44:54 2011 - Insufficient acknowledgements to guarantee transaction durability.
                [1308037494:377314][3955/47880320165088] MASTER: rep_send_function returned: 110
                [1308037494:475676][3955/1147312448] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 6526 eid 0, type log, LSN [22][34942
                12]
                [1308037494:475693][3955/1147312448] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 6524 eid -1, type dupmaster, LSN [0][0]
                nobuf
                [1308037494:475932][3955/1147312448] MASTER: rep_start: Found old version log 17
                [1308037494:476005][3955/1147312448] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 6524 eid -1, type newclient, LSN [0][0]
                nobuf
                Tue Jun 14 10:44:54 2011 - DB_EVENT_REP_CLIENT.
                Tue Jun 14 10:44:54 2011 - DB_EVENT_REP_DUPMASTER.
                [1308037494:476094][3955/1168292160] CLIENT: starting election thread
                DB_ENV->rep_elect:WARNING: nvotes (1) is sub-majority with nsites (2)
                [1308037494:476136][3955/1168292160] CLIENT: Start election nsites 2, ack 1, priority 100
                [1308037494:476149][3955/1168292160] CLIENT: Election thread owns egen 6525
                [1308037494:476684][3955/1157802304] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 6526 eid 0, type newclient, LSN [0][
                0]
                [1308037494:479128][3955/1168292160] CLIENT: Tallying VOTE1[0] (2147483647, 6525)
                [1308037494:479158][3955/1168292160] CLIENT: Beginning an election
                [1308037494:479172][3955/1168292160] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 6524 eid -1, type vote1, LSN [22][35735
                06] nobuf
                [1308037494:479181][3955/1157802304] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 6524 eid -1, type master_req, LSN [0][0
                ] nobuf
                DB->put: attempt to modify a read-only database
                QUOTESERVER(read-only)> [1308037494:479386][3955/1136822592] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 6526 eid 0,
                type vote1, LSN [22][3494304]
                [1308037494:479415][3955/1136822592] CLIENT: Updating gen from 6524 to 6526
                [1308037494:479429][3955/1136822592] CLIENT: Received vote1 egen 6527, egen 6525
                [1308037494:479440][3955/1136822592] CLIENT: Received VOTE1 from egen 6527, my egen 6525
                [1308037494:479448][3955/1136822592] CLIENT: Election finished in 0.003289000 sec
                [1308037494:479459][3955/1136822592] CLIENT: Election done; egen 6526
                [1308037494:479466][3955/1136822592] CLIENT: Tallying VOTE1[0] (0, 6527)
                [1308037494:479477][3955/1136822592] CLIENT: Incoming vote: (eid)0 (pri)100 ELECTABLE (gen)6526 (egen)6527 [22,3494304]
                [1308037494:479489][3955/1136822592] CLIENT: Not in election, but received vote1 0x282c 0x8
                [1308037494:479504][3955/1157802304] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 6526 eid 0, type alive, LSN [22][357
                3506]
                [1308037494:479532][3955/1178782016] CLIENT: starting election thread
                DB_ENV->rep_elect:WARNING: nvotes (1) is sub-majority with nsites (2)
                [1308037494:479553][3955/1178782016] CLIENT: Start election nsites 2, ack 1, priority 100
                [1308037494:479562][3955/1178782016] CLIENT: Election thread owns egen 6527
                [1308037494:481586][3955/1178782016] CLIENT: Tallying VOTE1[1] (2147483647, 6527)
                [1308037494:481608][3955/1178782016] CLIENT: Accepting new vote
                [1308037494:481621][3955/1178782016] CLIENT: Beginning an election
                [1308037494:481634][3955/1178782016] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 6526 eid -1, type vote1, LSN [22][35735
                06] nobuf
                [1308037494:481665][3955/1178782016] CLIENT: Tallying VOTE2[0] (2147483647, 6527)
                [1308037494:481677][3955/1178782016] CLIENT: Counted my vote 1
                [1308037494:481688][3955/1178782016] CLIENT: Skipping phase2 wait: already got 1 votes
                [1308037494:481699][3955/1178782016] CLIENT: Got enough votes to win; election done; (prev) gen 6526
                [1308037494:481710][3955/1178782016] CLIENT: Election finished in 0.002132000 sec
                [1308037494:481726][3955/1178782016] CLIENT: Election done; egen 6528
                ignoring event 5
                [1308037494:481746][3955/1178782016] CLIENT: Ended election with 0, e_th 1, egen 6528, flag 0x2a2c, e_fl 0x0, lo_fl 0x6
                [1308037494:481821][3955/1136822592] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 6526 eid 0, type alive, LSN [0][0]
                [1308037494:481842][3955/1136822592] CLIENT: Racing replication msg lockout, ignore message.
                [1308037494:522443][3955/1157802304] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 6526 eid 0, type vote2, LSN [0][0]
                [1308037494:522466][3955/1157802304] CLIENT: Racing replication msg lockout, ignore message.
                [1308037494:580289][3955/1168292160] CLIENT: Ended election with 0, e_th 0, egen 6528, flag 0x2a2c, e_fl 0x0, lo_fl 0x1c
                [1308037494:580316][3955/1168292160] CLIENT: election thread is exiting
                [1308037518:697085][3955/1147312448] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 6526 eid 0, type newclient, LSN [0][0]
                [1308037518:697133][3955/1147312448] CLIENT: Racing replication msg lockout, ignore message.
                [1308037518:697280][3955/1157802304] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 6526 eid 0, type vote1, LSN [22][3494304]
                [1308037518:697302][3955/1136822592] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 6526 eid 0, type bulk_log, LSN [0][0]
                [1308037518:697322][3955/1157802304] CLIENT: Racing replication msg lockout, ignore message.
                ...
                even when I stop the other node which also was in CLIENT Mode I got the following
                ...
                [1308056228:398423][3955/1157802304] CLIENT: Racing replication msg lockout, ignore message.
                EOF on connection from site 2.0.0.210:12345
                [1308056236:713163][3955/1115842880] CLIENT: init connection to site 2.0.0.210:12345 with result 115
                connecting to site 2.0.0.210:12345: Connection refused
                [1308056241:914285][3955/1115842880] CLIENT: init connection to site 2.0.0.210:12345 with result 115
                connecting to site 2.0.0.210:12345: Connection refused
                [1308056247:115430][3955/1115842880] CLIENT: init connection to site 2.0.0.210:12345 with result 115
                connecting to site 2.0.0.210:12345: Connection refused
                ... infinitly ...
                -----
                I attach with gdb to the process and i found the thread of election in this state

                (gdb) thre 8
                [Switching to thread 8 (process 19702)]#0 0x000000368dccced2 in select () from /lib64/libc.so.6
                (gdb) bt
                #0 0x000000368dccced2 in select () from /lib64/libc.so.6
                #1 0x00002b8c0153b8da in __os_sleep (env=0x19a187a0, secs=1, usecs=0) at ../src/os/os_yield.c:90
                #2 0x00002b8c0153b889 in __os_yield (env=0x19a187a0, secs=1, usecs=0) at ../src/os/os_yield.c:48
                #3 0x00002b8c01458581 in __rep_lockout_int (env=0x19a187a0, rep=0x2b8c01355348, fieldp=0x2b8c013553f4, field_val=0,
                msg=0x2b8c0155b45f "op_cnt", lockout_flag=16) at ../src/rep/rep_util.c:1508
                #4 0x00002b8c01458396 in __rep_lockout_api (env=0x19a187a0, rep=0x2b8c01355348) at ../src/rep/rep_util.c:1429
                #5 0x00002b8c01442da7 in __rep_start_int (env=0x19a187a0, dbt=0x4642bf30, flags=2) at ../src/rep/rep_method.c:530
                #6 0x00002b8c01468860 in __repmgr_repstart (env=0x19a187a0, flags=2) at ../src/repmgr/repmgr_util.c:480
                #7 0x00002b8c0145d4fa in __repmgr_elect (env=0x19a187a0, nsites=2, nvotes=1, failtimep=0x4642c070)
                at ../src/repmgr/repmgr_elect.c:464
                #8 0x00002b8c0145cc1b in __repmgr_elect_main (env=0x19a187a0, th=0x2aaab0000a90) at ../src/repmgr/repmgr_elect.c:168
                #9 0x00002b8c0145ca86 in __repmgr_elect_thread (argsp=0x2aaab0000a90) at ../src/repmgr/repmgr_elect.c:102
                #10 0x000000368e4064a7 in start_thread () from /lib64/libpthread.so.0
                #11 0x000000368dcd3c2d in clone () from /lib64/libc.so.6
                (gdb)


                Could you explain me the state ?
                • 5. Re: Election problem after repeated split-brains with two nodes
                  857786
                  I add some new trace after I correct the program according to your instruction but still the problem is staying.
                  Perhaps I didn't understand your instruction ?
                  • 6. Re: Election problem after repeated split-brains with two nodes
                    524722
                    I noticed that you posted your test program earlier in this thread.
                    It does not appear that you actively use txns or cursors. Therefore,
                    although the code is stuck waiting for either a txn or cursor to
                    resolve or close, it isn't clear where that would be from.

                    Do you have DIAGNOSTIC configured? You should be seeing
                    a message that says: "Waiting for op_cnt (#) to complete lockout..."
                    in the replication verbose messages. The first one would appear

                    This may require additional rounds of debugging back and forth
                    and we can take that offline. Please contact me at

                    firstname.lastname@oracle.com

                    using the first and last name shown below and we can debug this further.
                    Thanks!

                    Sue LoVerso
                    Oracle