9 Replies Latest reply: Apr 21, 2011 1:03 PM by 524761 RSS

    What causes that HA throws error "local site address has not yet been set"

    776130
      HA works well util we update Berkely DB from 5.0 to 5.1.25.

      But now,we often get the error "local site address has not yet been set".

      Who can tell we what happen?
        • 1. Re: What causes that HA throws error "local site address has not yet been set"
          524761
          That error message comes from the DB_ENV->repmgr_get_local_site() method, which is new in 5.1. The application should only call this method after the local site address has been defined, by calling DB_ENV->repmgr_set_local_site() (or the equivalent entry in the DB_CONFIG file).
          • 2. Re: What causes that HA throws error "local site address has not yet been set"
            524761
            However, if you're using the Java API, it makes this call internally whenever the application calls Environment.getConfig() or Environment.setConfig(); so that's a bug in Berkeley DB. Other than the annoyance of the unwanted error message, it is harmless.
            • 3. Re: What causes that HA throws error "local site address has not yet been set"
              776130
              Dear Alan,

              the application is of C++.

              The error cause that our application can't run. Do you have some advice for it?

              Thanks
              • 4. Re: What causes that HA throws error "local site address has not yet been set"
                524761
                Is the application calling DbEnv::repmgr_get_local_site()?
                • 5. Re: What causes that HA throws error "local site address has not yet been set"
                  776130
                  The application nerver invoke explicitly the method get_local_site.

                  But before calling the method set_local_site, what the application does as below.

                       m_envFlags = m_envFlags |
                            DB_CREATE | // Create the environment if it does not exist
                            DB_RECOVER | // Run normal recovery.
                            DB_INIT_LOCK | // Initialize the locking subsystem
                            DB_INIT_LOG | // Initialize the logging subsystem which provides a high-degree of recoverability when application crashes.
                            DB_INIT_TXN | // Initialize the transactional subsystem. This also turns on logging.
                                           // wyb: recovery requires transaction support, so DB_INIT_TXN is a must
                            DB_THREAD | // Cause the environment to be free-threaded //not ok when db->get() , but ok when dbcursor->get()
                            DB_INIT_MPOOL; // Initialize the memory pool (in-memory cache)
                            
                       m_env = new DbEnv(0);
                       if (m_env == NULL) {
                            m_logger.error("new DbEnv error: %m");
                            throw OWException("new DbEnv error");
                       }

                       if (shm_key == -1) {   
                            printf("create shm_key failed!\n");
                            exit(1) ;
                       }

                       m_env->set_errpfx("owbdb");
                       m_env->set_errcall(DBEnv::errorHandler); //bdb errors will be sent to the callback function.
                       m_env->set_msgcall(DBEnv::msgHandler); //bdb msgs will be sent to the callback function.

                       m_env->set_flags(DB_TXN_NOSYNC, 1);

                       m_env->set_lg_max(10*1024*1024); //disk log size (default: 10M)

                       m_env->set_cachesize(0, m_cacheSize, 0);

                       uint32 logSize = 32*1024; //default: 32*1024, ori: m_cacheSize
                       m_env->set_lg_bsize(logSize);
                       m_env->set_lk_max_lockers(20000);
                       m_env->set_lk_max_objects(20000);
                       m_env->set_lk_max_locks(20000);

                       // set the maximum number of simultaneous transactions
                       m_env->set_tx_max(10000);

                            //m_env->set_app_private(&m_appData);
                            m_env->set_event_notify(DBEnv::eventCallback);

                            // ack policy can have a great impact in performance, lantency and consistency
                            //m_env->repmgr_set_ack_policy(DB_REPMGR_ACKS_ONE_PEER); //ori: DB_REPMGR_ACKS_ALL
                            m_env->repmgr_set_ack_policy(DB_REPMGR_ACKS_NONE);
                            //m_env->repmgr_set_ack_policy(DB_REPMGR_ACKS_QUORUM);

                            // timeout configs
                            m_env->rep_set_timeout(DB_REP_ACK_TIMEOUT, 50 * 1000); //50ms
                            m_env->rep_set_timeout(DB_REP_CHECKPOINT_DELAY, 0);
                            m_env->rep_set_timeout(DB_REP_CONNECTION_RETRY, 30 * 1000 * 1000); // 30 seconds
                            m_env->rep_set_timeout(DB_REP_ELECTION_TIMEOUT, 5 * 1000 * 1000); // 5 seconds
                            m_env->rep_set_timeout(DB_REP_ELECTION_RETRY, 10 * 1000 * 1000); //10 seconds

                            m_env->rep_set_timeout(DB_REP_HEARTBEAT_MONITOR, 80 * 1000 * 1000); //80 seconds
                            m_env->rep_set_timeout(DB_REP_HEARTBEAT_SEND, 60 * 1000 * 1000); //60 seconds

                            m_env->rep_set_priority(priority);     

                            uint32 rep_req_min = 40000;
                            uint32 rep_req_max = 1280000;
                            uint32 rep_limit_gbytes = 0;
                            uint32 rep_limit_bytes = 10 * 1024 * 1024; // 10MB
                            m_env->rep_set_request(rep_req_min, rep_req_max);
                            m_env->rep_set_limit(rep_limit_gbytes, rep_limit_bytes);

                            // set local site
                            if ((ret = m_env->repmgr_set_local_site(localIp.c_str(), localPort, 0)) != 0) {
                                 string strError = "Could not set bdb local on " + localIp + ":" + toStr(localPort) + " : ";
                                 m_env->err(ret, strError.c_str());
                                 throw OWException(strError);
                            }
                  • 6. Re: What causes that HA throws error "local site address has not yet been s
                    524722
                    It looks like your m_envFlags is missing DB_INIT_REP.

                    Sue LoVerso
                    Oracle
                    • 7. Re: What causes that HA throws error "local site address has not yet been set"
                      524761
                      user13177882 wrote:
                      The application nerver invoke explicitly the method get_local_site.
                      In that case it is a mystery how it got invoked, since AFAIK it is never invoked internal (other than the Java API as I mentioned earlier).

                      Therefore, can you please try running the program in a debugger, with a breakpoint on the repmgr_get_local_site function at the point where that message is produced, and when you hit the breakpoint print a stack trace?
                      • 8. Re: What causes that HA throws error "local site address has not yet been set"
                        776130
                        I have debug source code.

                        And adjust my code.

                        m_env->set_event_notify(DBEnv::eventCallback);
                        m_env->rep_set_config(DB_REP_CONF_BULK, 1);
                        m_env->set_verbose(DB_VERB_REPLICATION, 1);
                        // set local site
                        im_env->repmgr_set_local_site(localIp.c_str(), localPort, 0))
                        // set remote site
                        m_env->repmgr_add_remote_site(peerIp.c_str(), peerPort, NULL, 0))

                        uint32 rep_req_max = 1280000;
                        uint32 rep_limit_gbytes = 0;
                        uint32 rep_limit_bytes = 10 * 1024 * 1024; // 10MB
                        m_env->rep_set_request(rep_req_min, rep_req_max);
                        // timeout configs
                        m_env->rep_set_priority(priority);
                        m_env->rep_set_nsites(2);


                        setEnvParameters();
                        m_envFlags = m_envFlags |DB_INIT_REP;
                        m_env->repmgr_set_ack_policy(DB_REPMGR_ACKS_ALL);
                        m_env->rep_set_timeout(DB_REP_ACK_TIMEOUT, 50 * 1000); //50ms
                        // m_env->rep_set_timeout(DB_REP_CHECKPOINT_DELAY, 0);
                        // m_env->rep_set_timeout(DB_REP_CONNECTION_RETRY, 30 * 1000 * 1000); // 30 seconds
                        // m_env->rep_set_timeout(DB_REP_ELECTION_TIMEOUT, 5 * 1000 * 1000); // 5 seconds
                        // m_env->rep_set_timeout(DB_REP_ELECTION_RETRY, 10 * 1000 * 1000); //10 seconds
                        //m_env->rep_set_priority(priority);
                        m_env->rep_set_limit(rep_limit_gbytes, rep_limit_bytes);

                        m_logger.info("bdb environment, db home dir: %s, m_envFlags: %x ", m_homeDir.c_str(), m_envFlags);
                        int ret = m_env->open(m_homeDir.c_str(), m_envFlags, 0);
                        m_logger.info("open db env(%s) return %s", m_homeDir.c_str(), (ret==0 ? "ok":toStr(ret).c_str()) );
                        if ((ret = m_env->repmgr_start(3, startPolicy)) != 0) {
                        string strError = "bdb repmgr_start failed: " + toStr(db_strerror(ret));
                        m_logger.error(strError.c_str());
                        throw OWException(strError);
                        }

                        when the application invoke the methods m_env->rep_set_config(DB_REP_CONF_BULK, 1), m_env->repmgr_set_ack_policy(DB_REPMGR_ACKS_ALL) m_env->rep_set_priority(priority) m_env->rep_set_timeout(DB_REP_ACK_TIMEOUT, 50 * 1000) and so on, the application crash because it get SIGSEVG.
                        • 9. Re: What causes that HA throws error "local site address has not yet been set"
                          524761
                          Where did the SEGV occur? Can you provide a stack trace, please?

                          What do you mean by "and so on"? Which one resulted in the SEGV?