6 Replies Latest reply: Oct 21, 2013 1:18 AM by 4246bac3-3a09-4031-8891-51bb8a451ffc RSS

    how to handle event "DB_EVENT_PANIC" on replica

    4246bac3-3a09-4031-8891-51bb8a451ffc

      Hi BDB experts,

       

      I am writing db HA application based on bdb version 4.6.21. Two daemons run on two machines, one as master which will read/write db, one as backup will only read db. It seems backup will sometimes get event "DB_EVENT_PANIC" for reasons I didn't know. What should I do on receiving such event? Should the daemon exit and run then open env again for recovery? Can one process reopen the environment for recovery without exit first?

       

      Another question is unrelated to the issue above: if use bdb ha(base api), can there be 3 processes, process1 is neither master or client but just writing db,  process2 is as rep master that open the same env(just for ha, will not write db), process3 is client that connects to master to get db synced?

       

      Thanks,

       

      Min

        • 1. Re: how to handle event "DB_EVENT_PANIC" on replica
          Paula B-Oracle

          I am writing db HA application based on bdb version 4.6.21. Two daemons run on two machines, one as master which will read/write db, one as backup will only read db. It seems backup will sometimes get event "DB_EVENT_PANIC" for reasons I didn't know. What should I do on receiving such event? Should the daemon exit and run then open env again for recovery? Can one process reopen the environment for recovery without exit first?

          After a panic you should exit from all your processes and then run a recovery. You can run the recovery using the db_recover utility or by performing a DB_ENV->open() with the DB_RECOVER flag. See the Reference Guide section "Recovery procedures" for more details.

           

          Another question is unrelated to the issue above: if use bdb ha(base api), can there be 3 processes, process1 is neither master or client but just writing db,  process2 is as rep master that open the same env(just for ha, will not write db), process3 is client that connects to master to get db synced?

          For most replication applications, the master environment and the client environment run on different sites/machines. It is also possible to run the master environment and the client environment on the same machine, but they must still be separate environments with separate home directories.

           

          All database writes must be performed on the master environment. So your process1 must use the same environment as process2. Your process2 should run first to configure and start the environment as a replication master. Then your process1 can start up and perform its writes. You can start your client/process3 at any time. See the Reference Guide section "Building replicated applications", particularly the paragraph starting with "Consider the case of multiple processes...", for more information.

           

          Paula Bingham

          Oracle

          • 2. Re: how to handle event "DB_EVENT_PANIC" on replica
            4246bac3-3a09-4031-8891-51bb8a451ffc

            Hi Paula,

            Very appreciate your clear answer. One more question: if run recovery, should all the files in the env path be removed first or no need to do this and only run recovery on current env path?


            Thanks,

            Min

            • 3. Re: how to handle event "DB_EVENT_PANIC" on replica
              Paula B-Oracle

              You don't need to remove the files yourself.

               

              Paula Bingham

              Oracle

              • 4. Re: how to handle event "DB_EVENT_PANIC" on replica
                4246bac3-3a09-4031-8891-51bb8a451ffc

                Hi Paula,

                 

                 

                For replicated application that have multiple processes, I have read the related manual that you mentioned.

                Could you help confirm whether my understanding is right?

                That is, if in ha environment and N processes would write db, each process should establish a connection with the replica and would send ha msg to the replica when it writes db, right?

                It is said "Subsequent replication processes must at least call the DB_ENV->rep_set_transport method. Those processes may call the DB_ENV->rep_start method", I am confused that why rep_start is not a must. Because if rep_start is not called, bdb will not start ha threads and the process can't send msg to replica, right?

                 

                If my understanding is right, for the N processes ha case, there is a question below:

                each process used to be single threaded, and run a few seconds then exit after finishes its task. Will the main thread get notified for which event from bdb that indicates sync to replica is done? As the process is not a daemon, I am afraid if it exits as former logic, bdb ha thread may not finish sync.

                 

                 

                Thanks,

                Min

                • 5. Re: how to handle event "DB_EVENT_PANIC" on replica
                  Paula B-Oracle

                  That is, if in ha environment and N processes would write db, each process should establish a connection with the replica and would send ha msg to the replica when it writes db, right?

                  It is said "Subsequent replication processes must at least call the DB_ENV->rep_set_transport method.

                  Yes, each process should establish a connection to the replica. Each process also needs the DB_ENV->rep_set_transport call so that it will invoke your application's send function to send each HA message to the replica.

                   

                  Those processes may call the DB_ENV->rep_start method", I am confused that why rep_start is not a must. Because if rep_start is not called, bdb will not start ha threads and the process can't send msg to replica, right?

                  I am assuming from your earlier information that you are using Base API calls. The Base API calls do not create their own threads. When using the Base API, it is the application's responsibility to create and manage its own threads.

                   

                  One possible way to design an application is to have a main replication process on the master that calls rep_set_transport and rep_start and creates one or more threads to handle all incoming messages from other sites.

                  In this design, there can be N additional processes on the master that perform database writes which only need to send the logging information about these writes to the other sites. The call to rep_set_transport provides the information needed to do this so there is no need for these N additional processes to call rep_start. Our documentation is allowing for cases like this.

                   

                  You do not have to design your application this way and you can make rep_start calls in more than one process as long as they each supply the same DB_REP_CLIENT or DB_REP_MASTER value.

                   

                  If my understanding is right, for the N processes ha case, there is a question below:

                  each process used to be single threaded, and run a few seconds then exit after finishes its task. Will the main thread get notified for which event from bdb that indicates sync to replica is done? As the process is not a daemon, I am afraid if it exits as former logic, bdb ha thread may not finish sync.

                  This is a reason to consider a design with a main replication process that remains running and handles all incoming messages. Then you can have multiple additional short-lived processes that perform database writes and exit.

                   

                  If your application cannot follow this model, then you will need to design your application to make sure each process survives long enough to get whatever synchronization from the replica it requires. You should look at the Reference Guide sections "Building the communications infrastructure" and "Transactional guarantees" for more details.

                   

                  Just out of curiosity, why are you using 4.6? That is not a very recent release.

                   

                  Paula Bingham

                  Oracle

                  • 6. Re: how to handle event "DB_EVENT_PANIC" on replica
                    4246bac3-3a09-4031-8891-51bb8a451ffc

                    Hi Paula,

                     

                    Thank you a lot! I'll read the manual sections you suggested carefully. I am developing software in the company and the version the company used was 4.6(may be bought several years ago), so I have to use this version.

                     

                    Min