1 2 Previous Next 26 Replies Latest reply: May 2, 2007 10:49 AM by 524761 Go to original post RSS
      • 15. Re: Regenerate a replica
        566200
        Never mind. I don't know how or why, but rebooting the client solved the DB_LOCK_DEADLOCK problem. Now I'll have a look at the 'Operation not permitted' message again.
        • 16. Re: Regenerate a replica
          566200
          With DB_VERB_REPLICATION enabled, I see the following on the master just before the PANIC arrives:

          ----- s n i p -----
          MASTER: <dbdir> rep_process_message: msgv = 3 logv 12 gen = 1 eid 0, type all_req, LSN [0][0]
          MASTER: <dbdir> rep_send_message: msgv = 3 logv 12 gen = 1 eid 0, type newfile, LSN [0][0] nobuf
          MASTER: ignoring message sent to unavailable site
          MASTER: rep_send_function returned: -30976
          MASTER: <dbdir> rep_send_message: msgv = 3 logv 12 gen = 1 eid 0, type log, LSN [1][28] nobuf
          MASTER: ignoring message sent to unavailable site
          MASTER: rep_send_function returned: -30976
          ----- s n i p -----
          • 17. Re: Regenerate a replica
            524761
            Turbo,

            Thank you for the verbose messages listing. It was very helpful.

            I think I've discovered the bug that causes this problem. It's actually pretty interesting, because it's in a rather unrelated area: enforcing the message limits imposed by the throttling feature.

            Anyway, this is triggered by the master trying to process the ALL_REQ request from the client, and meanwhile the client has gone away. So the response messages cannot be sent. (That's what "rep_send_function returned: -30976" means.)

            If you avoid disconnecting (including shutting down) the client during its sync-up with the master, I expect you can avoid this problem. But in any case, I will certainly fix this bug. If you would like a patch, to fix up version 4.5.20, please let me know where I can send it to you.

            Thanks again for your patience in working through to this solution!

            Alan Bram
            Oracle
            • 18. Re: Regenerate a replica
              566200
              I'd very much like a patch. How do I tell you where to send it? In the forum don't seem to be the best of ideas :). Can you put it so that it can be retreived from the web/ftp?
              • 19. Re: Regenerate a replica
                566200
                Think I found the patch at the download page http://www.oracle.com/technology/products/berkeley-db/db/update/4.5.20/patch.4.5.20.html.

                Now the 'server crash on client (unclean) exist' is fixed, but I still can't get replication to work...
                The client will 'get a couple of megs' (different each time), and then just stops.
                One thing that is/can be a clue is that the __db.rep.db file will almost always end up with 14.7Mb. Each time I restart the client, it will be recreated and grow to 14.7Mb. If I'm quick enough I can see that the filesize will 'fluctuate' (temporarily shrink somewhat and then grow again). The shrinking will correspond with the database table(s) growth.
                • 20. Re: Regenerate a replica
                  566200
                  Also (which might have something to do with it) is that only the first log file is created on the client. It grows to the maximum allowed (default 10485760 bytes). The __db.rep.db will grow for a couple of seconds after this, but then everything stops. The size of the __db.rep.db will be everything from 13.6Mb to 14.7Mb...
                  • 21. Re: Regenerate a replica
                    566200
                    Checking the error log (running with verbose messages (DB_VERB_DEADLOCK|DB_VERB_RECOVERY|DB_VERB_REPLICATION|DB_VERB_WAITSFOR), I see that I get 'EOF on connection from site <client:port>' almost immediatly. The __db.rep.db file and one table file will continue to grow for a few seconds after this. When the client is shutdown, I get another EOF from client.
                    That sometimes doesn't seem to matter, becase I've almost been able to retreive all the tables. But not all of it/them...
                    • 22. Re: Regenerate a replica
                      566200
                      Finally I managed to get the whole database (all tables) on the client. But when i wanted to try it again, to see if I could reproduce the success, the client died because it couldn't open a table (no such file or directory) and the master received a PANIC. Don't exactly know what came first though.

                      Can't seem to reproduce this problem a 100% though. A lot of the time, everything seems to work, and every now and then the PANIC occurs (and I have to do a db_recover). It seems like the patch at the URL above isn't the correct one. Or it is, but the problem isn't 100% fixed (the problem seems to occure more seldom now, but that might just be my imagination :). Alan, do you have the patch, I could really need it!?

                      What's weird though is that I have done absolutly NO changes to the code, exept enable the DB_VERB_* messages! I'll try again without them.
                      Without the DB_VERB_* messages, I get PANIC on the master every time!

                      UPDATE: Enabling ONLY the DB_VERB_REPLICATION message, it works again. What is this!?!?

                      UPDATE: But a Ctrl-C on the client still gives a PANIC on the master with the last couple of lines (the queuing lines are repeated quite a lot of times before this):

                      MASTER: <datadir> rep_send_message: msgv = 3 logv 12 gen = 2 eid 0, type log, LSN [9][2350260] nobuf
                      MASTER: msg to site <client:port> to be queued
                      MASTER: queue limit exceeded
                      MASTER: <datadir> rep_send_message: msgv = 3 logv 12 gen = 2 eid 0, type log, LSN [9][2350328] nobuf
                      MASTER: ignoring message sent to unavailable site
                      MASTER: rep_send_function returned: -30976

                      Message was edited by: Turbo
                      • 23. Re: Regenerate a replica
                        524761
                        Hi,

                        I will post the patch on our ftp server when it is ready (not yet), and will let you know.

                        Alan Bram
                        Oracle
                        • 24. Re: Regenerate a replica
                          566200
                          Thanx. But do you have any idea why it seems to work with DB_VERB_REPLICATION enabled, but fails almost always without it!?
                          • 25. Re: Regenerate a replica
                            566200
                            Still no patch? My boss(es) are starting to be a little ... annoyed on me/replication.
                            • 26. Re: Regenerate a replica
                              524761
                              The patch was posted to the FTP site on April 18. I thought I posted a message to this forum with instructions at that time, but it must have gotten lost; I don't see it here now.

                              Anyway, download the patch from:

                              ftp://ftp.sleepycat.com/hidden/patch.15436

                              To apply the patch, change to the top level Berkeley DB source code directory, and use the patch command with the "-p1" flag.

                              Alan Bram
                              Oracle
                              1 2 Previous Next