5 Replies Latest reply on Aug 6, 2010 12:43 AM by 774396

    Database LSN out of Sync with transaction logs LSN

    749004
      I am using BDB 4.7 as a back end for my server. On some customer installations, the Transactin log LSN falls behind the database LSN. The specific error I'm getting is

      "file unknown has LSN 1/6437062, past end of log at 1/6222368"


      What could cause this? I have verified that the database folder is excluded from any virus scan or automatic backup software.
        • 1. Re: Database LSN out of Sync with transaction logs LSN
          694549
          Hi,

          Its funny, when i came to post my question , the first question on the sorum was the same. I am putting steps to solve the problem.

          We have been using BDB in our production environment for a while. LIbrary Version is 4.7.25. We use bdb in a java based stack running in tomcat and everytime on startup, we have configured it to run recovery. Today on one of the instances, we started getting this error after starting the environment.

          ====
          recovery 0% completedb_recover: file unknown has LSN 504/20536642, past end of log at 504/1878386
          db_recover: Commonly caused by moving a database from one database environment
          db_recover: to another without clearing the database LSNs, or by removing all of
          db_recover: the log files from a database environment
          ====

          We did not notice the error for a while as its not one of the hevaily used environments and looking at the logs I found that txn id specified for the log file started at something like "past end of log at 504/1877026" and kept on imncreasing for every transaction environment tried to execute and failed but id mentioned in the part "file unknown has LSN 504/20536642" remained the same.

          I have the following questions. It will be great if anyone can help me with this.

          1) As far as I understand, this LSN format is latest_log_file_number/transaction_id. Is that correct.
          2) Looking at "file unknown has LSN 504/20536642", it seems that there is some logging issue going on here as well as its not able to print file name here.
          3) I tried running recovery but that didnt help either. Here is the recovery log log

          =====
          $>/usr/local/BerkeleyDB.4.7/bin/db_recover -c -f -v
          Finding last valid log LSN: file: 504 offset 1878386
          recovery 0% completedb_recover: file unknown has LSN 504/20536642, past end of log at 504/1878386
          db_recover: Commonly caused by moving a database from one database environment
          db_recover: to another without clearing the database LSNs, or by removing all of
          db_recover: the log files from a database environment
          recovery 0% completedb_recover: file unknown has LSN 504/20536642, past end of log at 504/1878386
          db_recover: Commonly caused by moving a database from one database environment
          db_recover: to another without clearing the database LSNs, or by removing all of
          db_recover: the log files from a database environment
          recovery 26% completeRecovery starting from [500]
          klahiri: recovery 26% completeRecovery starting from [500][28]
          recovery 93% completeRecovery complete at Sun Jan 24 13:10:10 2010
          Maximum transaction ID 8000ffc7 Recovery checkpoint [504][1878386]
          =====

          4) I guessed that the unknow is the DB File name as I know that db pages also have references to LSNs. I next did a reset for LSNs using db_load command and ran recovery. This time it went through with without any complains. I know that the two scenarios mentioend in the error message did not occur in this case. We did not move the db since its creation and and we did not move any log files from this environment as its not a heavily used environment. Can anyone help me with finding the situation which could have caused it. In this instance, I was able to resolve it but just want to find the root cause as to why DB file may have a higher LSN than the log file, if my assumption is correct about the file name.

          Thanks a lot in advance for all the help.

          Shishir
          • 2. Re: Database LSN out of Sync with transaction logs LSN
            "Oracle, Sandra Whitman-Oracle"
            Hello,

            The LSN is a log sequence number which specifies a unique location in a log file. A DB_LSN consists of two unsigned 32-bit integers -- one specifies the log file number, and the other specifies an offset in the log file. The environment log_file() method maps DB_LSN structures to filenames, returning the name of the file containing the record named by lsn. Further documentation is at:

            http://www.oracle.com/technology/documentation/berkeley-db/db/api_reference/C/lsn.html
            and
            http://www.oracle.com/technology/documentation/berkeley-db/db/api_reference/C/logfile.html

            As you can see from the LSNs in the error message, "file unknown has LSN X, past end of log at Y" says that there is an LSN in the database that is for a record which is not yet in the log. That could mean that the database was copied from another environment or by removing all the log files in an environment. It sounds like neither condition is the case here.

            Is there any way that db_checkpoint and db_archive -d (to remove log files) are being used?

            What else can you provide about what the applications are doing? How are the databases and environment opened? For example, is DB_REGISTER in use, or in-memory logging, etc.? Also what platform is this?

            Thanks,
            Sandra
            • 3. Re: Database LSN out of Sync with transaction logs LSN
              749004
              The database is open with DB_RECOVER,DB_INIT_LOG, DB_INIT_TXN, DB_PRIVATE, and DB_THREAD. The checkpoint threashold is set to 1 mb. db_checkpoint and db_archive are not being used. The log files are being stored on the file system and we are not using in memory logging.

              One of the customers that ran into this problem had a virus scanner scanning the database (I told him not to do that).

              I discovered that if the transaction log file is deleted while the database is open BDB will continue to write transaction the the database even though it doesn't have a transaction log file to update. I believe that what is hapenning is that the virus scanner is locking the log file preventing BDB from updating it. BDB continues to update the database. When the application is re-started the open fails because the database lsn number is higher than the log file LSN number.

              Is there a way to configure BDB so that it can detect when the log file is locked/missing?
              • 4. Re: Database LSN out of Sync with transaction logs LSN
                "Oracle, Sandra Whitman-Oracle"
                Thanks for the information.

                Yes, that could exactly cause the error message reported. I know of no way to configure BDB to avoid that.

                Sandra
                • 5. Re: Database LSN out of Sync with transaction logs LSN
                  774396
                  Are you suggesting that db_checkpoint and subsequent db_archive -d can lead to this kind of error?
                  If so, is there a safer way to perform db_checkpoint and db_archive -d? since I would not like to retain the log files indefinitely.