8 Replies Latest reply: Aug 6, 2008 3:05 PM by greybird RSS

    Questions re: opening a large BDB

    628326
      i am opening a 149GB bdb consisting of 17041 jdb's on a 64-bit linux box.

      it takes approximately 5 minutes to go from getting the environment to completing the open operations (the db in question has one primary and one secondary key).

      is this to be expected? is it going to get worse exponentially? do i need to change the jdb size dramatically to get the # of files down?

      i am also using com.sleepycat.je.Database.count() as part of the open operation (and later at close time to compare). i read somewhere that count() is very fast but could someone confirm (or deny)

      thanks
        • 1. Re: Questions re: opening a large BDB
          Linda Lee
          >
          >
          i am also using com.sleepycat.je.Database.count() as
          part of the open operation (and later at close time
          to compare). i read somewhere that count() is very
          fast but could someone confirm (or deny)
          Database.count() is pretty darned slow. At one point, we made it somewhat faster, and calling Database.count() is faster than scanning the database record by record yourself, but it's still a pretty expensive operation. That's very likely to be your culprit.

          If you close the environment cleanly (by calling Environment.close()), the subsequent open should be very fast, and should only need to read a bit of the last .jdb file or two, so the number of files is not a factor. If your environment did not shut down cleanly, you might have to read a bit more, depending on the length of your last checkpoint, but it still should be many orders of magnitude less than 5 minutes.

          Linda
          • 2. Re: Questions re: opening a large BDB
            628326
            Database.count() is pretty darned slow. At one point,
            we made it somewhat faster, and calling
            Database.count() is faster than scanning the database
            record by record yourself, but it's still a pretty
            expensive operation. That's very likely to be your
            culprit.

            If you close the environment cleanly (by calling
            Environment.close()), the subsequent open should be
            very fast, and should only need to read a bit of the
            last .jdb file or two, so the number of files is not
            a factor. If your environment did not shut down
            cleanly, you might have to read a bit more, depending
            on the length of your last checkpoint, but it still
            should be many orders of magnitude less than 5
            minutes.
            Thank you for the info. I instrumented our code and we found that while .count() was indeed slow it was not the main culprit. Environment open is by far the most costly operation. We're closing the environment cleanly and when opening it takes a very very long time. As our database grows so does the startup time. It actually took 15 minutes this time (the db has grown substantially since our last startup)

            Any ideas? This is pretty much a write once read many database. We almost never update records or delete them.
            • 3. Re: Questions re: opening a large BDB
              Linda Lee
              Ok, that's surprising news. Let me explain what happens when you open an environment. That will give you the rationale for some next steps to try.

              When you instantiate the environment, JE runs what is often called "recovery" in database parlance. It reads the transactional history to recreate a consistent view of the data held within. Like many database products, JE periodically writes what are called checkpoint records into the logs, which are points of known consistency. Checkpoints speed up recovery because they reduce the amount of transactional history that must be perused at recovery. Instead of having to read the entire history, JE starts at a checkpoint, and then only reads what follows that checkpoint. Any unfinished transactions that follow the checkpoint are deemed dead, and are rolled back. Any finished, committed transactions after the checkpoint are re-applied. After these steps, JE has a consistent view of the data. These steps are taken even for an environment opened in non-transactional mode. (There are still operations that need to be reapplied in non-transactional mode).

              So when you say that the environment open is very slow, the first thing I wonder is where the last checkpoint was. If you call Environment.close() in a read/write environment, a checkpoint should always be written at the very end of the log, which makes the next open very fast. Perhaps you're always opening the environment in read/only mode, and that last checkpoint isn't going out? One easy experiment to try is to open your environment in read/write mode, call Environment.close(), and then try a re-open.

              If that doesn't do the trick, we have to figure out where your last checkpoint is. If you do this command:

              java -jar <your je.jar> DbPrintLog -h <environment directory> -ty20 > foo.xml

              you'd see a whole bunch of log dump, with lines that start like this.

              <entry lsn="0x2/0x442b2" type="CkptEnd/6" ...

              Look at the last record in foo.xml. That "lsn" value is the location of the checkpoint records -- in this case, in file 2, at byte 0x442b2. If there are many bytes between it and the end of the log, (like more than four or five files worth), then that's a smoking gun as to why the recovery is so slow. Then we have to figure out why closing the environment doesn't give you a checkpoint end record at the very end.


              Linda
              • 4. Re: Questions re: opening a large BDB
                628326
                thanks again linda, i'll let you know how this goes during our nighttime maint window. in the meantime, what's the best place to learn more about all these utility classes (you mentioned DbPrintLog and i was reading about DbSpace earlier).
                • 5. Re: Questions re: opening a large BDB
                  628326
                  Hi again, I forgot to mention that our database is being opened for transactional access. We did some testing and found that a non-transactional open is much faster. Did everything you had to say apply to both transactional and non-transactional bdbs?
                  • 6. Re: Questions re: opening a large BDB
                    Linda Lee
                    The same basic steps are taken in a non-transactional environment, but there's never any operations to undo, because there are never any in-progress-not-committed-at-close transactions. But there should be no difference at open that I can think of, because the determining thing is what happened before the open.

                    For example, suppose at t1, you opened and operated the environment transactionally. Then at t100, you re-opened the environment, non-transactionally. What happened from t1-t99 is what determines what has to be done at t100 -- it should be the same, no matter how you open at t100, because the basic task is still to recreate a consistent environment.

                    Linda
                    • 7. Re: Questions re: opening a large BDB
                      607790
                      Hi Linda,

                      I have a BDB/JE with a table in it that is taking about 45 minutes to open. It is about 23GB of data, and there are around 20,000,000 records of the problem type. I'm using the DPL and all the time is taken up in the getPrimaryIndex method for this type. I'm assuming that this is the point where the DB needs to do its consistency check, and I suspect that this particular database was not closed cleanly. However, based on what I've read here, it seems like even if the database wasn't closed cleanly, there should be a checkpoint somewhere near the end of the DB that allows it to open fairly quickly, if not as fast as it would normally. I'm fairly certain this DB was created by a program that did a whole lot of puts of new objects and not any updating/deleting. It is possible that some long running read-only transactions from cursors could have been interrupted when the DB was closed. Could that kind of thing explain this excessively long time?

                      I tried running the DbPrintLog command above, although I could only run it without the "-ty20" command without an error. It gave me 100 million lines of XML, and I can tell you that there is a "CkptEnd" line within the last 20,000 lines of the file (I don't have an exact number -- it's a lot of data to filter!).

                      I'd love to have the DB open faster than this even following a failure. Even 5 minutes is a lot better than 45! Is there anything I can be doing to help that? Any patterns to avoid that would cause it to take this long?

                      Thanks!
                      Jeff
                      • 8. Re: Questions re: opening a large BDB
                        greybird
                        Hi Jeff,

                        A performance issue with getPrimaryIndex is not related to recovery, which is performed in the Environment constructor. I've replied to this question on a new thread, so as not to confuse this one.
                        Please see my reply on the new thread:

                        getPrimaryIndex is slow

                        Thanks,
                        --mark