This content has been marked as final. Show 8 replies
i am also using com.sleepycat.je.Database.count() asDatabase.count() is pretty darned slow. At one point, we made it somewhat faster, and calling Database.count() is faster than scanning the database record by record yourself, but it's still a pretty expensive operation. That's very likely to be your culprit.
part of the open operation (and later at close time
to compare). i read somewhere that count() is very
fast but could someone confirm (or deny)
If you close the environment cleanly (by calling Environment.close()), the subsequent open should be very fast, and should only need to read a bit of the last .jdb file or two, so the number of files is not a factor. If your environment did not shut down cleanly, you might have to read a bit more, depending on the length of your last checkpoint, but it still should be many orders of magnitude less than 5 minutes.
Database.count() is pretty darned slow. At one point,Thank you for the info. I instrumented our code and we found that while .count() was indeed slow it was not the main culprit. Environment open is by far the most costly operation. We're closing the environment cleanly and when opening it takes a very very long time. As our database grows so does the startup time. It actually took 15 minutes this time (the db has grown substantially since our last startup)
we made it somewhat faster, and calling
Database.count() is faster than scanning the database
record by record yourself, but it's still a pretty
expensive operation. That's very likely to be your
If you close the environment cleanly (by calling
Environment.close()), the subsequent open should be
very fast, and should only need to read a bit of the
last .jdb file or two, so the number of files is not
a factor. If your environment did not shut down
cleanly, you might have to read a bit more, depending
on the length of your last checkpoint, but it still
should be many orders of magnitude less than 5
Any ideas? This is pretty much a write once read many database. We almost never update records or delete them.
Ok, that's surprising news. Let me explain what happens when you open an environment. That will give you the rationale for some next steps to try.
When you instantiate the environment, JE runs what is often called "recovery" in database parlance. It reads the transactional history to recreate a consistent view of the data held within. Like many database products, JE periodically writes what are called checkpoint records into the logs, which are points of known consistency. Checkpoints speed up recovery because they reduce the amount of transactional history that must be perused at recovery. Instead of having to read the entire history, JE starts at a checkpoint, and then only reads what follows that checkpoint. Any unfinished transactions that follow the checkpoint are deemed dead, and are rolled back. Any finished, committed transactions after the checkpoint are re-applied. After these steps, JE has a consistent view of the data. These steps are taken even for an environment opened in non-transactional mode. (There are still operations that need to be reapplied in non-transactional mode).
So when you say that the environment open is very slow, the first thing I wonder is where the last checkpoint was. If you call Environment.close() in a read/write environment, a checkpoint should always be written at the very end of the log, which makes the next open very fast. Perhaps you're always opening the environment in read/only mode, and that last checkpoint isn't going out? One easy experiment to try is to open your environment in read/write mode, call Environment.close(), and then try a re-open.
If that doesn't do the trick, we have to figure out where your last checkpoint is. If you do this command:
java -jar <your je.jar> DbPrintLog -h <environment directory> -ty20 > foo.xml
you'd see a whole bunch of log dump, with lines that start like this.
<entry lsn="0x2/0x442b2" type="CkptEnd/6" ...
Look at the last record in foo.xml. That "lsn" value is the location of the checkpoint records -- in this case, in file 2, at byte 0x442b2. If there are many bytes between it and the end of the log, (like more than four or five files worth), then that's a smoking gun as to why the recovery is so slow. Then we have to figure out why closing the environment doesn't give you a checkpoint end record at the very end.
thanks again linda, i'll let you know how this goes during our nighttime maint window. in the meantime, what's the best place to learn more about all these utility classes (you mentioned DbPrintLog and i was reading about DbSpace earlier).
Hi again, I forgot to mention that our database is being opened for transactional access. We did some testing and found that a non-transactional open is much faster. Did everything you had to say apply to both transactional and non-transactional bdbs?
The same basic steps are taken in a non-transactional environment, but there's never any operations to undo, because there are never any in-progress-not-committed-at-close transactions. But there should be no difference at open that I can think of, because the determining thing is what happened before the open.
For example, suppose at t1, you opened and operated the environment transactionally. Then at t100, you re-opened the environment, non-transactionally. What happened from t1-t99 is what determines what has to be done at t100 -- it should be the same, no matter how you open at t100, because the basic task is still to recreate a consistent environment.
I have a BDB/JE with a table in it that is taking about 45 minutes to open. It is about 23GB of data, and there are around 20,000,000 records of the problem type. I'm using the DPL and all the time is taken up in the getPrimaryIndex method for this type. I'm assuming that this is the point where the DB needs to do its consistency check, and I suspect that this particular database was not closed cleanly. However, based on what I've read here, it seems like even if the database wasn't closed cleanly, there should be a checkpoint somewhere near the end of the DB that allows it to open fairly quickly, if not as fast as it would normally. I'm fairly certain this DB was created by a program that did a whole lot of puts of new objects and not any updating/deleting. It is possible that some long running read-only transactions from cursors could have been interrupted when the DB was closed. Could that kind of thing explain this excessively long time?
I tried running the DbPrintLog command above, although I could only run it without the "-ty20" command without an error. It gave me 100 million lines of XML, and I can tell you that there is a "CkptEnd" line within the last 20,000 lines of the file (I don't have an exact number -- it's a lot of data to filter!).
I'd love to have the DB open faster than this even following a failure. Even 5 minutes is a lot better than 45! Is there anything I can be doing to help that? Any patterns to avoid that would cause it to take this long?
A performance issue with getPrimaryIndex is not related to recovery, which is performed in the Environment constructor. I've replied to this question on a new thread, so as not to confuse this one.
Please see my reply on the new thread:
getPrimaryIndex is slow