we are using a BDB instance as a task queue. When the process is being killed while heavily used (means writing new tasks to and reading next from it) it will not start up again, but provide this error:
Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception:
(JE 4.0.103) master(-1):/var/data/bdb/store recoveryTracker
should overlap or follow on disk last VLSN of 52,678 recoveryFirst= 52,680 UNEXPECTED_STATE_FATAL: Unexpected
internal stat e, unable to continue. Environment is invalid and must be closed.
... 35 more
The error is always the same: recoveryFirst is 2 ahead of VLSN, although it should be the same or smaller.
We are using replica settings but only one node so far.
Any idea what can be the cause, or how to get the BDB open anyway?
Edited by: 846360 on 22.03.2011 06:25
Our apologies for this problem! This is either a bug or an over aggressive assertion. We'll need some log files to diagnose this and make further recommendations.
Can you pull together the following information and email it to linda dot q dot lee at oracle dot com? At that point, we'll take this offline and report the resolution back to the forum.
1. Is this problem always reproducible? If so, does it happen on the master, the replica, or both?
2. Can you do an " ls -ls " of an environment that has had this problem, and include it in the email?
3. Can you execute this command:
java -jar <je.jar> DbPrintLog -h <environment directory> -ty 19,20 > info.txt
and include it in the email? This last command will show where the checkpoint start and end entries are in the log, which tells us which portion of the log is read when the environment is being re-opened, or recovered. We are looking for the last complete checkpoint start/checkpoint end pair. Then we'll ask you to send us the .jdb files that correspond to that portion. It's usually the last few .jdb files of the environment.
I just wanted to note to you and to others on the forum that when we ask for log files, we are usually interested in the metadata in the log, and the headers of log entries. If there are any restrictions with sharing your application's data with us when we request log files, we can instruct you on how to dump logs in such a way that the application data will be obfuscated or omitted, leaving behind the log entry headers and JE internal metadata.
Thought it might be of interest to all!
the problem is still valid :( What we did:
We are pushing a lot of data to the BDB (write only) and reading it simultaneously (opening cursors and walking over them).
When shutting down the tomcat by tomcat restart
While trying to come up it produces the known error. (after some tries)
right now to prevent this we have to first get the BDB done by not putting any write requests to it, wait util it is finished, and then stop the service - possible but ugly :)
btw: we managed to get the queue unstable also be killing tomcat with kill -9 so this is not a way to go around this :(
Thanks for your help.
Edited by: kurellajunior on 30.03.2011 01:55
With your help, we've just figured out the cause of the problem. It's a BDB JE bug, and is caused by a race condition which unfortunately, you seem able to trigger. We expect the fix to be small in terms of lines of code, but to be subtle enough that we need to review and test it very well. Our next steps are to decide on the fix and write a test that can trigger the problem.
The SR# for this is [#19754], and the fix will appear in the change log under that tag. However, we'll keep you updated specifically, and in fact, it's likely that we'll ask you to verify it at your site, since you are able to trigger the problem.
Thank you very much for your assistance in finding this bug.
Just for the notes to the team:
This was very convincing support! If you ever come across some new boss, that want s to save some money on cuttong down support resources, slap him from me and show him this post. :)
Your support convinced me that my colleagues decision for this key-value-store was the right for a production environment. wave and see you soon (or at least your product)
Jan Kurella, NOKIA Gate5 GmbH