This content has been marked as final. Show 4 replies
In addition to the scenarios described in the documentation, are there any other known ways of producing corrupt secondaries? Particularly, what happens if an explicit transaction is left open and not aborted because of a failure of some sort?Just to double check, your secondary DBs are transactional also, correct?
When a transaction is left open, it is the same as aborting it. It will be effectively aborted by recovery, when you next open the environment.
The only think I can think of is that your secondary key creator has a bug of some kind, perhaps related to concurrency. The key creator must be thread-safe. You could post your key creator code to see if anyone can spot an error.
Also, any suggestions as to methods for tracking down the origin of the problem--for catching it in the act?If you save your .jdb files when the corruption is first detected, it would be possible to analyze the log (using DbPrintLog to get human readable output) and see what transaction is responsible for the corruption. To ensure that the relevant .jdb files have not been cleaned and deleted, you would need to catch the problem early, perhaps by scanning by secondary key after a crash. Of course, that may not be possible in a production app, but maybe you can take a snapshot of the logs after a crash and do the scan offline. Or you could set EnvironmentConfig.CLEANER_EXPUNGE to false so that no log files are deleted, but this is also problematic in a production app (you may run out of disk space). Also, the analysis requires knowledge of the internal log format. If you want to try this, I'll give you some pointers. However, I don't recommend trying this unless you're familiar with the basic idea of write-ahead logging for transactions. This is probably best only as a last resort.
Thanks for the quick response. Yes, the secondaries are also transactional. I will scrutinize the key creator some more to make sure it's really as thread-safe as I think it is. If I can rule that out, I will post that also.
Thanks for the suggestions. After studying this for awhile, I've concluded that the issue was, indeed, related to the key creator. I believe it was not consistently generating the same values under all circumstances rather than being a concurrency issue.
Glad it's resolved. I appreciate you letting us know, so we're not wondering about a possible bug.