This content has been marked as final. Show 4 replies
Thank you for your bug report and analysis. I have opened SR [#21786] for this.
You are right, and have detected a flaw in the implementation of this method. It is indeed trying to do the entire delete in one transaction, and is therefore vulnerable to accumulating a huge set of changes, requiring a lot of memory. We certainly need to change make this more constrained, and the deletion should occur as a series of transactions. The change is slightly trickier than meets the eye because we need to traverse the metadata to be deleted in reverse order; right now, it's being deleted from end -> front, and to chunk it, we need to go front -> end. That may have some dependencies, which we will have to think about and test.
One reason we have never noticed this before is because it can only occur when you are going to delete a really large amount of data in one attempt. The JE log cleaner is designed to work steadily in the background, cleaning in small increments. This kind of sudden burst of cleaning is unexpected. The VLSNIndex problem is a bug that we will fix, but this kind of heavy log cleaning is certainly going to cause your application performance problems, in that disk space is not being incrementally reclaimed. Does it make sense to you that there is such a sudden burst of log cleaning? Are there some characteristics of your application that might cause this?
In the meantime, do you need to recover this environment? I think it might take a little while to work out a fix, and perhaps there is a way to force incremental cleaning of your log, so that the VLSNINdex deletion also ends up being incremental. I would have to do some investigation to see if there is a workaround; let me know if you need that.
Thank you for the bug report,
You're right that a truncation would make a lot of data obsolete. But nevertheless, the log cleaner would still process the JE .jdb files in an incremental fashion. However, what you said makes me realize that there is a difference between the way the actual log cleaning works, and the maintenance of VLSN metadata, which is used for JE HA. The VLSN metadata is getting sorta truncated in a fashion, (or beheaded) reallly, rather than following the same pattern as the log cleaning.
Well, one way or another, it's our bug.
Thank you for your timely reply,and your reminding about log cleaning performance in my case.
Yes, this kind of sudden burst of cleaning may occurs in my application:
My application is a distributed HA cluster. When we add "new" group (two-nodes bdb je group) to cluster,a lot of data will be moved out from "old" groups to "new" groups.
And after rebalancing data,we will remove obsoleted data from "old" groups;thus sudden burst of cleaning occurses.
Thank you for paying attention about recovering of my bdb environment.
Because I found this problem in a test environment,so I do not need to recover it at this time; and I look forward to your next release fixing this little bug.
Thank you for your reply,