we have a setup of 2 RG with 3 RNs each. We tried to load around 10'000'000 keys with relatively small values. Unfortunately, this caused our whole cluster to fail. The problem was that the replica nodes got their 40 GB (just for the data) partitions full and gave up. The two masters were still there with only around 5 GB of space taken in their partitions. What could be the reason for this discrepancy between a master and its replicas? The replicas contained warnings like this in their log files:
121108 17:05:06:073 WARNING [rg1-rn1] Cleaner has 33 files not deleted because they are protected by replication.
I checked that each replica had 37 JDB files in the env directory, the master just 4 JDB files. I guess that is where the whole storage went missing. Does anyone have an idea what the reason for such behaviour could be? If that would happen in production (which it will sooner or later if we do not know the reasons) it would be a disaster.
Unfortunately, I have no access to the logs any more because our admins were very fast to clean up the whole mess and setup the kv store anew.
I suspect that the problem is something that we know about and have been actively working on. The problem comes up when a store is run with very significantly undersized caches, and the application does a large number of updates, but then ceases all write activity. In the cases we've looked at, the store's underlying log cleaning falls behind during the time of the heavy application load. It would catch
up, except that a bug in some metadata maintenance is gating the log cleanup.
Your case sounds like that, except that you see asymmetrical behavior on the part of the master node. Does the application load consist of only updates, or of mixed updates and reads? That may have some bearing on the asymmetry.
Our R2 pre-release has some improvements for this problem, and we are actively working on a complete solution. But there are really two issues at hand. What you've seen is poor handling of the case when log cleaning falls behind, and we will be fixing that because it can cause the sort of catastrophic out of disk failure you see. But more fundamentally, it may also be that the store is not optimally configured for your load. Fixing the log cleaning issue might still leave you with performance that's not optimal.
We've got some documentation in the Admin Guide on how to come up with starting point configurations to best support the application load and the hardware. If you post more information on your application key and data size, and hardware, we can comment on what might work. For example, it sounds like your application might have large keys and small data. We find that smaller keys are generally more efficient in the NoSQL caches.
We'd be interested in getting more details about your application so that we can use that as a test case for the fix for the log cleaning issue. In our current test cases, the nodes of the cluster all have symmetrical behavior, unlike what you experienced. If that's possible, please contact me at linda dot q dot lee at oracle dot com.
the load consisted of sequences of inserts of large amounts of data (bulk loads), reads and then bulk deletes. I am not sure when it happened for the first time. However, I was able to reproduce the behaviour after a bulk insert with high load into a newly setup kv store. The RN processes were under heavy load even hours after the actual inserts have finished (my guess is a lot of java garbage collection).
Thanks to your hint, we increased the heap sizes and the BDB JE cache sizes of the replication nodes and were able to run the same load without any problems - and with improved response times, throughput and hard disc footprint.
I guess the problem was partly in the a bit misleading admin guide which lead us to believe that we can set the heap size and cache sizes using set policy command but we had to use plan -execute change-all-repnode-params "cacheSize=.." and plan -execute change-all-repnode-params "javaMiscParams=..".