My application infrastructure is like this: 4 separate physical machines , 2 of them have the applications which uses coherence cache and the other two have the standalone servers. So my application machines are getting coherence cache service from the other two. All 4 machines are powerful and fine machines. And applications are all storage disabled nodes.
My problem is: my standalone caches are getting slower and slower , after some time , like 6 days, they stop to give responses. And with this problem , my system just freeze. With restart of the cache servers , everytginh goes normal speed which I desire to be.
There is no CPU, memory or networking problems. And according to the coherence reports , there is no exceptional situation. but with every operation wth the cache( get(), putAll() or invokeAll etc.) it does not give any reponse after some time.
Any ideas appreciated
What version of coherence? Any chance your worker threads are all busy so it has no more threads left to do work?
When they are slow and non-responsive, doing a thread dump of them will certainly help identify what they are doing (because if they aren't doing tons of GC they are probably doing something, and the thread dump would show you what the threads are doing)