Discussions
Categories
- 17.9K All Categories
- 3.4K Industry Applications
- 3.4K Intelligent Advisor
- 75 Insurance
- 537.7K On-Premises Infrastructure
- 138.7K Analytics Software
- 38.6K Application Development Software
- 6.1K Cloud Platform
- 109.6K Database Software
- 17.6K Enterprise Manager
- 8.8K Hardware
- 71.3K Infrastructure Software
- 105.4K Integration
- 41.6K Security Software
Stopping cluster due to unhandled exception

Hi Coherence Experts,
During the recent performance test I got my Coherence cluster (v.12.2.1.2.0) unexpectedly shouted down with following stack trace in the logs:
2017-08-31 23:25:38.143/521798.489 Oracle Coherence GE 12.2.1.2.0 <Error> (thread=Transport:TransportService, member=5): Stopping cluster due to unhandled exception: java.lang.ThreadDeath
at java.lang.Thread.stop(Thread.java:850)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.tangosol.util.ClassHelper.invoke(ClassHelper.java:890)
at com.tangosol.util.ClassHelper.invoke(ClassHelper.java:845)
at com.tangosol.coherence.component.util.Daemon.halt(Daemon.CDB:11)
at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.halt(Service.CDB:1)
at com.tangosol.coherence.component.net.Cluster$TransportService.halt(Cluster.CDB:3)
at com.tangosol.coherence.component.net.Cluster.halt(Cluster.CDB:37)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.tangosol.util.ClassHelper.invoke(ClassHelper.java:890)
at com.tangosol.util.ClassHelper.invoke(ClassHelper.java:845)
at com.tangosol.internal.net.cluster.DefaultServiceFailurePolicy.onServiceFailed(DefaultServiceFailurePolicy.java:120)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$Guard.terminate(Grid.CDB:17)
at com.tangosol.internal.net.cluster.DefaultServiceFailurePolicy.onGuardableTerminate(DefaultServiceFailurePolicy.java:89)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$WrapperGuardable.terminate(Grid.CDB:14)
at com.tangosol.net.GuardSupport$Context$2.run(GuardSupport.java:697)
at java.lang.Thread.run(Thread.java:745)
At that moment Task Count had ~90,000 on each coherence storage node:
Task Backlog increased to 1,000:
Cache statistics:
Could anybody please advise how to investigate and fix the issue?
Answers
-
Hi Viktar,
This is the Coherence guardian terminating the node because some critical service had become unresponsive. The coherence logs should have more detail about which service as well as a full thread dump to help us identify where that thread was stuck. Can you please provide these. It would likely be best to open an SR so this can be tracked properly as well.
thanks,
Mark
Oracle Coherence