This site is currently read-only as we are migrating to Oracle Forums for an improved community experience. You will not be able to initiate activity until January 31st, when you will be able to use this site as normal.

    Forum Stats

  • 3,890,899 Users
  • 2,269,649 Discussions
  • 7,916,821 Comments

Discussions

Stopping cluster due to unhandled exception

Viktar C
Viktar C Member Posts: 31
edited Sep 8, 2017 9:43AM in Coherence Support

Hi Coherence Experts,

During the recent performance test I got my Coherence cluster (v.12.2.1.2.0) unexpectedly shouted down with following stack trace in the logs:

2017-08-31 23:25:38.143/521798.489 Oracle Coherence GE 12.2.1.2.0 <Error> (thread=Transport:TransportService, member=5): Stopping cluster due to unhandled exception: java.lang.ThreadDeath

       at java.lang.Thread.stop(Thread.java:850)

       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

       at java.lang.reflect.Method.invoke(Method.java:497)

       at com.tangosol.util.ClassHelper.invoke(ClassHelper.java:890)

       at com.tangosol.util.ClassHelper.invoke(ClassHelper.java:845)

       at com.tangosol.coherence.component.util.Daemon.halt(Daemon.CDB:11)

       at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.halt(Service.CDB:1)

       at com.tangosol.coherence.component.net.Cluster$TransportService.halt(Cluster.CDB:3)

       at com.tangosol.coherence.component.net.Cluster.halt(Cluster.CDB:37)

       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

       at java.lang.reflect.Method.invoke(Method.java:497)

       at com.tangosol.util.ClassHelper.invoke(ClassHelper.java:890)

       at com.tangosol.util.ClassHelper.invoke(ClassHelper.java:845)

       at com.tangosol.internal.net.cluster.DefaultServiceFailurePolicy.onServiceFailed(DefaultServiceFailurePolicy.java:120)

       at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$Guard.terminate(Grid.CDB:17)

       at com.tangosol.internal.net.cluster.DefaultServiceFailurePolicy.onGuardableTerminate(DefaultServiceFailurePolicy.java:89)

       at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$WrapperGuardable.terminate(Grid.CDB:14)

       at com.tangosol.net.GuardSupport$Context$2.run(GuardSupport.java:697)

       at java.lang.Thread.run(Thread.java:745)

At that moment Task Count had ~90,000 on each coherence storage node:

pastedImage_4.png

Task Backlog increased to 1,000:

pastedImage_3.png

Cache statistics:

pastedImage_5.png

Could anybody please advise how to investigate and fix the issue?

Answers

  • Mfalco-Oracle
    Mfalco-Oracle Member Posts: 503
    edited Sep 8, 2017 9:43AM

    Hi Viktar,

    This is the Coherence guardian terminating the node because some critical service had become unresponsive.  The coherence logs should have more detail about which service as well as a full thread dump to help us identify where that thread was stuck.  Can you please provide these.  It would likely be best to open an SR so this can be tracked properly as well.

    thanks,

    Mark

    Oracle Coherence

This discussion has been closed.