Forum Stats

  • 3,851,977 Users
  • 2,264,055 Discussions
  • 7,904,923 Comments

Discussions

One of the nodes in the cluster fail to come up with errors

User_VA8NF
User_VA8NF Member Posts: 10 Employee
edited Jan 3, 2018 12:03AM in Coherence Support

Hi,

We are using Coherence 12.2.1.0.4 and one of the nodes in the cluster failed to come up with the below errors. The issue got resolved automatically after multiple retries, no changes were done to the system. Could you please suggest if this is a known issue and how it can be resolved ?

2017-12-08 09:50:22.609 AEDT ERROR -  -  -  - Oracle Coherence 12.2.1.0.4 (thread=SelectionService(channels=16, selector=MultiplexedSelector([email protected]), id=922871524), member=39): Stopping cluster due to unhandled exception: java.lang.ArrayIndexOutOfBoundsException: -4350

        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ClusterService.getService(ClusterService.CDB:4)

        at com.tangosol.coherence.component.net.Cluster$TransportService$MessageHandler.getServiceById(Cluster.CDB:1)

        at com.tangosol.coherence.component.net.MessageHandler$Connection.prepareMessage(MessageHandler.CDB:16)

        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.TransportService$MessageHandler$Connection.prepareMessage(TransportService.CDB:3)

        at com.tangosol.coherence.component.net.MessageHandler.processMessage(MessageHandler.CDB:26)

        at com.tangosol.coherence.component.net.MessageHandler$EventCollector.add(MessageHandler.CDB:51)

        at com.oracle.common.internal.net.socketbus.AbstractSocketBus.addEvent(AbstractSocketBus.java:779)

        at com.oracle.common.internal.net.socketbus.SocketMessageBus$MessageConnection$ReadBatch.onReady(SocketMessageBus.java:546)

        at com.oracle.common.internal.net.socketbus.SocketMessageBus$MessageConnection$ReadBatch.read(SocketMessageBus.java:449)

        at com.oracle.common.internal.net.socketbus.SocketMessageBus$MessageConnection.processReads(SocketMessageBus.java:190)

        at com.oracle.common.internal.net.socketbus.BufferedSocketBus$BufferedConnection.onReadySafe(BufferedSocketBus.java:666)

        at com.oracle.common.internal.net.socketbus.AbstractSocketBus$Connection.onReady(AbstractSocketBus.java:2016)

        at com.oracle.common.internal.net.RunnableSelectionService.process(RunnableSelectionService.java:401)

        at com.oracle.common.internal.net.RunnableSelectionService.run(RunnableSelectionService.java:274)

        at com.oracle.common.internal.net.ResumableSelectionService.run(ResumableSelectionService.java:133)

        at java.lang.Thread.run(Thread.java:748)

Thanks & Regards

Shashidhar

Answers

  • Tmiddlet-Oracle
    Tmiddlet-Oracle Member Posts: 125
    edited Dec 18, 2017 12:32AM

    Hi Shashidhar

    The most recent patch for 12.2.1.0 is 12.2.1.0.6, (See https://updates.oracle.com/Orion/PatchDetails/process_form?patch_num=26790749 )

    I would suggest that you apply this patch and see if that resolves your issue as there is mention of an issue similar to yours in the readme.

    A couple of additional questions:

    1) Is this reproducible?

    2) What OS are you running on?

    Thanks

    Tim

  • User_VA8NF
    User_VA8NF Member Posts: 10 Employee
    edited Dec 18, 2017 1:01AM

    Thanks Tim ! I am running on RHEL 6.7. I would need to check if the issue is reproducible or not. I will confirm the same.

    Regards

    Shashidhar

  • User_VA8NF
    User_VA8NF Member Posts: 10 Employee
    edited Dec 18, 2017 6:59AM

    Hi Tim,

    The issue has not been reproduced since the last restart.  Also, the issues mentioned in the README for the suggested patch seems to be different from the current issue. The fixed issue in 12.2.1.0.6 happened during the recovery of snapshots. However we are facing an issue during the node start. Do you still recommend the patch upgrade ?

    Thanks & Regards

    Shashidhar

  • Tmiddlet-Oracle
    Tmiddlet-Oracle Member Posts: 125
    edited Dec 18, 2017 8:24AM

    The Bug I'm referring to is Bug 26608557 - which is fixed in 12.2.1.0.5.

    The bug you are referring to is 23211759 which is fix in 12.2.1.0.2.

    I would suggest applying the patch.

    Thanks

    Tim

  • Mfalco-Oracle
    Mfalco-Oracle Member Posts: 503
    edited Dec 18, 2017 10:50AM

    Bug 26608557 while similar sounding and in a related area of code would unfortunately not be able to trigger the reported error.  Technically the only way I can see this occurring is from packet or memory corruption, and if either of those was the cause it seems quite unlikely that it would have been reproducible at all.  I'm curious, while it was reproducing were you getting the exact same IndexOutOfBoundsException at index -4350 each time?

    thanks,

    Mark

    Oracle Coherence

  • User_VA8NF
    User_VA8NF Member Posts: 10 Employee
    edited Dec 18, 2017 11:36PM

    Hi Mark,

    We have observed this issue again today.

    1. Got the Array out of bounds exception issue.
    2. Stopped again and restarted with no change.
    3. It was hung for couple of hours.
    4. Stopped and restarted again . It was working fine .

    We are getting the same IndexOutOfBoundsException at index -4350 each time.

    Thanks & Regards

    Shashidhar

  • Mfalco-Oracle
    Mfalco-Oracle Member Posts: 503
    edited Dec 19, 2017 12:49PM

    Hi Shashidhar,

    At this point it would be best to move this to a proper SR where we can continue the investigation.

    thanks,

    Mark

    Oracle Coherence

  • User_VA8NF
    User_VA8NF Member Posts: 10 Employee
    edited Dec 22, 2017 1:55AM

    Thanks Mark, SR 3-16476212001 has been created.

    Regards

    Shashidhar

  • User_VA8NF
    User_VA8NF Member Posts: 10 Employee
    edited Jan 3, 2018 12:03AM

    Hi Mark,

    The SR resolution suggested an upgrade to 12.2.1.0.6. However even after an upgrade we see the same error. Could you check and let us know what can be an issue ?

    java.lang.ArrayIndexOutOfBoundsException: 1

            at com.tangosol.net.ConfigurableAddressProvider$2.next(ConfigurableAddressProvider.java:477)

            at com.tangosol.net.ConfigurableAddressProvider$2.next(ConfigurableAddressProvider.java:465)

            at com.tangosol.net.ConfigurableAddressProvider.getNextAddress(ConfigurableAddressProvider.java:150)

            at com.tangosol.net.internal.SubstitutionAddressProvider.getNextAddress(SubstitutionAddressProvider.java:37)

            at com.tangosol.net.CompositeAddressProvider$AddressIterator.advance(CompositeAddressProvider.java:342)

            at com.oracle.common.collections.AbstractStableIterator.hasNext(AbstractStableIterator.java:41)

            at com.tangosol.net.CompositeAddressProvider.ensureInternalIterator(CompositeAddressProvider.java:498)

            at com.tangosol.net.CompositeAddressProvider.getNextAddress(CompositeAddressProvider.java:109)

            at com.tangosol.net.CompositeAddressProvider$AddressIterator.advance(CompositeAddressProvider.java:342)

            at com.oracle.common.collections.AbstractStableIterator.hasNext(AbstractStableIterator.java:41)

            at java.util.AbstractCollection.clear(AbstractCollection.java:434)

            at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ClusterService.onTimerRunning(ClusterService.CDB:27)

            at com.tangosol.coherence.component.net.Cluster$ClusterService.onTimerRunning(Cluster.CDB:5)

            at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ClusterService.onNotify(ClusterService.CDB:19)

            at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:45)

    Thanks & Regards

    Shashidhar

This discussion has been closed.