3 Replies Latest reply: Sep 11, 2012 7:04 AM by 908890 RSS

    Cache deadlock

    837934
      We have had a few co-occurrence of a deadlock on one of our caches.

      Thread seems stuck in com.tangosol.net.internal.StorageVersion.waitForPendingUpdates method. Any clues?

      The stack trace for the worker threads on the offending node(two configured) is.

      Thread[WorkflowEntityDistributedSchemeWorker:0,5,WorkflowEntityDistributedScheme]
      INFO | 2012/01/24 08:40:32 | jvm 1 | java.lang.Object.wait(Native Method)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.util.SegmentedConcurrentMap$LockableEntry.waitForNotify(SegmentedConcurrentMap.java:939)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.util.SegmentedConcurrentMap.lock(SegmentedConcurrentMap.java:370)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCa
      che$ResourceCoordinator.lock(PartitionedCache.CDB:4)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCa
      che.lockKey(PartitionedCache.CDB:7)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCa
      che$InvocationContext.lockEntry(PartitionedCache.CDB:19)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.createQueryResult(PartitionedCache.CDB:59)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.query(PartitionedCache.CDB:72)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onInvokeFilterRequest(PartitionedCache.CDB:55)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$InvokeFilterRequest.run(PartitionedCache.CDB:1)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:1)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:32)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.DaemonPool$Daemon.onNotify(DaemonPool.CDB:63)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
      INFO | 2012/01/24 08:40:32 | jvm 1 | java.lang.Thread.run(Thread.java:662)
      INFO | 2012/01/24 08:40:32 | jvm 1 |
      INFO | 2012/01/24 08:40:32 | jvm 1 | Thread[WorkflowEntityDistributedSchemeWorker:1,5,WorkflowEntityDistributedScheme]
      INFO | 2012/01/24 08:40:32 | jvm 1 | sun.misc.Unsafe.park(Native Method)
      INFO | 2012/01/24 08:40:32 | jvm 1 | java.util.concurrent.locks.LockSupport.park(LockSupport.java:283)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.net.internal.StorageVersion.waitForPendingUpdates(StorageVersion.java:200)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.reevaluateQueryResults(PartitionedCache.CDB:39)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.checkIndexConsistency(PartitionedCache.CDB:52)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.createQueryResult(PartitionedCache.CDB:94)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.query(PartitionedCache.CDB:72)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onInvokeFilterRequest(PartitionedCache.CDB:55)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$InvokeFilterRequest.run(PartitionedCache.CDB:1)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:1)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:32)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.DaemonPool$Daemon.onNotify(DaemonPool.CDB:63)
      INFO | 2012/01/24 08:40:32 | jvm 1 | com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
      INFO | 2012/01/24 08:40:32 | jvm 1 | java.lang.Thread.run(Thread.java:662)





      We have the service guardian disabled at the moment but we have successfully resolved the issue by killing the offending node.
        • 1. Re: Cache deadlock
          MagnusE
          Just watch out so this dont occur at two or more nodes at the "same" time - then the guardian may cause data loss when killing the second node before the a backup of the first terminated node has been created...

          Under what circumstances does this occur? Do you use invocables, entry processors, transaction etc?

          /Magnus

          Edited by: MagnusE on Jan 24, 2012 2:40 PM
          • 2. Re: Cache deadlock
            837934
            We have not been able to correlate the deadlock with any thing else yet. This cache is used with put/get/keySet(filter) and entry processor for eviction.
            • 3. Re: Cache deadlock
              908890
              I am facing a similar deadlock where all threads are at the following state:

              "pool-3-thread-5" - Thread t@97
              java.lang.Thread.State: TIMED_WAITING on com.tangosol.util.SegmentedConcurrentMap$LockableEntry@2b8f4fb4
              at java.lang.Object.wait(Native Method)
              at com.tangosol.util.SegmentedConcurrentMap$LockableEntry.waitForNotify(SegmentedConcurrentMap.java:939)
              at com.tangosol.util.SegmentedConcurrentMap.lock(SegmentedConcurrentMap.java:370)
              at com.tangosol.net.cache.CachingMap.get(CachingMap.java:462)
              at com.nima.app.generic.CacheRepositoryImpl.getSingleByFilter(CacheRepositoryImpl.java:333)
              at com.nima.app.bdm.configurationdata.businessdate.BusinessDateRepositoryImpl.getBusinessDateByCalendarDate(BusinessDateRepositoryImpl.java:108)
              at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.getBusinessDate(DefaultProcessVariableEnricher.java:82)
              at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.isMonthEnd(DefaultProcessVariableEnricher.java:73)
              at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.enrichBusinessDate(DefaultProcessVariableEnricher.java:63)
              at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.enrichProcessVariables(DefaultProcessVariableEnricher.java:53)
              at com.nima.app.bpm.process.JBpmProcessExecutor.executeProcess(JBpmProcessExecutor.java:211)
              at com.nima.app.bpm.message.XmlMessageProcessor.processMessage(XmlMessageProcessor.java:35)
              at com.nima.app.bpm.message.JmsTextMessageListener.onMessage(JmsTextMessageListener.java:39)
              at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:562)
              at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:500)
              at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:468)
              at org.springframework.jms.listener.AbstractMessageListenerContainer.executeListener(AbstractMessageListenerContainer.java:440)
              at org.springframework.jms.listener.SimpleMessageListenerContainer.processMessage(SimpleMessageListenerContainer.java:340)
              at org.springframework.jms.listener.SimpleMessageListenerContainer$1$1.run(SimpleMessageListenerContainer.java:307)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:662)

              Locked ownable synchronizers:
              - locked java.util.concurrent.locks.ReentrantLock$NonfairSync@550d9e8e

              "pool-3-thread-4" - Thread t@96
              java.lang.Thread.State: TIMED_WAITING on com.tangosol.util.SegmentedConcurrentMap$LockableEntry@2b8f4fb4
              at java.lang.Object.wait(Native Method)
              at com.tangosol.util.SegmentedConcurrentMap$LockableEntry.waitForNotify(SegmentedConcurrentMap.java:939)
              at com.tangosol.util.SegmentedConcurrentMap.lock(SegmentedConcurrentMap.java:370)
              at com.tangosol.net.cache.CachingMap.get(CachingMap.java:462)
              at com.nima.app.generic.CacheRepositoryImpl.getSingleByFilter(CacheRepositoryImpl.java:333)
              at com.nima.app.bdm.configurationdata.businessdate.BusinessDateRepositoryImpl.getBusinessDateByCalendarDate(BusinessDateRepositoryImpl.java:108)
              at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.getBusinessDate(DefaultProcessVariableEnricher.java:82)
              at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.isMonthEnd(DefaultProcessVariableEnricher.java:73)
              at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.enrichBusinessDate(DefaultProcessVariableEnricher.java:63)
              at com.nima.app.bpm.process.parameter.DefaultProcessVariableEnricher.enrichProcessVariables(DefaultProcessVariableEnricher.java:53)
              at com.nima.app.bpm.process.JBpmProcessExecutor.executeProcess(JBpmProcessExecutor.java:211)
              at com.nima.app.bpm.message.XmlMessageProcessor.processMessage(XmlMessageProcessor.java:35)
              at com.nima.app.bpm.message.JmsTextMessageListener.onMessage(JmsTextMessageListener.java:39)
              at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:562)
              at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:500)
              at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:468)
              at org.springframework.jms.listener.AbstractMessageListenerContainer.executeListener(AbstractMessageListenerContainer.java:440)
              at org.springframework.jms.listener.SimpleMessageListenerContainer.processMessage(SimpleMessageListenerContainer.java:340)
              at org.springframework.jms.listener.SimpleMessageListenerContainer$1$1.run(SimpleMessageListenerContainer.java:307)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:662)

              Locked ownable synchronizers:
              - locked java.util.concurrent.locks.ReentrantLock$NonfairSync@7124a841

              any idea?