Forum Stats

  • 3,853,631 Users
  • 2,264,247 Discussions
  • 7,905,419 Comments

Discussions

EPM 11.1.2.4 WebLogic 10.3.6 Stuck Thread but APP still running

User_AAD34
User_AAD34 Member Posts: 43 Red Ribbon
edited Jul 16, 2018 9:55PM in EPM System Infrastructure

Hi,

I'm doing a health check for EPM 11.1.2.4 (HFM) deployed in WebLogic 10.3.6.

Everything seems to be working fine (i.e. APP can be accessed, all managed server state are running, and health are OK), but if I navigate to the WebLogic under:

     Environment > Servers > EPMServer0 > Monitoring > Health

I could see the health show "Warning" and the reason "ThreadPool has stuck threads".

I've done some changes as per some blogs suggest:

1. Increased database connection pool

     Services > Data Sources > EPMSystemRegistry > Configuration > Connection Pool

     Maximum Capacity: from 15 to 500

2. Increased Max Stuck Thread

     Navigate to Environment > Servers > EPMServer0 > Configuration > Tuning

     Stuck Thread Max Time: from 600 to 3000

     Stuck Thread Timer Interval: from 600 to 3000

3. Increased JVM (Java Heap)

     Open Regedit

     Navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Hyperion Solutions\EPMServer0\HyS9EPMServer_epmsystem1

     Increase JVM -Xmx 8000m

After all amendment and server reboot, I still encountered the stuck thread after 3000 seconds.

Please find enclosed below the EPMserver log for your reference:

<Jul 10, 2018 9:01:01 AM> <Error> <Diagnostics> <BEA-320142> <An error was encountered while performing size based data retirement on archive EventsDataArchive

weblogic.diagnostics.accessor.DiagnosticDataAccessException: weblogic.store.PersistentStoreException: weblogic.store.PersistentStoreException: [Store:280029]The persistent store record 674 could not be found

    at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.deleteDataRecords(PersistentStoreDataArchive.java:1368)

    at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.retireOldestRecords(PersistentStoreDataArchive.java:1211)

    at weblogic.diagnostics.archive.DataRetirementByQuotaTaskImpl.performDataRetirement(DataRetirementByQuotaTaskImpl.java:92)

    at weblogic.diagnostics.archive.DataRetirementByQuotaTaskImpl.run(DataRetirementByQuotaTaskImpl.java:49)

    at weblogic.diagnostics.archive.DataRetirementTaskImpl.run(DataRetirementTaskImpl.java:261)

    Truncated. see log file for complete stacktrace

Caused By: weblogic.store.PersistentStoreException: weblogic.store.PersistentStoreException: [Store:280029]The persistent store record 674 could not be found

    at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.readRecord(PersistentStoreDataArchive.java:698)

    at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.readRecord(PersistentStoreDataArchive.java:668)

    at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.getWrapper(PersistentStoreDataArchive.java:1767)

    at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.removeGarbageInPage(PersistentStoreDataArchive.java:1813)

    at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.cleanupPages(PersistentStoreDataArchive.java:1697)

    Truncated. see log file for complete stacktrace

Caused By: weblogic.store.PersistentStoreException: [Store:280029]The persistent store record 674 could not be found

    at weblogic.store.io.file.FileStoreIO$TypeRecord.getSlot(FileStoreIO.java:1097)

    at weblogic.store.io.file.FileStoreIO.readInternal(FileStoreIO.java:262)

    at weblogic.store.io.file.FileStoreIO.read(FileStoreIO.java:253)

    at weblogic.store.internal.ReadRequest.run(ReadRequest.java:34)

    at weblogic.store.internal.StoreRequest.doTheIO(StoreRequest.java:64)

    Truncated. see log file for complete stacktrace

>

<Jul 10, 2018 9:22:15 AM> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "3,039" seconds working on the request "Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 3039944 ms

", which is more than the configured time (StuckThreadMaxTime) of "3,000" seconds. Stack trace:

Thread-206 "[STUCK] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, suspended, parked, priority=1, DAEMON> {

    sun.misc.Unsafe.park(Unsafe.java:???)

    java.util.concurrent.locks.LockSupport.park(LockSupport.java:154)

    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1981)

    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:393)

    com.hyperion.calcmgr.database.cache.CacheEventThread.run(CacheEventThread.java:37)

    com.hyperion.calcmgr.thread.WorkDelegate.run(WorkDelegate.java:30)

    weblogic.work.j2ee.J2EEWorkManager$WorkWithListener.run(J2EEWorkManager.java:170)

    weblogic.work.ExecuteThread.execute(ExecuteThread.java:250)

    weblogic.work.ExecuteThread.run(ExecuteThread.java:213)

}

>

<Jul 10, 2018 9:22:15 AM> <Notice> <Diagnostics> <BEA-320068> <Watch 'StuckThread' with severity 'Notice' on server 'EPMServer0' has triggered at Jul 10, 2018 9:22:15 AM. Notification details:

WatchRuleType: Log

WatchRule: (SEVERITY = 'Error') AND ((MSGID = 'WL-000337') OR (MSGID = 'BEA-000337'))

WatchData: DATE = Jul 10, 2018 9:22:15 AM SERVER = EPMServer0 MESSAGE = [STUCK] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "3,039" seconds working on the request "Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 3039944 ms

", which is more than the configured time (StuckThreadMaxTime) of "3,000" seconds. Stack trace:

Thread-206 "[STUCK] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, suspended, parked, priority=1, DAEMON> {

    sun.misc.Unsafe.park(Unsafe.java:???)

    java.util.concurrent.locks.LockSupport.park(LockSupport.java:154)

    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1981)

    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:393)

    com.hyperion.calcmgr.database.cache.CacheEventThread.run(CacheEventThread.java:37)

    com.hyperion.calcmgr.thread.WorkDelegate.run(WorkDelegate.java:30)

    weblogic.work.j2ee.J2EEWorkManager$WorkWithListener.run(J2EEWorkManager.java:170)

    weblogic.work.ExecuteThread.execute(ExecuteThread.java:250)

    weblogic.work.ExecuteThread.run(ExecuteThread.java:213)

}

SUBSYSTEM = WebLogicServer USERID = <WLS Kernel> SEVERITY = Error THREAD = [ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' MSGID = BEA-000337 MACHINE = APP01 TXID =  CONTEXTID =  TIMESTAMP = 1531228935954 

WatchAlarmType: AutomaticReset

WatchAlarmResetPeriod: 600000

>


Any one has suggestion on how to fix?

Any hint will be much appreciated.

Thanks!

Tagged:
«1

Answers

  • SureshM-Oracle
    SureshM-Oracle Member Posts: 407 Employee
    edited Jul 11, 2018 4:55AM

    If the issue is specific to HFM application, then you can try to increase the Stuck thread time for HFMWeb0 in Weblogic console.

    To know the steps, please see this KM Doc ID 2272709.1.

    Thanks.

    Suresh.

  • Vikram VK
    Vikram VK Member Posts: 1,204
    edited Jul 11, 2018 10:51AM

    Increasing stuck threads will only increase the timing of stuck thread occurrence i.e. in other words if it is happening after 600 seconds then after increasing it will happens 1200 (or whatsoever value you put) seconds.

    You should investigate why you are getting stuck threads by checking the log files of EPM Products for ex. if it is happening for Foundation then check all the log files for FoundationServices0, etc.

    Thanks.

  • user9928941
    user9928941 Member Posts: 222
    edited Jul 11, 2018 11:21AM

    Hi,

    Can you describe what are the actions being performed before you see the stuck threads?

    As user Vikram pointed out (even though you increased your time-out setting), your request is taking more than 3000 seconds which is not helping here.

    Thanks,

    Pawan.

  • Madhusudhan. M
    Madhusudhan. M Principal Consultant Member Posts: 1,908 Silver Trophy
    edited Jul 11, 2018 2:07PM

    looks like you deployed to single managed server EPMserver0, going through logs you can find what is happening exactly.

    check out domain logs you may get any clue about this issue.

    Thanks,

    Mady

  • User_AAD34
    User_AAD34 Member Posts: 43 Red Ribbon
    edited Jul 12, 2018 3:15AM

    Thanks Suresh,

    I did have increased the Stuck thread time for EPMServer0, as the stuck thread is under this managed server instead of HFMWeb0

    2. Increased Max Stuck Thread     Navigate to Environment > Servers > EPMServer0 > Configuration > Tuning     Stuck Thread Max Time: from 600 to 3000     Stuck Thread Timer Interval: from 600 to 3000

    Noted from other answers, increasing stuck threads only to increase the timing occurence, but it won't fix the issue.

    Thanks!

  • User_AAD34
    User_AAD34 Member Posts: 43 Red Ribbon
    edited Jul 12, 2018 3:16AM

    Thanks Vikram,

    I've tried to check the EPMServer0 log files (as I'm getting stucked thread at EPMServer0). The only hint I could see in the log beside the stuck thread is the persistent store error.

    I am currently trying to fix this issue first and see how it goes.

    Thanks!

  • User_AAD34
    User_AAD34 Member Posts: 43 Red Ribbon
    edited Jul 12, 2018 3:17AM

    Thanks Pawan,

    Before the occurence, we upgraded the server from a very old EPM version to the latest version i.e. EPM 11.1.2.4

    What interesting is it only occured in production environment, in which we can't replicate this issue in UAT environment.

    I've also logged this into Oracle Support, but still no respond as of now.

    They do have asked to change some settings though:

    1. Changing Logging level from default to Trace-32 (more detail logs)

    2. Change DSStartupOption from 2 to 0

    Thanks!

  • User_AAD34
    User_AAD34 Member Posts: 43 Red Ribbon
    edited Jul 12, 2018 3:18AM

    Hi Mady,

    Yes correct, some component is deployed into single EPMServer0. As what I've said to Vikram, the only clue is the persistent store error. Will try to fix this and get back to you guys!

    Thanks!

  • User_AAD34
    User_AAD34 Member Posts: 43 Red Ribbon
    edited Jul 16, 2018 11:14AM

    Hi,

    After fixing the persistent store issue, I am still encountering the stuck thread:

    Below is the log:

    <Jul 15, 2018 11:36:14 AM> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "3,043" seconds working on the request "Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 3043702 ms", which is more than the configured time (StuckThreadMaxTime) of "3,000" seconds. Stack trace:Thread-14 "[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, suspended, parked, priority=1, DAEMON> {    sun.misc.Unsafe.park(Unsafe.java:???)    java.util.concurrent.locks.LockSupport.park(LockSupport.java:154)    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1981)    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:393)    com.hyperion.calcmgr.database.cache.CacheEventThread.run(CacheEventThread.java:37)    com.hyperion.calcmgr.thread.WorkDelegate.run(WorkDelegate.java:30)    weblogic.work.j2ee.J2EEWorkManager$WorkWithListener.run(J2EEWorkManager.java:170)    weblogic.work.ExecuteThread.execute(ExecuteThread.java:250)    weblogic.work.ExecuteThread.run(ExecuteThread.java:213)}> <Jul 15, 2018 11:36:14 AM> <Notice> <Diagnostics> <BEA-320068> <Watch 'StuckThread' with severity 'Notice' on server 'EPMServer0' has triggered at Jul 15, 2018 11:36:14 AM. Notification details: WatchRuleType: Log WatchRule: (SEVERITY = 'Error') AND ((MSGID = 'WL-000337') OR (MSGID = 'BEA-000337')) WatchData: DATE = Jul 15, 2018 11:36:14 AM SERVER = EPMServer0 MESSAGE = [STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "3,043" seconds working on the request "Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 3043702 ms", which is more than the configured time (StuckThreadMaxTime) of "3,000" seconds. Stack trace:Thread-14 "[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, suspended, parked, priority=1, DAEMON> {    sun.misc.Unsafe.park(Unsafe.java:???)    java.util.concurrent.locks.LockSupport.park(LockSupport.java:154)    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1981)    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:393)    com.hyperion.calcmgr.database.cache.CacheEventThread.run(CacheEventThread.java:37)    com.hyperion.calcmgr.thread.WorkDelegate.run(WorkDelegate.java:30)    weblogic.work.j2ee.J2EEWorkManager$WorkWithListener.run(J2EEWorkManager.java:170)    weblogic.work.ExecuteThread.execute(ExecuteThread.java:250)    weblogic.work.ExecuteThread.run(ExecuteThread.java:213)} SUBSYSTEM = WebLogicServer USERID = <WLS Kernel> SEVERITY = Error THREAD = [ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)' MSGID = BEA-000337 MACHINE = APP01 TXID =  CONTEXTID =  TIMESTAMP = 1531668974062  WatchAlarmType: AutomaticReset WatchAlarmResetPeriod: 600000 > <Jul 16, 2018 12:26:14 PM> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "6,043" seconds working on the request "Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 6043720 ms", which is more than the configured time (StuckThreadMaxTime) of "3,000" seconds. Stack trace:Thread-14 "[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, suspended, parked, priority=1, DAEMON> {    sun.misc.Unsafe.park(Unsafe.java:???)    java.util.concurrent.locks.LockSupport.park(LockSupport.java:154)    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1981)    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:393)    com.hyperion.calcmgr.database.cache.CacheEventThread.run(CacheEventThread.java:37)    com.hyperion.calcmgr.thread.WorkDelegate.run(WorkDelegate.java:30)    weblogic.work.j2ee.J2EEWorkManager$WorkWithListener.run(J2EEWorkManager.java:170)    weblogic.work.ExecuteThread.execute(ExecuteThread.java:250)    weblogic.work.ExecuteThread.run(ExecuteThread.java:213)}> <Jul 16, 2018 12:26:14 PM> <Notice> <Diagnostics> <BEA-320068> <Watch 'StuckThread' with severity 'Notice' on server 'EPMServer0' has triggered at Jul 16, 2018 12:26:14 PM. Notification details: WatchRuleType: Log WatchRule: (SEVERITY = 'Error') AND ((MSGID = 'WL-000337') OR (MSGID = 'BEA-000337')) WatchData: DATE = Jul 16, 2018 12:26:14 PM SERVER = EPMServer0 MESSAGE = [STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "6,043" seconds working on the request "Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 6043720 ms", which is more than the configured time (StuckThreadMaxTime) of "3,000" seconds. Stack trace:Thread-14 "[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, suspended, parked, priority=1, DAEMON> {    sun.misc.Unsafe.park(Unsafe.java:???)    java.util.concurrent.locks.LockSupport.park(LockSupport.java:154)    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1981)    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:393)    com.hyperion.calcmgr.database.cache.CacheEventThread.run(CacheEventThread.java:37)    com.hyperion.calcmgr.thread.WorkDelegate.run(WorkDelegate.java:30)    weblogic.work.j2ee.J2EEWorkManager$WorkWithListener.run(J2EEWorkManager.java:170)    weblogic.work.ExecuteThread.execute(ExecuteThread.java:250)    weblogic.work.ExecuteThread.run(ExecuteThread.java:213)} SUBSYSTEM = WebLogicServer USERID = <WLS Kernel> SEVERITY = Error THREAD = [ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)' MSGID = BEA-000337 MACHINE = APP01 TXID =  CONTEXTID =  TIMESTAMP = 1531671974066  WatchAlarmType: AutomaticReset WatchAlarmResetPeriod: 600000 >

    Below please also find the dump thread stacks which showing the stuck:

    "[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" id=14 idx=0x50 tid=10960 prio=1 alive, parked, native_blocked, daemon                          -- Parking to wait for: java/util/concurrent/locks/[email protected]                          at jrockit/vm/Locks.park0(J)V(Native Method)                          at jrockit/vm/Locks.park(Locks.java:2230)                          at sun/misc/Unsafe.park(ZJ)V(Native Method)                          at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156)                          at java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)                          at java/util/concurrent/LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)                          at com/hyperion/calcmgr/database/cache/CacheEventThread.run(CacheEventThread.java:40)                          at com/hyperion/calcmgr/thread/WorkDelegate.run(WorkDelegate.java:30)                          at weblogic/work/j2ee/J2EEWorkManager$WorkWithListener.run(J2EEWorkManager.java:184)                          at weblogic/work/ExecuteThread.execute(ExecuteThread.java:256)                          at weblogic/work/ExecuteThread.run(ExecuteThread.java:221)                          at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)                          -- end of trace

    Thanks

  • Vikram VK
    Vikram VK Member Posts: 1,204
    edited Jul 16, 2018 3:31PM

    Ok I understand that you are getting stuck threads BTW do you see any impact on the Hyperion side say like navigating in workspace etc.

    Thanks.

This discussion has been closed.