We have an ADF application deployed in weblogic 10.3.2.
We have few PL/SQL's and SQL's which are complex and take around 5 minutes of time to complete at times. We also have the script running in database to kill long running queries so that it helps us to get rid of weblogic stuck threads to some extent.
Now when ever there is an SQL running for long time , we see a stuck thread casuing heap to grow very fast and casuing server to crash.
I know after a particular point of idle time thread is marked as stuck , but why does it filling the heap space - does stuck threads always causes heap to grow or is it just my specific case
any pointers on how to debug this issue to find the root cause.
Weblogic tunes itself to create new threads whenever needed. If threads are stuck, depending on the number of stuck threads and the number of available threads, weblogic will create new threads to serve the incoming requests.
Each thread itself will consume space from the heap, but usually it will not be enough to cause out of memory (it depends on size of heap, num of threads, etc). Having said that, while a thread is active or stuck, some of the instantiated objects might not be released. This could be one of the reasons why you are running out of memory.
Another possibility, is that your stored procedure is returning huge result sets. Could this be the case? Is it returning cursors or types?
I think you should look into the stored procedures and fix the issue there.
In the meantime, if you want to protect the server, you could try to use workmanagers to try to limit the number of threads in parallel. This will not resolve the issue. It means that weblogic will stop accepting new requests until there are threads (configurable) available, but it will protect (depending on how you configure it) against out of memory.
What is the current JVM heap size?
What kind of load are you handling?
How many active threads and stuck threads do you see in the worst case scenario?
Thanks for such good explanation.
We have 6 GB RAM and have application running in 4 servers. During peak time we have around 100 users using the system.
I will investigate further and see If my procedure or Sql is returning huge sets of data which might be causing out of memory.
Edited by: Naresh on Apr 1, 2013 11:59 AM
You can try creating a Work Manager Connection Pool in weblogic console. Path -> Home >Summary of Work Managers. After creating the conn pool, please go to configuration tab and check ignore stuck threads. I had a similar problem earlier and this helped me a lot.
100 users for 4 servers seems that it will generate a relatively small load. Having said that, I've seen weblogic running out of memory with one user using an application, because the resultset size wasn't being limited.
Just to confirm, the 6GB of RAM is the Java Heap size, right?
We have 6 GB RAM for Java Heap.
Application is relatively big and has around 120 pages , database tables has millions of records.
At times we have long running queries in database (due to sorting or filtering or any other reason) which is making thread go into stuck mode. I tried to replicate the scenario in local and dev environments by creating stuck threads (invoking long running queries which are running during production crash) - Now when I do this , I dont see heap growing abnormally either in DEV or local environments.
Please suggest on how to proceed to find the root cause.
I think this setup needs profiling. To start-off, i would recommend collecting heap dump or jrf (if jrockit), instructions for them are available on net, as it varies with jvm versions and vendors.
Post that you should look for the thread which you believe is stuck and a possible cause using OQL in memory analyzer tool over the heap dumps.
Conceptually, till the time the objects (or data if you say) is being referenced, JVM cannot garbage collect it. So, I would say we need to figure out what are the contents of the heap, before coming to a conclusion that the threads which are stuck is causing the issue. Why I say so it that, I have seen in past that many-a-times, we suspect stuck threads to be an issue but it turns out to be that we are "knocking the wrong door".
Long running threads (aka STUCK thread in WLS), will normally cause memory issues, if the references/task that its trying to perform has a large local data (i.e. data with the thread scope), so it the time the thread is not released the data is alive (irrespective of the fact that the data is used or not).
Now, my first thought of this situation is that stuck thread is not an issue, considering that you are not able to replicate this in UAT post simulating stuck threads. Thus please collect some heap dumps (may be when you start observing memory increase) and then review them comparatively, I think that should be helpful.
The suggestion given by AJ makes total sense.
At this stage, it's not even possible to correlate the Stuck thread with the OOM.
Depending on how often the issue happens in production, it might be more or less difficult to find the issue (how often does the issue happen?)
As AJ mentioned, you can use JRockit mission control or take a heap dump. Usually you would take a head dump on demand, and use Jrockit mission control for a limited period of time. You should definitely do this.
Additionally, you should:
1) Log GC, as follows: http://docs.oracle.com/cd/E15289_01/doc.40/e15061/verbose.htm
2) Use Weblogic dashboard. You can easily monitor the heap for your servers in realtime: http://<ADMINSERVER>/console/dashboard
3) Run a realistic performance test in either UAT or preprod, and monitor the heap. Testing with one user wouldn't help, as sessions are created per user, for example.
4) Talk to developers: What is being stored in the session? Is there data being cached? A code review might help
Thanks for all your suggestions.
We enabled flags to collect heap dump on production server crash , once I get the heap dump I should be able to see the live object instances occupying memory.
We see production server crash 2-3 times a month , so never know when this is going to happen again. I am the developer for the APP and we dont store much of data in session.
I will post results once I get heap dump.