I am stress testing my web-application on WLS containers (weblogic 10.3g). I am doing this by sending multiple http POST requests to the server. After some time the server just hangs unable to accept any request from the browser client. Even other web-applications which are deployed onto same container stop responding. But the services which are deployed onto different container are still working fine. The server after sometimes recovers by itself. Seeing at the servers I can say that there are no signs of OOM. My initial guess is that threads are getting used by my application by the thread dump doesn't show so.
I am using AbstarctAsynchrounousServlets provided by weblogic and weblogic.timers. Should this cause a problem?
Thread dump summary:
11 occurences of following threads
"Thread-11" daemon prio=10 tid=0x00002aaab6509000 nid=0x6e8f waiting on condition [0x00000000425d2000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000e177fbb8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at weblogic.utils.concurrent.JDK15ConcurrentBlockingQueue.take(JDK15ConcurrentBlockingQueue.java:89) at weblogic.store.internal.PersistentStoreImpl.getOutstandingWork(PersistentStoreImpl.java:650) at weblogic.store.internal.PersistentStoreImpl.run(PersistentStoreImpl.java:707) at weblogic.store.internal.PersistentStoreImpl$2.run(PersistentStoreImpl.java:464)
7 Timer threads:
"Timer-1" daemon prio=10 tid=0x00002aaab537e800 nid=0x6e81 in Object.wait() [0x0000000041d40000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.util.TimerThread.mainLoop(Timer.java:509) - locked <
0x00000000e149e290> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:462)
3 occurances of weblogic.socket.Muxer
It seems that you might need to tune few parameters from weblogic side as show below and then try to test and see if you get any improvements Console Path: Servers > Configuration (tab) > Tuning (sub-tab)
Tune the above parameter value Accept Backlog:
The number of backlogged, new TCP connection requests that should be allowed for this server's regular and SSL ports. Setting the backlog to 0 may prevent this server from accepting any incoming connection on some operating systems. Console Path: Servers > Protocols (tab) > HTTP (sub-tab)
Tune the above two parameters values Post Timeout:
The amount of time this server waits between receiving chunks of data in an HTTP POST data before it times out. This is used to prevent denial-of-service attacks that attempt to overload the server with POST data. Duration:
The amount of time this server waits before closing an inactive HTTP connection.Number of seconds to maintain HTTP keep-alive before timing out the request. Topic: Tune the Chunk Parameters
However the values for this parameters are totally dependent end to end environments
Hope above information helps you.
I would also recommend that you troubleshoot further if your problem is really regarding a HTTP request (socket) backlog.
On top of these above suggestions, can you please generate a few netstat snapshots during your load test (netstat or netstat -an) from your application server. This will allow you to track down any growing Socket backlog. Look for any excessive pool of sockets attached to your WL server socket. You will also be able to compare such snapshot before and after any tuning performed.
Also, take a look and see if you have this Exception in the logs: Too many open files. If you do, this indicates that you reached your maximum capacity of File Descriptors / sockets for your load testing environment.