We have a problem in our enviroment...
Have 2 nodes of AM, periodically we have some problems of degradation in each node, high times of response +30seconds, all seems perfect until some more traffic or requests begin to appears in the system, the rare is somes request going well and other going bad with more 30 seconds of time.... we are searching problems like memory leak but all is perfect "word of our support" even the machine is nice, not saturate....
Not found nothing in server.log, is like nothing happen but our times of resquest are so high even produces than in the other node the queue of notification rise so fast to the limit.
What could be?
could be a problem with our ldap connections than enqueue and affect to the listener?
In the moments of problems even with curls the port of AM response so so slowly more than 30 seconds, but seems fixed in some minutes so, we especulate with the high traffic, but the rare is the other node all is perfect! no problems, not saturate... Only have logs (a lot of) in the server.log of appserv of the node with problems like:
|SEVERE|sun-appserver-ee8.2|javax.enterprise.system.container.web|_ThreadID=73;|failure (11210): HTTP3068: Error
receiving request from xxxxx (Not connected)
And in PA we have some logs than means saturation or high load
"Error 30678:8779160 SSOTokenService::getSessionInfo(): Error 18 for sso token ID"
but like i said the other node not show that, and the high load is basically the same all days.
could we find some answers with a kill -3 of the instance?
We discarded problems with ndsltimeout of ldap firewall or balancer
The information you provided isn't enough to troubleshoot your problems. You will need to enable debug logs to get more detailed error messages. My guess would be that you have run out of ldap connections, either no available network sockets or threads on the listeners which results in your user being unable to login. This often happens when your ldap server is not sufficiently tuned, but this is only a guess. You will need to enable the debug logs to see what the problem is.
Thanks for the answer.
We have a problem in the debug logs of nodes, because generates so much traffic and sometimes we had problems in our production enviroment... then we have not chance to discover the origin without this level of log?
If the origin of problem is ldap server then we could find some tracks in a kill -3, with for example a huge amount of threads of ldap connections or similar?
Thanks again, we will take a look in our ldap connectors, pools and servers.
Thank you! seems works ir our enviroment, I will keep this command and use in the next degradation, I hope can find the origin and keep informed in the thread about the result
Thanks again, very helpful
We found a reason of this degradation, could be possible than, if one Policy Agent not answer, because the appserv -in this case weblogic- is saturated, could affected to the AM? I mean the appserv where the PA is installed depend of a finale system and this system is generating timeouts, and the appserv the is affected and the PA installed in it too, then all requests from the Access Manager have times of 30 40 50 seconds...
Can we have a solution configuring some specs in the AM or the solaris connections? like timeouts or time request?
I think the problem is this, because this PA installed attend the most request of the users
Anyway we thought than can fight to this tunning some specs in solaris like tcp_conn_req_max_q or tcp_conn_req_max_q0 and some timeouts like -Dsun.net.client.defaultConnectTimeout=10000 -Dsun.net.client.defaultReadTimeout=10000 but not seems enough because we are suffering this problems again.