I am facing this sporadic issue with my rmi server for the past few months. I have tried searching the internet world for any help but did not come across any solution that’s of help. Any help from you on this would be much appreciated!
Our rmi server is accessed by 100-200 clients at any point in time. 99% of the times, the server just works fine except suddenly it kind of hangs i.e., no new connections can be made from clients. The connection attempts from clients fail with error "SocketTimeOutException: read timed out" and on the server, the tcp connections are observed to be in SYN_RECV state (from netstat output). Server is running on Redhat linux and clients could run on any of windows, linux and mac. The frequency of the issue is random with once in few months to multiple times on a single day - the average is 2-3 times a month. After restarting the server, everything starts to work fine again.
On one such an event, we were able to talk to our network guys to get tcpdump trace logs on server port. Analyzing the dump, the following is the behavior
client -> SYN -> server (server accepts)
server -> SYN/ACK -> client (client accepts)
client -> ACK -> server (server does not accept)
server -> SYN/ACK -> client
client -> ACK -> server (server does not accept) . This is repeated 6 times based on tcp retry setting
server -> RST -> client (client gets timed out error at this point)
I am no expert of tcp and so for me, the strange thing is to see ACK message at the server port yet server resending syn/ack messages. Could you tell me what this behavior could mean? Any guidance towards debugging this issue will be really helpful.
Thanks in advance
PS: I am a newbie to forum so please feel free to let me know if I my post has any issues w.r.t forum decor/guidelines etc