We have a weblogic application running with a JDMK package to receive and process SNMP packets. This processor is running on a separate thread off a EJB. We just recently encountered an issue where after processing several packets the thread pauses in the middle of processing for about 28 seconds, then continues processing again for a second and pauses again. This keeps occurring. Note the times vary and the pause spot in the processing can also vary.
Moreover, depending on what Linux system weblogic is running on, the problem may go away. In all cases the Linux systems never get more than 20% busy by CPU time.
28 seconds sounds very near to 30s, the default Socket connection timeout on Linux....
If you are using JRockit, in JRMC there is an outstanding Latency analyzer, which lets you pick up even the tiniest wait time in every thread.
However, since 28 seconds is quite a human time, I would simply take thread dumps at 5 seconds interval (google for weblogic thread dump) and see where the thread(s) are blocking... most likely on a network i/o operation
Turns out the issue was related to OS level SNMP port IO timeout. There were routing issues causing DNS lookups a long time to resolve. This in-turn caused delays up to 28+ seconds in the JDMK SNMP manager during some snmpGet of OIDs and MIB tables. Since the same JDMK SNMP manager is used from trap recipient and listener it also caused delays in processing traps. We solved the issue by updating the resolv.conf to something that worked. This was verified using nslookup and traceroute commands on Linux.