4 Replies Latest reply: Feb 12, 2013 7:00 AM by user123799 RSS

    Ping timeouts at server and broken pipes at client

    985743
      I am experiencing unexpected broken pipe exceptions at the client side of a coherence server. These exceptions prevent our code to establish a connection to the cache server.

      Due to network restrictions, we are connecting client and server through Coherence*Extend.

      At the server side, the logs show the following message:

      DEBUG Coherence:3 - 2013-02-01 11:12:20.584/85.322 Oracle Coherence GE 3.7.1.5 <D6> (thread=Proxy:TcpProxyServicePof:TcpAcceptor, member=1): Closed: TcpConnection(Id=0x0000013C953D8F150AA202D9F632E5EAFD6BDDE1A713F026C65BC1E7CC1E5952, Open=false, Member(Id=0, Timestamp=2013-02-01 11:11:44.404, Address=127.0.0.1:0, MachineId=0, Location=site:,process:1612, Role=WeblogicServer), LocalAddress=10.162.2.217:28088, RemoteAddress=10.162.2.231:45202) due to:
      com.tangosol.net.messaging.ConnectionException: TcpConnection(Id=0x0000013C953D8F150AA202D9F632E5EAFD6BDDE1A713F026C65BC1E7CC1E5952, Open=true, Member(Id=0, Timestamp=2013-02-01 11:11:44.404, Address=127.0.0.1:0, MachineId=0, Location=site:,process:1612, Role=WeblogicServer), LocalAddress=10.162.2.217:28088, RemoteAddress=10.162.2.231:45202): did not receive a response to a ping within 500 millis
      at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Peer.checkPingTimeout(Peer.CDB:12)
      at com.tangosol.coherence.component.util.daemon.queueProcessor.service.peer.Acceptor.checkPingTimeouts(Acceptor.CDB:7)
      at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Peer.onNotify(Peer.CDB:115)
      at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
      at java.lang.Thread.run(Thread.java:662)

      And at the client side:

      ERROR (01/02/2013) 11:12:20 [threadsafe]Thread-15/AbstractComponentHandler Unable to create new instance 45067 ms
      com.tangosol.net.messaging.ConnectionException: TcpConnection(Id=0x0000013C953D8F150AA202D9F632E5EAFD6BDDE1A713F026C65BC1E7CC1E5952, Open=true, Member(Id=0, Timestamp=2013-02-01 11:11:44.404, Address=127.0.0.1:0, MachineId=0, Location=site:,process:1612, Role=WeblogicServer), LocalAddress=10.162.2.231:45202, RemoteAddress=10.162.2.217:28088)
      at com.tangosol.coherence.component.util.daemon.queueProcessor.service.peer.initiator.TcpInitiator$TcpConnection.send(TcpInitiator.CDB:35)
      at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Peer.send(Peer.CDB:29)
      at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Peer.post(Peer.CDB:23)
      at com.tangosol.coherence.component.net.extend.Channel.post(Channel.CDB:25)
      at com.tangosol.coherence.component.net.extend.Channel.request(Channel.CDB:18)
      at com.tangosol.coherence.component.net.extend.Channel.request(Channel.CDB:1)
      at com.tangosol.coherence.component.net.extend.RemoteNamedCache$BinaryCache.putAll(RemoteNamedCache.CDB:10)
      at com.tangosol.util.ConverterCollections$ConverterMap.putAll(ConverterCollections.java:1708)
      at com.tangosol.coherence.component.net.extend.RemoteNamedCache.putAll(RemoteNamedCache.CDB:1)
      at com.tangosol.coherence.component.util.SafeNamedCache.putAll(SafeNamedCache.CDB:1)
      at com.tangosol.net.cache.CachingMap.putAll(CachingMap.java:1023)
      ...
      Caused by: java.net.SocketException: Write failed: Broken pipe
      at jrockit.net.SocketNativeIO.writeBytesPinned(Native Method)
      at jrockit.net.SocketNativeIO.socketWrite(SocketNativeIO.java:46)
      at java.net.SocketOutputStream.socketWrite0(SocketOutputStream.java)
      at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
      at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
      at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
      at java.io.BufferedOutputStream.write(BufferedOutputStream.java:104)
      at java.io.DataOutputStream.write(DataOutputStream.java:90)
      at com.tangosol.coherence.component.util.daemon.queueProcessor.service.peer.initiator.TcpInitiator$TcpConnection.send(TcpInitiator.CDB:27)
      at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Peer.send(Peer.CDB:29)
      at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Peer.post(Peer.CDB:23)
      at com.tangosol.coherence.component.net.extend.Channel.post(Channel.CDB:25)
      at com.tangosol.coherence.component.net.extend.Channel.request(Channel.CDB:18)
      at com.tangosol.coherence.component.net.extend.Channel.request(Channel.CDB:1)
      at com.tangosol.coherence.component.net.extend.RemoteNamedCache$BinaryCache.putAll(RemoteNamedCache.CDB:10)
      at com.tangosol.util.ConverterCollections$ConverterMap.putAll(ConverterCollections.java:1703)
      at com.tangosol.coherence.component.net.extend.RemoteNamedCache.putAll(RemoteNamedCache.CDB:1)
      at com.tangosol.coherence.component.util.SafeNamedCache.putAll(SafeNamedCache.CDB:1)
      at com.tangosol.net.cache.CachingMap.putAll(CachingMap.java:1023)
      ...

      I can ping the client machine from the server without a problem.

      Has anybody seen this before?

      Edited by: 982740 on Feb 1, 2013 2:29 AM
        • 1. Re: Ping timeouts at server and broken pipes at client
          Angel.Ruiz
          Hi there,

          did you find a solution/root cause for this?

          Looks like your proxy service closed the connection to the client probably after some kind network outage.

          You could try to adjust the ping intervals and timeouts either on client or server (proxy service) side. More info in the configuration reference:

          http://docs.oracle.com/cd/E24290_01/coh.371/e22837/appendix_cacheconfig.htm#BABHAFCB

          Hope this helps.
          • 2. Re: Ping timeouts at server and broken pipes at client
            user123799
            Hi user,

            It seems from the log that when the proxy node accepts your client connection the client fails to respond to a ping request down the same pipe. Does this happen consistently? If so, no putAll would ever succeed.

            I have to admit I've never seen this behaviour before. I can't see how this would be a firewall port issue, as the connection has been established... the only things I can think of would be a) your weblogic client node is running way to hot and doesn't respond to the grids ping request in time, b) Some very aggressive network infrastructure is killing the connection just after its created, or c) aliens are interfering with your system.

            sorry I can't be of more help,

            Andy
            • 3. Re: Ping timeouts at server and broken pipes at client
              985743
              Thanks both for the information. It looks like the heartbeat rate was too much for the test environment network and server load. By increasing a bit both the heartbeat frequency and timeout, the errors stopped to show.

              I was starting to think that the alien theory was plausible :-D
              • 4. Re: Ping timeouts at server and broken pipes at client
                user123799
                Tin foil around your servers is the only cure I've found so far.

                Glad its all sorted.