4 Replies Latest reply on Jan 10, 2020 2:58 AM by Chris San Buenaventura

    Federation ActiveActive Topology: Replication delay backing-up local writes

    Chris San Buenaventura

      Coherence version: 12.2.1.4.0

       

      We are using ActiveActive federation across two clusters (one cluster in London and one cluster in New York). Today we had prolonged network glitch which slowed down communication between our London and New York servers.

       

      What we have observed is due to the cross-Atlantic network delay, local writes to a cluster were getting backed up as well. Can you please check if this is a bug or expected behaviiour? If it is the latter then, is there anyway we can configure Federation so that the replication flow does not back-up local writes?

       

       

      We were getting below logs like below during the network glitch:

       

      2020-01-05T23:41:08,356 WARN  [Logger@9237753 12.2.1.4.0][Coherence] (thread=SelectionService(channels=7, selector=MultiplexedSelector(sun.nio.ch.EPollSelectorImpl@7e905460), id=150693841), member=4) tmb://10.53.200.125:9300.45391 accepted connection migration with tmb://10.53.200.126:9300.54452 on MultiplexedSocketChannel(MultiplexedSocket{Socket[addr=/10.53.200.126,port=9300,localport=52414]}): peer=tmb://10.53.200.126:9300.54452, state=ACTIVE, socket=MultiplexedSocket{Socket[addr=/10.53.200.126,port=9300,localport=52414]}, migrations=6, bytes(in=13683513, out=23235313), flushlock false, bufferedOut=6.95KB, unflushed=0B, delivered(in=54117, out=58173), timeout(ack=7.49s), interestOps=1, unflushed receipt=0, receiptReturn 0, isReceiptFlushRequired false, bufferedIn(), msgs(in=28914, out=29393/29412)

      java.io.IOException: ack timeout after 15s

              at com.oracle.common.internal.net.socketbus.BufferedSocketBus$BufferedConnection.checkHealth(BufferedSocketBus.java:890)

              at com.oracle.common.internal.net.socketbus.AbstractSocketBus$5.lambda$run$0(AbstractSocketBus.java:644)

              at com.oracle.common.internal.net.socketbus.AbstractSocketBus$5$$Lambda$208/626754434.accept(Unknown Source)

              at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)

              at com.oracle.common.internal.net.socketbus.AbstractSocketBus$5.run(AbstractSocketBus.java:644)

              at com.oracle.common.internal.net.socketbus.AbstractSocketBus$3.run(AbstractSocketBus.java:426)

              at com.oracle.common.internal.net.RunnableSelectionService.processRunnables(RunnableSelectionService.java:533)

              at com.oracle.common.internal.net.RunnableSelectionService.process(RunnableSelectionService.java:349)

              at com.oracle.common.internal.net.RunnableSelectionService.run(RunnableSelectionService.java:274)

              at com.oracle.common.internal.net.ResumableSelectionService.run(ResumableSelectionService.java:133)

              at java.lang.Thread.run(Thread.java:745)

       

      2020-01-05T23:41:33,974 WARN  [Logger@9237753 12.2.1.4.0][Coherence] (thread=SelectionService(channels=15, selector=MultiplexedSelector(sun.nio.ch.EPollSelectorImpl@2b773ba8), id=505818397), member=4) tmb://10.53.200.125:9300.45391 accepted connection migration with tmb://10.53.200.126:9300.54452 on MultiplexedSocketChannel(MultiplexedSocket{Socket[addr=/10.53.200.126,port=9300,localport=52530]}): peer=tmb://10.53.200.126:9300.54452, state=ACTIVE, socket=MultiplexedSocket{Socket[addr=/10.53.200.126,port=9300,localport=52530]}, migrations=7, bytes(in=13766783, out=23336481), flushlock false, bufferedOut=14.4KB, unflushed=0B, delivered(in=54389, out=58422), timeout(ack=2.13s), interestOps=1, unflushed receipt=0, receiptReturn 0, isReceiptFlushRequired false, bufferedIn(), msgs(in=29054, out=29524/29537)

      java.io.IOException: Connection reset by peer

              at sun.nio.ch.FileDispatcherImpl.readv0(Native Method)

              at sun.nio.ch.SocketDispatcher.readv(SocketDispatcher.java:43)

              at sun.nio.ch.IOUtil.read(IOUtil.java:278)

              at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:435)

              at com.oracle.common.internal.net.WrapperSocketChannel.read(WrapperSocketChannel.java:130)

              at com.oracle.common.internal.net.MultiplexedSocketProvider$MultiplexedSocketChannel.read(MultiplexedSocketProvider.java:1547)

              at com.oracle.common.internal.net.socketbus.AbstractSocketBus$Connection.read(AbstractSocketBus.java:1956)

              at com.oracle.common.internal.net.socketbus.BufferedSocketBus$BufferedConnection.read(BufferedSocketBus.java:93)

              at com.oracle.common.internal.net.socketbus.SocketMessageBus$MessageConnection$ReadBatch.read(SocketMessageBus.java:615)

              at com.oracle.common.internal.net.socketbus.SocketMessageBus$MessageConnection.processReads(SocketMessageBus.java:206)

              at com.oracle.common.internal.net.socketbus.BufferedSocketBus$BufferedConnection.onReadySafe(BufferedSocketBus.java:700)

              at com.oracle.common.internal.net.socketbus.AbstractSocketBus$Connection.onReady(AbstractSocketBus.java:2135)

              at com.oracle.common.internal.net.RunnableSelectionService.process(RunnableSelectionService.java:401)

              at com.oracle.common.internal.net.RunnableSelectionService.run(RunnableSelectionService.java:274)

              at com.oracle.common.internal.net.ResumableSelectionService.run(ResumableSelectionService.java:133)

              at java.lang.Thread.run(Thread.java:745)