I've been trying to reproduce this for a few hours now agains 3.6.0, 3.6.1 and internal builds fo 3.7. As I've had no luck, perhaps you can outline some of the start up order. Alternatively we could have a call so that we can isolate it.
Full post is here: deadlock on cluster startup - push replication + coherence 184.108.40.206 (assuming you have seen that).
Coherence v220.127.116.11, PRP v18.104.22.16819
Roughly speaking start up is as follows:
Start up 2 storage nodes with a small delay between each node starting (say 10-20 seconds). Each will call DefaultCacheServer.start().
Then on the first node to startup, and immediately after call CacheFactory.getCache() for around 80+ caches in a tight loop. (This forces the CoherencePushReplicationProvider to establishPublishingInfrastructureFor all these caches, before we load any data into cluster). This seemed to block after around 3 caches into the 80+ caches.
On the node doing the tight loop I get the first set of thread dumps (see post on Coherence forum), and on the other node i get the second set of thread dumps.