This discussion is archived
0 Replies Latest reply: Nov 30, 2012 6:53 PM by 899771 RSS

Oracle SOA Cluster soa-infra failing, startup coherence issues

899771 Newbie
Currently Being Moderated
Hi,

Have a 2 Node cluster od soa 11.1.1.5. Have 3 instances of 2 Node cluster running on the same machine. It's basically one for DEV,*TST*,*UAT* on the same machine. We have cronjobs that run every night shuts down all the cluster instances and then bring them back up. Recently one of our UAT soa cluster having issues. Both the Nodes come up fine but soa-infra in one node is not getting deployed and is down. I tried to manually start that node but no luck these are the errors I am getting in the Node where soa-infra is down. It's been working fine since recently. Any idea what might be wrong and where to look at. Any help is appreciated.

Inside the console all the servers are up and running but when going to Enterprise Manager it shows soa-infra down and only see one node.

####<Nov 28, 2012 11:10:23 PM CST> <Notice> <Stdout> <soadev01> <WLS_SOA1> <Logger@2067016530 3.6.0.4> <<WLS Kernel>> <> <> <1354165823364> <BEA-000000> <bpel.fatal.conection.max.retry is set to 3<Nov 28, 2012 11:10:23 PM CST> <Warning> <Coherence> <BEA-000000> <2012-11-28 23:10:23.360/533.658 Oracle Coherence GE 3.6.0.4 <Warning> (thread=Cluster, member=n/a): This Member(Id=0, Timestamp=2012-11-28 23:09:52.784, Address=9.9.10.95:8083, MachineId=3944, Location=site:xxorg.com,machine:web1,process:19968, Role=WeblogicServer) has been attempting to join the cluster using WKA list [web2.xxorg.com/8.8.18.85:8083, web1.xxorg.com/9.9.10.95:8083] for 30 seconds without success; this could indicate a mis-configured WKA, or it may simply be the result of a busy cluster or active failover.>> 
####<Nov 28, 2012 11:10:23 PM CST> <Notice> <Stdout> <soadev01> <WLS_SOA1> <Logger@2067016530 3.6.0.4> <<WLS Kernel>> <> <> <1354165823365> <BEA-000000> <<Nov 28, 2012 11:10:23 PM CST> <Warning> <Coherence> <BEA-000000> <2012-11-28 23:10:23.363/533.661 Oracle Coherence GE 3.6.0.4 <Warning> (thread=Cluster, member=n/a): Received a discovery message that indicates the presence of an existing cluster that does not respond to join requests; this is usually caused by a network layer failure:
Message "SeniorMemberHeartbeat"
  {
  FromMember=Member(Id=1, Timestamp=2012-11-28 21:28:09.017, Address=8.8.18.85:8083, MachineId=3968, Location=site:xxorg.com,machine:web2,process:3396, Role=WeblogicServer)
  FromMessageId=0
  Internal=false
  MessagePartCount=1
  PendingCount=0
  MessageType=17
  ToPollId=0
  Poll=null
  Packets
    {
    [000]=Broadcast{PacketType=0x0DDF00D2, ToId=0, FromId=1, Direction=Incoming, ReceivedMillis=23:10:23.353, MessageType=17, ServiceId=0, MessagePartCount=1, MessagePartIndex=0, Body=0}
    }
  Service=ClusterService{Name=Cluster, State=(SERVICE_STARTED, STATE_ANNOUNCE), Id=0, Version=3.6}
  ToMemberSet=null
  NotifySent=false
  
  LastRecvTimestamp=none
  MemberSet=MemberSet(Size=1, BitSetCount=1, ids=[1])
  }>> 


.............................
.....................................
........................................
<Nov 28, 2012 11:14:53 PM CST> <Error> <Deployer> <BEA-149231> <Unable to set the activation state to true for the application 'soa-infra'.
weblogic.application.ModuleException: 
     at weblogic.servlet.internal.WebAppModule.startContexts(WebAppModule.java:1510)
     at weblogic.servlet.internal.WebAppModule.start(WebAppModule.java:482)
     at weblogic.application.internal.flow.ModuleStateDriver$3.next(ModuleStateDriver.java:425)
     at weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:52)
     at weblogic.application.internal.flow.ModuleStateDriver.start(ModuleStateDriver.java:119)
     Truncated. see log file for complete stacktrace
Caused By: com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=0, Name=Cluster, Type=Cluster
  MemberSet=ServiceMemberSet(
    OldestMember=n/a
    ActualMemberSet=MemberSet(Size=0, BitSetCount=0
      )
    MemberId/ServiceVersion/ServiceJoined/MemberState
    )
)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:6)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:28)
     at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
     at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:637)
     at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
     Truncated. see log file for complete stacktrace
>
.....
...................
...........................

<anonymous>] [ecid: 0000JhFIXrrEWN0pnwh8iZ1Gi2DE000009,0] [APP: soa-infra] Cluster communication initialization failed.  If you are using multicast, please make sure multicast is enabled on your network and that there is no interference on the address in use.  Please see the documentation for more details.[[
com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=0, Name=Cluster, Type=Cluster
  MemberSet=ServiceMemberSet(
    OldestMember=n/a
    ActualMemberSet=MemberSet(Size=0, BitSetCount=0
      )
    MemberId/ServiceVersion/ServiceJoined/MemberState
    )
)

<Nov 29, 2012 9:20:13 PM CST> <Error> <Deployer> <BEA-149231> <Unable to set the activation state to true for the application 'soa-infra'.
weblogic.application.ModuleException: 
     at weblogic.servlet.internal.WebAppModule.startContexts(WebAppModule.java:1510)
     at weblogic.servlet.internal.WebAppModule.start(WebAppModule.java:482)
     at weblogic.application.internal.flow.ModuleStateDriver$3.next(ModuleStateDriver.java:425)
     at weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:52)
     at weblogic.application.internal.flow.ModuleStateDriver.start(ModuleStateDriver.java:119)
     Truncated. see log file for complete stacktrace
Caused By: com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(Id=0, Name=Cluster, Type=Cluster
  MemberSet=ServiceMemberSet(
    OldestMember=n/a
    ActualMemberSet=MemberSet(Size=0, BitSetCount=0
      )
    MemberId/ServiceVersion/ServiceJoined/MemberState
    )
)
As we have 2 physical machines and have DEV (2-node soa cluster), TST (2-node soa cluster), UAT (2-node soa cluster) all running on the same machines and comes up and down for daily DB maintenance everynight. As it the UAT environment any help appreciated in pointing to tle location to debug.

Also have Unicast operation, and use Well-Known Addressing (WKA) configured in Console under server(s)->start tab. But looks like something with coherence that's not working for the cluster to come up and failing. Here are the unicast coherence settings we have for both servers in the console under Start Server tab as per Oracle EDG
WLS_SOA1 

-Dtangosol.coherence.wka1=web1.xxorg.com
-Dtangosol.coherence.wka2=web2.xxorg.com
-Dtangosol.coherence.localhost=web1.xxorg.com
-Dtangosol.coherence.localport=8083
-Dtangosol.coherence.wka1.port=8083
-Dtangosol.coherence.wka2.port=8083

WLS_SOA2 

-Dtangosol.coherence.wka1=web1.xxorg.com
-Dtangosol.coherence.wka2=web2.xxorg.com
-Dtangosol.coherence.localhost=web2.xxorg.com
-Dtangosol.coherence.localport=8083
-Dtangosol.coherence.wka1.port=8083
-Dtangosol.coherence.wka2.port=8083
Thanks

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points