4 Replies Latest reply on Jun 14, 2008 2:21 AM by 666705

    Cluster not load-balancing, ideas?

    666705
      I've been struggling to identify why my JMS producers are not load-balancing against a remote cluster.
                
                I've ruled out the producer as being the problem (I see the same non-load-balancing behavior regardless of what I use to create messages - Hermes, ALSB, simple Java producer...) I also don't think the JMS Connection Factory config is the problem, judging by the help I've received from folks over on the jms forum.
                
                I believe something is wrong with our cluster setup because in addition to the problem I just mentioned, we also are not seeing JNDI entries propagate to all managed servers - for example, if I create one jms queue on m1, that queue does not appear in the jndi tree on m2.
                
                I've been trying to find any documentation on what settings I should look at to verify the cluster configuration. If I go through the WLS console and look at the Cluster settings, I see both managed servers there, is there some other place that the configuration could be messed up?
                
                Added 6/11, 9:30 am:
                We're focusing on multicast now as the most likely problem. Can anyone tell me whether clusters on the same multicast address but different ports will interfere with each other? It looks like the infrastructure team has set up 5 clusters like that (same multicast address in each cluster, but different ports).
                
                We've got a ticket open with BEA but it's been two weeks now and nothing except requests for more information.
                
                Any ideas/help are much appreciated!
                
                Meghan
                
                --
                Edited by pietila at 06/11/2008 7:38 AM
        • 1. Re: Cluster not load-balancing, ideas?
          3004
          Meghan Pietila wrote:
                    > I've been struggling to identify why my JMS producers are not load-balancing against a remote cluster.
                    >
                    > I've ruled out the producer as being the problem (I see the same non-load-balancing behavior regardless of what I use to create messages - Hermes, ALSB, simple Java producer...) I also don't think the JMS Connection Factory config is the problem, judging by the help I've received from folks over on the jms forum.
                    >
                    > I believe something is wrong with our cluster setup because in addition to the problem I just mentioned, we also are not seeing JNDI entries propagate to all managed servers - for example, if I create one jms queue on m1, that queue does not appear in the jndi tree on m2.
                    >
                    > I've been trying to find any documentation on what settings I should look at to verify the cluster configuration. If I go through the WLS console and look at the Cluster settings, I see both managed servers there, is there some other place that the configuration could be messed up?
                    >
                    > Added 6/11, 9:30 am:
                    > We're focusing on multicast now as the most likely problem. Can anyone tell me whether clusters on the same multicast address but different ports will interfere with each other? It looks like the infrastructure team has set up 5 clusters like that (same multicast address in each cluster, but different ports).
                    >
                    > We've got a ticket open with BEA but it's been two weeks now and nothing except requests for more information.
                    >
                    > Any ideas/help are much appreciated!
                    >
                    > Meghan
                    >
                    > --
                    > Edited by pietila at 06/11/2008 7:38 AM
                    
                    You could be right. I think we have had problems where the same IP but
                    different ports were used for multicast. This is on 8.1 though.
                    
                    I think as a rule, it's best to have a different ip and port for each
                    cluster.
                    
                    Also - can you be sure that no one else is using the multicast addresses
                    on the network for anything else - we had someone bring up a test
                    cluster using our addresses which caused a few issues and took a while
                    to find! We also have security cameras which also use multicast, which
                    if they are using the same address/port can cause issues!
                    
                    We're using 239.192.1.4:8001 for one cluster and 239.192.1.3:7001 for
                    the other - I think it's best to keep those as different as you can.
                    
                    In 8.1, there is also the multicast monitor utility - there's a support
                    pattern on e-support on how to diagnose it. I've found this useful in
                    the past when I've suspected a cluster issue.
                    
                    https://support.bea.com/application_content/product_portlets/support_patterns/wls/MulticastErrorsPattern.html
                    
                    Check also that you're using a valid range for the address - we weren't
                    for a while and had odd problems from time to time.
                    
                    There are also cluster debug flags available which you'll see listed in
                    the support document.
                    
                    Are you seeing dropped multicast packets?
                    
                    Hope that helps.
                    
                    Pete
          • 2. Re: Cluster not load-balancing, ideas?
            666705
            Thanks very much, Pete!
                      
                      "I *think* we have had problems where the same IP but
                      different ports were used for multicast. This is on 8.1 though."
                      
                      That's good to hear. We're pushing BEA to assign a different engineer to the ticket we have open and I'll ask the question again if we get someone knowledgeable this time.
                      
                      "In 8.1, there is also the multicast monitor utility - there's a support
                      pattern on e-support on how to diagnose it. I've found this useful in
                      the past when I've suspected a cluster issue."
                      
                      Yesterday we were able to get console access to our Solaris clusters and we used a Multicast Test utility, not sure if it's exactly the same thing but it gave us some info:
                      http://edocs.bea.com/wls/docs92/admin_ref/utils.html#wp1199798
                      
                      We shut down one of the clusters and tested using its address. We did learn that even though the console is not reporting any dropped messages, the MulticastTest reported that neither of the managed servers is successfully communicating via multicast.
                      
                      That's a very good tip about the video cameras, I didn't even think about utilities like that. We've also wondered if maybe the routers are configured with a smaller multicast range than the default. On a different address, we found that one of our physical servers could successfully multicast but the other could not.
                      
                      
                      
                      Meghan
            • 3. Re: Cluster not load-balancing, ideas?
              666705
              Thanks very much, Pete!
                        
                        "I *think* we have had problems where the same IP but
                        different ports were used for multicast. This is on 8.1 though."
                        
                        That's good to hear. We're pushing BEA to assign a different engineer to the ticket we have open and I'll ask the question again if we get someone knowledgeable this time.
                        
                        "In 8.1, there is also the multicast monitor utility - there's a support
                        pattern on e-support on how to diagnose it. I've found this useful in
                        the past when I've suspected a cluster issue."
                        
                        Yesterday we were able to get console access to our Solaris clusters and we used a Multicast Test utility, not sure if it's exactly the same thing but it gave us some info:
                        http://edocs.bea.com/wls/docs92/admin_ref/utils.html#wp1199798
                        
                        We shut down one of the clusters and tested using its address. We did learn that even though the console is not reporting any dropped messages, the MulticastTest reported that neither of the managed servers is successfully communicating via multicast.
                        
                        That's a very good tip about the video cameras, I didn't even think about utilities like that. We've also wondered if maybe the routers are configured with a smaller multicast range than the default. On a different address, we found that one of our physical servers could successfully multicast but the other could not.
                        
                        "Check also that you're using a valid range for the address - we weren't for a while and had odd problems from time to time."
                        
                        That's good information! We have checked and rechecked this, but I didn't know it might work if it wasn't in the valid range, I assumed it would just completely fail.
                        
                        I'll read up on those cluster debug flags and see if they can help.
                        
                        I'll keep posting if we make any progress...
                        
                        
                        Meghan
              • 4. Re: Cluster not load-balancing, ideas?
                666705
                Looks like we've finally pinpointed the problem, the multicast addresses on 6 of our 7 clusters were not working on the network. We were able to find a multicast address that worked on one of the remaining 6 clusters, but not the rest yet... the heartening news is that once we got the one cluster configured with a working multicast ip, the distributed queues started working beautifully!
                          
                          Now to get someone who knows the ins and outs of the company network well enough to help us track down why so many multicast addresses are failing, and help us find some more valid ones...
                          
                          Meghan