5 Replies Latest reply: Oct 11, 2012 5:31 AM by user123799 RSS

    Dynamic and intelligent re-balancing of coherence partitions

    965779
      Can Coherence dynamically route more requests intelligently to better performing nodes on the same cluster? i.e. can it rebalance its partitions dynamically such that there are either more partitions or more frequently accessed partitions made to reside on the more powerful nodes of the cluster.

      Edited by: 962776 on Oct 3, 2012 10:03 AM
        • 1. Re: Dynamic and intelligent re-balancing of coherence partitions
          robvarga
          962776 wrote:
          Can Coherence dynamically route more requests intelligently to better performing nodes on the same cluster? i.e. can it rebalance its partitions dynamically such that there are either more partitions or more frequently accessed partitions made to reside on the more powerful nodes of the cluster.
          Hi,

          this is kind of a loaded question, which is not as simple as it seems so let me step back a bit and put it into context.

          Coherence maintains one copy of each partition which serves read and write behaviour.

          It does not maintain multiple copies of the same partition even for read purposes (the redundant copies (backups) are just for high availability). This means that there is a single node which the client needs to communicate with, and does not need to maintain load information upon which it would need to decide where to route the request. You could possibly implement lagging-read behaviour to be serviced by the backup nodes yourself on top of Coherence, but it does not come out of the box, and you would need to transmit the load information to each other node which comes with an overhead. Lagging-read here meaning that you may read a not-up-to-date copy from backup nodes.

          As for write operations, Coherence depends on there being only at most one active copy for write purposes, therefore you should not expect this to change anytime soon or not so soon...

          As for your original question: can ownership be rebalanced based on the load?

          It is not possible to do it out-of-the-box. Also, the functionality would come with its own gotchas:

          - load information needs to be transmitted to the senior node (this may already be done behind the scenes)

          - it may not be so simple to reconcile decisions based on load information with decisions based on balancing and dispersing partitions so that you get to a balanced and safe distribution while also optimizing the load

          - Would you want to move partitions just based on the load, not only for availability/safety reasons? The current behaviour theoretically does something similar, if you consider the load cost being a constant one per partition. The operative word is constant.
          Once a balanced and safe distribution is reached, Coherence does not want to move partitions around.
          Why is this important? Because you cannot process operations while a partition is on the move.
          If you wanted the load to be actually variable instead of constant 1, then it would possibly lead to much more frequent partition movements which would add additional load to the network and increase message latencies in general and in particular hinder responsiveness of the partitions on the move, which according to your request would likely be those which are already the most heavily loaded. That would possibly be counterproductive.

          So yes, theoretically it is possible, you could implement it with 3.7.1+ by writing your own implementation of PartitionAssignmentStrategy. But is it practical? It really depends on your exact system.

          An alternative approach which could tackle the problem may be changing your key partitioning algorithm so that it balances data in the partitions more evenly, maybe that will also translate to a more even distribution of the load. You would have to try and measure it. Also, you may want to look at BroadKeyPartitioningStrategy. Of course if you have a few very hot associated keys, that would make any single partition hot and that cannot really be helped.


          Best regards,

          Robert
          • 3. Re: Dynamic and intelligent re-balancing of coherence partitions
            user123799
            Hi user,

            I notice that your question mentions moving partitions onto 'more powerful nodes'. I take this to mean nodes running on more powerful servers and if this is the case I'd suggest your problem may be more easily solved by only have servers of a very similar specification. I run all the production servers in one grid at an identical spec, so there are not issues with partitions being to 'slow servers'.

            If this is not an option for you then, on top of what Rob suggests, it should be theoretically possible to implement your own partition strategy that stores more partitions on boxes with more resources. (Assuming Coherence doesn't enforce an even distribution, which I can't say as I haven't tried it). This strategy would not require partitions to move but, given a roughly even load-per-partition, would see less requests going to the slower boxes and more to the quicker ones. Obviously, as Rob also mentions, you'd want to still ensure you're not getting performance at the cost of HA.

            Andy.
            • 4. Re: Dynamic and intelligent re-balancing of coherence partitions
              robvarga
              BigAndy wrote:
              Hi user,

              I notice that your question mentions moving partitions onto 'more powerful nodes'. I take this to mean nodes running on more powerful servers and if this is the case I'd suggest your problem may be more easily solved by only have servers of a very similar specification. I run all the production servers in one grid at an identical spec, so there are not issues with partitions being to 'slow servers'.
              I think he was more worried about not having an even distribution among partitions and if hotter partitions group up on the same node, or same box, it will inevitably cause issues.
              If this is not an option for you then, on top of what Rob suggests, it should be theoretically possible to implement your own partition strategy that stores more partitions on boxes with more resources. (Assuming Coherence doesn't enforce an even distribution, which I can't say as I haven't tried it). This strategy would not require partitions to move but, given a roughly even load-per-partition, would see less requests going to the slower boxes and more to the quicker ones. Obviously, as Rob also mentions, you'd want to still ensure you're not getting performance at the cost of HA.

              Andy.
              A simpler solution to that is to just run proportionally more nodes / box on stronger boxes to mirror the difference in power. Of course power is not simply clock speed * number of cores, it should carefully be evaluated because memory size and architecture, network bandwidth, operating system and other factors all affect performance.

              Best regards,

              Robert
              • 5. Re: Dynamic and intelligent re-balancing of coherence partitions
                user123799
                Good point about running more / less nodes. That's what you get from posting last thing at night: In the cold light of day it just doesn't add up!