This content has been marked as final. Show 6 replies
That's an interesting question/thought. Currently the Command Pattern simply allows currently running code to complete and does not attempt to kill/interrupt it. The reason being, say your command takes 2hrs to run and then at 1hr 59 mins the command has to move, say due to a new node starting, should the currently running task be killed and restarted? Most people say no, but perhaps that should be an option? How would this work though? Would the thread simply be interrupted? What if the Command ignores "interrupted" exceptions?
One thing you do is check when a Command is re-balanced to a new node if it is potentially being recovered. If so, you could potentially avoid re-executing it again?
Brian Oliver | Architect | Oracle Coherence
Another one of the several use cases that I'm trying to leverage the Command Pattern for is I want to make sure a particular Command is ALWAYS running somewhere on the cache cluster, and I only want the ONE instance of it running at any given time.
If it gets migrated, I want the old one to die and not interfere and allow the new instance to do the work.
Everything in Command Pattern works perfectly except for there is no way for me to signal the old copy of the job to stop processing/doing work.
Being interrupted would be one way, another would to create another method signature to the Command Interface, or create a subinterface ie "StoppableCommand" or "KillableCommand", which has a method that will be invoked if the command is migrated off to another JVM.
I may be abusing your Command Pattern, but its so close to what I need,and it does everything else so well..... I have it working to the point where if I kill the node where its currently running, it kicks off a new one on another nodes.
For this use case, your suggestion to fall out of the execute method if environment.isRecovering() is true wouldn't help me because to the Command Pattern, it would be ready to pick up and run the following job for this context even though its still really running on the original node.
how strictly ALWAYS and stricltly one instance do you want it to run?
Can you allow overlapping a brief period of time (while the old node is notified that the new node received the partition)?
If not, then is it fine if it does not run while the partition is being transported to the new owner node?
If not, then can you tolerate if it does not run while lock ownership is replaced with the new owner (one entry-processor sending time)?
Also, would your command attempt to access the same service?
theJC wrote:And how many such "commands" and how many cluster nodes would you have, how CPU intensive would they be?
For my use case, I can tolerate and actually prefer that they overlap.
The command accesses other caches served by the same cluster, but they are different services.
Also, I understand that you prefer overlaps, but can you actually tolerate having small periods while such a "command" does not run on any node?
Actually, for my main case, I cannot tolerate the job not running. I've got it watching non durable events so if I miss them when not running, I will never be able to pick them back up.
I prefer overlap and guarantee the job is always running on at least one node, and have some mechanism to terminate it on old nodes when it gets migrated due to cluster resizing. I've hacked the code and have it working, but would prefer the solution to support this out of the box so I could upgrade in the future without having to port my changes forward which could very well need changes as the underlying implementation of the command pattern changes.