MagnusE wrote:Hi Magnus,
Has anybody managed to implement a decently scalable / high-performance read / write lock based on the existing distributed locks in Coherence?
I have found some research papers on distributed read / write locks but they all seem to build the mechanism from scratch using messages rather than assume the existence of a simpler distributed lock mechanism...
I started to do it, not based on Coherence locks, but on Coherence cache events (to notify about ownership changes) and entry-processors (to make the ownership changes) and map-event-transformers (to prevent unnecessary events going to non-interested nodes) and member listeners (to ensure locks held by departing members are dropped).
We can discuss that in email.
Can you share your implementation with me too? Or at least check my first attempt and let me know if you see some drawbacks.
I have a cache were the key is the 'lock id' and the value is the list of members that have the lock and its role (reader or writer). NOTE: operation over this list (add/remove members) is protected with coherence locks.
A reader can get the lock if:
- The list if empty
- All the members of the list are readers.
- Special case: if the list have a writer and the member is not alive in the cluster.
A writer can get a lock if:
- The list if empty.
- Special case: if the list have a writer and the member is not alive in the cluster (same as above)
- Special case: if the list have readers and all the members are not alive in the cluster.
- I'm using member.getUID() as the id of the member. Can I use member.getId()?
- To check if the member is alive I'm checking the cluster.getmemberSet(). There is a best way to do this?
IMHO, this approach is very naive because it does not use any distributed notification to propagate changes (we pull the changes instead of pushing them).
Thanks in advance,
If you are continuously polling (I assume with an entry-processor) then:
- the number of the changes you can achieve on the same lock is going to be lower than with notifications because you need multiple state changing (backed-up) operations to carry out the same state transition
- you are going to have a significant latency for detecting changes.
If you don't store a queue of requestors then you won't be able to implement fair locks because new requestors can barge in and acquire the locks before other already waiting requestors if their request arrives when you are in the unlocked state when the last owner unlocked and the periodic poll from the next fair owner came in.
The member can be assigned the same id as an earlier departed member once a certain amount of time has passed after the departure of that member (I believe this time window is 5 minutes), so if you want to use member id instead of uid, then you must ensure that you are polling frequently enough or clearing up periodically to safely clear dead owners within that time window or else you can end up with unsuspecting lock owners. Alternatively you can use a combination of member id and member timestamp (which is the cluster time when the node started and hence is guaranteed to be distinguish different members with the same id).
Then finally the most important part:
If you lost a partition, you lost the current owner(s) of all locks falling into that partition, therefore you have to build detecting the loss of the partition part of the protocol.
To be honest, I do not remember if I ever finished that experiment, and I changed machines twice since then, so I can't guarantee I still have that code and whether it is finished at all... as far as I remember, it is not. :-)
Nowadays you can implement a fairly quick implementation leveraging atomic commit across multiple entries in the same partition and a canary entry in each partition as a prerequisite to successfully acquiring locks (and putting the canary entry there by some process which enforces by some other means that there are no current owners). We can discuss that via email but expect a few days response time :-)