- We have a cache backed by a Coherence database cachestore.
- Database write fails, as data is bad.
- We fix data in cache and data is replaced in cachestore queue (as idempotent)
- Data is written to database, happy days!
Behavior with push replication:
- We have a cache backed by a Push Rep Publishing cachestore which writes to coherence messaging.
- We have a database publisher attached to messaging using standard push rep pattern.
- Database write fails as data is bad, messaging holds onto bad data item.
Q: How do we fix this data item?
Putting a new value to the cache, just adds it onto messaging, but it will be queued behind this bad data, so we cannot correct the bad data.
Using the push replication drain is less than ideal, as it will drain all messages for that database cache publisher, and we will loose good data.
Q: Is there a way to make messaging idempotent, like core Coherence cachestores?
One simple way is to "drain" the publisher for your Database Publisher. This will remove the messages from the internal "queue" for the Database Publisher.
Alternatively you could change your Database Publisher so that it is tolerant to bad data. The data would still be in the cache, where you could fix it and then have it replicated as required.
Hope this helps.
PS: You'll soon be able to "disable" a publisher when an error occurs (instead of just suspending). This will disable you to perform any numer of changes to caches, without queuing up data and only afterwards "enable" the publisher(s) again.
- As I mentioned in my post, drain is not an option, as it drains all values not just the failed entries, we don't want to loose data. As I understand it, all entries will queue behind this bad entry, and this will eventually result in an OOM.
- Making the Database Publisher tolerant to bad data is not really what i am after here, i don't want to write bad data to the database, i want it to be corrected in the Coherence cache and then written down to database. This was a great feature in core Coherence cache stores!
Ideally we want to replace the bad entry with corrected data.
Are you saying that this will never be supported by Push Rep/Messaging?
2. How does your CacheStore implementation prevent what your suggesting from happening?
Remember, Push Replication is driven by a Cache Store. If you know that an Entry is bad, then you can catch that in many places. In the Push Replication Cache Store, in your Database Publisher, or even, in a Publishing Transformer. ie: you could filter out the bad data before it even gets to your Database Publisher.
Brian Oliver | Architect | Oracle Coherence
We know an entry is bad if there is an exception generated on the write and the exception points to an issue with the data being wrong. In this scenario we need to correct the data in Coherence.
Of course there is also the scenario where there is a problem with the database itself, in this scenario we want to retry until the database itself is fixed (this scenario is covered well in both core Coherence cachestores and Push Rep).
Right. This is because the Coherence coalesces updates to the Cache Store. In the case of Push Replication, the writes are essentially queued, so the "bad" object end's up sitting at the front of the queue.
If your Database Publisher could determine the difference between a Database failure and a bad entry, then you could simple ignore the entry (thus it would be removed from the queue). I'm guessing that this isn't really an option.
It sounds like we need a new mode of Push Replication. Thoughts?
Brian Oliver | Architect | Oracle Coherence
Given the current framework, something like this may work (it's a little rough, but hopefully workable):
Right now each publisher implements: "public void publishBatch(String cacheName, String publisherName, Iterator<EntryOperation> entryOperations)"
If one entry in the entryOperations batch fails to be published, the whole batch is re-queued and messaging backs up (CoherencePublishingService). Perhaps we can add some configurable customization to make only failed entries be re-queued.
If we combine Step 1: with a reasonable batch size and a CoalescingPublishingTransformer, then eventually failed entries would be overwritten with the corrected ones.
The only snag here, is if "all" entries in one batch fail, then CoalescingPublishingTransformer will not help, as the same batch would be attempted on the next run. Something to think about perhaps.