This discussion is archived
7 Replies Latest reply: Apr 19, 2011 9:54 AM by 701681 RSS

Push replication - messaging idempotent

701681 Explorer
Currently Being Moderated
Behavior prior to push replication:

- We have a cache backed by a Coherence database cachestore.
- Database write fails, as data is bad.
- We fix data in cache and data is replaced in cachestore queue (as idempotent)
- Data is written to database, happy days!

Behavior with push replication:
- We have a cache backed by a Push Rep Publishing cachestore which writes to coherence messaging.
- We have a database publisher attached to messaging using standard push rep pattern.
- Database write fails as data is bad, messaging holds onto bad data item.

Q: How do we fix this data item?

Putting a new value to the cache, just adds it onto messaging, but it will be queued behind this bad data, so we cannot correct the bad data.

Using the push replication drain is less than ideal, as it will drain all messages for that database cache publisher, and we will loose good data.

Q: Is there a way to make messaging idempotent, like core Coherence cachestores?

Cheers,
Neville.
  • 1. Re: Push replication - messaging idempotent
    Brian Oliver Explorer
    Currently Being Moderated
    Hi Neville,

    One simple way is to "drain" the publisher for your Database Publisher. This will remove the messages from the internal "queue" for the Database Publisher.

    Alternatively you could change your Database Publisher so that it is tolerant to bad data. The data would still be in the cache, where you could fix it and then have it replicated as required.

    Hope this helps.

    -- Brian

    PS: You'll soon be able to "disable" a publisher when an error occurs (instead of just suspending). This will disable you to perform any numer of changes to caches, without queuing up data and only afterwards "enable" the publisher(s) again.

    -----
    Brian Oliver | Architect | Oracle Coherence 

    Edited by: Brian Oliver on Apr 14, 2011 3:38 PM
  • 2. Re: Push replication - messaging idempotent
    701681 Explorer
    Currently Being Moderated
    Cheers Brian,

    - As I mentioned in my post, drain is not an option, as it drains all values not just the failed entries, we don't want to loose data. As I understand it, all entries will queue behind this bad entry, and this will eventually result in an OOM.

    - Making the Database Publisher tolerant to bad data is not really what i am after here, i don't want to write bad data to the database, i want it to be corrected in the Coherence cache and then written down to database. This was a great feature in core Coherence cache stores!

    Ideally we want to replace the bad entry with corrected data.

    Are you saying that this will never be supported by Push Rep/Messaging?
  • 3. Re: Push replication - messaging idempotent
    Brian Oliver Explorer
    Currently Being Moderated
    Hi Neville,

    A few quick questions.

    1. How do you know that one entry is bad?

    2. How does your CacheStore implementation prevent what your suggesting from happening?

    Remember, Push Replication is driven by a Cache Store. If you know that an Entry is bad, then you can catch that in many places. In the Push Replication Cache Store, in your Database Publisher, or even, in a Publishing Transformer. ie: you could filter out the bad data before it even gets to your Database Publisher.

    -- Brian
    -----
    Brian Oliver | Architect | Oracle Coherence 
  • 4. Re: Push replication - messaging idempotent
    701681 Explorer
    Currently Being Moderated
    We know an entry is bad if there is an exception generated on write.

    In a core Coherence, the cachestore would just periodically retry the write + if we fixed the data in the cache, the write would successfully write through to the database.

    With push rep, messaging starts to backup, and we have no way to correct the bad data.

    Cheers,
    Neville.
  • 5. Re: Push replication - messaging idempotent
    701681 Explorer
    Currently Being Moderated
    To be more specific:

    We know an entry is bad if there is an exception generated on the write and the exception points to an issue with the data being wrong. In this scenario we need to correct the data in Coherence.

    Of course there is also the scenario where there is a problem with the database itself, in this scenario we want to retry until the database itself is fixed (this scenario is covered well in both core Coherence cachestores and Push Rep).
  • 6. Re: Push replication - messaging idempotent
    Brian Oliver Explorer
    Currently Being Moderated
    Hi Neville,

    Right. This is because the Coherence coalesces updates to the Cache Store. In the case of Push Replication, the writes are essentially queued, so the "bad" object end's up sitting at the front of the queue.

    If your Database Publisher could determine the difference between a Database failure and a bad entry, then you could simple ignore the entry (thus it would be removed from the queue). I'm guessing that this isn't really an option.

    It sounds like we need a new mode of Push Replication. Thoughts?

    -- Brian
    -----
    Brian Oliver | Architect | Oracle Coherence 

  • 7. Re: Push replication - messaging idempotent
    701681 Explorer
    Currently Being Moderated
    Given the current framework, something like this may work (it's a little rough, but hopefully workable):

    Step 1:

    Right now each publisher implements: "public void publishBatch(String cacheName, String publisherName, Iterator<EntryOperation> entryOperations)"

    If one entry in the entryOperations batch fails to be published, the whole batch is re-queued and messaging backs up (CoherencePublishingService). Perhaps we can add some configurable customization to make only failed entries be re-queued.

    Step 2:

    If we combine Step 1: with a reasonable batch size and a CoalescingPublishingTransformer, then eventually failed entries would be overwritten with the corrected ones.

    The only snag here, is if "all" entries in one batch fail, then CoalescingPublishingTransformer will not help, as the same batch would be attempted on the next run. Something to think about perhaps.

    Any thoughts or other ideas?

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points