This discussion is archived
4 Replies Latest reply: Feb 8, 2013 11:27 AM by 572824 RSS

Elastic Data and Eviction

572824 Newbie
Currently Being Moderated
I understand that Eviction is not supported by ramjournal-scheme or flashjournal-scheme.

We have a cache that is currently ~300GB in total heap across the cluster and we add several hundred thousand new entries every day averaging ~16KB per entry. We currently use local-scheme's eviction policy to control the size of the cache.

We would like to use Elastic Data to grow the cache to a couple of terabytes but since eviction is not supported we would need another way to keep the size of the cache constrained. I have been pondering running a weekly job to purge old entries but I'm not sure how to go about it.

Our entry keys encode the data we need in order to decide which entries should be evicted. Would it be reasonable to call keySet with a filter that would select those keys which should be evicted? The total number of entries would be on the order of 100 million and the matching key set, if run weekly, would be ~2 million. The javadoc says the resulting set may not be backed by the Map so calling clear() won't do the trick. Also, the NamedCache doesn't seem to have a removeAll() method so that's out.

Any advice would be greatly appreciated.
  • 1. Re: Elastic Data and Eviction
    robvarga Oracle ACE
    Currently Being Moderated
    llowrey wrote:
    I understand that Eviction is not supported by ramjournal-scheme or flashjournal-scheme.

    We have a cache that is currently ~300GB in total heap across the cluster and we add several hundred thousand new entries every day averaging ~16KB per entry. We currently use local-scheme's eviction policy to control the size of the cache.

    We would like to use Elastic Data to grow the cache to a couple of terabytes but since eviction is not supported we would need another way to keep the size of the cache constrained. I have been pondering running a weekly job to purge old entries but I'm not sure how to go about it.

    Our entry keys encode the data we need in order to decide which entries should be evicted. Would it be reasonable to call keySet with a filter that would select those keys which should be evicted? The total number of entries would be on the order of 100 million and the matching key set, if run weekly, would be ~2 million. The javadoc says the resulting set may not be backed by the Map so calling clear() won't do the trick. Also, the NamedCache doesn't seem to have a removeAll() method so that's out.

    Any advice would be greatly appreciated.
    Hi Larkin,

    keySet().removeAll() is what you are looking for if you want to do map.removeAll(). The keySet() method was several times earlier claimed to be lazily evaluated to allow keySet().remove and keySet().removeAll() to be a cheaper blind version of remove/removeAll.
    The keySet() method without filtering is lazily backed by the named cache. The keySet(Filter) is not backed by the cache.

    On the other hand, if you know you want to remove the filtered entries, why not just send an entry-processor with a filter filtering for keys you want removed.

    Removing millions of entries all at once may be a problematic operation due to the amount of events and backup traffic it generates, but it is easy to do remove in smaller chunks by writing a filter which selects only at most a certain number of entries by retaining at most that number of entries in the keys parameter of the applyIndex method (applyIndex() is usable for key-only filtering even if you don't have indexes).

    Best regards,

    Robert
  • 2. Re: Elastic Data and Eviction
    572824 Newbie
    Currently Being Moderated
    Thanks, Robert.

    An entry-processor definitely seems like the way to go.

    It appears that all caches using ramjournal-scheme or flashjournal-scheme will use the same pool of RAM and flash since ramjournal-manager and flashjournal-manager appear to be global. Is that correct? If so, then the total space available to all caches backed by the node is RAM+flash and that space needs to be large enough to contain all bounded caches and all unbounded caches must be actively scrubbed to fit within the remaining space. Is that accurate?

    Finally, when the RAM area fills, which entries are demoted to flash? Is it the newly added/updated entry or an older entry? I suppose the question I'm really asking is if the RAM favors "hot" data and if not then are there any mechanisms we can use to force hot entries into RAM.
  • 3. Re: Elastic Data and Eviction
    HarveyRaja Explorer
    Currently Being Moderated
    >
    It appears that all caches using ramjournal-scheme or flashjournal-scheme will use the same pool of RAM and flash since ramjournal-manager and flashjournal-manager appear to be global. Is that correct? If so, then the total space available to all caches backed by the node is RAM+flash and that space needs to be large enough to contain all bounded caches and all unbounded caches must be actively scrubbed to fit within the remaining space. Is that accurate?
    >

    Yup.

    >
    Finally, when the RAM area fills, which entries are demoted to flash? Is it the newly added/updated entry or an older entry? I suppose the question I'm really asking is if the RAM favors "hot" data and if not then are there any mechanisms we can use to force hot entries into RAM.
    >

    Once the RAM journal has been filled ED will overflow to the Flash journal thus all future writes will be stored in the Flash journal. There is no demotion or promotion that occurs taking statistics such as MFU into consideration. However it is an interesting idea but with some complications, we shall consider this a little more.

    Thanks,
    Harvey
  • 4. Re: Elastic Data and Eviction
    572824 Newbie
    Currently Being Moderated
    Thanks for the clarification. It would seem that it might be sufficient to run a flash journal and leave enough system memory for the OS to cache hot disk blocks.

    IIRC, flashhournal-scheme still stores all keys in RAM and therefore, I assume, if I run an entry-processor with a KeyFilter as Robert described above that the flash journal disk blocks won't be touched. If walking all the keys to do eviction does cause journal reads then the OS will end up polluting it's disk block cache with a lot of cold data.

    Could a near cache be used? We already use near-scheme for the client side but could the storage nodes also use a near-scheme where the front is a limited local-scheme and the back is a flashjournal-scheme?

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points