This discussion is archived
5 Replies Latest reply: Jun 13, 2012 10:36 PM by 943066 RSS

Cache & object retrieval performance

943066 Newbie
Currently Being Moderated
Hi all,

I have a question regarding query performance, caching and deserialization.

I have a DB of about 1.8GB, 11M records - I run with transactions off, deferred writes enabled, Xmx5300m and cache percent of 60. Writes are very fast, which is excellent.

I use string based keys and have a fairly simple object as the value (a few ints and longs inside it).

I have the need to execute queries that are a series of range-scans that return quite large numbers of results (~250,000 or so).

With a fairly naive implementation, returning 250K results was generally taking about 2200ms, even if the same query is run repeatedly.

I ran some profiling, and it appeared that a lot of the time was spent in deserialization, so implemented a Protostuff based (de)serializer (http://code.google.com/p/protostuff/wiki/ProtostuffRuntime), which I have used before and is wonderfully fast.

This sped things up quite a bit, but queries are still around the 1400ms mark - I realise that the observed deserialization bottleneck was possibly a bit of a red-herring, as any time spent waiting on disk ends up being attributed to the byte deserialization methods.

I was really hoping that cached results would be a lot faster than this and was hoping that BDB JE's caching layer kept deserialized objects in memory, so that if cached results were requested, no deserialization would occur.

Is this the case (or is the cache more of disk blocks)? and if so, am I doing something wrong here, which is causing things to be slower than they should be?

I ran with stats spitting out periodically, shown here: https://gist.github.com/869edc4c00b0d9664e13 (two dumps, several minutes apart with benchmarking running). The operations running during this dump are a series of identical queries executed continuously in a loop.

From what I can gather, the cache misses only increase by ~100,000 over several minutes, which considering every loop of queries returns ~250,000 objects is very low (ie: I would have retrieved >20,000,000 object during the time between stats dumps).

Any insights as to how BDB JE's cache behaves (and whether it appears to be working) would be most appreciated.

Cheers/thanks!

fb.
  • 1. Re: Cache & object retrieval performance
    Charles Lamb Pro
    Currently Being Moderated
    940063 wrote:
    I ran with stats spitting out periodically, shown here: https://gist.github.com/869edc4c00b0d9664e13 (two dumps, several minutes apart with benchmarking running). The operations running during this dump are a series of identical queries executed continuously in a loop.
    Have you run DbCacheSize? If so, what does it say about how much memory you need for fitting the leaf nodes into memory? It appears that most of your cache misses are Leaf Nodes. It also looks like most of the IO is sequential reads (as opposed to random reads).

    Do a few random stack traces indicate IO is happening when the slowness appears?

    Charles Lamb
  • 2. Re: Cache & object retrieval performance
    greybird Expert
    Currently Being Moderated
    >
    I was really hoping that cached results would be a lot faster than this and was hoping that BDB JE's caching layer kept deserialized objects in memory, so that if cached results were requested, no deserialization would occur.
    >

    No, JE does not cache deserialized objects, it caches internal metadata and for the user's record it caches the bytes read from disk.

    Have you looked at Java GC to see if it is contributing to performance? When iterating over records sequentially, the JE cache can be counterproductive if it is large enough to hold the leaf nodes (LNs or record data). Each LN is accessed only once during a sequential scan, but it is kept in the JE cache until it is aged out (using an LRU algorithm). These objects can become 'old generation' prior to being evicted, and costly to GC after they are evicted. This may be a case where it is best not to cache LNs, which can be done using CacheMode.EVICT_LN.

    For some related info see:
    http://www.oracle.com/technetwork/database/berkeleydb/je-faq-096044.html#WhyshouldtheJEcachebelargeenoughtoholdtheBtreeinternalnodes

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 3. Re: Cache & object retrieval performance
    943066 Newbie
    Currently Being Moderated
    Thanks for the fantastic answers -

    GC doesn't appear to be a major problem (at least in my tests). The core challenge looks like it's going to be deserialization speeds, regardless of whether the bytes are read from disk or cache.

    Would it be possible (and sane) to consider an object cache using something like a WeakHashMap to cache keys against their deserialized object values?

    To do this I would need to be able to traverse keys independently of the DatabaseEntry values (to avoid deserialization if the object is in cache).

    I'd also hit the challenge of JE byte cache fighting with the object cache (and GC), which could be challenging.

    Is this something that sounds potentially doable or am I venturing into untested waters?

    Cheers!

    fb.
  • 4. Re: Cache & object retrieval performance
    greybird Expert
    Currently Being Moderated
    First off, I suggest that you do enough testing and profiling, with a system and data set comparable to what you intend to use in production, to be sure that deserialization is truly as large a factor as you believe. This is rarely the case. With data sets of significant size, IO is almost always the main factor.
    To do this I would need to be able to traverse keys independently of the DatabaseEntry values (to avoid deserialization if the object is in cache).
    That's not a problem, as you can call DatabaseEntry.setPartial(0, 0, true) and pass that object as the 'data' parameter. This tells JE not to fetch or return the data portion of the record. That way, you can fetch the data from JE only when you need it, i.e., when you don't find it in your own cache.

    Also, with JE 5 you can perform transactions without having to fetch the data into cache. (Pre-JE 5, the data had to be fetched into cache in order to lock the record.)

    You'll have to work out an algorithm for ensuring that access to the cache is transactionally correct. For example, you'll probably want to have a read or write lock on the JE record while you access or update your cache.
    I'd also hit the challenge of JE byte cache fighting with the object cache (and GC), which could be challenging.
    By using CacheMode.EVICT_LN, there won't be as much overlap between the caches -- JE will not cache the record data, so only the keys will be in common. However, if you don't have enough memory to keep all keys in the JE cache, performance will suffer, as described in the FAQ link I sent you earlier.

    Therefore, you should size the JE cache large enough to hold all BINs (as described in the FAQ), and use remaining JVM memory for your own cache. Normally it is wise to leave much of the machine's memory to the file system cache, but in this case you'll have to decide whether it's more effective to allocate this as JVM memory and use it for your own cache.
    Is this something that sounds potentially doable or am I venturing into untested waters?
    Both. It is venturing into untested waters, but I think it may be doable.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 5. Re: Cache & object retrieval performance
    943066 Newbie
    Currently Being Moderated
    This is likely the best/fastest support I've ever gotten for something like this :) Thank you so much.
    greybird wrote:
    First off, I suggest that you do enough testing and profiling, with a system and data set comparable to what you intend to use in production, to be sure that deserialization is truly as large a factor as you believe. This is rarely the case. With data sets of significant size, IO is almost always the main factor.
    In my case, the total data set size is not that big (~2GB), it's more how much of it I need to "touch" regularly. I considered using something like an in-memory TreeMap for the whole thing, but you still incur the problem of persistence (and things not fitting entirely in memory always).

    I'm fairly certain in my case that deserialization is the bottleneck, but I will do further tests to confirm.

    >
    You can call DatabaseEntry.setPartial(0, 0, true) and pass that object as the 'data' parameter. This tells JE not to fetch or return the data portion of the record. That way, you can fetch the data from JE only when you need it, i.e., when you don't find it in your own cache.
    Oh, that sounds like precisely what I'm after, thank you.

    >
    Also, with JE 5 you can perform transactions without having to fetch the data into cache. (Pre-JE 5, the data had to be fetched into cache in order to lock the record.)

    You'll have to work out an algorithm for ensuring that access to the cache is transactionally correct. For example, you'll probably want to have a read or write lock on the JE record while you access or update your cache.
    I'm currently running with transactions off, which neatly side-steps the need to worry too much about coherence and isolation within transactions... I think...
    By using CacheMode.EVICT_LN, there won't be as much overlap between the caches -- JE will not cache the record data, so only the keys will be in common. However, if you don't have enough memory to keep all keys in the JE cache, performance will suffer, as described in the FAQ link I sent you earlier.
    Perfect, I was hoping the EVECT_LN did something like that - I'm expecting my OS cache to provide suitable caching for the data; with the size of my dataset I believe that should be feasible.

    >
    Therefore, you should size the JE cache large enough to hold all BINs (as described in the FAQ), and use remaining JVM memory for your own cache. Normally it is wise to leave much of the machine's memory to the file system cache, but in this case you'll have to decide whether it's more effective to allocate this as JVM memory and use it for your own cache.
    Bingo. I suspect I need to do some work in reducing my key sizes, currently their fairly naive and I can do much better in terms of packing them into smaller byte[]'s, whilst still maintaining meaningful ordering.

    >
    Is this something that sounds potentially doable or am I venturing into untested waters?
    Both. It is venturing into untested waters, but I think it may be doable.
    Thanks so much for your help, Mark, I really appreciate it.

    I'll let you know how I go, or, more likely, when I hit my first question :)

    Cheers,

    fb.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points