8 Replies Latest reply: Apr 16, 2012 6:45 AM by robvarga RSS

    Detail questions about indexes?

    MagnusE
      We are trying to reduce coherence index heap use in our application using partitioned cache (we use near but it is the partitioned back tier that holds the indexes) and therefore needs to better understandand some details (we are right now using version 3.6 but will eventually change to 3.7 so if any of the answer are different for the two please indivcate that):

      1. Are any indexes (or some part of them) maintained per partition (would be my guess since it is possible to efficiently apply PartitionFilter) or are they all per node?
      2. Are any part of the indexes in binary form or are both the keys and values in the form of "java objects"?

      Best Regards
      Magnus
        • 1. Re: Detail questions about indexes?
          alexey.ragozin
          Hi.
          1. Are any indexes (or some part of them) maintained per partition (would be my guess since it is possible to efficiently apply PartitionFilter) or are they all per node?
          Secondary indexes are one per node, but Coherence internally maintains "key index" which in particular used for partition based filtering AFAIK. (I did presentation on one of Coherence SIGs, here is slide deck http://blog.ragozin.info/2011/11/coherence-sig-advanced-usage-of-indexes.html).
          2. Are any part of the indexes in binary form or are both the keys and values in the form of "java objects"?
          Indexed attributes are stored as java objects in heap, but keys are binary and shared with backing map (e.i. indexes does not create extra copy of key).

          Few year ago I did an analysis of index heap footprint you can find it here http://blog.griddynamics.com/2009/10/coherence-memory-usage-indexes.html but they relevant to 3.5. AFAIK there were some improvements in 3.6. Index overhead may also depend on selectivity.

          Regards,
          Alexey
          • 2. Re: Detail questions about indexes?
            MagnusE
            Thanks for the info.

            I got a bit confused about your description of the forward and reverse indexes - I would have assumed that what is referred to as "forward index" would map keys to values provided by the extractor and the "reverse index" map values returned by extractors to possibly multiple keys that have extractors returning this value but it seems like it is the other way around or?!

            In our case the indexes seem to occupy a lot more heap than the data itself(!!!). It is in particular an index containing information about what cache objects that reference what other cache objects (we use this to answer "where used" questions i.e. what objects contains references to object X) that seem to occupy LOTS of space...

            Do you think that your analysis can be easily extended to extractors returning a collection (as is the case for our "gigantic" index) or do they require more investigation?

            With 3.6.1.4 we have done some preliminary tests with different number of partitions and it seems (not fully verified) that it makes a significant difference (fewer partitions = lower index size) so maybe they something has changed from 3.5 (with your analysis I do not see any indication that number of partitions should affect index size or have I missed something to that effect)...

            It is a pity that Oracle don't provide this kind of technical information in the documentation - for a this kind of product it is often essential for developers / architects to have as detailed info as possible...

            Best Regards
            Magnus
            • 3. Re: Detail questions about indexes?
              robvarga
              MagnusE wrote:
              Thanks for the info.

              I got a bit confused about your description of the forward and reverse indexes - I would have assumed that what is referred to as "forward index" would map keys to values provided by the extractor and the "reverse index" map values returned by extractors to possibly multiple keys that have extractors returning this value but it seems like it is the other way around or?!
              Hi Magnus,

              what you wrote first is correct.

              The forward index is a map of internal (binary) keys to extracted values. This has the could be stored in the backing map entry, but I don't think it is done as that would make plugging in index implementation with IndexAwareExtractor much more problematic. Too bad, as it would have been a nice optimization.

              The reverse index is a map of extracted values as keys to values of a collection or array (logically a set) of internal (binary) keys of entries the extracted value was extracted from.

              So if you have a NamedCache<K,V> and extracted value type is X, then forward index is Map<Binary<K>,X> and reverse index is a Map<X,Set<Binary<K>>> for unsorted indexes and SortedMap<X,Set<Binary<K>>> in case of sorted indexes.

              The map and set implementations may differ from implementation version to implementation version.


              I don't know what you refer to which would describe it the other way round.

              In our case the indexes seem to occupy a lot more heap than the data itself(!!!).
              That is quite conceivable if you have attributes which can take lots of different values, as in that case the reverse index will also have many entries, each entry value being a set which is not exactly a lean thing either.
              It is in particular an index containing information about what cache objects that reference what other cache objects (we use this to answer "where used" questions i.e. what objects contains references to object X) that seem to occupy LOTS of space...
              Again, that is lots of different values thus lots of entries in reverse index.
              Do you think that your analysis can be easily extended to extractors returning a collection (as is the case for our "gigantic" index) or do they require more investigation?
              A collection is just a value. If two collections do not equal by the equals() method, then they will have different reverse index entries. It is quite feasible to implement a forward-only index in case you don't need to filter for equality, but you just want to avoid deserialization.


              >
              With 3.6.1.4 we have done some preliminary tests with different number of partitions and it seems (not fully verified) that it makes a significant difference (fewer partitions = lower index size) so maybe they something has changed from 3.5 (with your analysis I do not see any indication that number of partitions should affect index size or have I missed something to that effect)...
              Number of partitions do not affect index size at the moment, indexes are not partitioned in 3.6. (Probably not even in 3.7). The only case when they would influence index size would be if you had partitioned reverse index (and that would make reverse index footprint possibly worse, but definitely not better) analogously to having more nodes causes higher total index footprint.


              It is a pity that Oracle don't provide this kind of technical information in the documentation - for a this kind of product it is often essential for developers / architects to have as detailed info as possible...
              I don't see what documentation you are really missing. Coherence indexes are instances of SimpleMapIndex which class has a Javadoc. Alexey and I have several times described the structure of indexes on the forum and in the blog. The Coherence book quite aptly describes it. All those descriptions are still valid as index strutures have not significantly changed since 3.0 except that starting with 3.6 you can plug in your own implementation of MapIndex instead of SimpleMapIndex. There was an optimization around 3.5-3.6 time (as far as I remember in one of the 3.5.x patch releases and 3.6) which ensured that you don't have multiple references to the equaling extracted objects (after a request Alexey submitted after one of his blog entries describing index usage), but really that is all, and even that was not a structural change. Semantically, having sparse indexes and forward-only indexes became possible with 3.6 (by plugging in your own implementation), and I don't remember which version introduced not indexing null values which cuts down on size a bit if nulls are frequent and not interesting.

              Best regards,

              Robert
              • 4. Re: Detail questions about indexes?
                MagnusE
                I read the following in the linked page (index size calculation):

                "Internally the index is constructed from 2 maps:
                Forward map: maps indexed attribute value to a set of cache entries (I suspect that reference to the key is actually stored, but I am not sure);
                Reverse map: maps cache entry (or its key) to the value of the indexed attribute for this entry."

                that at least for me sounds like the other way around but then English is not my native language (I am from Sweden so we talk like the chef in the muppets - no that is actually Danish if you ask me - well you get the picture - I may have read it wrong) :-)

                As for documentation I was thinking of some formulas for index footprint (with some examples perhaps) or at least some UML (or less formal) pictures and architecture level documentation?! Not even important details (for instance what objects that are shared in the indexes) is easy (or even possible?) to read from JavaDoc, at least not unless you are VERY familiar with it...

                If documented source code for Coherence was available the JavaDoc would be of more value but since it is closed source this is not the case. As it is now the lack of typed collections and the fact that it is not clear (to mortals or at least me!) what you can cast some data structures to often makes it hard to figure things out using only JavaDoc ...

                At least in my projects I try to write a bit more "why" and "how" on an architectural & design level than what the Coherence Doc contain today...

                /Magnus

                Edited by: MagnusE on Apr 16, 2012 12:11 PM
                • 5. Re: Detail questions about indexes?
                  alexey.ragozin
                  Hi,

                  First, I agree with forward/reverse map definitions on this thread (I have have swapped names somewhere, let me know I will fix it).

                  Normally collections are not treated as single values in reverse map (except for MultiExtractor). Instead, Coherence will create a reverse map entry for each element of collection. This way Contains* filter could also use indexes. This is handy but not intuitive (you can find funny example of consequences by http://blog.griddynamics.com/2010/10/coherence-magic-trick-with-cache-index.html).

                  I haven't account collection indexes in my memory footprint analysis. Variety of possible collection sizes, selectivity and other factors would make "general case" analysis to vague.
                  But it should fairy easy to correlate heap histogram with you cache content (I you could isolate caches from each other).

                  BTW you can disable forward map using ConditionalIndexExtractor class, it may free some memory (though it is hard to say how much).

                  Regards,
                  Alexey
                  • 6. Re: Detail questions about indexes?
                    MagnusE
                    I would have thought (then I am not sure I have read the JavaDoc in detail...) that you could block an entry from contributing to BOTH maps in the index using ConditionalIndexExtractor not only the "forward" map (which ever of them you are refering to now :-))...

                    /Magnus
                    • 7. Re: Detail questions about indexes?
                      alexey.ragozin
                      Hi Magnus,
                      ValueExtractor indexExtractor = new ConditionalExtractor(AlwaysFilter.INSTANCE, extractor, false);
                      Index created by such extractor will have only reverse map (index all your cache objects). Forward map will not be created in such index. Forward map can be used by aggregators but is not used for filters.

                      PS I was wrong, it was 2.5 years ago, can we pass on :)

                      --
                      Alexey
                      • 8. Re: Detail questions about indexes?
                        robvarga
                        alexey.ragozin wrote:
                        Hi Magnus,
                        ValueExtractor indexExtractor = new ConditionalExtractor(AlwaysFilter.INSTANCE, extractor, false);
                        Index created by such extractor will have only reverse map (index all your cache objects). Forward map will not be created in such index. Forward map can be used by aggregators but is not used for filters.

                        PS I was wrong, it was 2.5 years ago, can we pass on :)

                        --
                        Alexey
                        One thing to note, though is that a reverse map without a forward map is more expensive to maintain as you have to extract not only from the new value, but you also need to extract from the old value to know from which reverse map values you need to remove the internal key. If you had a forward map, you could avoid the extraction from the old value as it is in the forward map.

                        Best regards,

                        Robert