1 2 Previous Next 29 Replies Latest reply: Jan 3, 2011 2:07 PM by 828156 RSS

    Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.

    843790
      When using a concurrentHashMap as a cache, I have come across the problem of wanting to use the putIfAbsent method, but only instantiate an expensive object if the value is not already in the cache (imagine the value to be cached coming from a complicated database query for example).

      I had previously hoped for the putIfAbsent method to take a callback interface which would create the object only if it was necessary, but seeing as this is not the case I thought of an alternate solution that I would appreciate any feedback on.

      What if I created an ICreateValue interface and ValueWrapper object that looked like this:

      public interface ICreateValue {
      public Object create();
      }

      public class ValueWrapper {
      ICreateValue createValue;
      Object object;

      public synchronized Object getValue() {
      if(object == null)
      object = createValue.create();

      return object;
      }
      }

      Then, I could call putIfAbsent with a key, and a ValueWrapper to atomically add the value to the cache. Before returning the value to the client, I would call getValue() on the wrapper, which with either safely create one object if it has not been created before, or return a previously created value.

      This should in theory shift the synchronization away from the map itself, and since putIfAbsent returns the value already found in cache, if the object was created already it could reuse that instance. It should also prevent multiple instantiations of any expensive objects, the tradeoff being threads requesting this specific value in cache need to block at first.

      Does this sound reasonable? Are there any obvious things I may be missing?
        • 1. Re: Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.
          782681
          Fatefree wrote:
          When using a concurrentHashMap as a cache, I have come across the problem of wanting to use the putIfAbsent method, but only instantiate an expensive object if the value is not already in the cache...
          for stuff like above I like the implementation provided by Peter in [an older thread at this forum|http://forums.sun.com/thread.jspa?messageID=10994827#10994827]

          The trick is to keep a dedicated map of keys:
          public class CacheResults<K, R> {
              // depending on your key type, 'keys' Map may not be required.
              //  e.g. if you had an Enum of possible results, the Enums themselves could also be the keys.
              private final ConcurrentMap<K, K> keys = new ConcurrentHashMap<K, K>();
              private final ConcurrentMap<K, R> results = new ConcurrentHashMap<K, R>();
              private final ResultService<K,R> resultService;
           
              public R get(K key) {
                    K key2 = getKey(key);
                    // each lock is unique to the key
                    synchronized(key2) {
                        R result = results.get(key2);
                        if (result == null) {
                             // cache miss - instantiate an expensive object
                             results.put(key2, result = resultService.get(key));
                        }
                        return result;
                    }
               }
           
               private K getKey(K key) {
                    keys.putIfAbsent(key, key);
                    return keys.get(key);
               }
          }
          • 2. Re: Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.
            843790
            Hmm are you familiar with how this is working? I am trying to understand, the map of keys exists only for a set of objects that can be individually locked on correct? I suppose he is mimicking the operation of putIfAbsent

            I notice his method is synchronizing across three main operations, the get, creating the object, and putting the object in the map, whereas my initial idea is only synchronizing on the creation. However my idea requires an additional wrapper object to be instantiated and stored in cache whereas his does not, so I am trying to determine which has a bigger cost in the end.
            • 3. Re: Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.
              jtahlborn
              Not a big fan of that CacheResults class. it requires _3_ hash lookups in the best case scenario (4 in the worst) and a synchronization. i would stick w/ your ValueWrapper class, especially if you do an eager lookup before doing the putIfAbsent, e.g.:
              public Object getValue(Object key) {
                ValueWrapper wrapper = cache.get(key);
                if(wrapper == null) {
                  // ... do putIfAbsent logic here
                }
                return wrapper.getValue();
              }
              this requires 1 hash lookup in the best case scenario and 2 in the worst (and a synchronization). definitely a better operating scenario than CacheResults.
              • 4. Re: Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.
                796085
                Or use the memoizer pattern (essentially keep a Future as the value in the map):
                public interface Computable<K, V> {
                    V compute(K argument) throws InterruptedException, ExecutionException;
                }
                
                public class Memoizer<K, V> implements Computable<K, V> {
                    private final ConcurrentMap<K, Future<V>> cache = new ConcurrentHashMap<K, Future<V>>();
                    private final Computable<K, V> computable;
                
                    public Memoizer(final Computable<K, V> computable) {
                        this.computable = computable;
                    }
                
                    public V compute(final K argument) throws ExecutionException, InterruptedException {
                        while (true) {
                            Future<V> future = cache.get(argument);
                            if (future == null) {
                                final Callable<V> callable = new Callable<V>() {
                                    public V call() throws ExecutionException, InterruptedException {
                                        return computable.compute(argument);
                                    }
                                };
                                final FutureTask<V> futureTask = new FutureTask<V>(callable);
                                future = cache.putIfAbsent(argument, futureTask);
                                if (future == null) {
                                    future = futureTask;
                                    futureTask.run();
                                }
                            }
                            try {
                                return future.get();
                            } catch (final CancellationException e) {
                                cache.remove(argument, future);
                            }
                        }
                    }
                }
                • 5. Re: Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.
                  796085
                  Also, Google Collections (now Guava) has a "computing map"
                  • 6. Re: Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.
                    843790
                    This is interesting and I was just about to post that as I stumbled upon it. In some ways it looks very similar to my initial idea, however one thing I can't see is where (if any) the synchronization is on creating the object. Is it unnecessary with a FutureTask?

                    Another benefit seems to be that object creation can begin immediately after the cache put so there will be less wait when its actually requested, however this comes at the cost of a new thread being spawned for every cache put.

                    Is anyone more familiar with the internals of the future task that might know if there is a synchronization somewhere on the object being created?
                    • 7. Re: Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.
                      796085
                      You could always look at the source code! :-)

                      There is no synchronization in this implementation, but you do risk creating unnecessary FutureTask and Callable objects which will get thrown away. However, that's only going to happen in the rare race condition of multiple threads trying to retrieve the same key, finding it not present (and not in-progress) and then creating the FutureTask for putting in the map. Once the map putIfAbsent() call has completed any future callers looking for the same key will simply get the result from the FutureTask (waiting if necessary). So I don't really see this as being a problem.

                      Note that even in this case, the memoizer is still threadsafe due to the use of putIfAbsent(), and efficient (i.e. only one outstanding computation for a given key) due to using the FutureTask that was already in the map, if one was.
                      • 8. Re: Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.
                        jtahlborn
                        there are no extra threads in the Memoizer version. FutureTask does not do threading, ThreadPools just happen to use FutureTasks when they do threading. the line which has "futureTask.run()" is executing the FutureTask, aka the first thread to put the task in the map will be doing the heavy lifting. also, there is "synchronization" it's just that it is hidden from you inside of the FutureTask (and no, depending on the implementation of FutureTask, it is not necessarily true synchronization, instead it's some funky stuff based on AbstractQueuedSynchronizer). that aside, the Memoizer implemation is very similar to your version, with 2 tradeoffs:

                        - Memoizer uses funky synchronization which may be faster than normal "synchronized"
                        - Memoizer depends on the first thread to generate the real value. this means that if the first thread gets stalled for some reason, all other threads reading the value will block. in your version another thread could potentially do the work if the original version stalls for some reason (assuming it stalls outside of the synchronized block).

                        that said, the Memoizer version does do some nice things like persist the failure (so all the calls to get() will get the same exception if the run() method failed for some reason).
                        • 9. Re: Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.
                          782681
                          Fatefree wrote:
                          Hmm are you familiar with how this is working?
                          yeah that's why I like it. I see how it avoids duplicate reads by using atomic putIfAbsent and how it minimizes lock contention by syncing on specific key objects. Last but not the least I have seen it work in one of my past projects. :)
                          I am trying to understand... I am trying to determine which has a bigger cost in the end.
                          Hm aren't you falling into the trap of premature optimization? I mean, if you are confident about your solution, why getting into the risk trying to squeeze insignificant performance bits with more complicated algorithm
                          • 10. Re: Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.
                            843790
                            I thought it was pretty obvious that I wasn't confident in my solution haha.
                            • 11. Re: Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.
                              jtahlborn
                              gnat wrote:
                              Fatefree wrote:
                              Hmm are you familiar with how this is working?
                              yeah that's why I like it. I see how it avoids duplicate reads by using atomic putIfAbsent and how it minimizes lock contention by syncing on specific key objects. Last but not the least I have seen it work in one of my past projects. :)
                              wait, what? as i pointed out above, the CacheResults class does not avoid duplicate reads. it reads and reads and reads...
                              I am trying to understand... I am trying to determine which has a bigger cost in the end.
                              Hm aren't you falling into the trap of premature optimization? I mean, if you are confident about your solution, why getting into the risk trying to squeeze insignificant performance bits with more complicated algorithm
                              again, the version the OP presented seems to me less complicated that the CacheResults implementation.
                              • 12. Re: Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.
                                782681
                                jtahlborn wrote:
                                gnat wrote:
                                Fatefree wrote:
                                Hmm are you familiar with how this is working?
                                yeah that's why I like it. I see how it avoids duplicate reads by using atomic putIfAbsent and how it minimizes lock contention by syncing on specific key objects. Last but not the least I have seen it work in one of my past projects. :)
                                wait, what? as i pointed out above, the CacheResults class does not avoid duplicate reads. it reads and reads and reads...
                                you mean [hash lookups|http://forums.sun.com/thread.jspa?messageID=11016004#11016004|message], right? These are exactly what I call insignificant bits of performance.

                                Purpose of CacheResults is to minimize invocations and related contention for resultService.get(key) and it's the only thing I worry about. As for hash lookups, I would consider them only if these are proven to be bottleneck.

                                >
                                I am trying to understand... I am trying to determine which has a bigger cost in the end.
                                Hm aren't you falling into the trap of premature optimization? I mean, if you are confident about your solution, why getting into the risk trying to squeeze insignificant performance bits with more complicated algorithm
                                again, the version the OP presented seems to me less complicated that the CacheResults implementation.
                                Simplicity is in the eyes of beholder. My point is it's better to stick with stuff that's better understood.

                                If value gateway looks simpler to OP and if it does the job, there's no need to change to more complicated stuff.

                                Same applies to me of course - I stick with what I understand, not trying to squeeze insignificant performance bits

                                Edited by: gnat on Jul 8, 2010 7:08 AM
                                • 13. Re: Using ConcurrentHashMap's putIfAbsent to prevent multiple instantiations.
                                  796085
                                  I'll say it again since it got lost in the noise before:
                                  dannyyates wrote:
                                  Also, Google Collections (now Guava) has a "computing map"
                                  • 14. Google Guava's "computing map" - MapMaker? [was: ...putIfAbsent...]
                                    782681
                                    dannyyates wrote:
                                    I'll say it again since it got lost in the noise before:
                                    dannyyates wrote:
                                    Also, Google Collections (now Guava) has a *"computing map"*
                                    do you by chance mean MapMaker? or maybe ForwardingConcurrentMap?

                                    I ask because I was unable to find public API like "computing map" in their [API docs|http://guava-libraries.googlecode.com/svn/trunk/javadoc/index.html|javadoc].

                                    -----

                                    // there's non-public class ComputingConcurrentHashMap in repository. It is also somehow referred to from their [issue 361|http://code.google.com/p/guava-libraries/issues/detail?id=361|issue]... +"race condition in MapMaker computing map Values view"+
                                    1 2 Previous Next