This discussion is archived
1 2 Previous Next 16 Replies Latest reply: Dec 8, 2010 9:06 PM by 796440 RSS

hash-based collections vs. mutable hashcodes

DMF Newbie
Currently Being Moderated
I've been getting ConcurrentModificationExceptions on HashTables (derivatives) in a single-threaded app that does no overt mods to the collection being iterated. The algorithm is a classic Observer, iterating all listeners on an event. These listeners are rather heavyweight classes with hashcode() (along with equals()) essentially calculating the current object internal state, which is mutable.

What I think is going on is that during an event (iteration over the set) a listener state may change, resulting in a changed hashcode(). While this doesn't overtly modify the collection, it may modify something like the internal ordering. i.e. the "old" object would appear to have disappeared since it can no longer be resolved with its current hashcode. Can anyone confirm that this is indeed a possible phenomenon? If so, it raises an interesting general issue.

Should hashcode() and equals() be built on an immutable subset of state? Assuming logical identity is desirable (i.e. state matters to equality), and hashcode should reflect the result of equals (re: Josh Bloch in EJ (2nd) Item# 8), this seems to dictate that such objects are not viable elements of a hash-based collection. Discussion?
  • 1. Re: hash-based collections vs. mutable hashcodes
    EJP Guru
    Currently Being Moderated
    Correct.
  • 2. Re: hash-based collections vs. mutable hashcodes
    796440 Guru
    Currently Being Moderated
    796912 wrote:
    Should hashcode() and equals() be built on an immutable subset of state?
    hashCode() should, yes. Or, if it's mutable, then when using a hash-based collection you have to remove and re-add after a change, or possibly remove, change, re-add. Additionally, there's the iteration gotcha that you found that prompted this thread.

    equals() shouldn't matter. Changing an object's hash can change which bucket it's in, and that's why things get screwed up. The equals() comparison comes into play when doing a linear search of that bucket, so changing state that contributes to equals won't hurt that. However, in a SortedSet or SortedMap, changing state that contributes to equals() could lead to problems, so the same remove/re-add rules would apply.

    Typically hashCode() and equals() use the same state values in their calculations, but it is legal for hashCode() to use a subset of those used by equals().
  • 3. Re: hash-based collections vs. mutable hashcodes
    jtahlborn Expert
    Currently Being Moderated
    jverd wrote:
    796912 wrote:
    Should hashcode() and equals() be built on an immutable subset of state?
    hashCode() should, yes. Or, if it's mutable, then when using a hash-based collection you have to remove and re-add after a change, or possibly remove, change, re-add. Additionally, there's the iteration gotcha that you found that prompted this thread.
    you definitely have to remove before you change, otherwise remove won't work.
    equals() shouldn't matter. Changing an object's hash can change which bucket it's in, and that's why things get screwed up. The equals() comparison comes into play when doing a linear search of that bucket, so changing state that contributes to equals won't hurt that.
    not entirely true. changing equals could change the validity of the uniqueness of the set. if you change the equals criteria of an object already in the set so that it is now equal to another object in the set, the set is no longer valid. while this may not break quite as spectacularly as changing the hashcode, it can still do interesting things. for instance, if you add another element such that the hashmap decides to re-hash, one of your now duplicate elements will be silently dropped on the floor. long answer short, don't change hashcode or equals.
    However, in a SortedSet or SortedMap, changing state that contributes to equals() could lead to problems, so the same remove/re-add rules would apply.
    nope, sorted collections don't use hashcode or equals, they use compareTo/compare. but, the same rules certainly apply to the comparison criteria for the given sorted collection.
  • 4. Re: hash-based collections vs. mutable hashcodes
    DMF Newbie
    Currently Being Moderated
    long answer short, don't change hashcode or equals.
    That's a lot easier said than done. In fact, if you can't do logical equality, why not just use the object id? Or arrays instead of sets etc..?


    Still, the situation is what it is. What I find amazing, though, is that of all the books, tutorials, specifications, posts, etc etc, that I've read about Java Collections and STL Containers, I've never seen mention of this this particular little restriction.


    Thanks for the comments, all.
  • 5. Re: hash-based collections vs. mutable hashcodes
    jtahlborn Expert
    Currently Being Moderated
    796912 wrote:
    long answer short, don't change hashcode or equals.
    That's a lot easier said than done. In fact, if you can't do logical equality, why not just use the object id? Or arrays instead of sets etc..?
    i think you took that statement out of context. i wasn't saying, "don't ever override equals or hashcode". i was saying, "don't ever make the result of equals/hashcode change for a given instance of an object during its lifetime".
  • 6. Re: hash-based collections vs. mutable hashcodes
    796440 Guru
    Currently Being Moderated
    jtahlborn wrote:
    796912 wrote:
    long answer short, don't change hashcode or equals.
    That's a lot easier said than done. In fact, if you can't do logical equality, why not just use the object id? Or arrays instead of sets etc..?
    i think you took that statement out of context. i wasn't saying, "don't ever override equals or hashcode". i was saying, "don't ever make the result of equals/hashcode change for a given instance of an object during its lifetime".
    Even that's rather extreme, IMHO. Just don't change it while it's in a collection. Of course, sometimes you can't know how an object will be used, so you have to guard by making copies, or just by clear documentation.
  • 7. Re: hash-based collections vs. mutable hashcodes
    EJP Guru
    Currently Being Moderated
    of all the books, tutorials, specifications, posts, etc etc, that I've read about Java Collections and STL Containers, I've never seen mention of this this particular little restriction.
    'The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.' Javadoc for java.util.Map.
  • 8. Re: hash-based collections vs. mutable hashcodes
    802316 Pro
    Currently Being Moderated
    DMF wrote:
    I've been getting ConcurrentModificationExceptions on HashTables (derivatives) in a single-threaded app that does no overt mods to the collection being iterated. The algorithm is a classic Observer, iterating all listeners on an event. These listeners are rather heavyweight classes with hashcode() (along with equals()) essentially calculating the current object internal state, which is mutable.
    A couple of things come to mind.
    - Hashtable was replaced by HashMap in Java 1.2 (1998).
    - Hashtable is only useful in mutli-threaded contexts where you have a library which requires it.
    - For a collection of observers I suggest you use an array to reduce the cost of iteration. You can update the array every time the source collection changes (in your case the Hashtable) This assumes you iterator over the list of observers more often than you change the list. If this is not the case, I suggest you reconsider your design. ;)
    - Using an array copy to iterate over can be used to avoid a CME, even if your source collection would have thrown this.
    - the hashCode(), equals() and compareTo() (if you define them) must only be based on fields which do not change once they are added to a collection. Ideally, they should be final and immutable. Note: they don't have to include all fields.
  • 9. Re: hash-based collections vs. mutable hashcodes
    802316 Pro
    Currently Being Moderated
    EJP wrote:
    of all the books, tutorials, specifications, posts, etc etc, that I've read about Java Collections and STL Containers, I've never seen mention of this this particular little restriction.
    'The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.' Javadoc for java.util.Map.
    It also states before that "Note: great care must be exercised if mutable objects are used as map keys."

    BTW: The same applies to Sets for the same reasons.
  • 10. Re: hash-based collections vs. mutable hashcodes
    DMF Newbie
    Currently Being Moderated
    Peter Lawrey wrote:
    A couple of things come to mind.
    - Hashtable was replaced by HashMap in Java 1.2 (1998).
    HashSet, not HashTable (despite what I said). But I do use Vector in several places...
    - For a collection of observers I suggest you use an array to reduce the cost of iteration. You can update the array every time the source collection changes (in your case the Hashtable) This assumes you iterator over the list of observers more often than you change the list. If this is not the case, I suggest you reconsider your design. ;)
    Thanks for the hint. Yes, the collection of listeners is established early, with few changes. Fortunately, performance isn't an issue. Even with several thousand listeners, it's more than fast enough. For the future, will you please quantify for me what is the relative performance advantage of iterating an array vs. iterating a HashSet?
    - Using an array copy to iterate over can be used to avoid a CME, even if your source collection would have thrown this.
    Or an array-backed Collection. Frankly, I use hash-based collections so rarely, I'm using it here just to get this type of experience. In that respect, a good design choice, I think. ;)

    Thanks again for your help, Peter.

    -- Dennis
  • 11. Re: hash-based collections vs. mutable hashcodes
    796440 Guru
    Currently Being Moderated
    DMF wrote:
    For the future, will you please quantify for me what is the relative performance advantage of iterating an array vs. iterating a HashSet?
    I highly doubt that it's significant, but you can easily create some benchmarks for yourself.

    >
    - Using an array copy to iterate over can be used to avoid a CME, even if your source collection would have thrown this.
    Or an array-backed Collection.
    Eh? Any collection can give a CME, other that the ones in java.util.concurrent.
  • 12. Re: hash-based collections vs. mutable hashcodes
    822375 Newbie
    Currently Being Moderated
    I started a blog on Java. First entry is about HashMaps...still need to write part 2 and 3, but this thread is touches on a lot of similar ideas.

    http://www.javaprofession.com/2010/12/part-1-hashmap-keys_128.html
  • 13. Re: hash-based collections vs. mutable hashcodes
    822375 Newbie
    Currently Being Moderated
    Hashtable isn't ideal for using in multi-threaded environments. Sure, it's synchronized, but ConcurrentHashMap is better, since it doesn't lock the entire map when there's access to it. With Hashtable the whole thing gets locked. I think it's an obsolete class.
  • 14. Re: hash-based collections vs. mutable hashcodes
    796440 Guru
    Currently Being Moderated
    user13441673 wrote:
    Hashtable isn't ideal for using in multi-threaded environments. Sure, it's synchronized, but ConcurrentHashMap is better, since it doesn't lock the entire map when there's access to it. With Hashtable the whole thing gets locked. I think it's an obsolete class.
    Would you care to explain what you mean by "the whole thing gets locked"? Since there's no concept in Java of locking an object--either in part or in whole--your comment doesn't make much sense.
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points