12 Replies Latest reply: Feb 8, 2007 2:15 PM by 807606 RSS

    String equals implementation

    807606
      It seems like atleast as of JDK1.5 implementation of java.lang.String class's equals method (public boolean equals(Object anObject)), in case the two strings are of same length, entire content is compared.

      Won't it be worthwhile to also ensure that - if their hash is available, the hash matches before comparing the entire string character arrays? Seems like that would offer a significant performance boost.

      -Viral
        • 1. Re: String equals implementation
          807606
          Won't it be worthwhile to also ensure that - if their
          hash is available, the hash matches before comparing
          the entire string character arrays? Seems like that
          would offer a significant performance boost.
          Definitely not. Heres the thing: A strings hashcode is calculated using some mathematical formula that depends on all of the letters. Doing a string comparison, if the first 2 characters don't match, you can return false right there. So calculating hashcodes first could definitely make performance a lot worse and wouldn't make it a lot better.
          • 2. Re: String equals implementation
            807606
            It is true that you may not compute hash if it is not known.

            But my argument is if hash is already available (computed earlier eg an earlier invocation of
            hashCode()
            ), then it must be used as there is no point in comparing all characters when hash is known to be different.
            • 3. Re: String equals implementation
              DrClap
              On the other hand, since String is immutable, it is possible that its hashCode() method just calculates the hash code once, the first time it is requested, and then caches the result.

              I don't know whether this is the case or not. But the answer to "Is X faster than Y" is often best determined by trying it, rather than speculating how things might be implemented.
              • 4. Re: String equals implementation
                807606
                On the other hand, since String is immutable, it is
                possible that its hashCode() method just calculates
                the hash code once, the first time it is requested,
                and then caches the result.
                Yes hashcode is is calculated just once and stored in the hash member. Refer hashCode() implementation.
                • 5. Re: String equals implementation
                  807606
                  The trouble with deciding what's best is that a common class like String is used in countless contexts.

                  For example, you could be comparing similar URL strings, where the first 88 characters are often the same, or you could be comparing dictionary words where the first characters are usually different. It's hard to generalize -- what's the benchmark?
                  • 6. Re: String equals implementation
                    796447
                    Oh good. Another "save me every last nanosecond out of my silly app" discussion. Is this Martin Hilpert in disguise?
                    Well guess what, if you're so concerned about it, Java is not for you. You'd better write your all-important speedy app in machine language so you can control every tidbit of execution.
                    • 7. Re: String equals implementation
                      807606
                      Yes hashcode is is calculated just once and stored in
                      the hash member. Refer hashCode() implementation.
                      That's c%l, but notice that the API doesn't mention caching,
                      so there's no compunction on other implementation to cache, too.

                      I remember reading somewhere that originally the hashCode wasn't computed based on all the characters in the string -- it only sampled a few for efficiency sake. Here's the API for version 1.0.2 -- there's no mention of what algorithm is used. Hmmm:
                      http://www.aquaphoenix.com/ref/jdk1.0.2_api/java.lang.String.html#8782

                      So implementations, high-level algorithms, every API specs change over time. YMMV.
                      • 8. Re: String equals implementation
                        807606
                        > For example, you could be comparing similar URL
                        strings, where the first 88 characters are often the
                        same...

                        Like the keys on a piano, baby.

                        Yawmark (<-- Ain't that grand?)

                        ~
                        • 9. Re: String equals implementation
                          807606
                          I remember reading somewhere that originally the
                          hashCode wasn't computed based on all the characters
                          in the string -- it only sampled a few for efficiency
                          sake. Here's the API for version 1.0.2 -- there's no
                          mention of what algorithm is used. Hmmm:
                          http://www.aquaphoenix.com/ref/jdk1.0.2_api/java.lang.
                          String.html#8782

                          So implementations, high-level algorithms, every API
                          specs change over time. YMMV.
                          However, remember the hashCode contract: equal objects must have equal hash codes. Refer: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Object.html#hashCode()

                          If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

                          Therefore, it is reasonable to infer that the hashcodes, if present and not equal means that the two strings are not equal. Content is checked only when:

                          a) Length is equal and
                          b) Hash codes are absent/equal
                          • 10. Re: String equals implementation
                            807606
                            Therefore, it is reasonable to infer that the hashcodes, if present and not equal means that the
                            two strings are not equal. Content is checked only when:

                            a) Length is equal and
                            b) Hash codes are absent/equal
                            I agree that different hash codes imply nonequal objects. The "content is checked only when" part is your addition. Feel free say this is a reasonable thing to infer, but reasonable is not a legal contract.
                            • 11. Re: String equals implementation
                              807606
                              > I agree that different hash codes imply nonequal objects.

                              It should do more than merely imply. According to the contract of hashCode(), equal objects must have the same hashCode. Unequal objects may also have the same hashCode, but different hash codes means inequality.

                              You're probably saying the same thing. I'm just trying to add weight to the whole different hash codes == unequal objects thing... :o)

                              ~
                              • 12. Re: String equals implementation
                                807606
                                >
                                You're probably saying the same thing. I'm just
                                trying to add weight to the whole different hash
                                codes == unequal objects thing... :o)

                                ~
                                Yes, we are trying to same the same thing here. If hash codes are equal, then no doubt content must be checked and verified to ensure that two strings are indeed same - which covers the above usecase.

                                The point is that String comparision is a vital part of many applications and any effort to ensure its performance is vital. It is not easy to punt away String class in favor of a "more suitable" class for your application as String is pervasive throughout the apis.