14 Replies Latest reply: Jun 24, 2009 10:16 AM by 807588 RSS

    String's hashCode()

    800344
      Dear all,
      I have an unique id made of a string. I used hashCode() to covert it to int. However, I don't like an id have the negative value.
      How can I convert the hashCode() I got to an unique positive number?
      Regards.
      Pengyou
        • 1. Re: String's hashCode()
          807588
          pengyou wrote:
          Dear all,
          I have an unique id made of a string. I used hashCode() to covert it to int. However, I don't like an id have the negative value.
          How can I convert the hashCode() I got to an unique positive number?
          Regards.
          Pengyou
          You can't. Hash codes are not guaranteed unique.
          • 2. Re: String's hashCode()
            800344
            sabre150 wrote:
            pengyou wrote:
            Dear all,
            I have an unique id made of a string. I used hashCode() to covert it to int. However, I don't like an id have the negative value.
            How can I convert the hashCode() I got to an unique positive number?
            Regards.
            Pengyou
            You can't. Hash codes are not guaranteed unique.
            I would verify the following statement: two equal strings have the same hashcode and two different strings always have the different hashcodes.
            If it is true, then the hashcode in my context is unique.
            • 3. Re: String's hashCode()
              807588
              pengyou wrote:
              sabre150 wrote:
              pengyou wrote:
              Dear all,
              I have an unique id made of a string. I used hashCode() to covert it to int. However, I don't like an id have the negative value.
              How can I convert the hashCode() I got to an unique positive number?
              Regards.
              Pengyou
              You can't. Hash codes are not guaranteed unique.
              I would verify the following statement: two equal strings have the same hashcode and two different strings always have the different hashcodes.
              False. Run
              public class TwoStringWithTheSameHashCode
              {
                  public static void main(String[] args)
                  {
                      int hash1 = "ABCDEa123abc".hashCode();
                      int hash2 = "ABCDFB123abc".hashCode();
                      System.out.println(hash1 + " " + hash2);
                  }  
              }
              If it is true, then the hashcode in my context is unique.
              Since there are an infinite number of possible strings and only 2^32 possible hash codes how many strings per hash code do you think there are?
              • 4. Re: String's hashCode()
                800323
                pengyou wrote:
                sabre150 wrote:
                pengyou wrote:
                Dear all,
                I have an unique id made of a string. I used hashCode() to covert it to int. However, I don't like an id have the negative value.
                How can I convert the hashCode() I got to an unique positive number?
                Regards.
                Pengyou
                You can't. Hash codes are not guaranteed unique.
                I would verify the following statement: two equal strings have the same hashcode and two different strings always have the different hashcodes.
                If it is true, then the hashcode in my context is unique.
                Two equal strings always have the same hashcode, true.
                Two different strings always have different hashcodes, false.
                Think about the number of different ints compared to the number of different Strings. There are a heck of a lot more possible Strings than ints.
                • 5. Re: String's hashCode()
                  800344
                  jbish wrote:
                  pengyou wrote:
                  sabre150 wrote:
                  pengyou wrote:
                  Dear all,
                  I have an unique id made of a string. I used hashCode() to covert it to int. However, I don't like an id have the negative value.
                  How can I convert the hashCode() I got to an unique positive number?
                  Regards.
                  Pengyou
                  You can't. Hash codes are not guaranteed unique.
                  I would verify the following statement: two equal strings have the same hashcode and two different strings always have the different hashcodes.
                  If it is true, then the hashcode in my context is unique.
                  Two equal strings always have the same hashcode, true.
                  Two different strings always have different hashcodes, false.
                  Think about the number of different ints compared to the number of different Strings. There are a heck of a lot more possible Strings than ints.
                  Yes, I realized that too. My second statement is false.
                  However, I still want to ask another question: when two different strings gives the same hashcode. This may be too hard to answer asmathematically. Then in my cases, strings has a length not too big, e.g. balbla12345, do I still have the risk to get the same hashcode for different strings. My feeling is that I still have the troubles but your opinion is welcome.
                  • 6. Re: String's hashCode()
                    807588
                    pengyou wrote:
                    However, I still want to ask another question: when two different strings gives the same hashcode. This may be too hard to answer asmathematically. Then in my cases, strings has a length not too big, e.g. balbla12345, do I still have the risk to get the same hashcode for different strings. My feeling is that I still have the troubles but your opinion is welcome.
                    You mean like
                    public class TwoMoreStringWithTheSameHashCode
                    {
                        public static void main(String[] args)
                        {
                            int hash1 = "CDEa".hashCode();
                            int hash2 = "CDFB".hashCode();
                            System.out.println(hash1 + " " + hash2);
                        }  
                    }
                    or for your example
                    public class TwoStringWithTheSameHashCode
                    {
                        public static void main(String[] args)
                        {
                            int hash1 = "balbla12345".hashCode();
                            int hash2 = "bamCla12345".hashCode();
                            System.out.println(hash1 + " " + hash2);
                        }  
                    }
                    Edited by: sabre150 on Jun 24, 2009 2:20 PM
                    • 7. Re: String's hashCode()
                      800344
                      sabre150 wrote:
                      pengyou wrote:
                      However, I still want to ask another question: when two different strings gives the same hashcode. This may be too hard to answer asmathematically. Then in my cases, strings has a length not too big, e.g. balbla12345, do I still have the risk to get the same hashcode for different strings. My feeling is that I still have the troubles but your opinion is welcome.
                      You mean like
                      public class TwoMoreStringWithTheSameHashCode
                      {
                      public static void main(String[] args)
                      {
                      int hash1 = "CDEa".hashCode();
                      int hash2 = "CDFB".hashCode();
                      System.out.println(hash1 + " " + hash2);
                      }  
                      }
                      Thanks. You are right.
                      But I don't see how you can quickly construt an counter example. For the following?

                      Returns a hash code for this string. The hash code for a String object is computed as

                      s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]


                      using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
                      • 8. Re: String's hashCode()
                        807588
                        pengyou wrote:
                        Thanks. You are right.
                        But I don't see how you can quickly construt an counter example.
                        Easy. The assumption that hash codes are unique is a very common misunderstanding voiced in these forums. Based on the hash equation, it takes just a little thought see how to create the counter example.
                        • 9. Re: String's hashCode()
                          800344
                          sabre150 wrote:
                          pengyou wrote:
                          Thanks. You are right.
                          But I don't see how you can quickly construt an counter example.
                          Easy. The assumption that hash codes are unique is a very common misunderstanding voiced in these forums. Based on the hash equation, it takes just a little thought see how to create the counter example.
                          I see.
                          Now I have really a trouble in the context of Ehcache.
                          The net.sf.ehcache.Ehcache interface defines a method      
                          "get(java.lang.Object key) Gets an element from the cache".
                          As it will use the hashcode of Object which I overwrote by string's hashcode, then I might get the wrong element for different objects.
                          Does it mean Ehcache is wrong the the use of it is wrong?
                          • 10. Re: String's hashCode()
                            807588
                            I have no idea what net.sf.ehcache.Ehcache is/does but it should not matter that the hash code is not unique. If it does then net.sf.ehcache.Ehcache is rubbish.

                            P.S. java.util.HashMap and java.util.HashSet do not require the hashcode to be unique. In fact they will work when all the hash methods return the same value. It just makes them slower.
                            • 11. Re: String's hashCode()
                              JoachimSauer
                              pengyou wrote:
                              I see.
                              Now I have really a trouble in the context of Ehcache.
                              The net.sf.ehcache.Ehcache interface defines a method      
                              "get(java.lang.Object key) Gets an element from the cache".
                              As it will use the hashcode of Object which I overwrote by string's hashcode,
                              What? Could you rephrase that?
                              then I might get the wrong element for different objects.
                              Does it mean Ehcache is wrong the the use of it is wrong?
                              I'm not sure I get your question, and I don't know Ehcache, so it's possible that I get you entirely wrong.

                              But, there's a common way to use hashCodes and it is used in caching as well, so it might be worth explaining:

                              As you know the hashCode() of two equal() objects (i.e. two Object a and b for which a.equals(b) and b.equals(a) return true) must always be the same.

                              As you learned in this thread, the opposite need not be true: two Objects for which a.equals(b) and b.equals(a) return false, might have the same hashCode(). In most (good) hashCode() implementation that occasion is relatively rare, but it can happen.

                              Now if you use the hashCode() to distribute a Set of objects into separate buckets, then that's fine, as long as the hashCode is not the only criterium that's used for retrieving the object.

                              That means that if you have found an object with the same hashCode(), then you need to check if it is indeed the correct one by calling equals(). The trick is that you only have to do that equals() call for very few objects, since most objects will have different hashCode()s.
                              • 12. Re: String's hashCode()
                                800344
                                Now if you use the hashCode() to distribute a Set of objects into separate buckets, then that's fine, as long as the hashCode is not the only criterium that's used for retrieving the object.
                                Now I have to check the implementation of Ehcache to see if they only use hashCode of the key for retrieving the object. I am afraid that is true.
                                • 13. Re: String's hashCode()
                                  800344
                                  I checked Ehcache. In fact, they not only used hashCode but also equals method. In my key object, I overwrite the equals method. So if the two objects are different, even if they give the same hashCode, the equals checking will make them different.

                                  Now the issue can be closed by: If the unique id is a string, converting it to hashcode will lose its uniqueness.

                                  Thanks for all your replies.
                                  • 14. Re: String's hashCode()
                                    807588
                                    The documentation for String gives the formula it uses for hashCode :
                                     s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
                                    So a moment's thought will be enough to know how to construct example strings with the same hashCode.

                                    Also, think about the pidgeon-hole principle: hashCode returns an int, a 32 bit value. A char is a 16 bit value, so just two chars cover the whole range of 32 bit patterns. Even if you restrict yourself to 7 bit ASCII codes, 5x7bits = 35bits, so it's impossible to give all strings of ASCII chars of length 5 a unique int hashCode.