9 Replies Latest reply: Sep 18, 2008 8:24 AM by 807589 RSS

    Comparing 2 files contents

    807589
      Hello,

      I am trying to find out if 2 given file's contents are the same or not. For this I try to determine file's hash and compare them.
      The problem is that even though the files have the same content the hashes are distinct. Anyone any idea?
      try
                     {
                     InputStream fisSource =  new FileInputStream(new File(sourceFile));
                        byte[] bufferSource = new byte[1024];
                        MessageDigest sourceDigest = MessageDigest.getInstance("MD5");
                        int numReadSource;
                        do 
                        {
                             numReadSource = fisSource.read(bufferSource);
                              if (numReadSource > 0) 
                              {
                                   sourceDigest.update(bufferSource, 0, numReadSource);
                              }
                        } 
                        while (numReadSource != -1);
                        fisSource.close();
                        sourceHash = sourceDigest.digest().hashCode();
                        System.out.println("source digest len: " +sourceDigest.digest().length );
                        System.out.println("source digest: " +sourceDigest.digest());
                        
                        //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                        
                        InputStream fisDest =  new FileInputStream(new File(destinationFile));
                        byte[] bufferDest = new byte[1024];
                        MessageDigest destDigest = MessageDigest.getInstance("MD5");
                        int numReadDest;
                        do 
                        {
                             numReadDest = fisDest.read(bufferDest);
                              if (numReadDest > 0) 
                              {
                                   destDigest.update(bufferDest, 0, numReadDest);
                              }
                        } 
                        while (numReadDest != -1);
                        fisDest.close();
                        destHash = destDigest.digest().hashCode();
                        System.out.println("dest digest len: " +destDigest.digest().length );
                        System.out.println("source digest: " +destDigest.digest());
                        
                        //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                        
                        System.out.println("dest equals source= " + destDigest.digest().equals(sourceDigest.digest()));
                        System.out.println("sourceHash: " + sourceHash + "  destHash: " + destHash);
                          if( sourceHash!=destHash)
                          {
                               distinctHashes = true;
                          }
      Results:
      source digest len: 16
      source digest: [B@19d6ae
      dest digest len: 16
      source digest: [B@42f35f
      dest equals source= false
      sourceHash: 9391010  destHash: 6121358
      thanks,
      johnny
        • 1. Re: Comparing 2 files contents
          807589
          public byte[] digest() 
          return a byte[]

          You compare the hash values of two byte-Arrays. The hash values are not nessecary equal for the same content. But you could use
           public static boolean isEqual(byte digesta[], byte digestb[]) 
          to compare the two byte[]
          • 2. Re: Comparing 2 files contents
            807589
            Domi27 wrote:
            public byte[] digest() 
            return a byte[]

            You compare the hash values of two byte-Arrays. The hash values are not nessecary equal for the same content. But you could use
             public static boolean isEqual(byte digesta[], byte digestb[]) 
            to compare the two byte[]
            Which is available in the java.util.Arrays class as the equals() method.
            • 3. Re: Comparing 2 files contents
              807589
              Did you edit your post, I think I saw Array instead of Arrays.

              But you are right. The Arrays method works as well.
              • 4. Re: Comparing 2 files contents
                807589
                Domi27 wrote:
                Did you edit your post, I think I saw Array instead of Arrays.
                Yep! I corrected the typo.
                • 5. Re: Comparing 2 files contents
                  807589
                  Thanks for your responses. It makes sense.
                  However, the digests byte[] are different:

                  source digest: [B@19d6ae
                  dest digest: [B@42f35f                                                                                                                                                                                                                                                                               
                  • 6. Re: Comparing 2 files contents
                    793982
                    I runned his code and even if I use:
                    Arrays.equals(sourceDigest.digest(),destDigest.digest());
                    //or
                    MessageDigest.isEqual(sourceDigest.digest(),destDigest.digest()));
                    I got true no matter what. There's something wrong with the digest.
                    The only good solution I got was to compare the raw bytes read from the files.
                    Let's say the file is all read at once and fits to the buffer, then:
                    Arrays.equals(bufferSource,bufferDest);
                    Returns correct results.
                    • 7. Re: Comparing 2 files contents
                      807589
                      sorry. my mistake. Digests are equal. I was printing out byte[] addresses not their value.

                      Edited by: smeag0l on Sep 18, 2008 4:11 PM
                      • 8. Re: Comparing 2 files contents
                        807589
                        I always used FileUtils.contentEquals(file1, file2) from

                        [http://commons.apache.org/io/api-release/org/apache/commons/io/FileUtils.html]

                        for things like that.
                        • 9. Re: Comparing 2 files contents
                          807589
                          Hmmm. that's good to know. Thanks for sharing :-)
                          Next time I will keep in mind using this easier method instead.