1 2 3 4 Previous Next 57 Replies Latest reply: Mar 21, 2013 9:30 AM by 998363 Go to original post RSS
      • 15. Re: Removing duplicate file entries from LinkedHashSet
        DrClap
        dannyyates wrote:
        So, you could easily have two paths that point to the same file and don't equals() each other. You might have to write a custom comparator which compares based on the canonical name.
        On the other hand, the API documentation for File's equals() method says:
        Returns true if and only if the argument is not null and is an abstract pathname that denotes the same file or directory as this abstract pathname.
        So you're alleging there are obvious bugs in this method. Do post the results of your tests when you run them.
        • 16. Re: Removing duplicate file entries from LinkedHashSet
          796085
          DrClap wrote:
          So you're alleging there are obvious bugs in this method.
          Not at all.
          Do post the results of your tests when you run them.
          Here ya go:
          public class Test {
              public static void main(final String[] args) {
                  final File curDir = new File(System.getProperty("user.dir"));
                  System.out.println(curDir);
                  listFiles(curDir);
          
                  final File srcDir = new File("/java/eclipse/workspace/scratch/");
                  System.out.println(srcDir);
                  listFiles(srcDir);
          
                  System.out.println(srcDir.equals(curDir));
                  System.out.println(srcDir.getAbsolutePath().equals(curDir.getAbsolutePath()));
              }
          
              private static void listFiles(final File dir) {
                  System.out.println("Listing of folder: " + dir);
                  for (final File file: dir.listFiles()) {
                      System.out.println("    " + file);
                  }
                  System.out.println();
              }
          }
          • 17. Re: Removing duplicate file entries from LinkedHashSet
            796085
            In fact, here's an even better example:
            package thread5430780;
            
            import java.io.File;
            import java.io.IOException;
            
            public class Test {
                public static void main(final String[] args) {
                    try {
                        final File curDir = new File(System.getProperty("user.dir"));
                        System.out.println(curDir);
                        listFiles(curDir);
            
                        // This is the same as the directory above, but with the drive letter removed (D: in my case)
                        final File srcDir = new File("/java/eclipse/workspace/scratch/");  
                        System.out.println(srcDir);  // Note here that Java has replaced all the / with \, and removed the trailing / - it has "normalised" the path
                        listFiles(srcDir);
            
                        // This is the same directory again, but referred to using a relative path
                        final File srcDir2 = new File("../scratch");
                        System.out.println(srcDir2);
                        listFiles(srcDir2);
            
                        System.out.println(srcDir.equals(curDir));  // false - the normalised forms differ (one has D:, one does not)
                        System.out.println(srcDir.equals(srcDir2));  // false
                        System.out.println(srcDir.getAbsolutePath().equals(curDir.getAbsolutePath()));  // true
                        System.out.println(srcDir.getAbsolutePath().equals(srcDir2.getAbsolutePath()));  // false - which I'm a bit surprised about
                        System.out.println(srcDir.getCanonicalPath().equals(srcDir2.getCanonicalPath()));  // true
                    } catch (final IOException e) {
                        e.printStackTrace();
                    }
                }
            
                private static void listFiles(final File dir) {
                    System.out.println("Listing of folder: " + dir);
                    for (final File file: dir.listFiles()) {
                        System.out.println("    " + file);
                    }
                    System.out.println();
                }
            }
            Note that all three File objects refer to the same folder (as witnessed by the directory listings - obviously you'll need to substitute your own paths for the second and third), but the only reliable equality test is via getCanonicalPath()
            • 18. Re: Removing duplicate file entries from LinkedHashSet
              3004
              DrClap wrote:
              So you're alleging there are obvious bugs in this method
              Unforunately, there are.
              . Do post the results of your tests when you run them.
              package scratch;
              
              import java.io.*;
              
              public class FileEqualsIsBroken {
                public static void main(String... args) throws Exception {
                  System.out.println(System.getProperty("user.dir"));  // C:\cygwin\tmp
                  File f1 = new File(".");
                  File f2 = new File("C:/cygwin/tmp");
                  File f3 = new File("C:\\cygwin\\tmp");
                  File f4 = new File(System.getProperty("user.dir"));
              
                  String f1cp = f1.getCanonicalPath();
                  String f2cp = f2.getCanonicalPath();
                  String f3cp = f3.getCanonicalPath();
                  String f4cp = f4.getCanonicalPath();
              
                  System.out.println("Are all canonical paths equal? " +
                    (f1cp.equals(f2cp) && f1cp.equals(f3cp) && f1cp.equals(f4cp)));
              
                  System.out.println ("f1.equals(f2): " + f1.equals (f2));
                  System.out.println ("f1.equals(f3): " + f1.equals (f3));
                  System.out.println ("f1.equals(f4): " + f1.equals (f4));
                  System.out.println ();
                  System.out.println ("f2.equals(f3): " + f2.equals (f3));
                  System.out.println ("f2.equals(f4): " + f2.equals (f4));
                  System.out.println ();
                  System.out.println ("f3.equals(f4): " + f3.equals (f4));
                }
              }
              :; java -cp . FileEqualsIsBroken
              C:\cygwin\tmp
              Are all canonical paths equal? true
              f1.equals(f2): false
              f1.equals(f3): false
              f1.equals(f4): false

              f2.equals(f3): true
              f2.equals(f4): true

              f3.equals(f4): true
              • 19. Re: Removing duplicate file entries from LinkedHashSet
                843793
                Yeah, it might (and even should be) considered as a bug. Unfortunately. One more question. Consider such situation: create a text file (test.txt) with some content inside. Then copy it to /home/myname and /home/myname/test folders. What in this case? In our human understanding these two files can be considered as identical, but file path comparison will say something different. In such situation file comparison should be expanded but maybe this is a specific case.
                • 20. Re: Removing duplicate file entries from LinkedHashSet
                  EJP
                  Now that case is perfectly clear. File.equals() specifically isn't concerned with the contents of the file, or even with whether the file actually exists. It is about the name.
                  • 21. Re: Removing duplicate file entries from LinkedHashSet
                    796085
                    The File class isn't well named. It should probably be called Path. It only represents a path to a file (note: not the path - some OSes can have multiple paths to the same file).
                    • 22. Re: Removing duplicate file entries from LinkedHashSet
                      843793
                      That's a clever clue ;] Maybe java 7 nio features will solve some problems.
                      • 23. Re: Removing duplicate file entries from LinkedHashSet
                        843793
                        DrClap wrote:Did you consider my suggestion to do some debugging? Print out paths which you think should be the same but which the code says are not the same?
                        I musta missed the part where you suggested debugging.

                        Ok, so I substituted getName() with getPath(), and it turns out that it doesn't view them as the same.
                        $ java MyClass MyClass.java ./
                        MyClass.java
                        ./MyClass.java
                        ./MyClass.class
                        ..........

                        If I pass it in as just a file in the current directory, then it's stripping the path completely.

                        If I use getCanonicalPath, then it also gives me the file in the list twice, but the paths are identical. So internally, equals must be using getPath().

                        [Updated]
                        Using getAbsolutePath also shows the paths as being different. In that case I get output like
                        /home/myuser/workspace/myclass/MyClass.java
                        /home/myuser/workspace/myclass/./MyClass.class
                        /home/myuser/workspace/myclass/./MyClass.java

                        I would say this is not exactly intuitive behavior.

                        Edited by: mreeves on Mar 8, 2010 7:04 AM
                        • 24. Re: Removing duplicate file entries from LinkedHashSet
                          843793
                          jverd wrote:
                          DrClap wrote:
                          So you're alleging there are obvious bugs in this method
                          Unforunately, there are.
                          So, as someone who's somewhat green, this seems to be a bug. As I said, the way this functions is not intuitively obvious...at least not to me.

                          Is that the concensus here? Does this need to be posted as a bug? Would someone like to do that? Cause I wouldn't even know where to start, or how to describe it accurately.
                          • 25. Re: Removing duplicate file entries from LinkedHashSet
                            796085
                            As I've said (repeatedly), use getCanoncialPath() or getCanoncialFile(). Note, however, that a file can still have multiple canonical names. Consider, for example, the case of hard links (not symbolic links) under Unix.

                            I agree that the behaviour of equals() does seem counterintuative, in that one would expect it to compare the canonical form of the path name. However, getCanonicalPath()/getCanonicalName() both throw IOException because they may need to make queries of the filesystem, so integrating them into equals() would be tricky.
                            • 26. Re: Removing duplicate file entries from LinkedHashSet
                              843793
                              dannyyates wrote:
                              I agree that the behaviour of equals() does seem counterintuative, in that one would expect it to compare the canonical form of the path name. However, getCanonicalPath()/getCanonicalName() both throw IOException because they may need to make queries of the filesystem, so integrating them into equals() would be tricky.
                              That sounds an implementation detail that those at Sun/Oracle should deal with. I'm not interested in knowing what the paths are, per se. I just want the LinkedHashSet to work as expected. I shouldn't have to implement my own subClass for this, it should just work. java.io.File is a core piece of the Java library for goodness sakes.

                              Tell me if that sounds unreasonable, but still.....I'm just saying.
                              • 27. Re: Removing duplicate file entries from LinkedHashSet
                                DrClap
                                mreeves wrote:
                                jverd wrote:
                                DrClap wrote:
                                So you're alleging there are obvious bugs in this method
                                Unforunately, there are.
                                So, as someone who's somewhat green, this seems to be a bug. As I said, the way this functions is not intuitively obvious...at least not to me.

                                Is that the concensus here? Does this need to be posted as a bug? Would someone like to do that? Cause I wouldn't even know where to start, or how to describe it accurately.
                                It's a bug in my opinion. But as dannyyates says, the behaviour of the method can't be changed to match the documentation, so if Oracle did anything at all it would be to fix the documentation to match the behaviour.

                                You surely can't be the first person to bring this to light in the dozen years it's been around, so I would expect there is already a bug report for it. But then I expected the API documentation to be correct, so I could be wrong about that too.
                                • 28. Re: Removing duplicate file entries from LinkedHashSet
                                  DrClap
                                  mreeves wrote:
                                  dannyyates wrote:
                                  I agree that the behaviour of equals() does seem counterintuative, in that one would expect it to compare the canonical form of the path name. However, getCanonicalPath()/getCanonicalName() both throw IOException because they may need to make queries of the filesystem, so integrating them into equals() would be tricky.
                                  That sounds an implementation detail that those at Sun/Oracle should deal with. I'm not interested in knowing what the paths are, per se. I just want the LinkedHashSet to work as expected. I shouldn't have to implement my own subClass for this, it should just work. java.io.File is a core piece of the Java library for goodness sakes.
                                  Yes, writing a subclass is overkill. Your workaround is still to use the canonical path of your directories, but simply something like this:
                                  filename = new File( args[i] ).getCanonicalFile();
                                  • 29. Re: Removing duplicate file entries from LinkedHashSet
                                    3004
                                    elOpalo wrote:
                                    Yeah, it might (and even should be) considered as a bug.
                                    It's definitely a bug. It's behaving differently from the documentation.