1 2 3 4 Previous Next 57 Replies Latest reply: Mar 21, 2013 9:30 AM by PhHein Go to original post RSS
      • 30. Re: Removing duplicate file entries from LinkedHashSet
        3004
        ejp wrote:
        Now that case is perfectly clear. File.equals() specifically isn't concerned with the contents of the file, or even with whether the file actually exists. It is about the name.
        It's also about the underlying file system entity's identity. The documentation for equals() reflects that, but the behavior does not. Hence, bug.
        • 31. Re: Removing duplicate file entries from LinkedHashSet
          3004
          dannyyates wrote:
          I agree that the behaviour of equals() does seem counterintuative,
          It's more than counterintuitive. It goes against its own documentation.
          • 32. Re: Removing duplicate file entries from LinkedHashSet
            3004
            mreeves wrote:
            Does this need to be posted as a bug?
            I think it should be. I'd be surprised if something that glaring hasn't already been posted.
            Would someone like to do that?
            I might get around to it eventually, but if it's important to you, go ahead and take a stab at it. Consider it a learning experience. :-)
            • 33. Re: Removing duplicate file entries from LinkedHashSet
              843793
              jverd wrote:
              mreeves wrote:
              Does this need to be posted as a bug?
              I think it should be. I'd be surprised if something that glaring hasn't already been posted.
              I've seen a similar problem on SO some time ago (edit: [here it is|http://stackoverflow.com/questions/2275362/java-file-exists-inconsistencies-when-setting-user-dir]).

              It turned out that the problem there was that the developer set the system property "user.dir". That's a big no-no and it leads to very strange behaviour. I didn't read the whole thread (too lazy), but could that be the problem here?
              • 34. Re: Removing duplicate file entries from LinkedHashSet
                jtahlborn
                jverd wrote:
                elOpalo wrote:
                Yeah, it might (and even should be) considered as a bug.
                It's definitely a bug. It's behaving differently from the documentation.
                Actually, it's not. The javadoc states that it tests the equality of the "abstract pathname". If you read through the javadoc for File, you should come to understand that an "abstract pathname" is not the same as a "canonical pathname" (which is why they provided the getCanonical methods in the first place). I agree it is confusing and could possibly be stated more explicitly in more places, but the javadoc as it stands is correct.
                • 35. Re: Removing duplicate file entries from LinkedHashSet
                  796085
                  jverd wrote:
                  ejp wrote:
                  Now that case is perfectly clear. File.equals() specifically isn't concerned with the contents of the file, or even with whether the file actually exists. It is about the name.
                  It's also about the underlying file system entity's identity. The documentation for equals() reflects that, but the behavior does not. Hence, bug.
                  Except, as I already said, on some filesystems there is no "the [...] entity's identity". Consider hard links on a Unix FS.
                  • 36. Re: Removing duplicate file entries from LinkedHashSet
                    796085
                    DrClap wrote:
                    mreeves wrote:
                    dannyyates wrote:
                    I agree that the behaviour of equals() does seem counterintuative, in that one would expect it to compare the canonical form of the path name. However, getCanonicalPath()/getCanonicalName() both throw IOException because they may need to make queries of the filesystem, so integrating them into equals() would be tricky.
                    That sounds an implementation detail that those at Sun/Oracle should deal with. I'm not interested in knowing what the paths are, per se. I just want the LinkedHashSet to work as expected. I shouldn't have to implement my own subClass for this, it should just work. java.io.File is a core piece of the Java library for goodness sakes.
                    Yes, writing a subclass is overkill. Your workaround is still to use the canonical path of your directories, but simply something like this:
                    filename = new File( args[i] ).getCanonicalFile();
                    If you were using something other than LinkedHashSet (i.e. a collection that was properly sorted and took a Comparator), you could write a Comparator. (I think I already mentioned that, but failed to notice that LinkedHashSet isn't actually sorted, it's merely ordered - hence why it doesn't implement SortedSet!)
                    • 37. Re: Removing duplicate file entries from LinkedHashSet
                      843793
                      DrClap wrote:

                      Yes, writing a subclass is overkill. Your workaround is still to use the canonical path of your directories, but simply something like this:
                      filename = new File( args[i] ).getCanonicalFile();
                      Thanks DrClap. That workaround is what's important to me for right now, and it solved the problem I was having.
                      • 38. Re: Removing duplicate file entries from LinkedHashSet
                        843793
                        dannyyates wrote:
                        DrClap wrote:
                        mreeves wrote:
                        dannyyates wrote:
                        I agree that the behaviour of equals() does seem counterintuative, in that one would expect it to compare the canonical form of the path name. However, getCanonicalPath()/getCanonicalName() both throw IOException because they may need to make queries of the filesystem, so integrating them into equals() would be tricky.
                        That sounds an implementation detail that those at Sun/Oracle should deal with. I'm not interested in knowing what the paths are, per se. I just want the LinkedHashSet to work as expected. I shouldn't have to implement my own subClass for this, it should just work. java.io.File is a core piece of the Java library for goodness sakes.
                        Yes, writing a subclass is overkill. Your workaround is still to use the canonical path of your directories, but simply something like this:
                        filename = new File( args[i] ).getCanonicalFile();
                        If you were using something other than LinkedHashSet (i.e. a collection that was properly sorted and took a Comparator), you could write a Comparator. (I think I already mentioned that, but failed to notice that LinkedHashSet isn't actually sorted, it's merely ordered - hence why it doesn't implement SortedSet!)
                        Unfortunately, it looks like I may have to get into this. Because it looks like I need that list sorted as well. My ultimate goal here is to pipe this into a function that runs a checksum on all the files.

                        So it seems that I need to sort this list before I do that, or I can't count on the results being the same each time.
                        • 39. Re: Removing duplicate file entries from LinkedHashSet
                          DrClap
                          jtahlborn wrote:
                          jverd wrote:
                          elOpalo wrote:
                          Yeah, it might (and even should be) considered as a bug.
                          It's definitely a bug. It's behaving differently from the documentation.
                          Actually, it's not. The javadoc states that it tests the equality of the "abstract pathname".
                          No, it doesn't say that. It says
                          "Returns true if and only if the argument is not null and is an abstract pathname that *denotes the same file or directory* as this abstract pathname."
                          It's clearly talking about the underlying file system here, not the abstract pathname. And it's clear that the comparison is not meant to be just a string comparison of the two abstract pathnames.
                          • 40. Re: Removing duplicate file entries from LinkedHashSet
                            3004
                            jtahlborn wrote:
                            jverd wrote:
                            elOpalo wrote:
                            Yeah, it might (and even should be) considered as a bug.
                            It's definitely a bug. It's behaving differently from the documentation.
                            Actually, it's not. The javadoc states that it tests the equality of the "abstract pathname". If you read through the javadoc for File, you should come to understand that an "abstract pathname" is not the same as a "canonical pathname" (which is why they provided the getCanonical methods in the first place).
                            Javadoc says:
                            . Returns true if and only if the argument is not null and is an abstract pathname that denotes the same file or directory as this abstract pathname.
                            Doesn't matter if "abstract pathname" is or is not the same as "canonical pathname." It's not about whether the pathnames are equal. It's whether they refer to the same file or directory. In my test, they do refer to the same directory, but yet equals() gives false.

                            Ergo, bug, QED.
                            I agree it is confusing and could possibly be stated more explicitly in more places, but the javadoc as it stands is correct.
                            Nope.

                            Edited by: jverd on Mar 8, 2010 10:39 AM
                            • 41. Re: Removing duplicate file entries from LinkedHashSet
                              3004
                              dannyyates wrote:
                              jverd wrote:
                              ejp wrote:
                              Now that case is perfectly clear. File.equals() specifically isn't concerned with the contents of the file, or even with whether the file actually exists. It is about the name.
                              It's also about the underlying file system entity's identity. The documentation for equals() reflects that, but the behavior does not. Hence, bug.
                              Except, as I already said, on some filesystems there is no "the [...] entity's identity". Consider hard links on a Unix FS.
                              Maybe "identity" was a bad term then. The point I was trying to make--which still stands--is that it returns false in a situation when it is quite unambiguously documented to return true.
                              • 42. Re: Removing duplicate file entries from LinkedHashSet
                                jtahlborn
                                DrClap wrote:
                                jtahlborn wrote:
                                jverd wrote:
                                elOpalo wrote:
                                Yeah, it might (and even should be) considered as a bug.
                                It's definitely a bug. It's behaving differently from the documentation.
                                Actually, it's not. The javadoc states that it tests the equality of the "abstract pathname".
                                No, it doesn't say that. It says
                                "Returns true if and only if the argument is not null and is an abstract pathname that *denotes the same file or directory* as this abstract pathname."
                                It's clearly talking about the underlying file system here, not the abstract pathname. And it's clear that the comparison is not meant to be just a string comparison of the two abstract pathnames.
                                I can see your point of view on this. I guess the main problem is that the javadoc includes "if and only if". If the javadoc were changed to just "if" it would be correct. In other words, returning "false" does not guarantee that the abstract pathnames do not refer to the same file/directory.
                                • 43. Re: Removing duplicate file entries from LinkedHashSet
                                  jtahlborn
                                  FYI [http://bugs.sun.com/view_bug.do?bug_id=4787260|http://bugs.sun.com/view_bug.do?bug_id=4787260]
                                  • 44. Re: Removing duplicate file entries from LinkedHashSet
                                    3004
                                    jtahlborn wrote:
                                    DrClap wrote:
                                    jtahlborn wrote:
                                    jverd wrote:
                                    elOpalo wrote:
                                    Yeah, it might (and even should be) considered as a bug.
                                    It's definitely a bug. It's behaving differently from the documentation.
                                    Actually, it's not. The javadoc states that it tests the equality of the "abstract pathname".
                                    No, it doesn't say that. It says
                                    "Returns true if and only if the argument is not null and is an abstract pathname that *denotes the same file or directory* as this abstract pathname."
                                    It's clearly talking about the underlying file system here, not the abstract pathname. And it's clear that the comparison is not meant to be just a string comparison of the two abstract pathnames.
                                    I can see your point of view on this. I guess the main problem is that the javadoc includes "if and only if". If the javadoc were changed to just "if" it would be correct.
                                    No, the "only if" is not the problem. Let's try it with just "if":
                                    Returns true if the argument is not null and is an abstract pathname that denotes the same file or directory as this abstract pathname.
                                    Do they refer to the same file or directory? Yes. Therefore, by the above, must it return true? Yes. "Returns true if they refer to the same file" means exactly that.

                                    Removing "only if" means that it could return true even if they did not refer to the same file or directory, so it would address the opposite failure mode from what I'm seeing.
                                    In other words, returning "false" does not guarantee that the abstract pathnames do not refer to the same file/directory.
                                    Yes it does, regardless of whether the "only if" is present. Or rather, is supposed to according to the docs.