The Open Road: java.nio.file Blog

Version 2


    Editor's Note: Along with exotic and ambitious proposed features like the Java Module System, closures, and language-level XML support, do you suppose Java 7 will provide us a reliable file-copy method? It could happen, as a JSR for "More New I/O APIs for the Java Platform" appears to be a likely candidate for inclusion in Java 7. In this installment of The Open Road, Elliotte Rusty Harold takes a detailed look at the current state of the NIO2 spec and how it will, and sometimes won't, help you work with files.

    Before we begin, here's a brief update on the status of theOpenJDK 7 project. The most recent JDK 7 drop is Build b29, posted June 20. A look at b29's summary of changes shows this to mainly be a bug-fix beta, with defects cleared in the compiler, build scripts, AWT, and a few other areas. Releases from the project have been coming out every two weeks or so since April -- taking an unsurprising break in early May for JavaOne -- with b28,b27,b26, and b25continuing to fix defects and add minor features, such as JMX support for platform MXBeans of any type (bug 6610094), and an IO redirection API for sub-processes (bug 4960438).

    And speaking of bugs, take a look at bug 4032604, "Copy method in class" The first two comments on the bug were posted by the author and the editor of the article you're about to read -- 11 and nine years ago, respectively. Will we finally get our wish? Read on for Elliotte's answer.

    Java is a cross-platform language and environment. However, the Java VM itself needs to communicate with the native processor, operating system, and file system. If native code is to be avoided, everything you'd rely on the OS for in a classic program has to be provided by Java instead. Reimplementing a complete virtual OS and API takes a while. In Java's case, and specifically in the case of the file system, the job has taken over a decade; and it still isn't done. Nonetheless, Java 7 may finally finish some of the last abstractions needed to create filesystem-aware programs with all the power of their native counterparts.

    A Little History

    Sometimes it's the little things that are most annoying, like the mosquito that won't stop buzzing around your bed at night. Sometimes these irritants grow over time. In Java 1.0 the language was so new we barely noticed that it had no reliable way to copy or move a file. In Java 1.1 we were so happy about internationalization and readers and writers that we figured moving and copying files would surely come in the next release. In Java 1.2 we got distracted by Swing, and didn't think much about I/O. By Java 1.3, however, we were starting to get a little antsy. Surely Sun could have offered us file copies by now? We were definitely getting a little tired of running long streams just to move a file from point A to point B while losing all the metadata in the process. We were very tired of shelling out to native code to move files because renameTo() only worked on an unpredictable subset of the systems we needed to run on. But Sun promised that they'd get around to a decent file system interface in Java 1.4.

    Java 1.4 arrived, and it was full of buffers and channels and charsets and more in a spanking new java.nio package. Unfortunately, the new filesystem interface we'd been promised was nowhere to be found. Seems the developers working onjava.nio had gotten so excited about non-blocking I/O and memory-mapped files that they ran out of time or just plum forgot about their promise to finally let us move and copy files.

    Java 5, and then Java 6, came and went with nary a file copy operation in sight, though Sun did manage to find time to invent the most complicated and simultaneously least powerful generics implementation I've ever seen. It was starting to feel like the priorities were more than a little skewed over at the JCP. Complicated, sexy proposals like generics, closures, and asynchronous I/O got a lot more attention than they deserved while basic, fundamental, and easy but unsexy functionality, such as copying and moving files, was starved for resources.

    However, finally in Java 7, it looks like there's at least a 50-50 chance we'll get a filesystem API that's more powerful than the clunky old that was thrown together twelve years ago to push Java 1.0 out the door. Sun, IBM, Oracle, Intel, HP, Google, NTT, and Doug Lea are working on JSR 203 to create "More New I/O APIs for the Java Platform ('NIO.2')". Don't hold your breath yet, but do keep your fingers crossed. Maybe, just maybe, we'll finally be able to copy files in Java 7.

    The Basics

    In Java 7 won't be deprecated, but it probably should be. Instead, all files should be referenced through the java.nio.file package. What used to be will now become ajava.nio.file.Path. This is a more accurate name, since there was never any guarantee that a File object mapped to a real file. Also, paths can refer to both files and directories.

    The Path class is abstract and has no constructors. Instead, you'll ask a FileSystem object to create a path for you. This way, it can create a path that's specific to the type of file system: Windows, Unix, Mac OS X, network, zip archive, or something else. For example, this is how you create aPath object for the file in/home/elharo/articles/ on the local default file system:

    FileSystem local = FileSystems.getDefault(); Path p = local.getPath("/home/elharo/articles/");

    You can create relative paths, too. These are relative to the current working directory:

    Path p = fileSystem.getPath("articles/");

    Most of the time, you'll just want to use the operating system's native file system, which is available via theFileSystem.getDefault() method. If this is all you want, and usually it is, the static Path.get() method saves you a few columns of horizontal space:

    Path p = Path.get("/home/elharo/articles/");

    However, you can install other file systems that point somewhere other than the local file system. For instance, you could have a file system that accesses an HTTP server, reads a zip archive, mounts an ISO disk image, or views a Mercurial repository. Each such file system would have its own path and attribute classes. However, basic operations could still be performed with the abstract superclasses I discuss here.

    The possibility of alternate file systems gives us a second way to create path objects. Given a file system that supports a URI scheme, you can create a Path object from the URI. For example, imagine you've installed a RESTful file system provider that uses HTTP GET for reading, HTTP PUTfor writing, HTTP OPTIONS and HEAD for attributes, and HTTP DELETE for removal. Then you can point to a file on the server like so:

    Path p = Path.get(new URI("")); 

    There's also a toUri() method that converts an absolute Path to a filesystem-specific URI:

    String url = path.toAbsolutePath().toUri(); 

    Finally, if you're passed a object by some old code, you can convert it to the new hotness with the getFileRef()method:

    File f = new File("/foo/bar.txt"); Path p = f.getFileRef(); 

    Unfortunately, the more obvious name getPath() was already taken.

    So What Is a Path, Really?

    A Path stores a hierarchical list of names indexed from zero to one less than the length of the list. The last name in the list is the name of the individual file or directory the path refers to. The first name will be the root of the file system if the path is absolute. Other parts of the path are parent directories. These methods inspect this list:

    Path getName(int n)
    Returns the nth component in this path. The root of the path is 0. The file/directory itself is one less than the number of components of the path.
    int getNameCount()
    Returns the number of components in the path.
    Path getParent()
    Returns the path to the parent of this file or directory, or null if this path does not have a parent (that is, if this path is itself a root.)
    Path getRoot()
    Returns the root component of this path. For an absolute path on a Unix-like file system, this will be /. For an absolute path on a DOS-like file system, this will be something like C:\ or D:\. For a relative path, this will be null.

    Of course, paths aren't always quite perfect trees. In relative paths, the root is missing. Sometimes symbolic links cause the path to jump to a different subtree. The toAbsolutePath()method converts a path to an absolute path starting from a root of the file system, wherever that might be. InvokingtoRealPath(false) on a path removes path segments like/./ and /../ from the path before computing an absolute path. Invoking toRealPath(true)on a path removes path segments like /./ and/../ and also resolves all symbolic links before returning the absolute path.

    You can use several variants of the resolve method to create new paths from an existing path. For example, supposetemp points to /usr/tmp:

    Path temp = fileSystem.getPath("/usr/tmp");

    We can resolve other paths with this as the root. For example, resolving articles/ againsttemp creates a path pointing to/usr/tmp/articles/

    Path p = fileSystem.getPath("articles/"); Path resolved = temp.resolve(p);

    The inverse operation of resolution is relativization. Given an absolute path such as/usr/tmp/articles/, you can convert it to a path relative to some other path such as /usr/tmp:

    Path absolute = fileSystem.getPath("/usr/tmp/articles/"); Path temp = fileSystem.getPath("/usr/tmp"); Path relative = temp.relativize(absolute);

    If necessary, relativization can add ./ and ../ to the path to properly relativize. For example, here I calculate a relative link from an article in one directory to an article in another directory:

    Path article3 = fileSystem.getPath("/usr/tmp/articles/"); Path article7 = fileSystem.getPath("/usr/tmp/articles/developerWorks/article7.html"); Path link = article7.relativize(article3);

    link now points to../../

    These methods could be useful when setting up a templating system for a blog engine, or a content management system, and converting file paths to URLs, for example. If you use them that way, please do be careful that you don't accidentally let crackers go wandering all over the file system outside your content root, though.

    Reading and Writing

    To write to a path, you call newOutputStream(), and then use the returned object as normal. Example 1 shows a simple method to write the ASCII letters A through Z into a file in the current working directory named alphabet.txt:

    Example 1: Write the uppercase alphabet into an ASCII text file
    public void makeAlphabetFile throws IOException { Path p = Path.get("alphabet"); OutputStream out = p.newOutputStream(); for (int c = 'A'; c <= 'Z'; c++) { out.write(c); out.write('\n'); } out.close(); }

    This program will create the file if it doesn't exist, and overwrite it if it does. However, you can adjust this by passingStandardOpenOption.APPEND orStandardOpenOption.CREATE_NEW to thenewOutputStream() method:

    OutputStream out = p.newOutputStream(EnumSet<OpenOption>.of(StandardOpenOption.CREATE_NEW));

    Now alphabet.txt will be created if and only if it doesn't already exist. Otherwise the attempt will throw an exception.

    There are several options you can use when opening a file:

    StandardOpenOption.CREATE (default behavior for writes)
    Create a file if it does not already exist.
    Expect that the file does not already exist, and create it. Throw an exception if the file does already exist.
    Write new data to the end of the existing file.
    StandardOpenOption.TRUNCATE_EXISTING (default for writes)
    Remove all data from an existing file when opening it.
    Throw an exception if it is necessary to resolve a symbolic link to open a file.
    Suggest that the file will be sparse so the underlying operating system can optimize for that use case. File systems that don't support sparse files will ignore this hint.
    Write data to the underlying disk immediately. Do not use native buffering. This does not affect buffering internal to Java, such as that performed by BufferedInputStream andBufferedWriter.
    Write data and metadata (attributes) to the underlying disk immediately.
    Open for reading.
    Open for writing.

    These options apply not just in this method, but for all methods in the API that open files. Not all of these are mutually exclusive. You can use several when opening a file.

    You can buffer or otherwise filter these streams as normal. Example 2 shows a better alphabet() method that uses UTF-8 encoding, and buffers the data:

    Example 2: Write the upper case alphabet with buffering into a UTF-8 file
    public void makeAlphabetFile throws IOException { Path p = Path.get("alphabet"); OutputStream out = p.newOutputStream(); out = new BufferedOuputStream(out); Writer w = new OutputStreamWriter(out, "UTF-8"); w = new BufferedWriter(w); for (int c = 'A'; c <= 'Z'; c++) { w.write(c); w.write('\n'); } w.flush(); w.close(); }

    For reading, just use newInputStream() instead.

    You can also specify attributes for newly created files when opening a path for writing. I'll discuss those below.

    There are also methods that create channels instead, though on modern VMs, I'm skeptical whether that's really helpful or just more complex. Threading has improved so much in Java 6 that's it's no longer a problem to run thousands or even tens of thousands of streams in separate threads, thereby removing much of the impetus for using channels and non-blocking I/O in the first place. Perhaps the true asynchronous I/O also introduced with JSR-203 will make channels relevant again, but this remains to be seen.

    However, there is one case that definitely calls for channels: random access files. There's no specific newRandomAccessFile class. Instead you ask the path to give you a SeekableByteChannel:

    Path p = Path.get("fits.dat"); SeekableByteChannel raf = p.newSeekableByteChannel( StandardopenOption.READ, StandardOpenOption.WRITE, StandardopenOption.SYNC, StandardOpenOption.DSYNC ); 

    The SeekableByteChannel class is a new subinterface of ByteChannel that extends it with methods for moving the file pointer around in the file before reading or writing:

    public interface SeekableByteChannel extends ByteChannel { public int read(ByteBuffer dest) throws IOException; public int write(ByteBuffer source) throws IOException; public long position() throws IOException; public SeekableByteChannel position(long newPosition) throws IOException; public long size() throws IOException; public SeekableByteChannel truncate(long size) throws IOException; }

    Navigating the Filesystem

    To list the files in a directory you'll use aDirectoryStream, which is not really a stream at all. Instead, it's an Iterable that returnsDirectoryEntry objects from which you can get morePaths. These Path objects are all relative to their parent directories. The process starts with a call to the newDirectoryStream() method of the path representing a directory.

    Example 3 is a program that lists all the .txt files in the roots of the filesystem:

    Example 3: List all .txt files in the root directories
    import; import java.nio.file.*; public class TextLister { public static void main(String args) throws IOException { for (Path root : FileSystem.getRootDirectories()) { DirectoryStream txtFiles = root.newDirectoryStream("*.txt"); try { for (Path textFile : txtFiles) { System.out.println(textFile.getName()); } finally { txtFiles.close(); } } } }

    For filters beyond simple name filters -- for instance, filtering by size or MIME type -- you have to implement your own instance of the DirectoryStream.Filter interface to specify which files to accept and reject. For example, here's a simple filter that accepts files that are less than 100K in size:

    Example 4: A filter for "small" files
    public class SmallFilesOnly { public boolean accept(DirectoryEntry entry) { try { if (entry.newSeekableByteChannel().size() < 102400) { return true; } return false; } catch (IOException ex) { return false; } } }

    Unfortunately, you can't just pass an instance of this filter to the newDirectoryStream() as you might expect. Instead, you have to use a far less direct and more opaque means of listing the directory using the Files.withDirectorymethod:

    Example 5: List all "small" files in the root directories
    import; import java.nio.file.*; public class TextLister { public static void main(String args) throws IOException { for (Path root : FileSystem.getRootDirectories()) { Files.withDirectories(root, new SmallFilesOnly(), new DirectoryAction() { public void invoke(DirectoryEntry entry) { System.out.println(entry.getName()); } }); } } }

    I'm not sure what the working group has against simple, straightforward iteration, but instead we have to use this confusing closure-lite syntax. However, Java is not a language that was designed around closures, and closure-based methods like this just don't fit. There are just too many layers of indirection, and it's too hard to see what actually happens. For instance, in Example 5, can you tell me how to print the names of the first 10 entries, and then break? Doable, yes; but not trivial. Functional languages have their place, but they don't mix well with iterative-based languages like Java. Usable Java APIs should emphasize imperative design patterns, not functional ones.

    Copying Files

    Suppose you want to copy the file charm.txt in the directory cats to the file charm_harold.xml in the directory pets. Before Java 7, you had to open the source file and the destination file, read the entire contents from the source, and then write them to the destination. For a large file this could take a while, and usually you'd lose metadata such as permissions, owners, MIME types, archive flags, and such in the process. Example 6 shows how to accomplish this basic task in Java 7:

    Example 6: Copy the file charm.txt in the directorycats to the file charm_harold.xml in the directorypets
    FileSystem default = FileSystems.getDefault(); Path charm = default.getPath("cats/charm.txt"); Path pets = default.getPath("pets/charm_harold.xml"); charm.copyTo(pets);

    On many operating systems this will happen a lot faster than streaming data from one file to another. Furthermore, it should preserve all metadata that should be preserved. Security restrictions may prevent certain metadata from being copied, and other features such as the file creation time may be changed.

    Now suppose instead of copying a file you want to move a file. In Java 6 and earlier, all you could do rename the file, which worked on some operating systems but not on others, and usually didn't work for network volumes even if it worked for local disks. Or you could copy the file byte by byte, and then delete the original. Now however, it's this simple:

    Example 7: Move the file charm.txt in the directorycats to the file charm_harold.xml in the directorypets
    FileSystem default = FileSystems.getDefault(); Path charm = default.getPath("cats/charm.txt"); Path pets = default.getPath("pets/charm_harold.xml"); charm.moveTo(pets);

    This can be much faster even for very large files because most of the time no bits need to be moved at all. The local native file system simply needs to rewrite a few entries in a virtual table. Moves between physical disks or across the network do need to move bytes and will take finite time.

    These methods are synchronous and blocking. If that bothers you, just wrap the transfer in a FutureTask and pass it to an Executor.

    Of course, I/O is still an unsafe operation. These methods can throw IOExceptions if the source file doesn't exist, if the target directory is read-only, if a floppy is ejected while a copy is being written to it, if a network goes down while a file is being read, or any other such problems. As always, you'll need to wrap these operations in a try-catch block or declare that your method throws the relevant exception. You may also want to implement your own recovery logic. File copies and moves over the network or between disks take real time; and if an operation is interrupted in medias res, the target file may be half-written and in an inconsistent, corrupt state.

    By default, when a file is copied or moved:

    • The copy fails if the target file already exists.
    • File attributes may or may not be copied to the target file, in whole or in part.
    • If you're copying a symbolic link, the target of the link is copied, not the link itself.
    • If you're moving a symbolic link, the link is moved, not the target of the link. This is an asymmetry between copies and moves.
    • Directories are moved only if they're either empty or being moved to the same file system.

    Sometimes this is what you want, and sometimes it isn't. You can adjust the behavior of the copy/move by passing one more copy options to the copyTo() or moveTo()methods:

    • StandardCopyOption.REPLACE_EXISTING: Overwrite a preexisting target file.
    • StandardCopyOption.COPY_ATTRIBUTES: Preserve all the original's attributes in the copy.
    • StandardCopyOption.NOFOLLOW_LINKS: Do not follow symbolic links from the target when copying. Copy the links themselves instead.
    • StandardCopyOption.ATOMIC_MOVE: Copy/move the entire file or nothing.

    For example, if you want to overwrite an existing target when copying, pass StandardCopyOption.REPLACE_EXISTING tocopyTo like so:

    source.copyTo(target, StandardCopyOption.REPLACE_EXISTING);

    If you want to overwrite an existing target and preserve the original file attributes, passStandardCopyOption.REPLACE_EXISTING andStandardCopyOption.COPY_ATTRIBUTES:

    source.copyTo(target, StandardCopyOption.REPLACE_EXISTING, StandardCopyOption.COPY_ATTRIBUTES);

    Yes, the syntax does not look like the options for creating a stream. Those use an EnumSet while these use varargs.

    Particular file systems may support additional non-standard attributes, but these four are required.

    File Attributes

    Metadata about a file such as owners, permission, readability, and so forth has now been separated from the file class itself. You request attributes from a path using the newjava.nio.file.Attributes class like so:

    BasicFileAttributes attrs = Attributes.readBasicFileAttributes(path, false); 

    This only gives you the basic attributes that are common to most file systems, most of which have been available since Java 1.0. Example 8 is a simple program to list all the attributes for files named on the command line:

    Example 8: List the basic attributes of a file
    import; import java.nio.file.*; import java.nio.file.attribute.*; import java.util.concurrent.TimeUnit; public class AttributePrinter { public static void main(String args) throws IOException { for (String name : args) { Path p = Path.get(name); BasicFileAttributes attrs = Attributes.readBasicFileAttributes(path, false); TimeUnit scale = attrs.resolution(); // all dates are since the epoch but we do need to adjust for // different time units used in different file systems System.out.println(name + " was created at " + new Date(scale.toMillis(attrs.creationTime)); System.out.println(name + " was last access at " + new Date(scale.toMillis(attrs.lastAccessTime)); System.out.println(name + " was last modified at " + new Date(scale.toMillis(attrs.lastModifiedTime)); if (attrs.isDirectory()) { System.out.println(name + " is a directory."); } if (attrs.isFile()) { System.out.println(name + " is a normal file."); } if (attrs.isSymbolicLink()) { System.out.println(name + " is a symbolic link."); } if (attrs.isOther()) { System.out.println(name + " is something strange."); } System.out.println(name + " is " + attrs.size() + " bytes long."); System.out.println("There are " + attrs.linkCount() + " links to this file."); } } 

    These attributes are assumed to be more or less the same on different file systems, though this isn't always true. Not all file systems track the last access time, for example.

    You can ask for more platform-specific attributes with thereadDosFileAttributes() andreadPosixFileAttributes() methods. For example, Here's a simple program to list all the attributes for a Windows file named at the DOS prompt:

    Example 9: List the DOS attributes
    import; import java.nio.file.*; import java.nio.file.attribute.*; import java.util.concurrent.TimeUnit; public class WindowsAttributePrinter { public static void main(String args) throws IOException { for (String name : args) { Path p = Path.get(name); DosFileAttributes attrs = Attributes.readDosFileAttributes(path, false); if (attrs.isArchive()) { System.out.println(name + " is backed up."); } if (attrs.isReadOnly()) { System.out.println(name + " is read-only."); } if (attrs.isHidden()) { System.out.println(name + " is hidden."); } if (attrs.isSystem()) { System.out.println(name + " is a system file."); } } } 

    POSIX file attributes are group, owner, and permissions. You'll get an UnsupportedOperationException if you try to read DOS attributes from a POSIX file system or vice versa.

    Other providers can offer their own subclasses ofBasicFileAttributes. For instance, Apple might offerMacFileAttributes, and Microsoft (or third parties)NTFSFileAttributes. However, these additional attributes can't be so easily plugged into the system.

    I must say this is the piece of JSR-203 that strikes me as most questionable. File systems and file metadata are still evolving. The current system doesn't even support what's available today in Vista (Indexes, Archived, etc.) or Mac OS X Leopard (file type, creator type, etc.), much less what may be available in five years. I think we need a more flexible approach that does not presume it knows the names, types, or meaning of all possible file system metadata in advance. A generic key-value system would be a lot more palatable.

    Summing Up

    Copying files and checking permissions aren't the sexiest parts of a programmer's job. Indeed, they're among the most prosaic. Nonetheless, they are extremely important. The lack of a good way to do this has been a really critical omission in Java for years. Finally, Java 7 fills these basic holes.

    Add on top of that sexier new I/O features, such as watch lists, true asynchronous I/O, and virtual file systems, and Java 7 may finally have a modern foundation for input and output on which the next generation of clients, servers, and desktop apps can be built.