Sweeping the File System with NIO-2 Blog

    {cs.r.title}



    Introduction

    JSR 203 (NIO-2), being implemented in the OpenJDK project, is shaping the future of I/O in the upcoming JDK 7. File I/O has been lingering around since JDK 1.0, but lacked many capabilities and is being overhauled in NIO-2 arriving in the next JDK release. This will have a groundbreaking impact on the way Java applications interact with the file system. NIO-2 provides enhancements to the file system API, asynchronous I/O API, and socket channel API.

    A good number of Java applications work closely with the file system. The historic file system management capabilities in the JDK are limited, and therefore even commonly performed file interactions can require a lot of custom coding on top of the provided API. For example, let's say that you need to poll files for changes. You'd have to write that yourself. Even some of the provided features have deficiencies: the rename and move operations are not guaranteed to be atomic. In the event of failure, the original file and the target file may both exist or the target file may be incompletely written to the disk. The applications that want to handle these scenarios are forced to resort to native code and thus lose the platform-independence benefits of Java. The new(er) NIO API (NIO-2) allows exploiting the native file-system capabilities for interacting with the file system in a clean way. This brings Java on par with other programming languages when dealing with file system.

    In this article, I will focus on how the new API brings a fresh perspective for accessing and manipulating the file system. The term NIO-2 will be used in this article to refer to the file-system enhancements provided by NIO-2, though the scope of NIO-2 is much wider than that. I have tried to provide code snippets frequently to give an idea of what it feels like to use the new API.

    Why another API for file handling?

    The current java.io.File API is a child left behind, while other parts of the Java API have grown robust. It provides minimal file management capabilities. Some of the areas where the current file API falls short are:
    1. Lack of information: The current API is terse when passing the information back to the application. For example, operations likerename() and delete() return atrue or a false to indicate success or failure. There is no information on what does a falsemean, why a file could not be deleted, etc. The application is not effectively told what went wrong.
    2. Performance: The current file API uses fine-grained method calls for seeking information about a file. Calls for file metadata such as "are you a file?", or "are you a directory, are you hidden, what was your last modified time?" are all made against the file system in a granular way. There is no way to go and get all the attributes in a single shot. This is inefficient. When asked for the entries under a directory, the current API returns a list or an array (the file.list() operation). This doesn't scale when accessing large directories, specifically when accessing a remote file system.
    3. Limited capability: The current file API has no support in multiple areas that are so frequently needed in applications.
      • There is no way to perform rename(),move(), and other file operations atomically. These could fail with source and target files co-existing. There is no way to create a file along with initial file permissions set up in an atomic operation. This allows for an attack window, between the period when a file is being created in one operation and the file permissions are modified in a subsequent operation.
      • When copying a file, the best option available until JDK 6 isfileChannel.transferFrom(srcChannel, 0, srcChannel.size()). This uses a byte channel as an intermediate for copying data. This copies the file data, but does not copy the attributes. Currently, there is no simple option for copying a file, along with its set of file attributes.
      • There is no support for fetching the metadata of a file or a file system, other than the basic attributes. For example, there is no direct way to know the file owner, and the group or user permissions.
      • There is no API level support for handling symbolic links, without writing the native code. If the application demands resolving and traversing through symbolic links, you have to write the algorithms that handle the copy, move, delete and rename operations involving symbolic links. The complexity of handling circular references with symbolic links is another task to deal with. There is a good chance that the application that needs to handle symbolic links will resort to native code that is not portable.
      • There is no support to probe the type of file content. If you want to guess the type of file content, you have to write your own algorithm to read the bytes and make a good guess on the content type.
      • There is no way to extend the file API to support other custom file systems like memory files, encrypted file-system files, and so on. In essence, if you want to write an application like TrueCrypt, you have no good support from existing file API.
      • There is no support for file notifications. The only way to know about a change in the file system is to poll it, or write a native call based on your operating system.

    The NIO-2 Prodigy

    The NIO-2 API includes features to correct the shortcomings of the current file-system API. There are certain elements of design used in the NIO-2 APIs that are worth mentioning.

    1. Informative exceptions: The methods that access the file system throw checked IOExceptions. There are specific subclasses of IOException that encapsulate detailed information about the files in context, the reason for failure, and the detailed message. This can assist the application to recover from specific errors.

    2. Method Chaining is used to simplify repeated object interactions on the same object. This produces compact code and alleviates the need for declaring unnecessary variables.

      DirectoryStream<Path> dirStream = Paths.getPath("conf_link") .createSymbolicLink(..., ...).resolve()
      .newDirectoryStream();

    3. Varargsare used in operations that accept options or flags as parameters. This allows passing the options as arrays, or as comma-separated values.

      Path path = Paths.getPath("/app/config");
      SeekableByteChannel channel = path.newByteChannel(
      StandardOpenOption.CREATE,
            
      StandardOpenOption.WRITE, StandardOpenOption.APPEND,
          
      StandardOpenOption .SYNC);
      channel.write(...);

    Class orchestration


    Packages:
    The NIO-2 APIs for file management reside under a new package java.nio.file, with two sub-packages:
    • java.nio.file.attribute: This is the container for classes that supports bulk access to file and file store attributes.
    • java.nio.file.spi: This is the service provider interface subpackage. It provides a contract for pluggable file system implementations and is a facility for creating your own file system provider implementations.

    The key players of NIO-2 file management API are exibited in the class diagram of Figure 1.

    NIO 2 File Management API Class Diagram
    Figure 1. Class diagram for NIO-2 file management API

    The primary classes from java.nio.file package:

    • FileStore:A file store is the underlying storage for files in a particular file system. It could be a storage pool, device, partition, volume, concrete file system, etc. The FileStore class represents the physical characteristics of the device, and lets you know about the type of volume, how much disk space is left, etc. It allows access to the metadata of file store.

      Path path = Paths.get ("/app");
      FileStore store = path.getFileStore();
      FileStoreSpaceAttributeView fileStoreAttribute = store.getFileStoreAttributeView(attributeName);
      long unallocated = fileStoreAttribute.readAttributes().unallocatedSpace();

    • FileSystems: This is a factory for file systems. It has operations to get a file system, given a URI. There are operations to construct new file systems.

    • FileSystem: A file system is usually a single hierarchy of files with one top-level root directory. In some cases it may have several different file hierarchies, each with its own top-level root directory. Further, a file system can span over multiple file stores that vary in features. Each file system is identified by a URI in the new API. The default file system is identified by URI file:///. The default file system creates objects that provide access to the file systems accessible to the JVM. The FileSystemclass provides access to the associatedFileStorePath instance, given a path string on the file system.

      FileSystem fileSystem = FileSystems.getDefault();
      Path path = fileSystem.getPath("/Users/Guest/Public");
      for (Path rootDirs : fileSystem.getRootDirectories()) {
        ...
      }
      for (FileStore fileStore : fileSystem.getFileStores()) {
              ...
      }


    • FileRef: AFileRef is the basic notation of reference to a file. A file is mostly located using the Path, but could be widely implemented using other means like the file identifier to locate a file. It has operations to open a file for reading or writing. These operations are symbolic link aware. The symbolic link related methods are constructed in a manner that allows you to specify the behavior when a symbolic link is encountered. AFileRef acts as a gateway to the associated metadata or file attributes, allowing bulk access to the file attributes. For example, you can access the traditional unix style file permission attributes.

      FileRef fileRef = Paths.get("/Users/Guest/run.sh");
      try {
            OutputStream os = fileRef.newOutputStream(OpenOption.WRITE,
              OpenOption.APPEND, OpenOption.DELETE_ON_CLOSE);
      ...
      } catch (IOException e) {
          ...
      }

      PosixFileAttributeView attributeView = fileRef.getFileAttributeView(
             PosixFileAttributeView.class, LinkOption.NOFOLLOW_LINKS);
      PosixFileAttributes attributes = attributeView.readAttributes();
      GroupPrincipal groupName = attributes.group();
      Set<PosixFilePermission> permissionSet = attributes.permissions();

    • Path: This is the central class that an application developer will encounter most.Path is the implementation of FileRefthat uses system path to locate and access a file. Its the NIO equivalent of java.io.File. A path is hierarchical and knows about its name. Path has two kinds of operations: those that deal with the methods to access components, combine paths, and those that deal with the file operations. All operations support symbolic link semantics. The operations that access the file system throw meaningful exceptions that have details of what went wrong. The below code snippets provide a peek into the file management capabilities with NIO-2.

      Path is an Iterable over the name elements of the entire path.


      Path path = Paths.get ("/application/apache-tomcat/conf/server.xml");
      for (Path pathElement : path) {
         //gets the path elements- application, apache-tomcat, conf, server.xml
            //in this order.
      }

      The normalize operation removes redundancies from the path.


      Path path = Paths.get(" /app/dir/../../tmp/vm");
      Path normalized = path.normalize(); // normalized to /tmp/vm

      The resolve operation resolves the given relative path against the current path.


      Path path = Paths.get( "/app/dir");
      Path resolved = path.resolve("../dirA/dirB") ; // resolved to /app/dirA/dirB

      The relativize operation constructs a path that originates from the original path and ends at a location path. It returns the relative path between two given paths.


      Path path = Paths.get ("/a/b/c");
      Path absolutePath =
      Paths.get (" /a/x");
      Path relativized = path.relativize(
      absolutePath ); // returns path ../../x

      There are a bunch of path comparison operations likestartsWith(), endsWith(), andisSameFile().

      Path has operations for copying, moving and deleting a file. The new API passes on the onus of implementing the operating system's native calls to the service provider. For example, a typical service provider would provide implementations for atomically moving/copying a file on Windows based file systems with MoveFileEx system call, and likewise with other file systems. The application leverages platform independence by programing against the API. The copy and link options in the code snippet below provide a sneak peek into the additional armor.


      /*
      * Copy a file with the attributes.
      */
      path.copyTo(targetPath, StandardCopyOption.REPLACE_EXISTING,
      StandardCopyOption.COPY_ATTRIBUTES);
      /*
      * Move a file atomically.
      */
      path.moveTo(targetPath, StandardCopyOption.ATOMIC_MOVE);

      /*
      * The exceptions thrown are specific to the cause of failure.
      */
      Path link = Paths.get ("/app/tool/lib") ; // a symbolic link
      try {
         link.delete(); // the link is deleted, and not the target of the link.
      } catch (DirectoryNotEmptyException e) {
              ...
      } catch (NoSuchFileException e) {
          ...
      } catch (IOException e) {
          ...
      }

      The checkAccess operation allows to check the existense of a file and to know if the JVM has appropriate access privileges to a file.

      FileRef fileRef = Paths.get("/Users/Guest/run.sh");
      try {
         fileRef.checkAccess(AccessMode.READ, AccessMode.EXECUTE);
      } catch (NoSuchFileException e) {
            ...
      } catch (AccessDeniedException e) {
        ...
      } catch (IOException e) {
          ...
      }
    • SeekableByteChannel: This is the NIO-2 equivalent of RandomAccessFile. It allows reading and writing bytes from a channel of variable length. The set of OpenOption flags that can be provided when creating the SeekableByteChannel provides a glimpse into the capabilites of new API (and actually the service provider): READ, WRITE, APPEND, TRUNCATE_EXISTING, CREATE, CREATE_NEW, DELETE_ON_CLOSE, SPARSE, SYSNC, DSYNC, and NOFOLLOW_LINKS. The SeekableByteChannel can be cast to FileChannel for advanced operations like file locking, memory mapped I/O, etc.

      Path path = ... ;
      SeekableByteChannel channel = path.newByteChannel(StandardOpenOption.WRITE,
         StandardOpenOption .APPEND);
      channel.write(...);

    • DirectoryStream: The provision for accessing directories is slightly different than injava.io.File (file.list()). ADirectoryStream is an Iterable over entries in a directory. The iterator is weakly consistent, and may or may not reflect updates to the directory while iterating. The iterator scales to larger directories, is less demanding on resources, and improves response time when accessing remote and network mounted file systems. It supports filtering by globs,regex, or by a custom filter (DirectoryStream.Filter), as you iterate over the directory.

      Path dir = Paths.get (" /dev/nfs1");
      DirectoryStream<Path> stream = dir.newDirectoryStream();
      try {
      for (Path entry : stream) {
      ...
      }
      } finally {
          stream.close();
      }

    • FileSystemException: This extends IOException for backward compatibility, and has useful methods for inspecting the cause of exception.FileSystemException is thrown by the operations that access the file system. It has specific child exceptions that are thrown by the API with information pertaining to the exception condition. All other exceptions in NIO-2 API are unchecked.

    • Utility classes: A couple of utility classes in NIO-2 contain static operations that are handy to use. Files has utility operations for files and directories. Paths has operations to get aPath from a stringized path or uri.


    The primary interfaces fromjava.nio.file.attribute package:

    Class diagram for NIO-2 Attributes API
    Figure 2. Class diagram for NIO-2 file management API

    Each of the AttributeView types provide a read-only or updatable view of attribute values associated with an object in the file system. The FileAttributeViewtypes are associated with file metadata, and theFileStoreAttributeView types are associated with the file store metadata.

    • BasicFileAttributeView:This provides bulk access to the basic set of file attributes through a single readAttributes() operation. The basic attributes comprise created/modified/access times, and information like whether the file is a symbolic link, or a directory, or a regular file. This view allows updating the value of time-based attributes.

    • DosFileAttributeView: This lets you access legacy DOS attributes to know if the file is hidden, read only, archive, or a system file.

    • PosixFileAttributeView:This provides a view into the file attributes that are common to the POSIX compliant file systems. It allows you to inspect the group and the user that have permissions on the file. The file permissions and the file owner can be inspected and updated.

    • FileStoreAttributeView:This provides a view into the file store attributes like the total, usable, and unallocated space in a file system.

    • Utility class:Attributes contain convenience operations that operate on or return file and file store attributes

      FileRef file = Paths.get (" /app/scripts/exec.sh");
      UserPrincipal principal = Attributes.getOwner(file)

      BasicFileAttributes basic = Attributes.readBasicFileAttributes(file,
      LinkOption. NOFOLLOW_LINKS);
      System.out.format("isFile:%s size:%s", basic.isRegularFile(), basic .size());

      FileStoreSpaceAttributes attrs = Attributes.readFileStoreSpaceAttributes(store);
      System.out.format("Total:%s Usable:%s", attrs.totalSpace(), attrs.usableSpace());


    File Notification and Watch Service API

    File notification refers to the system's ability to detect and signal changes to files and folders such as when Windows Explorer or Mac Finder magically detects changes in the currently open folder. Windows uses the ReadDirectoryChangesWsystem call for this, and most Linux flavors use theiNotify system call facility.

    The Watch Service API in NIO-2 allows applications to watch a file system for changes using native notifications from the underlying file-system. The service provider implementation transparently uses the native file event notification wherever available on a given file system. When not available, the implementation falls back to polling. WatchKey and theWatchService are thread safe, allowing for a thread pool to work in tandem. Thus multiple threads can use the sameWatchService instance for registering differentWatchable instances without worrying about concurrency issues. The notifications can be comsumed in another thread. The starring cast of the Watch Service API:

    1. WatchService: The handle to the watch service from the file system. The actual implementation is loaded using a service-provider loading facility.
    2. Watchable: The types that can be registered with the watch service. Path is aWatchable.
    3. WatchKey: The token of registration. It's an association class on the Watchable-WatchService association

    Watch Service API
    Figure 3. Class diagram of Watch Service API

    Modus Operandi:

    1. The application gets a handle to watch service from the file system.
    2. The application registers a Watchable with thisWatchService, mentioning the events of interest. During registration, a WatchKey is created and returned to the application as a token of registration. The initial state of this WatchKey is non-signalled, meaning that the event of interest is not yet experienced.
    3. When an event is detected, the state of theWatchKey changes from non-signaled to signaled. The watch service maintains a queue of signaled watch keys. The watcher thread does the following:
      • Polls the watch service queue to get the signaled watch keys.
      • Examines the signaled watch key for event type and attached objects
      • Consumes the event appropriately.
      • Resets the watch key, which effectively moves the watch key back to non-signaled.
      • While the watch key is in signaled state, the watch service continues to accumulate events for the watchable. The detection of events, preservation of event order, and timelines are specific to service provider implementation, and the system calls available for a given file system.


        /* Step One */
        WatchService watchService = FileSystems.getDefault().newWatchService();

        /* Step Two */
        Path path = FileSystems.getDefault().getPath("/app/config");
        WatchKey watchKey = path.register(watchService, StandardWatchEventKind.ENTRY_CREATE,
        StandardWatchEventKind.ENTRY_MODIFY);

        /*
        * Step Three: In a separate watcher thread.
        */
        WatchKey key = watchService.take();// poll or timed-poll can be used.

        List< WatchEvent > events = key.pollEvents();
        for (WatchEvent event : events) {
        if (event.kind() == StandardWatchEventKind.ENTRY_MODIFY) {
        // consume the event
        }
        }
        watchKey.reset();

    The WatchService API finds applicability in many scenarios. Some of them are: 

    1. Sensing the changes in configuration files and reloading the new configurations. 
    2. Hot deployment of jars and class files in application servers.
    3. Applications like a text editor, that has loaded a file content in memory. When there are changes made to the file by another application, the editor must be notified of changes to the underlying file, which then pops up a alert box to the user; or any applications that work closely with the file system.
    4. When two applications integrate a file system, and the changes done by one application to a file must be detected by another application.

    Provider interface

    The java.nio.file.spi.FileSystemProvider interface loads and deploys the file system provider implementations. It uses the service loader mechanism, and allows replacing the default file system provider. It also allows interposing on the default file system provider for injecting your own caching, access control, logging, etc.

    The provider interface is the point of extension for the applications that want a better control on the file system, or those that want to write their own special purpose file system like a distributed, fault-tolerant file system. It's a facility to build your own concrete implementation of the provider interface, like the default service provider's implementation, and deploy the same.

    For example, if you want to write a memory-based file system, where you allocate a chunk of memory for storing files as in a regular file system, you can extend the provider interface to do that.

    Interoperability

    The Java file management API has been around for more than a decade. The NIO-2 API works with the existing code, without the need for extensive re-writing the code. At the same time, the application has an emergency escape to NIO-2 wherever needed.

    • The Path object is bilingual, and works withInputStream/OutputStream as well as theByteChannel.

      Path path = Paths.get("/app/doc/readme.txt");

      InputStream inStream = path.newInputStream();
      OutputStream outStream = path.newOutputStream(StandardOpenOption.WRITE);

      SeekableByteChannel channel = path.newByteChannel(StandardOpenOption.WRITE);

    • The java.io.File class has been retrofitted with atoPath() method that returns the Pathobject for the file. This acts at a gateway to NIO-2. It's a touch point for migrating from the old file I/O to the NIO-2 file I/O.

      File sourceFile = ... ;
      File targetFile = ... ;

      Path source = sourceFile.toPath();
      Path target = targetFile.toPath();

      source.moveTo(target, StandardCopyOption.ATOMIC_MOVE);

    • The Scanner class has been updated with constructors that accepts a FileRef.

    Walk the Tree-Walk

    The class java.nio.file.Files has convenience methods for files and directories. We will discuss two of the methods that are available in this class.

    1. The tree walk: The walkFileTree() utility method implements the Visitorpattern, and provides an internal iterator. It allows you to perform an operation on each node within a file tree, rooted at a given starting file. It provides a traversal on the file system, and you can perform operations on the files, as you traverse. For an instance, this method can be used to copy or move a directory with all of its entries and metadata to another location.
      • The method walks the file tree in a depth-first manner. The tree traversal completes when all accessible files in the tree have been visited, a visitor returns a result ofFileVisitResult.TERMINATE or the visitor terminates due to an uncaught exception. When the file is a directory, the utility method opens it using the DirectoryStream, and continues.
      • The operations that you desire to perform on the visited files and directories must be implemented as a concreteFileVisitor. The FileVisitor interface has methods like visitFile() to perform operation on the file being visited, preVisitDirectory(), andpostVisitDirectory() to perform operations on the directories prior and post visit. There are additional operations for exception scenarios. These must be overridden to provide the desired behavior. Or just extend the SimpleFileVisitorto override one of these operations. SimpleFileVisitorhas the default implementations for FileVisitoroperations.
      • A FileVisitor also has a say in the traversal by returning a FileVisitResult (CONTINUE, TERMINATE, SKIP_SUBTREE, SKIP_SIBLINGS) that is used in traversal.
      • By default symbolic links are not followed during traversal. The FileVisitOption parameter can be optionally provided to indicate if you want to follow links. When following links you could end up in infinite loops (cyclic graphs) and stack overflows. But never mind, the API detects cycles in case you are following symbolic links.
      • You can optionally specify the maximum levels of the tree that you want to traverse.

        Path rootedAt = Paths.get("/private/var/tmp");
        EnumSet<FileVisitOption> options = EnumSet.of(FileVisitOption.DETECT_CYCLES);
        int maxDepth = 10;

        Files.walkFileTree(
        rootedAt , options, maxDepth , new SimpleDeletingVisitor ());

        static class SimpleDeletingVisitor extends SimpleFileVisitor<Path> {

          public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) {
            try {
               if (<<check on attributes>>) {
          file.delete();
              }
           } catch (IOException e) {
           ...
        }
           return FileVisitResult.CONTINUE;
            }

               public FileVisitResult preVisitDirectory(Path dir) {
                System.out.format("Visiting directory: %s%n", dir.getName());
               return FileVisitResult.CONTINUE;
            }
        }

    2. Know what you are dealing with: TheprobeContentType()utility method is used to probe the content type of a file. It uses the FileTypeDetectorimplementations to detect the content type. The service provider provides the FileTypeDetector implementations using the provider interface.

      FileRef file = Paths.get("/app/doc/license.pdf");
      String type = Files.probeContentType(file);
      System.out.format("%s\t%s%n", file, type);

    Summary

    All in all, the NIO-2 file API works more consistently across platforms, with operations that weren't supported in the earlier file management API. It supports bulk access to file attributes. Larger sets of advanced file attributes are accessible through API. The new file notification API can replace the less efficient and manually implemented mechanism of polling a file-system for changes. The new exceptions allow the application to handle and recover from exception scenarios gracefully, on a case by case basis. The service provider interface allows for cleanly developing and deploying custom file systems. And what more, the interoperability with existing code has been taken care.
    The NIO-2 API is still a work in progress, and will be released as a part of JDK 7. As the API gets into shape to handle the next generation file management needs, some of the interfaces and APIs may change from what we discussed above.

    Acknowledgements

    Thanks to Alan Bateman, the specification lead for JSR 203 and the implementation lead for NIO-2 at OpenJDK, for providing a thorough technical review of the article. Alan ensured that I wasn't out of sync with the developments on NIO-2.

    Thanks also to James Gould, Shahid Khan, and David Smith for providing a timely technical review and for helping me to modulate the pitch of the article.

    Resources

      
    http://today.java.net/im/a.gif