1 Reply Latest reply: Feb 17, 2012 3:29 PM by Charles Koester-Oracle RSS

    Fix for making BDB work on NFS storage

    918230
      I've found that when the __db.00* files are on NFS storage, they become corrupted after querying and updating the database. This happens even when accessing them from a single process running on a single computer, and is not related to parallel access either from a single or from multiple computers. Even a simple read-only query of the database corrupts the cache files.

      It is also not our intention to access the database in parallel. We just use NFS as our backend storage.

      Debugging learned me the following. BDB accesses its cache files by mmap()ing them into its address space, and when done it munmap()s these cache files. As far as I understand, not calling msync() before munmap() means there is no guarantee that the contents are properly flushed, not on NFS and also not on local storage. Most local storage does so, NFS on Linux apparently not (or not anymore). Just calling munmap() on a region simply makes the mapped pages unavailable to the process calling munmap(), but does not guarantee that the changed data is written back to storage.

      My fix was to simply add a call to msync() just before munmap() is called on the cache files.

      I know the BDB requirement that the underlying storage must have POSIX filesystem semantics (see [this BDB FAQ|http://www.oracle.com/technetwork/database/berkeleydb/db-faq-095848.html#CanBerkeleyDBuseNFSSANorotherremotesharednetworkfilesystemsfordatabasesandtheirenvironments]), and NFS is not fully compliant. However, the current code in BDB is also wrong by not calling msync() before munmap(). With the fix I propose BDB will satisfy the munmap() specification and work on NFS storage as well. Doing msync asynchronously, with the MS_ASYNC option specified, should not have a negative performance impact even when using on a local filesystem.

      Question: what's the procedure to file a bug report and to make sure it will be fixed in next releases of bdb?
        • 1. Re: Fix for making BDB work on NFS storage
          Charles Koester-Oracle
          Hi,

          Yes, failing to flush out the changes to the mapped NFS files during close is the problem here. Adding msync() is at least a reasonable fix. A bug report has been filed with the number 21176; that SR # will appear in the next BDB Release Notes.

          If you have further questions please include [#21176] in the subject line, and cc: support@sleepycat.com

          Charles Koester
          BDB Core Engineering