6 Replies Latest reply: Oct 7, 2010 3:45 AM by 791266 RSS

    Advice on how to approach IO for large file

    843790
      Hi guys,

      I was looking for some general advice on which class to use for randomly accessing a large file in Java.

      My program reads from ASCII file which is approximately 25-30Mb in size. This file contains 'records' of variable length. I need to do an initial sequential pass of the data for one aspect of my program, but later in the program, I will need random access to records in the file.

      I have read some of the online help files for the NIO, but i don't see much about random access and large files. What I currently do is map the file to a MappedByteBuffer - I then iterate through the records, storing the locations where records begin. When I do random access, I just grab out the relevant record out of the buffer.

      I programmed this a while back, and I do recall trying to use the RandomAccessFile (not using NIO) and it was very slow. So the memory-mapping was a huge improvement - I am just wondering if there is any other way to make it faster though - clearly I have not explored all options, but I was just hoping for some leads from the java development community :)

      Thanks!
        • 1. Re: Advice on how to approach IO for large file
          EJP
          Don't do this. The I/O will kill you. Read it into a temporary database table and do your 'random access' from there.
          • 2. Re: Advice on how to approach IO for large file
            800381
            30 MB is not a big file.

            Does the file change while you're processing it? If not, just read it into memory the first time.
            • 3. Re: Advice on how to approach IO for large file
              800729
              Hi - this is the thread OP - sorry - the file size is actually in the vicinity of 200Mb - 300Mb (forseeable that it will get up to 600Mb), and the reason I am against loading in to memory is that I create a rather large data structure in memory after reading in the file, and the file records aren't used whilst the 'structure' is being created and used. The records are needed after this structure is disposed of though. The file is not modified during the running of the program (always the same length at run-time).

              Since I only use the records twice, is it worthwhile storing them in a database? I wanted to limit the number of external programs required to run my software, as its targeted towards the non-computer literate... So getting the users to install SQL may be a stretch.

              Should I stick with the memory-mapping approach? I have read that the memory-mapping doesn't work with files larger than some limit... (MAX INT?)... but not really sure about this.

              Thanks again for your helpful suggestions!

              Acacia
              • 4. Re: Advice on how to approach IO for large file
                791266
                SunForumsGuest wrote:
                Hi guys,

                I was looking for some general advice on which class to use for randomly accessing a large file in Java.

                My program reads from ASCII file which is approximately 25-30Mb in size. This file contains 'records' of variable length. I need to do an initial sequential pass of the data for one aspect of my program, but later in the program, I will need random access to records in the file.

                I have read some of the online help files for the NIO, but i don't see much about random access and large files. What I currently do is map the file to a MappedByteBuffer - I then iterate through the records, storing the locations where records begin. When I do random access, I just grab out the relevant record out of the buffer.

                I programmed this a while back, and I do recall trying to use the RandomAccessFile (not using NIO) and it was very slow. So the memory-mapping was a huge improvement - I am just wondering if there is any other way to make it faster though - clearly I have not explored all options, but I was just hoping for some leads from the java development community :)
                What you are doing sounds good. I wouldn't use a database if I only had to access the data twice, and then trhow it away.

                It does however sound odd that you found RandomAccessFile to be slow. Did you use multiple threads, or did you have any other disk activity when you were performing the random access? Random access should be fairly fast if you don't use multiple threads, and don't have much other disk activity. It does however get slower if your disk arm has to move a lot (not applicable if you are using solid state disks)
                • 5. Re: Advice on how to approach IO for large file
                  800729
                  Thanks for feedback Kaj :)

                  I can't recall why RandomAccessFile was so slow, but I was not accessing the file in a multi-threaded way at the time. I could always go back and try RandomAccessFile, but if the memory mapped version that I have implemented is acceptable (and scalable)then I might just stick with it.

                  Thanks again all!
                  • 6. Re: Advice on how to approach IO for large file
                    791266
                    Sounds good