This discussion is archived
3 Replies Latest reply: May 22, 2008 7:53 AM by 807591 RSS

Getting unusual behavior using a BufferedInputStream on a 5 GB file

807591 Newbie
Currently Being Moderated
I have a utility that reads the first 1000 records of a file, the middle 1000 records of a file, and the last 1000 records of a file, to check for data integrity. It has worked like a champ for years on thousands of fairly large files, but has recently been behaving badly on files around 5 GB.

I use a BufferedInputStream with a 8192 buffer size to read the data.

I calculate the line number of the middle 1000 records. After processing the 1st 1000 records, I count the number of LF characters until I hit the beginning of the middle 1000 records. I have to process around 2.7 gigabytes before I find the byte I want

If the buffer has 8192 bytes in it, and I find the byte I want at position 6251, a subsequent read(_bigByteArr, 0, 8192) only returns 1941 bytes.

This is different behavior than with files < 4GB. On smaller files, the subsequent read(_bigByteArr, 0, 8192) returns the full 8192 bytes.

I'm thinking I'm hitting some internal limit (2147483647?). Has anyone else encountered this behavior, or have any ideas for work-arounds?
  • 1. Re: Getting unusual behavior using a BufferedInputStream on a 5 GB file
    796365 Newbie
    Currently Being Moderated
    It appears that somewhere the code is trying to create an array with more elements than INTEGER_MAX.
    Post the minimal code that exhibits the problem.
  • 2. Re: Getting unusual behavior using a BufferedInputStream on a 5 GB file
    807591 Newbie
    Currently Being Moderated
    This is probably a result of a fundamental misunderstanding on your part. Nothing in the Javadoc for InputStream.read(byte[] buffer, int offset, int length) says it will guarantee to read the full 'length' bytes. It returns the the number of bytes actually read. Your best bet is to wrap the BufferedInputStream in a DataInputStream and use the readFully() method which is guaranteed to read all the bytes (as long as EOF is not reached).

    P.S. This is a very very common misunderstand that often goes unpunished because the read() method usually does read all the bytes asked for.
  • 3. Re: Getting unusual behavior using a BufferedInputStream on a 5 GB file
    807591 Newbie
    Currently Being Moderated
    Thanks! I looked into using DataInputStream.readFully, but it would have been a little clumsy to implement, so I created a new read method that works:

         //*************************************************************************
         /**
         *     Wrapper around low-level read.
         *     this is needed because bi.read(bigByteArr,0,_maxBytes) does not
         *     guarantee that _maxBytes will be returned.
         */
         private int read() throws IOException
         {
              int bytesRead=0;
              int curByte=0;
              curByte = (int) _bi.read();
              if (curByte==-1) {
                   bytesRead=-1;
              }//if
              else {
                   bytesRead++;
                   _bigByteArr[0]= (byte) curByte;
                   for (int i = 1;i<_maxBytes;i++) {
                        curByte = (int) _bi.read();
                        if (curByte==-1) {
                             _bufOnLastChunk=true;
                             break;
                        }//if
                        _bigByteArr= (byte) curByte;
                        bytesRead++;
                   }//for
              }//else
              
              //this was probably already flagged, but if equal to _maxBytes, need to flag
              if (bytesRead==_numBytes) {
                   _bufOnLastChunk=true;
              }//if

              return bytesRead;
         }//read

    Thanks again for your response and advise!