9 Replies Latest reply: Oct 15, 2007 8:18 AM by 807592 RSS

    Reading a file in reverse....

    807587
      Hi,

      Does anyone know if there is a free "ReverseFileInputStream" or "ReverseFileReader" available anywhere?

      There was a previous thread (about 3 years ago) where someone had a similar issue.

      Basically I would like to read a file from the "bottom up", a line at a time, so having the same contract as BufferedReader would be good, but of course it reads the last line first and then works backwards a line at a time.

      It would just save me a lot of time if someone knows of one already available (then I don't have to mess about with RandomAccessFile)...

      Many thanks in advance,

      Gary

      P.S. Please bear in mind that the file I want to read is 130MB+ (probably towards 500MB) so I can't just buffer it and read it in reverse... I probably want the last 30MB or so.
        • 1. Re: Reading a file in reverse....
          807587
          What's a "line"?
          • 2. Re: Reading a file in reverse....
            807587
            http://www.emerle.net/comments/view.cfm/p/185
            • 3. Re: Reading a file in reverse....
              800382
              you could use RandomAccessFile, seek the end and walk backwards. But only one byte at a time.... Or you could write a class that uses RandomAccessFile that seeks to (end - some chars), read that chunk, reverse the array, then back up another chunk.
              • 4. Re: Reading a file in reverse....
                DrClap
                http://www.emerle.net/comments/view.cfm/p/185
                A little warning: the code posted there encodes bytes to chars by simply casting them. This should work correctly if the encoding of the file is ISO-8859-1. If it's something else, the non-ASCII chars will be mangled. As the author says, works fine for him.
                • 5. Re: Reading a file in reverse....
                  807587
                  Thanks guys... that's exactly what I'm looking for, sorry for the loonnnggg delay in responding, been busy...

                  G.
                  • 6. Re: Reading a file in reverse....
                    807587
                    To up allbody the code Reverse file..
                    /*******************************************************************
                     * Author:          Ryan D. Emerle
                     * Date:               10.12.2004
                     * Desc:               Reverse file reader.  Reads a file from the end to the
                     *                              beginning
                     *
                     * Known Issues:
                     *                              Does not support unicode!
                     *******************************************************************/
                     
                    package org.emerle.fileIO;
                    import java.io.*;
                    import java.util.*;
                    
                    public class ReverseFileReader {     
                              private String filename;     
                              private RandomAccessFile randomfile;     
                              private long position;
                              
                              public ReverseFileReader (String filename) throws Exception {          
                                   // Open up a random access file
                                   this.randomfile=new RandomAccessFile(filename,"r");
                                   // Set our seek position to the end of the file
                                   this.position=this.randomfile.length();
                                        
                                   // Seek to the end of the file
                                   this.randomfile.seek(this.position);
                                   //Move our pointer to the first valid position at the end of the file.
                                   String thisLine=this.randomfile.readLine();
                                   while(thisLine == null ) {
                                        this.position--;
                                        this.randomfile.seek(this.position);
                                        thisLine=this.randomfile.readLine();
                                        this.randomfile.seek(this.position);
                                   }
                              }     
                              
                              // Read one line from the current position towards the beginning
                              public String readLine() throws Exception {          
                                   int thisCode;
                                   char thisChar;
                                   String finalLine="";
                                   
                                   // If our position is less than zero already, we are at the beginning
                                   // with nothing to return.
                                   if ( this.position < 0 ) {
                                             return null;
                                   }
                                   
                                   for(;;) {
                                        // we've reached the beginning of the file
                                        if ( this.position < 0 ) {
                                             break;
                                        }
                                        // Seek to the current position
                                        this.randomfile.seek(this.position);
                                        
                                        // Read the data at this position
                                        thisCode=this.randomfile.readByte();
                                        thisChar=(char)thisCode;
                                        
                                        // If this is a line break or carrige return, stop looking
                                        if (thisCode == 13 || thisCode == 10 ) {
                                             // See if the previous character is also a line break character.
                                             // this accounts for crlf combinations
                                             this.randomfile.seek(this.position-1);
                                             int nextCode=this.randomfile.readByte();
                                             if ( (thisCode == 10 && nextCode == 13) || (thisCode == 13 && nextCode == 10) ) {
                                                  // If we found another linebreak character, ignore it
                                                  this.position=this.position-1;
                                             }
                                             // Move the pointer for the next readline
                                             this.position--;
                                             break;
                                        } else {
                                             // This is a valid character append to the string
                                             finalLine=thisChar + finalLine;
                                        }
                                        // Move to the next char
                                        this.position--;
                                   }
                                   // return the line
                                   return finalLine;
                              }     
                    }
                    • 7. Re: Reading a file in reverse....
                      807587
                      What's a "line"?
                      a line is a group of consecutive characters that ends with an "end of line" character...

                      by extension, I suppose one could consider a file with no 'end of line' at all as one line, so, we could rewrite our definition :

                      a line is a group of consecutive characters that ends with an "end of line" of "end of file" character...
                      • 8. Re: Reading a file in reverse....
                        807592
                        Hi

                        Problem:
                        Let there be a file containing thousands of lines. We want to read last N lines. Similar to functionality provided by "tail" command.

                        Constraints:
                        1 Minimize the number of file reads.
                        Avoid reading the complete file to get last few lines.
                        2 Minimize the JVM in-memory usage.
                        Avoid storing the complete file info in in-memory.

                        Approach:
                        Read a chunk of characters from end of file. One chunk should contain multiple lines. Reverse this chunk and extract the lines. Repeat this until you get required number of last N lines. In this way we read and store only the required part of the file.

                        Below is a utility program:
                        -------------------------------------------------------------------------------
                         
                        import java.util.*;
                        import java.io.RandomAccessFile;
                        
                        public class FileHelper {
                        
                             /**
                              * @param args
                              */
                             public static void main(String[] args) 
                             {
                                  String dir = "C:\\";
                                  String fileName = args[0];
                                  int lineCount = Integer.parseInt(args[1]);
                                  FileHelper.tail(dir + fileName, lineCount);          
                             }
                        
                             public static Vector tail(String fileName, int lineCount)
                             {
                                  return tail(fileName, lineCount, 2000);
                             }
                        
                             /**
                              * Given a byte array this method:
                              * a. creates a String out of it
                              * b. reverses the string
                              * c. extracts the lines
                              * d. characters in extracted line will be in reverse order, 
                              *    so it reverses the line just before storing in Vector.
                              *     
                              *  On extracting required numer of lines, this method returns TRUE, 
                              *  Else it returns FALSE.
                              *   
                              * @param bytearray
                              * @param lineCount
                              * @param lastNlines
                              * @return
                              */
                             private static boolean parseLinesFromLast(byte[] bytearray, int lineCount, Vector lastNlines)
                             {
                                  String lastNChars = new String (bytearray);
                                  StringBuffer sb = new StringBuffer(lastNChars);
                                  lastNChars = sb.reverse().toString();
                                  StringTokenizer tokens= new StringTokenizer(lastNChars,"\n");
                                  while(tokens.hasMoreTokens())
                                  {
                                       StringBuffer sbLine = new StringBuffer((String)tokens.nextToken());               
                                       lastNlines.add(sbLine.reverse().toString());
                                       if(lastNlines.size() == lineCount)
                                       {
                                            return true;//indicates we got 'lineCount' lines
                                       }
                                  }
                                  return false; //indicates didn't read 'lineCount' lines
                             }
                             
                             /**
                              * Reads last N lines from the given file. File reading is done in chunks.
                              * 
                              * Constraints:
                              * 1 Minimize the number of file reads -- Avoid reading the complete file
                              * to get last few lines.
                              * 2 Minimize the JVM in-memory usage -- Avoid storing the complete file 
                              * info in in-memory.
                              *
                              * Approach: Read a chunk of characters from end of file. One chunk should
                              * contain multiple lines. Reverse this chunk and extract the lines. 
                              * Repeat this until you get required number of last N lines. In this way 
                              * we read and store only the required part of the file.
                              * 
                              * 1 Create a RandomAccessFile.
                              * 2 Get the position of last character using (i.e length-1). Let this be curPos.
                              * 3 Move the cursor to fromPos = (curPos - chunkSize). Use seek().
                              * 4 If fromPos is less than or equal to ZERO then go to step-5. Else go to step-6
                              * 5 Read characters from beginning of file to curPos. Go to step-9.
                              * 6 Read 'chunksize' characters from fromPos.
                              * 7 Extract the lines. On reading required N lines go to step-9.
                              * 8 Repeat step 3 to 7 until 
                              *               a. N lines are read.
                              *          OR
                              *               b. All lines are read when num of lines in file is less than N. 
                              * Last line may be a incomplete, so discard it. Modify curPos appropriately.
                              * 9 Exit. Got N lines or less than that.
                              *
                              * @param fileName
                              * @param lineCount
                              * @param chunkSize
                              * @return
                              */
                             public static Vector tail(String fileName, int lineCount, int chunkSize)
                             {
                                  try
                                  {
                                       RandomAccessFile raf = new RandomAccessFile(fileName,"r");
                                       Vector lastNlines = new Vector();               
                                       int delta=0;
                                       long curPos = raf.length() - 1;
                                       long fromPos;
                                       byte[] bytearray;
                                       while(true)
                                       {                    
                                            fromPos = curPos - chunkSize;
                                            System.out.println(curPos);
                                            System.out.println(fromPos);                    
                                            if(fromPos <= 0)
                                            {
                                                 raf.seek(0);
                                                 bytearray = new byte[(int)curPos];
                                                 raf.readFully(bytearray);
                                                 parseLinesFromLast(bytearray, lineCount, lastNlines);
                                                 break;
                                            }
                                            else
                                            {                         
                                                 raf.seek(fromPos);
                                                 bytearray = new byte[chunkSize];
                                                 raf.readFully(bytearray);
                                                 if(parseLinesFromLast(bytearray, lineCount, lastNlines))
                                                 {
                                                      break;
                                                 }
                                                 delta = ((String)lastNlines.get(lastNlines.size()-1)).length();
                                                 lastNlines.remove(lastNlines.size()-1);
                                                 curPos = fromPos + delta;     
                                            }
                                       }
                                       Enumeration e = lastNlines.elements();
                                       while(e.hasMoreElements())
                                       {
                                            System.out.println(e.nextElement());
                                       }               
                                       return lastNlines;
                                  }
                                  catch(Exception e)
                                  {
                                       e.printStackTrace();
                                       return null;
                                  }
                             }     
                        }
                        -------------------------------------------------------------------------------

                        Best Regards,
                        Chary
                        • 9. Re: Reading a file in reverse....
                          807592
                          i have concocted the following:

                          //herman vierendeels belgium
                          //read file backward
                          //java be/dekamer/dbtools/reverseFileReader /tmp/xx
                          //
                          // reverseFileReader /tmp/x1 > /tmp/x2
                          // reverseFileReader /tmp/x2 > /tmp/x3
                          // diff x1 x3
                          // try and compare also with tac
                          package be.dekamer.dbtools;

                          import java.io.*;
                          import java.util.logging.*;

                          import hvr4.source.java.allerlei.H4Object;

                          public class reverseFileReader
                          {
                          private static Logger logger=Logger.getLogger(reverseFileReader.class.getName());

                          private RandomAccessFile randomfile;
                          private long position;
                          String charset="UTF-8";

                          public reverseFileReader(String filename)
                          throws Exception
                          {          
                          this.randomfile=new RandomAccessFile(filename,"r");
                          this.position=this.randomfile.length()-1;
                          //logger.info("position="+position);

                          char c=readValidChar();
                          if(c=='\n' || c=='\r')
                          {
                          long oldpos=position;     
                          c=readValidChar();
                          if(c!='\n' && c!='\r') position=oldpos;//13 10
                          }
                          else
                          {
                          throw(new RuntimeException("invalid char"));
                          }     
                          //logger.info("position="+position);
                          }//constructor
                          private char readValidChar()
                          throws java.io.IOException      
                          {
                          char rv;     
                          byte[] vs=new byte[2];
                          String vss=null;

                          randomfile.seek(position);
                          vs[1]=randomfile.readByte();
                          position--;

                          if(Character.isValidCodePoint(vs[1]))
                          {
                          vss=new String(vs,1,1,charset);
                          }
                          else
                          {
                          randomfile.seek(position);
                          vs[0]=randomfile.readByte();
                          position--;

                          vss=new String(vs,charset);

                          }
                          if(vss.length()!=1)
                          {
                          throw(new RuntimeException(position+" vss.length()="+vss.length()+" vss="+H4Object.string2hexstring(vss)+" vs="+H4Object.bytes2hexstring(vs)));
                          }
                          //byte[] utf8=vss.getBytes(charset);
                          rv=vss.charAt(0);
                          //logger.info("rv="+rv);
                          return(rv);
                          }//private char readValidChar()
                          public String readLine()
                          throws Exception
                          {          
                          char c;
                          String finalLine="";

                          if(position<0){return null;}

                          while(position>=0)

                          c=readValidChar();
                          if(c=='\n' || c=='\r')
                          {
                          c=readValidChar();
                          if(c!='\n' && c!='\r') position++;
                          break;
                          }
                          else
                          {
                          finalLine=c+finalLine;
                          }
                          }//while
                          return finalLine;
                          }//readLine
                          public void close()
                          throws java.io.IOException     
                          {
                          randomfile.close();
                          }     
                               
                                    
                          public static void main(String args[])
                          throws Exception
                          {
                          String line=null;

                          reverseFileReader rfr=new reverseFileReader(args[0]);

                          line=rfr.readLine();
                          while(line!=null)
                          {      
                          System.out.println(line);
                          line=rfr.readLine();
                          }
                          //
                          rfr.close();
                          }//main
                          }//public class reverseFileReader