This discussion is archived
1 2 Previous Next 15 Replies Latest reply: Jun 14, 2008 1:30 PM by 807591 RSS

How to open a mixed data format file?

807591 Newbie
Currently Being Moderated
How can I open a file to read the content properly as below?
For example of HTTP response, the file would contain data as follows -

HTTP/1.1 200 OK
Cache-Control: no-cache
Connection: close
...
a
9
578
�WmS"9�~U��s
...
0

The first several lines are ASCII text, then follow with the compressed and chunked data. The first letter "a" (hex) means the first chunked compressed (by gzip) data has 10 bytes. The number 578 (hex) is the next chunked compressed data. The 0 is the end of file. How can I use java to read this kind of file? If I use FileInputStream, it reads only till the end of ASCII text (the rest will be garbage output). Can I dynamically change the FileInputStream to GZIPInputStream? I have tried it but didn't work. Any ideas? Thanks!
  • 1. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    Adelina_Clarks wrote:
    If I use FileInputStream, it reads only till the end of ASCII text (the rest will be garbage output).
    No, the rest will be compressed data. It sounds like you need to write a FilterInputStream to handle this format, but your description is not clear enough to give you more info at this point.

    You say there is compressed data and chunked data? What is the difference? Do you mean the compressed data is chunked? Are there multiple compressed streams, or one compressed stream broken into multiple chunks?

    You say the first leter "a" means the first chunk of compressed data has 10 bytes. Is it the first letter that counts, or the first line? Are lines separated by CR, LF, or CR/LF?

    If you can clearly define the problem, a solution will come to you easier.
  • 2. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    Yes, the compressed data is chunked. There are multiple compressed streams. Each hex number (a, 578) indicates how many bytes of compressed data will follow on the next new line. For example, the first chunked data is 10 bytes, and the second chiunked data is 1400 bytes.
    You say the first leter "a" means the first chunk of compressed data has 10 bytes. Is it the first letter that counts, or the first line?
    The partial file is shown below -

    Content-Encoding: gzip
    Vary: Accept-Encoding
    Transfer-Encoding: chunked

    a

    578
    �WmS"9�~U��s'� ........
    0

    So, the first letter "a" doesn't not count for that 10 bytes (10 bytes for "� ").
    Are lines separated by CR, LF, or CR/LF?
    Lines are separated by "\r\n".
    Please let me know anywhere need to be clarified. Thanks!
  • 3. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    Adelina_Clarks wrote:
    Yes, the compressed data is chunked. There are multiple compressed streams.
    So there are multiple compressed gzip streams, and each of these compressed streams is chunked? If so, how do you differentiate chunks belonging to one stream from chunks belonging to another stream?
    So, the first letter "a" doesn't not count for that 10 bytes (10 bytes for "‹ ").
    Doesn't not what?
  • 4. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    Yes, the compressed data is chunked. There are multiple compressed streams.
    So there are multiple compressed gzip streams, and each of these compressed streams is chunked? If so, how do you differentiate chunks
    belonging to one stream from chunks belonging to another stream?
    If you look at the file, you can see the hex "a" and "578". These two hex numbers didn't get compressed by gzip.
    The hex number is a separator.
    So, the first letter "a" doesn't not count for that 10 bytes (10 bytes for "� ").
    Doesn't not what?
    That is my typo! A new line follows after the letter "a" and has 10 bytes.
  • 5. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    Adelina_Clarks wrote:
    Yes, the compressed data is chunked. There are multiple compressed streams.
    So there are multiple compressed gzip streams, and each of these compressed streams is chunked? If so, how do you differentiate chunks
    belonging to one stream from chunks belonging to another stream?
    If you look at the file, you can see the hex "a" and "578". These two hex numbers didn't get compressed by gzip.
    The hex number is a separator.
    You're not understanding my question. Do you have multiple streams that are chunked, or one stream that is chunked? This is the difference between taking one file and splitting it into 3 pieces and taking 10 files and splitting each of them into three pieces.
  • 6. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    You're not understanding my question. Do you have multiple streams that are chunked, or one stream that is chunked?
    This is the difference between taking one file and splitting it into 3 pieces and taking 10 files and splitting each of them into three pieces.
    Oh, I see. It should be one file and split into several pieces (one stream that is chunked).
  • 7. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    It sounds like you need a stateful FilterInputStream. Roughly, this is what I think will be needed:

    1. A field indicating whether it is ready to read the chunk size or compressed data.
    2. A pair of fields indicating the current chunk size in bytes, and the number of bytes read for the current chunk, and an EOF flag.
    3. Your custom class will wrap the input stream you're reading from. The filtered output will be the compressed data. Your class can then be wrapped in a GZIPInputStream, which will in turn give you the decompressed bytes.

    A read should go something like this in pseudocode:
    while ((requestedBytesRemaining > 0) && !EOF)
    {
      switch (currentState)
      {
      READ_CHUNK_SIZE:
        readChunkSizeOrEOF();
        break;
      
      READ_CHUNK:
        readBytes(lesserOf(requestedBytesRemaining, bytesRemainingInCurrentChunk))
        break;
      }
    }
    if (bytesRead == 0)
      return -1;  // EOF
    return bytesRead;
    Reading and parsing the chunk size should be done manually (read byte by byte, looking for "\r\n", and Integer.parseInt(hexString, 16) on the result). A result of zero should set an EOF flag so it knows to stop trying. An EOF on the underlying stream should also set the EOF flag.

    EDIT: Reading the chunk size should change state to READ_CHUNK, and reaching the end of a chunk should change the state to READ_CHUNK_SIZE.

    EDIT: If it's not clear, the goal is for your filter class to read all the data, and use the hexadecimal chunk size lines to maintain its own state. These bytes are filtered out, and not passed back to whatever is calling read(). Only the compressed chunk data is passed back in the read(). You take care of reading the chunk sizes and assembling the chunks into a contiguous stream, which the wrapping GZIPInputStream can then decompress.

    Edited by: paul.miner on Jun 12, 2008 9:34 PM
  • 8. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    One other thing I thought of: a GZIPInputStream will not always read all the bytes written by a GZIPOutputStream; there may be a couple leftover bytes left unread at the end of the stream.

    Consequently, you should add a method to your filter that will read and discard the remainder of the chunks until the end of the stream (the "0") for situations where reading the entire stream is important.
  • 9. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.IOException;
    import java.util.zip.GZIPInputStream;

    public class Example {
    public static void main(String[] args) throws IOException{
              
    File mixedFile = new File("C:\\mixedFile.log");
    FileInputStream fin = new FileInputStream(mixedFile);
    String line = "";
    boolean CR = false;
    boolean LF = false;
    char charIs = '\0';
    //start reading all the ASCII text
    while(CR != true || LF != true){
    charIs = (char)fin.read();
    if(charIs == '\r')CR = true;
    if(charIs == '\n')LF = true;
    line = line + charIs;
    if(CR==true && LF==true){
    System.out.print(line);
    CR = false;
    LF = false;
    if(line.charAt(0) == '\r')break;
         line = "";
    }
    }
    //finished

    String chunkedNumber = "";
    int bytesOfChunk = 0;
    int startAt = 0;
    int readByte = 0;
    boolean done = false;
    byte [] buffer = new byte[2048]; //assume the file has not more than 2048 bytes
    //filtering out the hex number and join the chunked data
    while(!done){
    while(CR != true || LF != true){
    charIs = (char)fin.read();
    if(charIs == '\r')CR = true;
    if(charIs == '\n')LF = true;
         chunkedNumber = chunkedNumber + charIs;
    }
    CR = false;
    LF = false;
    chunkedNumber = chunkedNumber.substring(0, chunkedNumber.length()-2);
    if(!chunkedNumber.isEmpty()){
    bytesOfChunk = Integer.parseInt(chunkedNumber,16);
    fin.read(buffer, startAt, bytesOfChunk); //read chunked data
    startAt = startAt + bytesOfChunk;
    if (bytesOfChunk == 0)
    done = true;
    else
    chunkedNumber = "";
    }
    }
    //--> I stuck here. How can I use GZIPOutputStream to read the byte [] buffer?
    fin.close();
    }
    }
  • 10. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    You've completely missed the point. You need to write a FilterInputStream.

    And next time, use [code]...[/code] tags.
  • 11. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    Exception in thread "main" java.lang.NullPointerException
         at java.util.zip.InflaterInputStream.<init>(Unknown Source)
         at java.util.zip.GZIPInputStream.<init>(Unknown Source)
         at java.util.zip.GZIPInputStream.<init>(Unknown Source)
         at ReadCompressedDataStream.readConmpressedData(ReadCompressedDataStream.java:20)
         at Example.main(Example.java:47)

    ReadCompressedDataStream class
    import java.io.FileInputStream;
    import java.io.FilterInputStream;
    import java.io.IOException;
    import java.util.zip.GZIPInputStream;
    
    
    public class ReadCompressedDataStream extends FilterInputStream{
         private byte [] stream;
         private int block;
         private FileInputStream fin;
         
         ReadCompressedDataStream(FileInputStream in){
              super(in);
         }
         public void setBlockSize(int size){
              stream = new byte[size];
              block = size;
         }
         public void readConmpressedData() throws IOException{
              GZIPInputStream gis = new GZIPInputStream(fin);
              int count = 0;
              gis.read(stream, 0, block);
              while(count < block){
                   System.out.print(stream[count++]);
                   
              }
         }
    }
    Main program
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.IOException;
    
    public class Example {
         public static void main(String[] args) throws IOException{
              
                        File mixedFile = new File("C:\\mixedFile.log");
             FileInputStream fin = new FileInputStream(mixedFile);
             String line = "";
              boolean CR = false;
              boolean LF = false;
             char charIs = '\0';
              while(CR != true || LF != true){
                   charIs = (char)fin.read();
                   if(charIs == '\r')CR = true;
                   if(charIs == '\n')LF = true;
                   line = line + charIs;
                   if(CR==true && LF==true){
                        System.out.print(line);
                        CR = false;
                        LF = false;
                        if(line.charAt(0) == '\r')break;
                        line = "";
                        }
              }
              String chunkedNumber = "";
              int bytesOfChunk = 0;
              int startAt = 0;
              int readByte = 0;
              boolean done = false;
              byte [] buffer = new byte[2048];
              ReadCompressedDataStream rcd = new ReadCompressedDataStream(fin);
              while(!done){
                   while(CR != true || LF != true){
                        charIs = (char)fin.read();
                        if(charIs == '\r')CR = true;
                        if(charIs == '\n')LF = true;
                        chunkedNumber = chunkedNumber + charIs;
                   }
                   CR = false;
                   LF = false;
                   chunkedNumber = chunkedNumber.substring(0, chunkedNumber.length()-2);
                   if(!chunkedNumber.isEmpty()){
                        bytesOfChunk = Integer.parseInt(chunkedNumber,16);
                        rcd.setBlockSize(bytesOfChunk);
                        rcd.readConmpressedData();
                        if (bytesOfChunk == 0)
                             done = true;
                        else
                             chunkedNumber = "";
                   }
              }
    
              fin.close();
         }
    }
  • 12. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    I have not read your code in detail. However, it is structured wrong. All the stateful logic determining chunk sizes, reading chunks, etc, needs to be in your ReadCompressedDataStream class. In particular, you should be overriding the various read() methods.

    I don't think you're "getting it" yet. Your main program should simply consist of you wrapping a GZIPInputStream around a ReadCompressedDataStream around a FileInputStream, and then reading from the GZIPInputStream and writing to an output file.

    When the GZIPInputStream calls a read method on the underlying stream (ReadCompressedDataStream), that's when your code kicks in, filtering out the chunk size text and returning only the chunk data.
  • 13. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    To give you an idea of how a FilterInputStream should work, here's a sample class that ROT13s bytes between tildes (and removes the tildes):
    import java.io.*;
    
    public class TestFilter extends FilterInputStream
    {
      private boolean isEOF = false;
      private boolean isROT13 = false;
    
      // This filter 
      public TestFilter(InputStream in)
      {
        super(in);
      }
      
      public int read() throws IOException
      {
        while (!isEOF)
        {
          int n = in.read();
          if (n < 0)
          {
            isEOF = true;
            break;
          }
          if (n == '~')
          {
            isROT13 = !isROT13;
            // Discard character
            continue;
          }
          if (!isROT13)
            return n;
          if ((n >= 'A') && (n <= 'Z'))
            return "NOPQRSTUVWXYZABCDEFGHIJKLM".charAt(n - 'A');
          if ((n >= 'a') && (n <= 'z'))
            return "nopqrstuvwxyzabcdefghijklm".charAt(n - 'a');
          return n;
        }
        return -1;
      }
      
      public int read(byte[] b, int offset, int len) throws IOException
      {
        if (isEOF)
          return -1;
        int bytesRead = 0;
        while (bytesRead < len)
        {
          int n = read();
          if (n < 0)
            break;
          b[offset + bytesRead++] = (byte)n;
        }
        if (bytesRead == 0)
          return -1;
        return bytesRead;
      }
      
      public static void main(String[] argv) throws IOException
      {
        // Really shouldn't be using a byte stream for character data, but this is just a test.
        InputStream inpStr = new TestFilter(new ByteArrayInputStream("Rotate this: ~abcd~ Stopped rotating".getBytes()));
        ByteArrayOutputStream output = new ByteArrayOutputStream();
    
        int bytesRead = 0;
        byte[] buffer = new byte[4];
        while ((bytesRead = inpStr.read(buffer)) > 0)
          output.write(buffer, 0, bytesRead);
        System.out.println(new String(output.toByteArray()));
      }
      
    }
  • 14. Re: How to open a mixed data format file?
    807591 Newbie
    Currently Being Moderated
    Thanks for the example! However, I still don't get it.
    The program runs but doesn't decompress the buffer. Would you mind telling me how to fix it, please?
    import java.io.ByteArrayInputStream;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.IOException;
    import java.io.InputStream;
    import java.util.zip.GZIPInputStream;
    
    
    public class ReadMixedDataFormatFile {
         public static void main(String[] args) throws IOException {
    
              File mixedFile = new File("C:\\Message.log");
              FileInputStream fin = new FileInputStream(mixedFile);
              String line = "";
              boolean CR = false;
              boolean LF = false;
              char charIs = '\0';
              while(CR != true || LF != true){
                   charIs = (char)fin.read();
                   if(charIs == '\r')CR = true;
                   if(charIs == '\n')LF = true;
                   line = line + charIs;
                   if(CR==true && LF==true){
                        System.out.print(line);
                        CR = false;
                        LF = false;
                        if(line.charAt(0) == '\r')break;
                        line = "";
                   }
              }
              String chunkedNumber = "";
              int bytesOfChunk = 0;
              int startAt = 0;
              boolean done = false;
              byte [] buffer = new byte[2048];
    
              while(!done){
                   while(CR != true || LF != true){
                        charIs = (char)fin.read();
                        if(charIs == '\r')CR = true;
                        if(charIs == '\n')LF = true;
                        chunkedNumber = chunkedNumber + charIs;
                   }
                   CR = false;
                   LF = false;
                   chunkedNumber = chunkedNumber.substring(0, chunkedNumber.length()-2);
                   if(!chunkedNumber.isEmpty()){
                        bytesOfChunk = Integer.parseInt(chunkedNumber,16);
                        fin.read(buffer, startAt, bytesOfChunk);
                        startAt = startAt + bytesOfChunk;
                        if (bytesOfChunk == 0)
                             done = true;
                        else
                             chunkedNumber = "";
                   }
              }
              InputStream inputStream = new ReadCompressedDataStream(new GZIPInputStream(new ByteArrayInputStream(buffer)));
              inputStream.read(buffer);
              fin.close();
         }
    }
    import java.io.File;
    import java.io.FileOutputStream;
    import java.io.FilterInputStream;
    import java.io.IOException;
    import java.util.zip.GZIPInputStream;
    import java.util.zip.GZIPOutputStream;
    
    
    public class ReadCompressedDataStream extends FilterInputStream{
    
         ReadCompressedDataStream(GZIPInputStream gsin){
              super(gsin);
         }
         public int read(byte [] buf) throws IOException{
              GZIPOutputStream gos = new GZIPOutputStream(new FileOutputStream(new File("C:\\Message.txt")));
              gos.write(buf);
              gos.close();
              return 0;
         }
    }
1 2 Previous Next