2 Replies Latest reply: May 25, 2011 4:04 PM by jschellSomeoneStoleMyAlias RSS

    Help decoding ByteArray from file

    857833
      Hi guys,

      I was wondering if anybody could help me.

      I have a file that I am trying to parse, it is produced by an external piece of software that I have no control over, and the company that creates the software aren't been very helpful with giving me the structure of the file.

      What I know about the file, is that it contains mulitple tiff images and accompanying some data for each tiff (this is the bit I am having trouble with, could be xml, not sure what encoding etc). These tiff images are easy enough to parse, as each tiff is just a bytearray appended to each other at end of file, can parse each tiff by finding tiff headers, etc.


      Does anybody have any ideas about how to parse the header of the file, or some tools, that may help me analyse the file??
      I have tried to see if the file was encoding using EBCDIC, but that didn't help.

      Any other suggestions welcome,

      Thanks,
      Mac
        • 1. Re: Help decoding ByteArray from file
          DrClap
          The simplest reverse-engineering tool is a hex editor. Look at the bytes in the file and figure out what they are for.

          Of course figuring out what they are for does require some knowledge of the application which uses them for, so you can't expect to find a tool which does that. It's pretty much up to the person doing the reverse-engineering to do the sort of thing you already did with the TIFF part of the file.
          • 2. Re: Help decoding ByteArray from file
            jschellSomeoneStoleMyAlias
            user9941149 wrote:
            ...could be xml,
            ...
            I have tried to see if the file was encoding using EBCDIC, but that didn't help.
            Err...no that isn't going to work.

            EBCDIC is a character set which has different encodings.
            XML is a structured data format. The data itself could be in different character sets.

            The two are not equivalent nor comparable.

            In terms of your file...
            1. Determine the type of data: binary, text or mixed.
            2. If text of any sort determine the character set encoding
            3. Determine what the values mean. This is actually tied up with 1 especially in terms of binary data.

            As noted from the previous reply.
            1. Get a hex editor.
            2. Learn all you can about what data might be or should be in the file.
            3. Get as many different examples of the file you can.
            4. Use 1 to open each file and then using 2 attempt to map to likely values.

            Step 4 doesn't provide any shortcuts. Experience is the only way to learn how to do it more effectively.

            Note as well that unless the source is IBM equipement it is unlikely to have a EBCDIC character set.
            However that doesn't mean that you can ignore the possibility that you need to learn more about character sets and encodings of the same.