4 Replies Latest reply: Mar 23, 2010 4:58 AM by 807580 RSS

    Detecting File Type

    807580
      I need to detect file type or format such as whether a file is PDF/JPEG/DOC or what else. Is it possible to get the meta information from a file? PLease help me in this regard. Thanks everybody.
        • 1. Re: Detecting File Type
          796440
          The only way to determine for sure that a given file is, for example, a valid PDF file, is to try to read the entire file into a PDF parser.

          On Windows, the convention is that the file type is indicated by the file extension, but there are no guarantees. I can rename a Word doc to .txt, and it's still a Word doc, and I can rename a text file to .txt and it's still just a text file.

          On Linux, file types are typically determined by reading the first N bytes of a file, and if they meet certain patterns, assuming the file is a particular type, and if they're all printable, then assuming it's a text file, or something like that.

          You could google for an existing third party file type detection library, or, if you want to write your own, you can define your own rules, since there really is no standard, hard and fast way to do it.
          • 2. Re: Detecting File Type
            807580
            Thanks jverd. Ya i need to know from inside not from the extension. Ok I am googling. thanks.
            • 3. Re: Detecting File Type
              807580
              Shazzad wrote:
              Ya i need to know from inside not from the extension.
              Yup - as an example; wearing a dress doesn't make me a girl, no?
              • 4. Re: Detecting File Type
                807580
                There are tricky file types.

                There is one which is a piece of C code, so it is as textual as can be, but it still represents data: an image.