11 Replies Latest reply on Dec 31, 2008 9:57 AM by EJP

    Reading text file in ASCII or UTF-8 or UTF-16 or UTF-32?

    807589
      The following code will include the UTF-8 byte-order-mark (EF BB BF) in the first line from the source file:

      BufferedReader reader = new BufferedReader(new FileReader(sourceFile));
      String firstLine = reader.readLine();

      This isn't desirable. I don't want to get the UTF-8 BOM in the text contents that I get from the IO API.

      Now, I can do something like this:
      InputStreamReader reader = new InputStreamReader(new FileInputStream(sourceFile), "UTF8");

      However, that assumes that the program knows the encoding of the input file at design time. Unfortunately, my app takes files from the user who may supply files in UTF-8, UTF-16, ASCII, or some other text encoding. Doesn't Java have some sort of simple file reading API to auto-detect the specific text encoding, strip out any internal BOM type markings and return my a simple Java string of just the actual file contents?