This content has been marked as final. Show 2 replies
vkat wrote:If your file has characters outside of 8859-1's range (0 - 255), then it isn't ISO-8859-1 encoded. You need to know what encoding was used to store the file. It sounds like you it actually may be Unicode text, in which case you need to know which encoding (UTF8, UTF16, etc) was used.
Since I am reading in as ISO-8859-1, this works up to unicode 255. For the rest of the characters, apparently I need a Latin Extended-A and Latin Extended-B character set. How can I get that installed on my Windows OS machine? I am using jdk 1.4.1 on Windows XP. Any help is appreciated.
I figured it out. I actually stored the input file as an unicode encoded file (using wordpad) and used UTF-16 while reading it in. Now I can read in the accurate unicode values and parse them correctly!