This content has been marked as final. Show 3 replies
First of all Java strings are encoded in a modified version of UTF-8 that uses 2 bytes per character. That is the char datatype is equivalent to an unsigned short.
Second what exactly do you mean by double byte? Whether a character ends up encoded in two bytes or not depends on the encoding used (UTF-8, UTF-16 (both unicode), BIG5, GB2312 (both chinese), iso-8859-1(Latin-1), ASCII, etc...). This means that there are no "double byte characters" there are only "double byte characters when encoded in <your encoding>".
Where does your string come from? What encoding are you using to read the string in the first place? Are you sure you are creating the string using the right encoding?
You may also want to google for the file.encoding system property. It seems that the command line arguments are not passed correctly to your program, setting the file.encoding system property to the encoding in which your characters are might help. Examples of non-unicode encodings that contain japanese characters would be Shift JIS and ISO-2022.
npiguet wrote:Actually, they're encoded in UTF-16, but that's irrelevant. As you said, this looks like a problem with the encodings of the input and output.
First of all Java strings are encoded in a modified version of UTF-8 that uses 2 bytes per character.