This content has been marked as final. Show 3 replies
#1. I'm not sure Java will read UTF-8 Strings correctly unless you explicitly tell it to
byte stringData = ...;
String str1 = new String(byte); //don't know if this works. It might... I don't know
String str2 = new String(byte, "UTF-8");
#2. How can java "know" that extra bytes have been added to your string? When decoding a UTF-8 string, java will follow exactly the specification that defines UTF-8.
So to your questions:
#3 read in the string as byte data. This is the safest way to go. Then send that back to the sender and see if it is the same as what they sent.
Edited by: tjacobs01 on Dec 10, 2009 8:28 PM
Good question. I didn't know the answer so I had a look at the API documentation for the String(byte, charset) constructor. And it turns out to say:
The behavior of this constructor when the given bytes are not valid in the given charset is unspecified. The CharsetDecoder class should be used when more control over the decoding process is required.So there you go. The answer to (1) is "It depends". The answer to (2) is "No". The answer to (3) is "Use the java.nio.charset.CharsetDecoder class".