RyanAllaby wrote:I suspect you are right. So don't do anything which uses the default encoding. That includes the creation of Readers and Writers (use InputStreamReader and OutputStreamWriter as intermediaries) and the use of the getBytes() method for a start.
I suspect it is the default encoding on the computer the software is running on. If this is true, then how do I force the application to honor german umlauts?
//input is a String() Charset utf8charset = Charset.forName( "UTF-8" ); Charset iso88591charset = Charset.forName( "ISO-8859-1" ); ByteBuffer inputBuffer = ByteBuffer.wrap( input.getBytes() ); // decode UTF-8 CharBuffer data = utf8charset.decode( inputBuffer ); // encode ISO-8559-1 ByteBuffer outputBuffer = iso88591charset.encode( data ); byte outputData = outputBuffer.array(); return new String( outputData );
Here you convert the string to to a byte array using the default encoding. You say you've set the default to UTF-8, but how do you know it worked on the customer's machine? When we advise you not to rely on the default encoding, we don't mean you should override that system property, we mean you should always specify the encoding in your code. There's a getBytes() method that lets you do that.
ByteBuffer inputBuffer = ByteBuffer.wrap( input.getBytes() );
Now you decode the byte that you think is UTF-8, as UTF-8. If getBytes() did in fact encode the string as UTF-8, this is a wash; you just wasted a lot of time and ended up with the exact same string you started with. On the other hand, if getBytes() used something other than UTF-8, you've just created a load of garbage.
CharBuffer data = utf8charset.decode( inputBuffer );
Next you create yet another byte array, this time using the ISO-8859-1 encoding. If the string was valid to begin with, and the previous steps didn't corrupt it, there could be characters in it that can't be encoded in ISO-8859-1. Those characters will be lost.
ByteBuffer outputBuffer = iso88591charset.encode( data );
Finally, you decode the byte once more, this time using the default encoding. As with getBytes(), there's a String constructor that lets you specify the encoding, but it doesn't really matter. For the previous steps to have worked, the default had to be UTF-8. That means you have a byte that's encoded as ISO-8859-1 and you're decoding it as UTF-8. What's wrong with this picture?
byte outputData = outputBuffer.array(); return new String( outputData );
RyanAllaby wrote:Does this web service involve XML? If so then you shouldn't have to worry about "default" encoding -- whatever that might mean. The XML should declare its encoding, or if it doesn't, it should use UTF-8 or UTF-16. At any rate your XML parser should take care of determining the encoding.
i guess the real issue is determining the default encoding used by a web service i am consuming