This discussion is archived
10 Replies Latest reply: Oct 2, 2012 4:39 PM by EJP RSS

ByteMessage readUTF get truncated message

qjvictor Newbie
Currently Being Moderated
When trying to call ByteMessage.readUTF() to convert the byteMessage to String text, in most cases, it works perfect.
But we found if the message is bigger (for example, 500K), ByteMessage.readUTF() will truncate the message into bad format.

For example, in my test case, I expect a byteMessage with XML content, however, it gives me a malformed xml:
first line: <xml version="1.0" encoding="utf-8"?>
becomes" xml version="1.0" encoding="utf-8"?>, without the starting '<'
and the xml is truncated - the filesize seems to be limited as 16k.

this problem happens for MQ and Weblogic JMS, so I believe it is not 3rd party library or service provider specific, seems it is from JDK.

btw, I am using JDK 1.6+.

any idea how this could happen? any solution for this?

Edited by: qjvictor on Oct 1, 2012 1:12 PM

Edited by: qjvictor on Oct 1, 2012 1:12 PM
  • 1. Re: ByteMessage readUTF get truncated message
    jtahlborn Expert
    Currently Being Moderated
    if you read the javadoc for that method, you will see that it is not a general purpose read bytes as a String method, it is for reading a java specific string format (something written using writeUTF).

    if you want to read a byte message as a String, you will need to read all the bytes into a byte[] and convert that to a String uinsg the appropriate charset. however, considering that it appears your messages contain xml, you should not be converting them into Strings, but reading the into a byte[] and parsing them as xml using something like a ByteArrayInputStream.
  • 2. Re: ByteMessage readUTF get truncated message
    qjvictor Newbie
    Currently Being Moderated
    thanks for your reply.

    I fully understand that we could read all bytes into byte[] and convert to a String using appropriate charset, I am just wondering it might not as efficient as readUTF, and in my 99% case, readUTF works perfect since the message is small.
    BTW, in my example, xml is just an example, it could be a XML or plain text or Json string.

    Is there any docs saying this limitation - readUTF can't read the full text from bytesMessage? or what's the exact margin of size it can or can't handle?
  • 3. Re: ByteMessage readUTF get truncated message
    EJP Guru
    Currently Being Moderated
    I think jtahlborn may be confusing BytesMessage.readUTF() with DataInputStream.readUTF(), which reads a very specific format that is only written by DataOutputStream.writeUTF().

    However unless the BytesMessage was written with BytesMessage.writeUTF() it is unlikely to be readable correctly by BytesMessage.readUTF(), and I agree with him that you should be using the byte-oriented methods to transport XML, which has its own encoding scheme.
  • 4. Re: ByteMessage readUTF get truncated message
    jtahlborn Expert
    Currently Being Moderated
    EJP wrote:
    I think jtahlborn may be confusing BytesMessage.readUTF() with DataInputStream.readUTF(), which reads a very specific format that is only written by DataOutputStream.writeUTF().
    i admit this is a little nebulous, but if you look at the javadoc for "readUTF" is says "+modified+ UTF-8" (ala DataInputStream). i randomly grabbed the source for jboss messaging and their implementation of this method uses DataInputStream.readUTF. lastly, the first 2 chars of an xml doc (which are missing in the OP's example) are "<?", which is 15423 when converted to a short (~16k).
  • 5. Re: ByteMessage readUTF get truncated message
    qjvictor Newbie
    Currently Being Moderated
    Anybody could tell me the margin size of readUTF limit, because in my case, majority xmls are fine, only big one has the problem.
    I am considering a solution like 'less than the size, using readUTF, otherwise, readBytes'

    also if I am using readBytes, any performance loss?

    Thanks.
  • 6. Re: ByteMessage readUTF get truncated message
    jtahlborn Expert
    Currently Being Moderated
    qjvictor wrote:
    Anybody could tell me the margin size of readUTF limit, because in my case, majority xmls are fine, only big one has the problem.
    I am considering a solution like 'less than the size, using readUTF, otherwise, readBytes'

    also if I am using readBytes, any performance loss?
    first of all, i already indicated that you shouldn't be converting your xml data to a String! also, i'm not sure why you think readUTF would be faster than readBytes (considering the underlying data is bytes)? just read the bytes and parse as xml and skip the whole string thing (it will be faster to parse the xml from bytes than converting bytes to a String and then parsing as xml).
  • 7. Re: ByteMessage readUTF get truncated message
    qjvictor Newbie
    Currently Being Moderated
    As I said, xml is just an example, in my case, it could be XML, Plaintext, Json or other formats. I need this content inside byteMssage to be stored in database as varchar or clob.

    And if there is no performance diff comparing with readUTF and readBytes, I will definitely go to readBytes instead of readUTF.
  • 8. Re: ByteMessage readUTF get truncated message
    DrClap Expert
    Currently Being Moderated
    Performance?

    The readBytes method simply returns the bytes from the message. The readUTF method converts the bytes from the message into a String and returns that. So "obviously" the readBytes method is "better".

    On the other hand if you are going to subsequently convert those bytes to a String then you just threw away all of those performance gains.

    On the other other hand, as already pointed out, it's risky to convert the bytes containing an XML document to a String, because you should use the encoding declared by the document instead of hard-coding an encoding.

    And on the yet another hand, if your data is always text, then why did you choose to convert it to bytes before sending it? That just gets you into potential problems with encodings.
  • 9. Re: ByteMessage readUTF get truncated message
    qjvictor Newbie
    Currently Being Moderated
    Cool, I will use readBytes instead of readUTF.

    Thanks
  • 10. Re: ByteMessage readUTF get truncated message
    EJP Guru
    Currently Being Moderated
    And if there is no performance diff comparing with readUTF and readBytes, I will definitely go to readBytes instead of readUTF.
    It is meaningless to compare the performance of a working solution with the performance of a non-working solution.

    You can get the wrong answer in zero time.

    As jtahlborn has shown that the readUTF() format is the same as DataInputStream's, you've been doing it wrong all this time. You don't need any kind of performance justification to fix it. You must fix it.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points