This discussion is archived
7 Replies Latest reply: Nov 20, 2012 7:14 PM by jtahlborn RSS

MimeUtility.javaCharset usage

973216 Newbie
Currently Being Moderated
Hi,
I'm getting the Charset used in the "Content-Type:" header of an e-mail. e.g
Content-Type: text/plain; charset="US-ASCII"
once I've isolated the "US-ASCII", I use the MimeUtility.javaCharset method to get the "java" Charset name corresponding to the "mime" charset
String javaPartCharsetName = MimeUtility.javaCharset(partCharsetName);
and finally I get the Charset
Charset partCharset = Charset.forName(javaPartCharsetName);
It seems to work for most of the charsets but when I do it with "US-ASCII", the MimeUtility.javaCharset returns "ISO-8859-1" ?!

"US-ASCI" is not ISO-8859-1 so why does this method returns it ?

thanks,
Tex
  • 1. Re: MimeUtility.javaCharset usage
    sabre150 Expert
    Currently Being Moderated
    Tex-Twil wrote:
    "US-ASCI" is not ISO-8859-1 so why does this method returns it ?
    Since ASCII is a subset of ISO-8859-1, all bytes in the range 0x00 to 0x7f will convert correctly and there should be none outside of this range to convert wrongly. I'm not saying I like this flaw but it does not hurt and I could live with it. If you feel strongly about it then raise a bug report.
  • 2. Re: MimeUtility.javaCharset usage
    973216 Newbie
    Currently Being Moderated
    sabre150 wrote:
    Tex-Twil wrote:
    "US-ASCI" is not ISO-8859-1 so why does this method returns it ?
    Since ASCII is a subset of ISO-8859-1, all bytes in the range 0x00 to 0x7f will convert correctly and there should be none outside of this range to convert wrongly. I'm not saying I like this flaw but it does not hurt and I could live with it. If you feel strongly about it then raise a bug report.
    It works this way but not the other way round. A text with a "US-ASCII" charset cannot encode all of the "ISO-8859-1" characters.

    so:
    "US-ASCII" contains "ISO-8859-1"  is FALSE. 
    but
    MimeUtility.javaCharset("US-ASCII") contains "ISO-8859-1"  will be TRUE
    The second statement does not make sense for me.
  • 3. Re: MimeUtility.javaCharset usage
    jtahlborn Expert
    Currently Being Moderated
    Tex-Twil wrote:
    sabre150 wrote:
    Tex-Twil wrote:
    "US-ASCI" is not ISO-8859-1 so why does this method returns it ?
    Since ASCII is a subset of ISO-8859-1, all bytes in the range 0x00 to 0x7f will convert correctly and there should be none outside of this range to convert wrongly. I'm not saying I like this flaw but it does not hurt and I could live with it. If you feel strongly about it then raise a bug report.
    It works this way but not the other way round. A text with a "US-ASCII" charset cannot encode all of the "ISO-8859-1" characters.

    so:
    "US-ASCII" contains "ISO-8859-1"  is FALSE. 
    but
    MimeUtility.javaCharset("US-ASCII") contains "ISO-8859-1"  will be TRUE
    The second statement does not make sense for me.
    I'm not sure what exactly you are asking? sabre150's basic point was that if you get a mime part encoded using the "US-ASCII" charset, then you can successfully decode it using the "ISO-8859-1" charset. so, everything should "work" just fine.
  • 4. Re: MimeUtility.javaCharset usage
    973216 Newbie
    Currently Being Moderated
    jtahlborn wrote:

    I'm not sure what exactly you are asking? sabre150's basic point was that if you get a mime part encoded using the "US-ASCII" charset, then you can successfully decode it using the "ISO-8859-1" charset. so, everything should "work" just fine.
    The bottom line is the following. I have an e-mail body part and I need to determine dynamically if I can insert the text "inline" to the e-mail body part. For this, I check if the charset of the body part "contains" the charset of the text. If yes, I just add the text to the e-mail body part. If no, I add the text as another e-mail body part to the e-mail.

    So the pseudo algorithm is:
    String partCharsetName = ... // parse charset of the text body part (here "US-ASCII"
    
    // convert the mime charset name to java charset name. Here US-ASCII becomes "ISO-8859-1"
    String javaPartCharsetName = MimeUtility.javaCharset(partCharsetName);
    
    Charset textToAppendCharset = ... // the charset of the text I'm appending to the e-mail. Here "ISO-8859-1"
    Charset partCharset = Charset.forName(javaPartCharsetName);
    
    if(partCharset.contains(textToAppendCharset)) {
        // ok, the text can be added "inline" to the e-mail
    } else {
       // create a new text attachment 
    }
    In this situation, the IF condition is true, indicating that I can append a "ISO-8859-1" text to a "US-ASCII" mail body part ... . ISO-8859-1 characters, such as ü cannot be encoded using US-ASCII, right ?

    Where is my mistake? Shall I skip the MimeUtility.javaCharset call and use the charset name directly from the mime body part ?
  • 5. Re: MimeUtility.javaCharset usage
    jtahlborn Expert
    Currently Being Moderated
    ah, now i understand your problem. in this situation, it is definitely a problem. maybe you should start by trying to load the Charset directly first, and then fallback to MimeUtility.
  • 6. Re: MimeUtility.javaCharset usage
    973216 Newbie
    Currently Being Moderated
    jtahlborn wrote:
    ah, now i understand your problem. in this situation, it is definitely a problem. maybe you should start by trying to load the Charset directly first, and then fallback to MimeUtility.
    ok, that would be a workaround.

    I just suppose that MimeUtility.javaCharset falls back to a default JVM character encoding when it cannot find the java equivalent.

    btw, what is exactly the difference between a mime charset and a java charset?
  • 7. Re: MimeUtility.javaCharset usage
    jtahlborn Expert
    Currently Being Moderated
    Tex-Twil wrote:
    I just suppose that MimeUtility.javaCharset falls back to a default JVM character encoding when it cannot find the java equivalent.
    the "us-ascii" -> "ISO-8859-1" switch seems to be explicitly encoded into that class (no idea why).
    btw, what is exactly the difference between a mime charset and a java charset?
    from what i can understand of that method, it just translates mime specific charset names into java compatible charset names. so the difference isn't in the Charset, just in the name.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points