7 Replies Latest reply: Nov 20, 2012 9:14 PM by jtahlborn RSS

    MimeUtility.javaCharset usage

    973216
      Hi,
      I'm getting the Charset used in the "Content-Type:" header of an e-mail. e.g
      Content-Type: text/plain; charset="US-ASCII"
      once I've isolated the "US-ASCII", I use the MimeUtility.javaCharset method to get the "java" Charset name corresponding to the "mime" charset
      String javaPartCharsetName = MimeUtility.javaCharset(partCharsetName);
      and finally I get the Charset
      Charset partCharset = Charset.forName(javaPartCharsetName);
      It seems to work for most of the charsets but when I do it with "US-ASCII", the MimeUtility.javaCharset returns "ISO-8859-1" ?!

      "US-ASCI" is not ISO-8859-1 so why does this method returns it ?

      thanks,
      Tex
        • 1. Re: MimeUtility.javaCharset usage
          sabre150
          Tex-Twil wrote:
          "US-ASCI" is not ISO-8859-1 so why does this method returns it ?
          Since ASCII is a subset of ISO-8859-1, all bytes in the range 0x00 to 0x7f will convert correctly and there should be none outside of this range to convert wrongly. I'm not saying I like this flaw but it does not hurt and I could live with it. If you feel strongly about it then raise a bug report.
          • 2. Re: MimeUtility.javaCharset usage
            973216
            sabre150 wrote:
            Tex-Twil wrote:
            "US-ASCI" is not ISO-8859-1 so why does this method returns it ?
            Since ASCII is a subset of ISO-8859-1, all bytes in the range 0x00 to 0x7f will convert correctly and there should be none outside of this range to convert wrongly. I'm not saying I like this flaw but it does not hurt and I could live with it. If you feel strongly about it then raise a bug report.
            It works this way but not the other way round. A text with a "US-ASCII" charset cannot encode all of the "ISO-8859-1" characters.

            so:
            "US-ASCII" contains "ISO-8859-1"  is FALSE. 
            but
            MimeUtility.javaCharset("US-ASCII") contains "ISO-8859-1"  will be TRUE
            The second statement does not make sense for me.
            • 3. Re: MimeUtility.javaCharset usage
              jtahlborn
              Tex-Twil wrote:
              sabre150 wrote:
              Tex-Twil wrote:
              "US-ASCI" is not ISO-8859-1 so why does this method returns it ?
              Since ASCII is a subset of ISO-8859-1, all bytes in the range 0x00 to 0x7f will convert correctly and there should be none outside of this range to convert wrongly. I'm not saying I like this flaw but it does not hurt and I could live with it. If you feel strongly about it then raise a bug report.
              It works this way but not the other way round. A text with a "US-ASCII" charset cannot encode all of the "ISO-8859-1" characters.

              so:
              "US-ASCII" contains "ISO-8859-1"  is FALSE. 
              but
              MimeUtility.javaCharset("US-ASCII") contains "ISO-8859-1"  will be TRUE
              The second statement does not make sense for me.
              I'm not sure what exactly you are asking? sabre150's basic point was that if you get a mime part encoded using the "US-ASCII" charset, then you can successfully decode it using the "ISO-8859-1" charset. so, everything should "work" just fine.
              • 4. Re: MimeUtility.javaCharset usage
                973216
                jtahlborn wrote:

                I'm not sure what exactly you are asking? sabre150's basic point was that if you get a mime part encoded using the "US-ASCII" charset, then you can successfully decode it using the "ISO-8859-1" charset. so, everything should "work" just fine.
                The bottom line is the following. I have an e-mail body part and I need to determine dynamically if I can insert the text "inline" to the e-mail body part. For this, I check if the charset of the body part "contains" the charset of the text. If yes, I just add the text to the e-mail body part. If no, I add the text as another e-mail body part to the e-mail.

                So the pseudo algorithm is:
                String partCharsetName = ... // parse charset of the text body part (here "US-ASCII"
                
                // convert the mime charset name to java charset name. Here US-ASCII becomes "ISO-8859-1"
                String javaPartCharsetName = MimeUtility.javaCharset(partCharsetName);
                
                Charset textToAppendCharset = ... // the charset of the text I'm appending to the e-mail. Here "ISO-8859-1"
                Charset partCharset = Charset.forName(javaPartCharsetName);
                
                if(partCharset.contains(textToAppendCharset)) {
                    // ok, the text can be added "inline" to the e-mail
                } else {
                   // create a new text attachment 
                }
                In this situation, the IF condition is true, indicating that I can append a "ISO-8859-1" text to a "US-ASCII" mail body part ... . ISO-8859-1 characters, such as ü cannot be encoded using US-ASCII, right ?

                Where is my mistake? Shall I skip the MimeUtility.javaCharset call and use the charset name directly from the mime body part ?
                • 5. Re: MimeUtility.javaCharset usage
                  jtahlborn
                  ah, now i understand your problem. in this situation, it is definitely a problem. maybe you should start by trying to load the Charset directly first, and then fallback to MimeUtility.
                  • 6. Re: MimeUtility.javaCharset usage
                    973216
                    jtahlborn wrote:
                    ah, now i understand your problem. in this situation, it is definitely a problem. maybe you should start by trying to load the Charset directly first, and then fallback to MimeUtility.
                    ok, that would be a workaround.

                    I just suppose that MimeUtility.javaCharset falls back to a default JVM character encoding when it cannot find the java equivalent.

                    btw, what is exactly the difference between a mime charset and a java charset?
                    • 7. Re: MimeUtility.javaCharset usage
                      jtahlborn
                      Tex-Twil wrote:
                      I just suppose that MimeUtility.javaCharset falls back to a default JVM character encoding when it cannot find the java equivalent.
                      the "us-ascii" -> "ISO-8859-1" switch seems to be explicitly encoded into that class (no idea why).
                      btw, what is exactly the difference between a mime charset and a java charset?
                      from what i can understand of that method, it just translates mime specific charset names into java compatible charset names. so the difference isn't in the Charset, just in the name.