8 Replies Latest reply: Nov 9, 2011 11:27 AM by Bill Shannon-Oracle RSS

    Problem with Charset

    889641
      Hi All,

      we have done the application as send a mail in multiple languages.

      i am using IE 6.0. in default the browser encoding type is western European(ISO) [view->Encoding->Western European(ISO)].

      Now i send a mail in portuguese language. in subject i given like this EX: informação i can receive same word in subject

      but it has changed as EX:informação when i choose Unicode (UTF-8) in browser encoding format.

      in code im using like this

      +MimeMessage message = new MimeMessage(session);
      message.setSubject(mh.getSubject(),"UTF-8");+

      what could be the problem?

      Please guide me

      Thanks in advance
        • 1. Re: Problem with Charset
          Bill Shannon-Oracle
          Possibly the string returned by mh.getSubject() doesn't contain the correct Unicode characters.

          It's not clear how the subject in the mail message is getting to your browser to be displayed,
          but possibly there's some error somewhere in that path.

          Are you using some web mail application to display the message in the browser?
          • 2. Re: Problem with Charset
            802889
            886638 wrote:
            Hi All,

            we have done the application as send a mail in multiple languages.

            i am using IE 6.0. in default the browser encoding type is western European(ISO) [view->Encoding->Western European(ISO)].

            Now i send a mail in portuguese language. in subject i given like this EX: informação i can receive same word in subject

            but it has changed as EX:informação when i choose Unicode (UTF-8) in browser encoding format.
            Are you sure you don't have your charactersets mixed up? Text in UTF-8 which is displayed as your first example (informação), will be displayed as your second example (informação) when displayed using ISO-8859-1:
            ç = c3 a7 in UTF8 => c3 = Â and a7 = § in ISO-8859-1
            ã = c3 a3 in UTF8 => c3 = Â and a3 = £ in ISO-8859-1
            • 3. Re: Problem with Charset
              889641
              hi,

              i have found another thing,

              in Western European(ISO) encoding format, the words setting up into bean which what i have entered in subject text box

              but in Unicode (UTF-8) encoding format, the words encoded with UTF-8 format and setting up into bean class

              EX: informação which i typed in subject text box, the UTF-8 encoded format of this word is informação

              why it happened, i dont think how to i proceed further.

              Please guide me.......
              • 4. Re: Problem with Charset
                DrClap
                I see that sort of thing all the time in the browser, when the browser makes an incorrect assumption about the encoding of a page. Or when it's told the wrong encoding. It's quite possible that your webmail client (whose name you will not tell us) is doing something wrong -- it's extremely difficult to write web applications which work correctly with international scripts, especially if you didn't do it right when you first wrote the application ten years ago.

                So there's a good chance that there is nothing you can do to control the behaviour of the webmail client. What you can do is to find out if mail messages sent from sources other than JavaMail are treated any better by the webmail client. If they are, you could possibly follow up by sending messages which you consider to be successful to somewhere where you can examine their structure and try to imitate that structure.
                • 5. Re: Problem with Charset
                  889641
                  hi

                  then when i typed japenese words,

                  for example : 治疗动机 i received in java side as *#27835;#30103;#21160;#26426;* here i have removed '&'

                  when i choosed whaterver encoding type in browser as wester european and Unicode

                  what could be the reason?

                  Edited by: 886638 on Sep 26, 2011 6:18 AM
                  • 6. Re: Problem with Charset
                    802889
                    886638 wrote:
                    hi,

                    i have found another thing,

                    in Western European(ISO) encoding format, the words setting up into bean which what i have entered in subject text box

                    but in Unicode (UTF-8) encoding format, the words encoded with UTF-8 format and setting up into bean class

                    EX: informação which i typed in subject text box, the UTF-8 encoded format of this word is informação
                    No, informação is the UTF-8 encoded form displayed in ISO-8859-1. Your webbrowser, webapplication or intermediate processing is seriously messing around with character encoding. BTW: As far as I can see this has nothing to do with JavaMail.
                    • 7. Re: Problem with Charset
                      889641
                      1)     All the java files related to refer a colleague should be UTF-8 file format.
                      2)     The server level configuration files like web.xml and dispatcher servlet for struts should be UTF-8 document format.
                      3)     I added some piece of code for converting ISO-8859-1 to UTF-8. i mentioned below,

                      +public static String ISO8859-2Utf8(String _ISOString) {
                      if (_ISOString!= null)
                      try {
                      ISOString= new String(normal.getBytes("ISO-8859-1"), "UTF-8");
                      } catch (java.io.UnsupportedEncodingException e) {
                      System.err.println(e);
                      }
                      return _ISOString;
                      }+

                      Thanks for All
                      • 8. Re: Problem with Charset
                        Bill Shannon-Oracle
                        A Java String object contains Unicode characters. If your code actually makes a difference,
                        it means someone created the String incorrectly to begin with, e.g., by reading it from a
                        file without specifying the correct charset.