5 Replies Latest reply: Jul 4, 2011 4:59 AM by BIJ001 RSS

    Error in reading Latin caron character(Č, č, ď, ě, ň, ř, ť, ů) in Java I18N

    869367
      Hi,
      While reading Czech characters, some of Latin characters with caron (Ex: Č, č, ď, ě, ň, ř, ť, ů), these characters displayed as '?', we tried to fix it through generate Unicode using native2ascii tool in jdk for these characters we can't generate the Unicode properly, please give us a quick solution/references/suggestions.



      String czech = "Č, č, ď, ě, ň, ř, ť, ů";
      try {
      System.out.println("UTF-8 czech: " + new String(czech.getBytes("UTF-8"), "UTF-8"));
      System.out.println("UTF-8 ascii: " + new String(asciiCzech.getBytes("8859_1"), "8859_1"));
      } catch (Exception e) { e.printStackTrace();
      }


      The generated unicode are: \u02d9\u0163,

      Note: We tried the native2ascii for Windows-1252, Cp1252, ISO-8859-1 and ISO-8859-2 encoding methods.

      Please give us some solution to read this type of caron characters in Java.
        • 1. Re: Error in reading Latin caron character(Č, č, ď, ě, ň, ř, ť, ů) in Java I18N
          857597
          Two things I'd check first are:
          1) Are you using the correct input encoding with native2ascii? If your .java file was saved using a different encoding than native2ascii uses to read it, then it might not work properly.
          2) Are you sure that whatever you're using to view the output of your program is able to display those characters, regardless of encoding?
          • 2. Re: Error in reading Latin caron character(Č, č, ď, ě, ň, ř, ť, ů) in Java I18N
            869367
            Hi,

            First thanks to your reply,

            Those mentioned charcters comes under Latin script, the encoding convertor for that is ISO-8859-1 and I used this encoding to generate the ascii using native2ascii.exe and the .java file encoding as ISO-8859-1, but still these characters are displayed as '?' with any encoding convertion.

            While running the same code in Java 1.6 it's working fine without encoding also, but I have problem on Java 1.5(we are using Java 1.5).
            • 3. Re: Error in reading Latin caron character(Č, č, ď, ě, ň, ř, ť, ů) in Java I18N
              802889
              The characters mentioned are not in ISO-8859-1/Windows-1252 (latin-1), they are in ISO-8859-2/Windows-1250 (latin-2).

              Make sure that
              1) Your input characterset is correct
              2) Your output characterset is correct
              and
              3) That java is actually capable of displaying your desired characters on the console.
              • 4. Re: Error in reading Latin caron character(Č, č, ď, ě, ň, ř, ť, ů) in Java I18N
                869367
                HI TheAvalanche,

                I encoded the property file with ISO-8859-2 converter

                The original is '+Č+, č, ď, ě, ň, ř, ť, ů'

                The generated unicode is '+*\u00c4\u008c*+, \u00c4\u008d, \u00c4\u008f, \u00c4\u009b, \u0139\u0088, \u0139\u0099, \u0139\u013d, \u0139\u017b'


                On looking the unicoded file for each single character the native to ascii generate two Unicode (highlighted italic).

                In the console it prints like this ÄŒ, č, ď, Ä›, Ĺˆ, Ĺ™, ĹĽ, ĹŻ.

                As mentioned in the pervious reply I set the java file encoding type to the ISO-8859-2 using Eclipse-> selected java file property-> Text file encoding.


                I used the following command to generate the Unicode file,
                native2ascii.exe -encoding ISO-8859-2 czech.properties cs_CZ.properties

                Let me know if I did any mistake on the natvie2ascii.exe


                I printed the encoded string as follows
                final String ascii =
                "\u00c4\u008c, \u00c4\u008d, \u00c4\u008f, \u00c4\u009b, \u0139\u0088, \u0139\u0099, \u0139\u013d, \u0139\u017b";


                System.out.println(new String(ascii.getBytes("ISO-8859-2"),
                "ISO-8859-2"));

                final OutputStreamWriter outStreamWriter =
                new OutputStreamWriter(System.out, "ISO-8859-2");

                final PrintWriter writer = new PrintWriter(outStreamWriter, true);


                writer.println(ascii);

                writer.flush();
                writer.close();

                Thanks in advance.
                • 5. Re: Error in reading Latin caron character(Č, č, ď, ě, ň, ř, ť, ů) in Java I18N
                  BIJ001
                  I encoded the property file with ISO-8859-2 converter
                  Property files have their own encoding: Latin-1 with escape sequences for other characters.