This discussion is archived
5 Replies Latest reply: Jul 4, 2011 2:59 AM by BIJ001 RSS

Error in reading Latin caron character(Č, č, ď, ě, ň, ř, ť, ů) in Java I18N

869367 Newbie
Currently Being Moderated
Hi,
While reading Czech characters, some of Latin characters with caron (Ex: Č, č, ď, ě, ň, ř, ť, ů), these characters displayed as '?', we tried to fix it through generate Unicode using native2ascii tool in jdk for these characters we can't generate the Unicode properly, please give us a quick solution/references/suggestions.



String czech = "Č, č, ď, ě, ň, ř, ť, ů";
try {
System.out.println("UTF-8 czech: " + new String(czech.getBytes("UTF-8"), "UTF-8"));
System.out.println("UTF-8 ascii: " + new String(asciiCzech.getBytes("8859_1"), "8859_1"));
} catch (Exception e) { e.printStackTrace();
}


The generated unicode are: \u02d9\u0163,

Note: We tried the native2ascii for Windows-1252, Cp1252, ISO-8859-1 and ISO-8859-2 encoding methods.

Please give us some solution to read this type of caron characters in Java.
  • 1. Re: Error in reading Latin caron character(Č, č, ď, ě, ň, ř, ť, ů) in Java I18N
    857597 Newbie
    Currently Being Moderated
    Two things I'd check first are:
    1) Are you using the correct input encoding with native2ascii? If your .java file was saved using a different encoding than native2ascii uses to read it, then it might not work properly.
    2) Are you sure that whatever you're using to view the output of your program is able to display those characters, regardless of encoding?
  • 2. Re: Error in reading Latin caron character(Č, č, ď, ě, ň, ř, ť, ů) in Java I18N
    869367 Newbie
    Currently Being Moderated
    Hi,

    First thanks to your reply,

    Those mentioned charcters comes under Latin script, the encoding convertor for that is ISO-8859-1 and I used this encoding to generate the ascii using native2ascii.exe and the .java file encoding as ISO-8859-1, but still these characters are displayed as '?' with any encoding convertion.

    While running the same code in Java 1.6 it's working fine without encoding also, but I have problem on Java 1.5(we are using Java 1.5).
  • 3. Re: Error in reading Latin caron character(Č, č, ď, ě, ň, ř, ť, ů) in Java I18N
    802889 Explorer
    Currently Being Moderated
    The characters mentioned are not in ISO-8859-1/Windows-1252 (latin-1), they are in ISO-8859-2/Windows-1250 (latin-2).

    Make sure that
    1) Your input characterset is correct
    2) Your output characterset is correct
    and
    3) That java is actually capable of displaying your desired characters on the console.
  • 4. Re: Error in reading Latin caron character(Č, č, ď, ě, ň, ř, ť, ů) in Java I18N
    869367 Newbie
    Currently Being Moderated
    HI TheAvalanche,

    I encoded the property file with ISO-8859-2 converter

    The original is '+Č+, č, ď, ě, ň, ř, ť, ů'

    The generated unicode is '+*\u00c4\u008c*+, \u00c4\u008d, \u00c4\u008f, \u00c4\u009b, \u0139\u0088, \u0139\u0099, \u0139\u013d, \u0139\u017b'


    On looking the unicoded file for each single character the native to ascii generate two Unicode (highlighted italic).

    In the console it prints like this ÄŒ, č, ď, Ä›, Ĺˆ, Ĺ™, ĹĽ, ĹŻ.

    As mentioned in the pervious reply I set the java file encoding type to the ISO-8859-2 using Eclipse-> selected java file property-> Text file encoding.


    I used the following command to generate the Unicode file,
    native2ascii.exe -encoding ISO-8859-2 czech.properties cs_CZ.properties

    Let me know if I did any mistake on the natvie2ascii.exe


    I printed the encoded string as follows
    final String ascii =
    "\u00c4\u008c, \u00c4\u008d, \u00c4\u008f, \u00c4\u009b, \u0139\u0088, \u0139\u0099, \u0139\u013d, \u0139\u017b";


    System.out.println(new String(ascii.getBytes("ISO-8859-2"),
    "ISO-8859-2"));

    final OutputStreamWriter outStreamWriter =
    new OutputStreamWriter(System.out, "ISO-8859-2");

    final PrintWriter writer = new PrintWriter(outStreamWriter, true);


    writer.println(ascii);

    writer.flush();
    writer.close();

    Thanks in advance.
  • 5. Re: Error in reading Latin caron character(Č, č, ď, ě, ň, ř, ť, ů) in Java I18N
    BIJ001 Explorer
    Currently Being Moderated
    I encoded the property file with ISO-8859-2 converter
    Property files have their own encoding: Latin-1 with escape sequences for other characters.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points