Hi,
I'm building applications that use both latin and cyrillic characters. One application is used to create the xml file, another is used to read the file and present it to the user.
First I created the xml files by hand, using character entities to insert cyrillic characters and latin characters with accents. When I loaded that file into my application everything worked fine. But this costs a lot of time, so I created an editor to create the files (I can insert cyrillic characters easily with a keyboard mapping).
So I used my editor application to create the files encoded as UTF-8. When I loaded that file into the other application, all cyrillic characters were replaced by question marks (I used the same font). The same happens when I view the file in a text editor.
Then I used the same editor application to create the files encoded as ISO-8859-1. Now the cyrillic characters show up fine but not the latin characters with accents, they show up as squares. When I view that file in a text editor, the cyrillic characters are replaced by character entites, but latin characters with accents are not replaced by entities (or by question marks).
I also tried the UTF-16 encoding, but then I get an exception:
org.xml.sax.SAXParseException: The encoding "UTF-16" is not supported.
How can I solve this problem?
Ideally all latin characters with accents should also be replaced by character entities as the cyrillic characters are when using ISO-8859-1.
Or should I change the SAX parser I use to load the file. Should I set the encoding for the SAX parser? If so, how? When I save the file with my editor application I set the encoding with
OutputFormat format = new OutputFormat(document,"ISO-8859-1",true);
. Does a similar method exist to change the encoding when parsing the file?
I use the Xerces parser.
Thank you,
Don