This discussion is archived
13 Replies Latest reply: Aug 15, 2005 1:20 PM by 800387 RSS

Jdom no escaping characters!!!!

807597 Newbie
Currently Being Moderated
Hello all this may seem really simple, but this is a "new to java technology" forum.

Ok, I have a program that makes use of the JDOM api. In simple i take in a xml document with some unicode for example ’ in place of a '. The problem I am having with Jdom is it changes the ’ to the character representation of '. I would like the ’ to be seen as a string and not converted....Any suggestions
  • 1. Re: Jdom no escaping characters!!!!
    807597 Newbie
    Currently Being Moderated
    Hello all this may seem really simple, but this is a "new to java technology" forum.

    Ok, I have a program that makes use of the JDOM api. In simple i take in a xml document with some unicode for example "&8217" in place of a '. The problem I am having with Jdom is it changes the "&8217" to the character representation of '.....Any suggestions i have looked at the EscapeStrategy interface, but i am confused on how to use it
  • 2. Re: Jdom no escaping characters!!!!
    807597 Newbie
    Currently Being Moderated
    Ok iam getting desperate here...I know its hard to understand, but any sugesstions would help is there something i can do with EscapeStrategy?
  • 3. Re: Jdom no escaping characters!!!!
    DrClap Expert
    Currently Being Moderated
    The problem I am having with Jdom is it changes the "&8217" to the character representation of '
    It's the responsibility of JDOM, like any other XML parser, to change Unicode escapes to the actual characters they represent before passing them to your code. So if you're having a problem with that, you're having a problem with the definition of XML. You should consider not using XML in that case.

    As for this EscapeStrategy thing, is that part of the JDOM code? If so, you might get better answers from the JDOM mailing list.
  • 4. Re: Jdom no escaping characters!!!!
    800387 Newbie
    Currently Being Moderated
    Surround the relevant data with a CDATA section and see if that fixes it for you.

    - Saish
  • 5. Re: Jdom no escaping characters!!!!
    807597 Newbie
    Currently Being Moderated
    It's the responsibility of JDOM, like any other XML parser, to change Unicode escapes to >the actual characters they represent before passing them to your code.
    I know that, but there should be a way to turn off the escaping of characters. For example in my use, i am doing some editting to a xml file using jdom. simple things like changing element names, attributes, and editing PCDATA. after being outputted using the XMLOutputter, the file will be further editted by others on differnet platforms using different text editors or programs. This is where the problem occurs. for example my program will change the entity number "é" into is correct character ->"?". now with people using this xml file with the actual characters insted of the entity number this can cuase problems. one problem occured with someone using pagespinner on the make and they added some things to the xml file and the ? and other characters didnt reneder correctly.
    Another thing when working with other people is you want to have some sort of standard that people can follow. If the my program inserts the characters and people or using entity numbers along the line there will be some sort of confusion...

    That is why i asked about turning this (feature) off.
  • 6. Re: Jdom no escaping characters!!!!
    800387 Newbie
    Currently Being Moderated
    I repeat. Surround the relevant text with a CDATA element.

    - Saish
  • 7. Re: Jdom no escaping characters!!!!
    807597 Newbie
    Currently Being Moderated
    I repeat. Surround the relevant text with a CDATA element.
    thanks for the input but that would take a long time. b/c i am editting 100's of xml files, and one in particular of 50 mb. so i just wrote a program to go over the characters with the corresponding entity reference...
  • 8. Re: Jdom no escaping characters!!!!
    DrClap Expert
    Currently Being Moderated
    I know that, but there should be a way to turn off
    the escaping of characters. For example in my use, i
    am doing some editting to a xml file using jdom.
    simple things like changing element names,
    attributes, and editing PCDATA. after being outputted
    using the XMLOutputter, the file will be further
    editted by others on differnet platforms using
    different text editors or programs. This is where the
    problem occurs.
    Ah. You didn't mention that your problem was with JDOM's output. When you said "In simple i take in a xml document" it wasn't at all clear that "take in" also included writing out the document.
    for example my program will change
    the entity number "?" into is correct character
    ->"?". now with people using this xml file with the
    actual characters insted of the entity number this
    can cuase problems. one problem occured with someone
    using pagespinner on the make and they added some
    things to the xml file and the ? and other
    characters didnt reneder correctly.
    Another thing when working with other people is you
    ou want to have some sort of standard that people can
    follow. If the my program inserts the characters and
    people or using entity numbers along the line there
    will be some sort of confusion...

    That is why i asked about turning this (feature) off.
    Which "feature" are you asking about, then? Every single character can be represented as a Unicode escape; for example "A" can be represented as "A" if you want. But presumably you don't want that. You just have a list of characters you want escaped.

    I could be wrong, but my guess is that JDOM will output the Unicode escape form of a character if that character can't be represented in the encoding you chose for your output. My other guess is that you want all characters that aren't in US-ASCII to be Unicode escaped. If these two guesses are both correct then encoding your output in US-ASCII should do what you want.

    And let me remind you that no matter what standards you provide, manual editing of XML files is going to lead to some malformed documents.
  • 9. Re: Jdom no escaping characters!!!!
    807597 Newbie
    Currently Being Moderated
    Which "feature" are you asking about, then? Every single character >can be represented as a Unicode escape; for example "A" can be >represented as "A"
    if you want.
    I am asking about the feature in the XMLOutputter that translates the entity reference (i.e "é" to its actual character " ? "). I actually want the entity reference in the outputted xml not the character representation.
    I could be wrong, but my guess is that JDOM will output the Unicode >escape form of a character if that character can't be represented in the >encoding you chose for your output.
    yes this is true, but i would like jdom to not tocuh the entity references that i place in the xml.
    My other guess is that you want all characters that aren't in US-ASCII
    to be Unicode escaped. If these two guesses are both correct then >encoding your output in US-ASCII should do what you want.
    no i want all ISO 8859-1 Character Entities, and ISO 8859-1 Symbol Entities (ie " é") to remain as entity reference, and not the actual character (i.e "? ").
  • 10. Re: Jdom no escaping characters!!!!
    DrClap Expert
    Currently Being Moderated
    I am asking about the feature in the XMLOutputter
    that translates the entity reference (i.e "?"
    to its actual character " ? "). I actually want
    nt the entity reference in the outputted xml not the
    character representation.
    There is no such feature.

    Here's how it works:

    1. The parser reads the XML and translates it into an internal form (a "DOM"). This internal form contains the actual characters -- no entity references (as you call them) or Unicode escapes (as they are called). It does not keep track of whether a character came from a character or a Unicode escape or a DTD entity or anything else. Because according to the XML spec, that doesn't matter. They are all equivalent.

    2. The XMLOutputter serializes the DOM back to text. It has certain rules it follows, but since it doesn't know what the original form of a character was, it can't have a rule that says "use the original form". Besides, a character in the DOM could have been inserted by your program, it might not have come from the original document. There might not even be an original document.
    no i want all ISO 8859-1 Character Entities, and ISO
    8859-1 Symbol Entities (ie " ?") to remain as
    entity reference, and not the actual character (i.e
    "? ").
    I couldn't find a definition of "ISO 8859-1 character entity" anywhere. Except in documents talking about HTML, and they said that é was the entity for é. From your original post I don't think that is what you want. Did you try encoding your output in US-ASCII? Or if you have a specific feature request that you want JDOM to support, again I suggest the JDOM mailing list. This isn't the place for that.
  • 11. Re: Jdom no escaping characters!!!!
    807597 Newbie
    Currently Being Moderated
    I wasnt looking for a feature more of a hack to preform, where i can turn that part of the parser off.

    I just gave up any way and wrote a simple program that would translate everything back to its entity references... here are some of the ISO..entity references that i was referring to...

    http://www.w3schools.com/tags/ref_entities.asp

    -thanks for your input though
  • 12. Re: Jdom no escaping characters!!!!
    843789 Newbie
    Currently Being Moderated
    I found a way to do this if anyone is still interested.

    I did it like this:
    public void outputXML(String str) {
            FileWriter writer;
            Document doc = new Document(elementArr.get(0));
            try {
                writer = new FileWriter(str);
                Format format = org.jdom.output.Format.getPrettyFormat();
                format.setIndent("    ");
                XMLOutputter serializer = new XMLOutputter(format) {
                    @Override
                    public String escapeElementEntities(String str) {
                        return str;
                    }
                };
                serializer.output(doc, writer);
                writer.close();
            } catch (IOException ex) {
                System.out.println("Failed to write XML File from JDOMRecurserParser");
            }
        }
    This put out perfect ascii references for me, after I converted them in the elements I made to pass into this function. Hope this helps.
  • 13. Re: Jdom no escaping characters!!!!
    PhHein Guru Moderator
    Currently Being Moderated
    Please don't post to long dead threads. Locking.