13 Replies Latest reply: Aug 15, 2005 3:20 PM by 800387 RSS

    Jdom no escaping characters!!!!

    807597
      Hello all this may seem really simple, but this is a "new to java technology" forum.

      Ok, I have a program that makes use of the JDOM api. In simple i take in a xml document with some unicode for example ’ in place of a '. The problem I am having with Jdom is it changes the ’ to the character representation of '. I would like the ’ to be seen as a string and not converted....Any suggestions
        • 1. Re: Jdom no escaping characters!!!!
          807597
          Hello all this may seem really simple, but this is a "new to java technology" forum.

          Ok, I have a program that makes use of the JDOM api. In simple i take in a xml document with some unicode for example "&8217" in place of a '. The problem I am having with Jdom is it changes the "&8217" to the character representation of '.....Any suggestions i have looked at the EscapeStrategy interface, but i am confused on how to use it
          • 2. Re: Jdom no escaping characters!!!!
            807597
            Ok iam getting desperate here...I know its hard to understand, but any sugesstions would help is there something i can do with EscapeStrategy?
            • 3. Re: Jdom no escaping characters!!!!
              DrClap
              The problem I am having with Jdom is it changes the "&8217" to the character representation of '
              It's the responsibility of JDOM, like any other XML parser, to change Unicode escapes to the actual characters they represent before passing them to your code. So if you're having a problem with that, you're having a problem with the definition of XML. You should consider not using XML in that case.

              As for this EscapeStrategy thing, is that part of the JDOM code? If so, you might get better answers from the JDOM mailing list.
              • 4. Re: Jdom no escaping characters!!!!
                800387
                Surround the relevant data with a CDATA section and see if that fixes it for you.

                - Saish
                • 5. Re: Jdom no escaping characters!!!!
                  807597
                  It's the responsibility of JDOM, like any other XML parser, to change Unicode escapes to >the actual characters they represent before passing them to your code.
                  I know that, but there should be a way to turn off the escaping of characters. For example in my use, i am doing some editting to a xml file using jdom. simple things like changing element names, attributes, and editing PCDATA. after being outputted using the XMLOutputter, the file will be further editted by others on differnet platforms using different text editors or programs. This is where the problem occurs. for example my program will change the entity number "é" into is correct character ->"?". now with people using this xml file with the actual characters insted of the entity number this can cuase problems. one problem occured with someone using pagespinner on the make and they added some things to the xml file and the ? and other characters didnt reneder correctly.
                  Another thing when working with other people is you want to have some sort of standard that people can follow. If the my program inserts the characters and people or using entity numbers along the line there will be some sort of confusion...

                  That is why i asked about turning this (feature) off.
                  • 6. Re: Jdom no escaping characters!!!!
                    800387
                    I repeat. Surround the relevant text with a CDATA element.

                    - Saish
                    • 7. Re: Jdom no escaping characters!!!!
                      807597
                      I repeat. Surround the relevant text with a CDATA element.
                      thanks for the input but that would take a long time. b/c i am editting 100's of xml files, and one in particular of 50 mb. so i just wrote a program to go over the characters with the corresponding entity reference...
                      • 8. Re: Jdom no escaping characters!!!!
                        DrClap
                        I know that, but there should be a way to turn off
                        the escaping of characters. For example in my use, i
                        am doing some editting to a xml file using jdom.
                        simple things like changing element names,
                        attributes, and editing PCDATA. after being outputted
                        using the XMLOutputter, the file will be further
                        editted by others on differnet platforms using
                        different text editors or programs. This is where the
                        problem occurs.
                        Ah. You didn't mention that your problem was with JDOM's output. When you said "In simple i take in a xml document" it wasn't at all clear that "take in" also included writing out the document.
                        for example my program will change
                        the entity number "?" into is correct character
                        ->"?". now with people using this xml file with the
                        actual characters insted of the entity number this
                        can cuase problems. one problem occured with someone
                        using pagespinner on the make and they added some
                        things to the xml file and the ? and other
                        characters didnt reneder correctly.
                        Another thing when working with other people is you
                        ou want to have some sort of standard that people can
                        follow. If the my program inserts the characters and
                        people or using entity numbers along the line there
                        will be some sort of confusion...

                        That is why i asked about turning this (feature) off.
                        Which "feature" are you asking about, then? Every single character can be represented as a Unicode escape; for example "A" can be represented as "A" if you want. But presumably you don't want that. You just have a list of characters you want escaped.

                        I could be wrong, but my guess is that JDOM will output the Unicode escape form of a character if that character can't be represented in the encoding you chose for your output. My other guess is that you want all characters that aren't in US-ASCII to be Unicode escaped. If these two guesses are both correct then encoding your output in US-ASCII should do what you want.

                        And let me remind you that no matter what standards you provide, manual editing of XML files is going to lead to some malformed documents.
                        • 9. Re: Jdom no escaping characters!!!!
                          807597
                          Which "feature" are you asking about, then? Every single character >can be represented as a Unicode escape; for example "A" can be >represented as "A"
                          if you want.
                          I am asking about the feature in the XMLOutputter that translates the entity reference (i.e "é" to its actual character " ? "). I actually want the entity reference in the outputted xml not the character representation.
                          I could be wrong, but my guess is that JDOM will output the Unicode >escape form of a character if that character can't be represented in the >encoding you chose for your output.
                          yes this is true, but i would like jdom to not tocuh the entity references that i place in the xml.
                          My other guess is that you want all characters that aren't in US-ASCII
                          to be Unicode escaped. If these two guesses are both correct then >encoding your output in US-ASCII should do what you want.
                          no i want all ISO 8859-1 Character Entities, and ISO 8859-1 Symbol Entities (ie " é") to remain as entity reference, and not the actual character (i.e "? ").
                          • 10. Re: Jdom no escaping characters!!!!
                            DrClap
                            I am asking about the feature in the XMLOutputter
                            that translates the entity reference (i.e "?"
                            to its actual character " ? "). I actually want
                            nt the entity reference in the outputted xml not the
                            character representation.
                            There is no such feature.

                            Here's how it works:

                            1. The parser reads the XML and translates it into an internal form (a "DOM"). This internal form contains the actual characters -- no entity references (as you call them) or Unicode escapes (as they are called). It does not keep track of whether a character came from a character or a Unicode escape or a DTD entity or anything else. Because according to the XML spec, that doesn't matter. They are all equivalent.

                            2. The XMLOutputter serializes the DOM back to text. It has certain rules it follows, but since it doesn't know what the original form of a character was, it can't have a rule that says "use the original form". Besides, a character in the DOM could have been inserted by your program, it might not have come from the original document. There might not even be an original document.
                            no i want all ISO 8859-1 Character Entities, and ISO
                            8859-1 Symbol Entities (ie " ?") to remain as
                            entity reference, and not the actual character (i.e
                            "? ").
                            I couldn't find a definition of "ISO 8859-1 character entity" anywhere. Except in documents talking about HTML, and they said that é was the entity for é. From your original post I don't think that is what you want. Did you try encoding your output in US-ASCII? Or if you have a specific feature request that you want JDOM to support, again I suggest the JDOM mailing list. This isn't the place for that.
                            • 11. Re: Jdom no escaping characters!!!!
                              807597
                              I wasnt looking for a feature more of a hack to preform, where i can turn that part of the parser off.

                              I just gave up any way and wrote a simple program that would translate everything back to its entity references... here are some of the ISO..entity references that i was referring to...

                              http://www.w3schools.com/tags/ref_entities.asp

                              -thanks for your input though
                              • 12. Re: Jdom no escaping characters!!!!
                                843789
                                I found a way to do this if anyone is still interested.

                                I did it like this:
                                public void outputXML(String str) {
                                        FileWriter writer;
                                        Document doc = new Document(elementArr.get(0));
                                        try {
                                            writer = new FileWriter(str);
                                            Format format = org.jdom.output.Format.getPrettyFormat();
                                            format.setIndent("    ");
                                            XMLOutputter serializer = new XMLOutputter(format) {
                                                @Override
                                                public String escapeElementEntities(String str) {
                                                    return str;
                                                }
                                            };
                                            serializer.output(doc, writer);
                                            writer.close();
                                        } catch (IOException ex) {
                                            System.out.println("Failed to write XML File from JDOMRecurserParser");
                                        }
                                    }
                                This put out perfect ascii references for me, after I converted them in the elements I made to pass into this function. Hope this helps.
                                • 13. Re: Jdom no escaping characters!!!!
                                  PhHein
                                  Please don't post to long dead threads. Locking.