This discussion is archived
7 Replies Latest reply: Dec 10, 2010 9:59 AM by 822176 RSS

Parsing Special Characters in XML

822176 Newbie
Currently Being Moderated
Hi,

I have an incoming XML snippet that has the special characters already escaped.

If the Title is something like kkkkk ?& [] ]]> XML comes back escaped for & and >

<title><kkkkk ? & [] ]]></title>

Now I would like parse this ESCAPED Title correctly that should result in

<![CDATA[kkkkk ?& [] ]]>]]>

BUT INSTEAD I just get

<![CDATA[>]]>

What is it that I am missing?

Thank you so much.
Meena
  • 1. Re: Parsing Special Characters in XML
    forumKid2 Explorer
    Currently Being Moderated
    BUT INSTEAD I just get

    <![CDATA[>]]>
    If you remove the special characters, then do you get a value here? Or you don't get a value as expected either? You might want to post a snippet of code showing the forum exactly what is happening.
  • 2. Re: Parsing Special Characters in XML
    Kayaman Guru
    Currently Being Moderated
    http://en.wikipedia.org/wiki/CDATA

    Read the part titled "Uses of CDATA sections" about having the end escape sequence inside a CDATA section.
  • 3. Re: Parsing Special Characters in XML
    822176 Newbie
    Currently Being Moderated
    When there are no special characters, the Title gets parsed correctly.

    <title> Hello </title> gets parsed correctly to Hello.
    <title> > & ]]> </title> comes in escaped and the parser just parses in >

    So I tried enclosing the content of the title Tag into CDATA section before the parsers starts parsing it, but that does not help either..

    I am using Xerces Parser. I am wondering if I have set some flags on the parser or override some function that handles special characters.

    Thank you for the help.
  • 4. Re: Parsing Special Characters in XML
    DrClap Expert
    Currently Being Moderated
    There aren't any "special characters" there. And there aren't any switches on XML parsers which allow you to parse something which isn't well-formed XML.

    First of all, if you want to have the ampersand character (&) or the greater-than character (>) in a text node in an XML document then you have to escape them properly. It's possible that you have, but that the forum software has unescaped them and you haven't bothered to point out that fact. Or it's possible that you haven't. So your first task is to get those characters escaped properly. This should be covered near the beginning of Chapter 1 of your XML book.

    Once you have taken care of that, let us know if your problem went away. And if you are going to post XML with escaped ampersands and so on, please make sure they stay escaped when you post them by escaping them again so when the forum software unescapes them, they still look escaped. (If you don't understand that then reread the section about escaping until you do.)
  • 5. Re: Parsing Special Characters in XML
    804650 Journeyer
    Currently Being Moderated
    Yes, I like Pina Coladas
    And getting caught in the rain
  • 6. Re: Parsing Special Characters in XML
    822176 Newbie
    Currently Being Moderated
    OK. let me restate the problem....

    I have an incoming XML snippet that has the special characters already escaped

    If the Title is something like kkkkk ?& [] ]]> the REST XML comes back escaped for & and > which looks like

    *<title>kkkkk ? &amp;amp; [] ]] &amp;gt; </title>*

    Now I would like to parse this ESCAPED Title correctly that should result in

    kkkkk ?& [] ]]>

    instead I just get

    *>* as the SAX parser reads the <title>

    Can you now point me to next chapter in the book that will address this ;)

    Edited by: user8018511 on Dec 10, 2010 9:55 AM

    Edited by: user8018511 on Dec 10, 2010 9:55 AM

    Edited by: user8018511 on Dec 10, 2010 9:56 AM

    Edited by: user8018511 on Dec 10, 2010 9:59 AM
  • 7. Re: Parsing Special Characters in XML
    DrClap Expert
    Currently Being Moderated
    user8018511 wrote:
    instead I just get
    I missed where you explained how you "get" that. Without seeing any code let me throw out the wild guess that you are failing to account for the possibility that the SAX parser can break a text node into multiple parts and call your characters() method once for each of the parts.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points