Forum Stats

  • 3,824,755 Users
  • 2,260,414 Discussions
  • 7,896,306 Comments

Discussions

Extract escaped XML from SOAP response

partlycloudy
partlycloudy Member Posts: 8,160 Silver Trophy
edited Sep 3, 2020 2:50PM in XML DB

Oracle 12.1

I have a SOAP web service that returns a escaped XML document in a SOAP response envelope as follows

<?xml version="1.0" encoding="UTF-8"?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">  <soapenv:Body>    <ns:Response xmlns:ns="http://domain.com">      <ns:return>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;&lt;Results &gt;&lt;/Results&gt;</ns:return>    </ns:Response>  </soapenv:Body></soapenv:Envelope>

How can I get the inner XML payload un-escaped and in an XMLTYPE column/variable for further processing?

Thanks

cormaco

Best Answer

  • cormaco
    cormaco Member Posts: 1,939 Silver Crown
    edited Sep 3, 2020 2:03PM Answer ✓

    This is a mess, agreed.

    Here are several cases where < is encoded as &amp;lt; instead of &lt; and > as &amp;gt; instead of &gt;

    When all encodings are fixed the xml can be parsed analogous to my first example:

    with example(soap) as (  select xmltype(  '<?xml version="1.0" encoding="UTF-8"?>      <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">        <soapenv:Body>          <ns:filterDocResponse xmlns:ns="http://domain.com">            <ns:return>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;      &lt;Results ClientID=&quot;1234&quot; Id=&quot;&quot; Total=&quot;33&quot; &gt;      &lt;Match MatchType=&quot;XYZ&quot; &gt;      &lt;Entity Id=&quot;13527&quot; &gt;&lt;entity id=&quot;13527&quot; &gt;         &lt;name&gt;JOHN DOE&lt;/name&gt;      &lt;listId&gt;0&lt;/listId&gt;         &lt;/entity&gt;&lt;/Entity&gt;      &lt;/Match&gt;      &lt;/Results&gt;      </ns:return>          </ns:filterDocResponse>        </soapenv:Body>      </soapenv:Envelope>  ') from dual  )  select xmltype(response)   from example, xmltable (      xmlnamespaces(          'http://schemas.xmlsoap.org/soap/envelope/' as "soapenv",          'http://domain.com' as "ns"      ),      '/soapenv:Envelope/soapenv:Body/ns:filterDocResponse'      passing soap      columns          response clob path 'ns:return'  )

    Output:

    <?xml version="1.0" encoding="UTF-8"?>      <Results ClientID="1234" Id="" Total="33" >      <Match MatchType="XYZ" >      <Entity Id="13527" ><entity id="13527" >         <name>JOHN DOE</name>      <listId>0</listId>         </entity></Entity>      </Match>      </Results>      

Answers

  • cormaco
    cormaco Member Posts: 1,939 Silver Crown
    edited Sep 3, 2020 3:51AM

    Here is one way:

    set define #with example(soap) as (select xmltype('<?xml version="1.0" encoding="UTF-8"?>      <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">        <soapenv:Body>          <ns:Response xmlns:ns="http://domain.com">            <ns:return>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;      &lt;Results &gt;      &lt;/Results&gt;      </ns:return>          </ns:Response>        </soapenv:Body>      </soapenv:Envelope>') from dual)select xmltype(response) from example, xmltable (    xmlnamespaces(        'http://schemas.xmlsoap.org/soap/envelope/' as "soapenv",        'http://domain.com' as "ns"    ),    '/soapenv:Envelope/soapenv:Body/ns:Response'    passing soap    columns        response varchar2(100) path 'ns:return')

    Output:

    <?xml version="1.0" encoding="UTF-8"?>      <Results >      </Results>      

    I had to redefine the & character to something else, so that &lt etc. is not seen as a variable reference by SQL Developer.

  • partlycloudy
    partlycloudy Member Posts: 8,160 Silver Trophy
    edited Sep 3, 2020 9:37AM

    Very nice, thanks. Couple of questions

    1. The Results node in the embedded XML document may contain large amounts of text, greater than 32K. I assume I can use response clob path 'ns:return' in the XMLTABLE call to handle this?
    2. I don't see any code to unescape the data in the ns:return node, so how does this happen? The reason I ask this is that the Results node in the response (example below) contains another node with escaped value (not my choice, this is an external vendor response).  How can the entire Results node be converted to a well formed XML document ?

    <?xml version="1.0" encoding="UTF-8"?>      <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">        <soapenv:Body>          <ns:Response xmlns:ns="http://domain.com">            <ns:return>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;      &lt;Results &gt;          <Match>          <Entity>                &lt;entity&gt;                    &lt;name&gt;John Doe&lt;/name&gt;                &lt;/entity&gt;           </Entity>        </Match>    &lt;/Results&gt;      </ns:return>          </ns:Response>        </soapenv:Body>      </soapenv:Envelope>

    Thanks

  • cormaco
    cormaco Member Posts: 1,939 Silver Crown
    edited Sep 3, 2020 9:33AM
    partlycloudy schrieb:Very nice, thanks. Couple of questionsThe Results node in the embedded XML document may contain large amounts of text, greater than 32K. I assume I can use response clob path 'ns:return' in the XMLTABLE call to handle this?I don't see any code to unescape the data in the ns:return node, so how does this happen?Thanks

    You're welcome.

    1.) Yes, you can savely use clob instead of varchar2

    2.) The unescape happens transparently then the value of the textnode of ns:return is converted to varchar2 resp. clob.

  • partlycloudy
    partlycloudy Member Posts: 8,160 Silver Trophy
    edited Sep 3, 2020 9:38AM

    Yikes, apologies, I edited my question, could you help with the new example in #2? Thanks

  • cormaco
    cormaco Member Posts: 1,939 Silver Crown
    edited Sep 3, 2020 1:23PM

    Your new example contains encoded and unencoded tags in ns:return.

    Although this is a wellformed XML it is almost impossible to extract the inner XML from ns:return like this.

    I have not yet found a way to do it.

    You should complain to the vendor, the content of ns:return should be consistently encoded.

    Alternatively they can put the unencoded xml in ns:return using a CDATA element:

    https://en.wikipedia.org/wiki/CDATA

  • partlycloudy
    partlycloudy Member Posts: 8,160 Silver Trophy
    edited Sep 3, 2020 1:37PM

    Sorry, my misake, here is the full response. ns:return contains all encoded tags. Agreed, I will ask them to use CDATA to avoid this mess

    Any ideas?

    <?xml version="1.0" encoding="UTF-8"?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">  <soapenv:Body>    <ns:filterDocResponse xmlns:ns="http://domain.com">      <ns:return>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;&lt;Results ClientID=&quot;1234&quot; Id=&quot;&quot; Total=&quot;33&quot; &gt;&lt;Match MatchType=&quot;XYZ&quot; &gt;&lt;Entity Id=&quot;13527&quot; &gt;&amp;lt;entity id=&quot;13527&quot; &amp;gt;   &amp;lt;name&amp;gt;JOHN DOE&amp;lt;/name&amp;gt;&amp;lt;listId&amp;gt;0&amp;lt;/listId&amp;gt;   &amp;lt;/entity&amp;gt;&lt;/Entity&gt;&lt;/Match&gt;&lt;/Results&gt;</ns:return>    </ns:filterDocResponse>  </soapenv:Body></soapenv:Envelope>
  • cormaco
    cormaco Member Posts: 1,939 Silver Crown
    edited Sep 3, 2020 2:03PM Answer ✓

    This is a mess, agreed.

    Here are several cases where < is encoded as &amp;lt; instead of &lt; and > as &amp;gt; instead of &gt;

    When all encodings are fixed the xml can be parsed analogous to my first example:

    with example(soap) as (  select xmltype(  '<?xml version="1.0" encoding="UTF-8"?>      <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">        <soapenv:Body>          <ns:filterDocResponse xmlns:ns="http://domain.com">            <ns:return>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;      &lt;Results ClientID=&quot;1234&quot; Id=&quot;&quot; Total=&quot;33&quot; &gt;      &lt;Match MatchType=&quot;XYZ&quot; &gt;      &lt;Entity Id=&quot;13527&quot; &gt;&lt;entity id=&quot;13527&quot; &gt;         &lt;name&gt;JOHN DOE&lt;/name&gt;      &lt;listId&gt;0&lt;/listId&gt;         &lt;/entity&gt;&lt;/Entity&gt;      &lt;/Match&gt;      &lt;/Results&gt;      </ns:return>          </ns:filterDocResponse>        </soapenv:Body>      </soapenv:Envelope>  ') from dual  )  select xmltype(response)   from example, xmltable (      xmlnamespaces(          'http://schemas.xmlsoap.org/soap/envelope/' as "soapenv",          'http://domain.com' as "ns"      ),      '/soapenv:Envelope/soapenv:Body/ns:filterDocResponse'      passing soap      columns          response clob path 'ns:return'  )

    Output:

    <?xml version="1.0" encoding="UTF-8"?>      <Results ClientID="1234" Id="" Total="33" >      <Match MatchType="XYZ" >      <Entity Id="13527" ><entity id="13527" >         <name>JOHN DOE</name>      <listId>0</listId>         </entity></Entity>      </Match>      </Results>      
  • partlycloudy
    partlycloudy Member Posts: 8,160 Silver Trophy
    edited Sep 3, 2020 2:21PM

    From what I can see in the vendor documentation, the inner (lowercase) entity node (repeated) is nested inside the outer Entity (upper case) and hence it goes through another level of encoding i.e. even & gets converted to &amp;

    Taking a step back, how can I learn more about XMLTABLE? I know it can be used to extract repeating nodes from an XML document using XPath queries but it is not always clear to me what to start with (PASSING ...) and what to extract using XPath. It gets even more complicated with a deeply nested XML document like this with some scalar/singleton nodes and some repeating nodes. Is there a good tutorial?

    Thanks for all your help

  • Jason_(A_Non)
    Jason_(A_Non) Member Posts: 2,094 Silver Trophy
    edited Sep 3, 2020 2:50PM

    To deal with the double encoding they are doing, you need to perform a second decode yourself.  Simply change

    select xmltype(response)

    to

    select xmltype(dbms_xmlgen.convert(response, 1))  -- 1 = dbms_xmlgen.ENTITY_DECODE

    The call to dbms_xmlgen.convert will unescape the remaining escaped characters.

    In regards to your XMLTable question and chaining them, here is a good starter post

    https://odieweblog.wordpress.com/2016/05/20/how-to-using-outer-join-with-xmltable-or-xquery/

    You can search his blog and find many XMLTable examples.

    partlycloudycormaco