This discussion is archived
4 Replies Latest reply: Jul 3, 2013 8:31 AM by paul zip RSS

XML to CSV routine using XSLT.  Oracle bugs?

paul zip Newbie
Currently Being Moderated

I'm trying to write some XSLT so that I can apply it to any given XML using  XMLTransform to generate a flattened CSV structure.  My input can be any XML document, with any number of nested levels so the structure isn't known to the XSLT.  I'm fairly new to XSLT but am having problem with Oracle's version of it.  I've tested on both 10gR2 and 11gR2.


CSV Rules :

Adjacent fields must be separated by a comma

Embedded commas in fields are escaped e.g. 15, Maple Street => "15, Maple Street"

Embedded quotes in fields are escaped.  e.g. O'Brien => "O'Brien" or John "Jonner" McNabb => "John ""Jonner"" McNabb"

Embedded CR / LF or CRLF in fields are escaped with quotes.

Records are separated with CRLF pairs


Flattening Rules :

The text of leaf elements (elements without children) are output even if they are empty

Any element with text is output

A new line occurs whenever the nesting level changes. i.e. If an element has children.


When I run the following XML and my XSLT through Oracle I get results inconsistent with results I get when I run the XML and XSLT through online XSLT utils (including W3C's test pages).

 

with qryXML as (
  select xmltype(q'[
  <email>
    <to>Steve O'Brien</to>
    <cc/>
    <from>Jane "The Smiler" Griff</from>
    <heading>Reminder</heading>
    <body>Hey, don't forget me this weekend!  I've enclosed some pics from last weekend
  Love Jane
  </body>
    <attachments>
      <attachment>
        <filename>Dance.jpg</filename>
        <encoding>MIME</encoding>
      </attachment>
      <attachment>
        <filename>Signature.txt</filename>
        <encoding/>
      </attachment>
    </attachments>
  </email>
  ]') as XMLDATA from dual)
select
   XMLTransform(q.XMLDATA,
      xmltype(q'[
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="*[text() or not(*)]">
        <xsl:if test="position() &gt; 1">
            <xsl:text>,</xsl:text>
        </xsl:if>
    <xsl:call-template name="display_csv_field">
      <xsl:with-param name="field" select="."/>
    </xsl:call-template>
        <xsl:if test="not(following-sibling::*[text() or not(*)])">
            <xsl:text>
</xsl:text>          
        </xsl:if>
    </xsl:template>
  <xsl:template name="display_csv_field">
    <xsl:param name="field"/>
    <xsl:variable name="CRLF">
      <xsl:text>&#13;&#10;</xsl:text>
    </xsl:variable>
    <xsl:variable name="CR">
      <xsl:text>&#13;</xsl:text>
    </xsl:variable>
    <xsl:variable name="LF">
      <xsl:text>&#10;</xsl:text>
    </xsl:variable>
    <xsl:variable name="apos">'</xsl:variable>    
    <xsl:choose>
      <xsl:when test="contains( $field, '&quot;' )">
        <!-- Field contains a quote. So escape  -->
        <xsl:text>"</xsl:text>
        <xsl:call-template name="escape_quotes">
          <xsl:with-param name="string" select="$field" />
        </xsl:call-template>
        <xsl:text>"</xsl:text>
      </xsl:when>
      <xsl:when test="contains( $field, ',' ) or 
                      contains( $field, $apos ) or
                      contains( $field, $CRLF ) or
                      contains( $field, $CR ) or
                      contains( $field, $LF )" >
        <!-- Field contains a comma, apostrophe and/or a linefeed, so quote -->
        <xsl:text>"</xsl:text>
        <xsl:value-of select="$field" />
        <xsl:text>"</xsl:text>
      </xsl:when>
      <xsl:otherwise>
        <!-- No need to enclose this field in quotes. -->
        <xsl:value-of select="$field" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
  <xsl:template name="escape_quotes">
    <xsl:param name="string" />
    <xsl:value-of select="substring-before( $string, '&quot;' )" />
    <xsl:text>""</xsl:text>
    <xsl:variable name="substring_after_first_quote"
                  select="substring-after( $string, '&quot;' )" />
    <xsl:choose>
      <xsl:when test="not( contains( $substring_after_first_quote, '&quot;' ) )">
        <xsl:value-of select="$substring_after_first_quote" />
      </xsl:when>
      <xsl:otherwise>
        <!-- The substring after the first quote contains a quote.
             So, we call ourself recursively to escape the quotes
             in the substring after the first quote. -->
        <xsl:call-template name="escape_quotes">
          <xsl:with-param name="string" select="$substring_after_first_quote" />
        </xsl:call-template>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>  
</xsl:stylesheet>
]'
)) from qryXML q   
/

 

Output from Oracle (which appears incorrect)

&quot;Steve O&apos;Brien&quot;&quot;Jane &quot;&quot;The Smiler&quot;&quot; Griff&quot;Reminder&quot;Hey, don&apos;t forget me this weekend!  I&apos;ve enclosed some pics from last weekend
  Love Jane
  &quot;
Dance.jpgMIME
Signature.txt


Output from most other XSLT test environments (which appears correct):

"Steve O'Brien",,"Jane ""The Smiler"" Griff",Reminder,"Hey, don't forget me this weekend!  I've enclosed some pics from last weekend
Love Jane
"
Dance.jpg,MIME
Signature.txt,


 

Oracle's XSLT handling appears incorrect in several ways:

1. There is no comma delimiter separating fields and on investigation, position() is always returning 1, which is incorrect.  From XSLT reference "The position function returns a number equal to the context position from the expression evaluation context." ...."the position, or index number, of the node, relative to all the selected nodes in the node list."  Therefore any node in a node list should be indexable with position.   I am selecting a node list *[text() or not(*)] => Any nodes with text or with no children, yet position of each node in this list is always 1.

 

2. My XSLT says the output is text, yet Oracle is still using HTML character escaping which is not text!  &quot; instead of " and &apos; instead of '.  I could search and replace, but I shouldn't have to.

 

Can anyone explain why Oracle's XSLT differs from other XSLT processors and offer suggestions to solve these issues.  Are these bugs?

  • 1. Re: XML to CSV routine using XSLT.  Oracle bugs?
    odie_63 Guru
    Currently Being Moderated

    Thanks for the detailed test case.

     

    1. There is no comma delimiter separating fields and on investigation, position() is always returning 1, which is incorrect.

    Yes, it's a bug. You should report it to Oracle Support.

    There's probably a simple way to workaround it using an xsl:for-each loop instead.

     

    Have you thought of using XQuery too?

    2. My XSLT says the output is text, yet Oracle is still using HTML character escaping which is not text!  &quot; instead of " and &apos; instead of '.  I could search and replace, but I shouldn't have to.

    I don't reproduce that on 11.2 but one thing for sure, XMLTransform is meant to return an XMLType instance, not a character-based datatype.

    You have to serialize the output using getClobVal() method or XMLSerialize function.

  • 2. Re: XML to CSV routine using XSLT.  Oracle bugs?
    paul zip Newbie
    Currently Being Moderated

    @odie_63, many thanks for taking time to reply.

    Have you thought of using XQuery too?

    No, I didn't think of using XQuery at all, as I'm not very familiar with it, but that could be a good suggestion.  Could XQuery output variant record CSV (differing number of fields per line) and be used easily to apply the CSV and flattening rules?  It looks like XQuery supports local functions, which could be written to do the correct escaping, but I've not idea how to even approach this method.  Do you have any sample code which demonstrate how I'd approach this?

  • 3. Re: XML to CSV routine using XSLT.  Oracle bugs?
    odie_63 Guru
    Currently Being Moderated

    Here's an example tested on 11.2.0.2 :

     

    SQL> set scan off

    SQL> set long 5000

    SQL> set lines 200

    SQL> var doc clob

    SQL> begin

      2   :doc := '<email>

      3      <to>Steve O''Brien</to>

      4      <cc/>

      5      <from>Jane "The Smiler" Griff</from>

      6      <heading>Reminder</heading>

      7      <body>Hey, don''t forget me this weekend!  I''ve enclosed some pics from last weekend

      8    Love Jane

      9    </body>

    10      <attachments>

    11        <attachment>

    12          <filename>Dance.jpg</filename>

    13          <encoding>MIME</encoding>

    14        </attachment>

    15        <attachment>

    16          <filename>Signature.txt</filename>

    17          <encoding/>

    18        </attachment>

    19      </attachments>

    20    </email>';

    21  end;

    22  /

     

    PL/SQL procedure successfully completed.

     

    SQL> select xmlcast(

      2           xmlquery(

      3           'declare function local:formatField ($e as element()) as xs:string

      4            {

      5              let $t := ora:replace($e, "&quot;", "&quot;&quot;")

      6              return if (    contains($t, "&quot;")

      7                          or contains($t, ",")

      8                          or contains($t, "&apos;")

      9                          or contains($t, "&#13;")

    10                          or contains($t, "&#10;") )

    11                     then concat("&quot;", $t, "&quot;")

    12                     else $e

    13            }; (::)

    14            string-join(

    15              for $i in //*[*[text() or not(node())]]

    16              return

    17                string-join(

    18                  for $j in $i/*[text() or not(node())]

    19                  return local:formatField($j)

    20                , ","

    21                )

    22            , "&#13;&#10;"

    23            )'

    24             passing xmltype(:doc)

    25             returning content

    26           )

    27           as clob

    28         ) as result

    29  from dual ;

     

    RESULT

    --------------------------------------------------------------------------------

    "Steve O'Brien",,"Jane ""The Smiler"" Griff",Reminder,"Hey, don't forget me this

    weekend!  I've enclosed some pics from last weekend

      Love Jane

      "

    Dance.jpg,MIME

    Signature.txt,

     

     

     

    We could also use a single ora:matches function in place of the OR'ed contains but it doesn't appear to work on 11.2.0.3, got an internal error :

    ORA-00600: code d'erreur interne, arguments : [qmxqrs_xvm_popLastFuncArgAsStr:1], [], [], [], [], [], [], [], [], [], [], []

  • 4. Re: XML to CSV routine using XSLT.  Oracle bugs?
    paul zip Newbie
    Currently Being Moderated

    odie_63, once again your skills are invaluable.

     

    I was fiddling around with XQuery after you mentioned it in your previous post, but most I wrote generated syntax errors and some of the errors are particularly cryptic and hard (for me at least) to know how to fix.

     

    I tried your solution on 11R2 and it works brilliantly; in many ways seems more elegant than my XSLT solution.  I tried it on 10R2 (because my solution has to work on both) and had several of those cryptic errors...  One error seemed to be fixed by inserting whitespace between the (::) comment markers.  But now I get "ORA-19112: error raised during evaluation: oracle.xquery.XQException: FOTY0021".  Google suggests FOTY0021 is invalid node in XQuery, but I've no idea what that equates to?

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points