Forum Stats

  • 3,852,474 Users
  • 2,264,108 Discussions
  • 7,905,077 Comments

Discussions

How to generate XPath string from XML file

2

Answers

  • BluShadow
    BluShadow Member, Moderator Posts: 42,318 Red Diamond
    edited Aug 19, 2016 5:48AM

    Odie's code works fine for me on 11.2.0.3..  (I don't have 11.2.0.1 to test)

    SQL> ed
    Wrote file afiedt.buf  1  select x.node_id
      2        , x.node_name
      3        , x.node_type
      4        , x.node_xpath
      5   from tmp_xml t
      6      , xmltable(
      7        'declare function local:getChildren($e as node(), $pid as xs:string?, $pxp as xs:string?) as element()*
      8         {
      9           for $i at $p in $e/(node()|@*)
    10           let $id := if ($pid) then concat($pid,".",$p) else "1"
    11           let $xp := concat($pxp, "/",
    12                        typeswitch($i)
    13                          case text()      return "text()"
    14                          case attribute() return concat("@", local-name($i))
    15                          case element()   return concat(local-name($i), "[", count($i/preceding-sibling::*[name(.)=name($i)])+1, "]")
    16                          default          return ""
    17                      )
    18           return element r
    19           {
    20             element node_id {$id}
    21           , element parent_node_id {$pid}
    22           , element node_name {local-name($i)}
    23           , element node_value { if (not($i instance of element())) then data($i) else () }
    24           , element node_type {
    25               typeswitch($i)
    26                 case text()                   return "text"
    27                 case attribute()              return "attribute"
    28                 case element()                return "element"
    29                 case processing-instruction() return "processing-instruction"
    30                 case comment()                return "comment"
    31                 default                       return "other"
    32             }
    33           , element namespace_uri {namespace-uri($i)}
    34           , element node_xpath {$xp}
    35           }
    36           | local:getChildren($i, $id, $xp)
    37         }; (: :)
    38         local:getChildren($d,(),())'
    39         passing t.object_value as "d"
    40         columns node_id         varchar2(100)   path 'node_id'
    41               , node_name       varchar2(2000)  path 'node_name'
    42               , node_value      varchar2(4000)  path 'node_value'
    43               , parent_node_id  varchar2(100)   path 'parent_node_id'
    44               , node_type       varchar2(30)    path 'node_type'
    45               , namespace_uri   varchar2(2000)  path 'namespace_uri'
    46               , node_xpath      varchar2(4000)  path 'node_xpath'
    47*        ) x
    48  /NODE_ID         NODE_NAME                      NODE_TYPE                      NODE_XPATH
    --------------- ------------------------------ ------------------------------ ---------------------------------------------------------------------------
    1.1             projectNumber                  element                        /project[1]/projectNumber[1]
    1.3.1                                          text                           /project[1]/lineOfBusiness[1]/text()
    1.1.1                                          text                           /project[1]/projectNumber[1]/text()
    1               project                        element                        /project[1]
    1.3             lineOfBusiness                 element                        /project[1]/lineOfBusiness[1]
    1.2.1                                          text                           /project[1]/projectType[1]/text()
    1.5.9.2         Mail                           element                        /project[1]/summary[1]/appliesTo[1]/Mail[1]
    1.5.9.1.1                                      text                           /project[1]/summary[1]/appliesTo[1]/Retail[1]/text()
    1.5.9.1         Retail                         element                        /project[1]/summary[1]/appliesTo[1]/Retail[1]
    1.5.9           appliesTo                      element                        /project[1]/summary[1]/appliesTo[1]
    1.5.8.1                                        text                           /project[1]/summary[1]/description[1]/text()
    1.5.8           description                    element                        /project[1]/summary[1]/description[1]
    1.5.7.1                                        text                           /project[1]/summary[1]/clientType[1]/text()
    1.5.7           clientType                     element                        /project[1]/summary[1]/clientType[1]
    1.5.6.1                                        text                           /project[1]/summary[1]/internalProject[1]/text()
    1.5.6           internalProject                element                        /project[1]/summary[1]/internalProject[1]
    1.5.5.1                                        text                           /project[1]/summary[1]/mandatoryReview[1]/text()
    1.2             projectType                    element                        /project[1]/projectType[1]
    1.5.5           mandatoryReview                element                        /project[1]/summary[1]/mandatoryReview[1]
    1.5.4.1                                        text                           /project[1]/summary[1]/clientRequested[1]/text()
    1.5.4           clientRequested                element                        /project[1]/summary[1]/clientRequested[1]
    1.5.3.1                                        text                           /project[1]/summary[1]/effectiveDate[1]/text()
    1.5.3           effectiveDate                  element                        /project[1]/summary[1]/effectiveDate[1]
    1.5.2           workflowStateDate              element                        /project[1]/summary[1]/workflowStateDate[1]
    1.5.1.1                                        text                           /project[1]/summary[1]/creationDate[1]/text()
    1.5.1           creationDate                   element                        /project[1]/summary[1]/creationDate[1]
    1.5             summary                        element                        /project[1]/summary[1]
    1.4.1                                          text                           /project[1]/projectStatus[1]/text()
    1.4             projectStatus                  element                        /project[1]/projectStatus[1]
    1.5.9.2.1                                      text                           /project[1]/summary[1]/appliesTo[1]/Mail[1]/text()30 rows selected.SQL> select * from v$version;BANNER
    --------------------------------------------------------------------------------
    Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
    PL/SQL Release 11.2.0.3.0 - Production
    CORE    11.2.0.3.0      Production
    TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
    NLSRTL Version 11.2.0.3.0 - Production
  • padders
    padders Member Posts: 1,081 Silver Trophy
    edited Aug 19, 2016 5:57AM

    I think this is pretty useful when analysing XML if you don't have XMLSpy or equivalent to hand where you can just hover over elements and get their xpath.

    I added an attribute to the projectNumber node to test and I notice your first query returns the attribute before its element, was that intentional or is it non-deterministic if you use node()|@*?

  • odie_63
    odie_63 Member Posts: 8,493 Silver Trophy
    edited Aug 19, 2016 6:44AM
    I think this is pretty useful when analysing XML if you don't have XMLSpy or equivalent to hand where you can just hover over elements and get their xpath.

    Yes, why not.

    But for this usage, it lacks a proper namespace support. Not difficult to add though.

    The design of this query (and of other flattening techniques of the same kind) primarily came after a discussion about the Edge Model, a schema-less storage model for XML.

    This model is the basis for building simple XML indexes.

    Unstructured XML indexes in Oracle are implemented differently though, with a PATH table and an internal token table, but the idea is similar.

    I added an attribute to the projectNumber node to test and I notice your first query returns the attribute before its element, was that intentional or is it non-deterministic if you use node()|@*?

    Do you mean this :

    NODE_ID  NODE_NAME      NODE_TYPE  NODE_XPATH-------- -------------- ---------- ------------------------------------1        project        element    /project[1]1.1.2                   text       /project[1]/projectNumber[1]/text()1.1.1    id             attribute  /project[1]/projectNumber[1]/@id1.1      projectNumber  element    /project[1]/projectNumber[1]

    ?

    On my db (12.1.0.2), subsequent runs of the query give a different row order every single time.

    NODE_IDs show that nodes are processed in a deterministic way, though the XQuery VM is apparently free to return rows in any order.

  • odie_63
    odie_63 Member Posts: 8,493 Silver Trophy
    edited Aug 19, 2016 8:18AM
    When I execute your first method below error message is raised. Please help me to resolve.

    Are you testing using the sample XML document from your first post?

    Since you're on 11.2.0.1, you're probably not using Binary XML storage?

    In that version, the CBO will probably try to rewrite the XQuery expression using internal functions, which sometimes doesn't work that well.

    Try to force functional evaluation via the NO_XML_QUERY_REWRITE hint :

    SELECT /*+ no_xml_query_rewrite */       ...FROM ...
  • padders
    padders Member Posts: 1,081 Silver Trophy
    edited Aug 19, 2016 6:51AM

    > Do you mean this

    Yes, I was observing that attribute 1.1.1 was numbered correctly as a child of element 1.1 (to my thinking) but appeared before element 1.1.

  • New Roots
    New Roots Member Posts: 188 Blue Ribbon
    edited Aug 22, 2016 3:31AM

    Hi Blushadow,

    Excuse for  delayed response. when I execute the same in Oracle live sql window ,It is working fine. But I execute the same query in my database version I got the error message which i mentioned in earlier post. How can I run this with DB  version 11.2.0.1

    1.png

  • New Roots
    New Roots Member Posts: 188 Blue Ribbon
    edited Aug 22, 2016 4:17AM

    Hi Odie_63,

    Are you testing using the sample XML document from your first post?

    Yes. I used same XML document.

    Try to force functional evaluation via the NO_XML_QUERY_REWRITE hint :

    SELECT /*+ no_xml_query_rewrite */ 
           ... 
    FROM ... 

    I tried to use the above hint in my select statement. But no luck.

  • odie_63
    odie_63 Member Posts: 8,493 Silver Trophy
    edited Aug 22, 2016 5:16AM

    Try the 2nd query then. It'll be more efficient anyway.

  • New Roots
    New Roots Member Posts: 188 Blue Ribbon
    edited Aug 23, 2016 2:09AM

    Hi Odie_63,

      I tried to implement your second approach. My original xml file have lot of nodes. some nodes have same field names.  I want to insert field values into different tables. For instance the project node fields should be stored in the project table in addition to primary key column.

    Summary and appliesTo node tag fields should be stored in the summary table.

    project table  structure :

    ------------------------------------------------------------------

    PROJECT_IDNOT NULL NUMBER
    FILE_SEQ_ID     NUMBER
    LINEOFBUSINESS VARCHAR2(200)
    PROJECTNUMBER   VARCHAR2(200)
    PROJECTSTATUS   VARCHAR2(200)
    PROJECTTYPE     VARCHAR2(200)

    desc summaryName              Null    Type         ----------------- -------- -------------SUMMARY_ID        NOT NULL NUMBER       PROJECT_ID                NUMBER       FILE_SEQ_ID                NUMBER       CLIENTREQUESTED            VARCHAR2(200)CLIENTTYPE                VARCHAR2(200)CREATIONDATE              VARCHAR2(200)DESCRIPTION                VARCHAR2(200)EFFECTIVEDATE              VARCHAR2(200)INTERNALPROJECT            VARCHAR2(200)MANDATORYREVIEW            VARCHAR2(200)WORKFLOWSTATEDATE          VARCHAR2(200)APPLIESTO_MAIL            VARCHAR2(10) APPLIESTO_RETAIL          VARCHAR2(10) 

    My approach :

    Inserting  fieldname, parent tag name, field order ( column order in the actual tables) ,xpath informations in a temporary configuration table.

    Later I will use this table to prepare dynamic insert script with primary key and foreign key's  to load all the tables.

    Please suggest .

    -- Added

    I'm using below query to get the tagname and parent tag name and  xpath from your approach.

    WITH edge_data AS  (SELECT x.* ,    row_number() over(partition BY parent_node_id, node_name order by node_id) AS position  FROM tmp_xml t ,    TABLE(xmlflattendom(t.object_value)) x  )SELECT x.* ,  regexp_substr ( xpath, '/([^[]+)', 1, regexp_count(xpath,'/')-1, null, 1) node_name,regexp_substr ( xpath, '/([^[]+)', 1, regexp_count(xpath,'/')-2, null, 1) parent_nodeFROM  (SELECT t.node_id ,    t.node_name ,    t.node_type,    sys_connect_by_path(    CASE node_type      WHEN 'element'      THEN node_name        || '['        || position        || ']'      WHEN 'attribute'      THEN '@'        || node_name      ELSE 'text()'    END , '/' ) AS xpath  FROM edge_data t  WHERE node_type             IN ('element', 'attribute', 'text')    CONNECT BY prior node_id   = parent_node_id    START WITH parent_node_id IS NULL  )xWHERE node_type='text';
  • odie_63
    odie_63 Member Posts: 8,493 Silver Trophy
    edited Aug 23, 2016 4:51AM
    My approach :Inserting  fieldname, parent tag name, field order ( column order in the actual tables) ,xpath informations in a temporary configuration table.Later I will use this table to prepare dynamic insert script with primary key and foreign key's  to load all the tables.Please suggest .

    What I suggest is to drop this approach altogether.

    You may think it's rather smart to implement generic stuff like that but at the end of the day, it'll be harder to maintain and less efficient.

    My piece of advice : use static XMLTABLE/XMLQUERY statements with static paths.

    My original xml file have lot of nodes. some nodes have same field names.

    So the XML contains collection(s) of repeating nodes. All very classic so far.

    Given the fact that you requested XPath expressions with positional predicates (e.g. [1], [2] etc.), I'm inclined to think that you want to extract each value one by one, and that would be the worst thing to do with respect to what Oracle offers to do the job properly and efficiently.

    Again, XMLTABLE is the right tool to use here.

    BluShadow
This discussion has been closed.