Forum Stats

  • 3,826,405 Users
  • 2,260,641 Discussions
  • 7,896,932 Comments

Discussions

Select xml problem

Frank Lehmann
Frank Lehmann Member Posts: 126 Blue Ribbon

Hi helpers,

i have attached xml file, which has additional html tags (why ever someone does this).

My idea is to accept this html tags (<p>, <span>) as simple xml tags.

But how to read this xml, so that all <items> will be selected.

Any ideas ?

Tagged:

Best Answer

  • Paulzip
    Paulzip Member Posts: 8,693 Blue Diamond
    edited Jan 11, 2022 6:32PM Answer ✓

    Don't use rownum, it's not correct for node position in the XML. Use "for ordinality" in the XMLTable columns clause

    e.g.
     [snip..]               
     columns       
       idx for ordinality
     , textzeile varchar2(1000) path '.'
     )
    


«1

Answers

  • BluShadow
    BluShadow Member, Moderator Posts: 41,978 Red Diamond

    The XML is flawed in terms of being XML, and you'll struggle when it come to self closed tags e.g. "<p />" if you wanted to treat those as XML.

    Wherever the XML is generated needs to be changed so that the "data" that is HTML is stored as data and doesn't include XML like characters. One way is to ensure that the data is encoded i.e. "<" becomes "&lt;" and ">" becomes "&gt;" etc. and then this ensures that when you encounter "<" and ">" they are genuinely referring to XML tags and not some of the html data.

  • Frank Lehmann
    Frank Lehmann Member Posts: 126 Blue Ribbon

    Thank you for your response.

    I could remove those self closed tags before.

    It is a solution ?

  • BluShadow
    BluShadow Member, Moderator Posts: 41,978 Red Diamond

    Not really, HTML can contain all sorts of things, including embedded javascript and all sorts... It is better if it is properly treated like data that is embedded into XML, i.e. encoded data whether that is just encoding the things that would confuse it with XML, or a full encoding, for example Base64 encoded data.

    If you persist with trying to treat it like XML, then you may solve one thing, only to find other things, and then who knows what else could be included in the future.

    Just think of the scripts that can be included in the HTML, you could get a "<" and ">" character for good reasons, such as genuinely comparing less than or greater than values... so those wouldn't be tags at all.

    Proper data transfer relies on properly defined boundaries to ensure data is never corrupted (and often that forms part of an ISO or even legislative standard/requirement). Better to solve issues at source rather than try and work around them.

  • Paulzip
    Paulzip Member Posts: 8,693 Blue Diamond
    edited Jan 10, 2022 7:03PM

    The file you provided is valid XML, so can be parsed and your data should be able to be extracted. It just has embedded XHTML content (XHTML is HTML but with strict XML compatibility, so unlike HTML, all tags must be closed etc) in it.

    It's just not entirely clear what data you want?

    Do you want to extract these <p> and <span> items with XHTML content ?

    e.g. /GAEB/Award/AddText//

    or do you want to extact <item> nodes ?

    e.g. /GAEB/Award/BoQ/BoQBody/BoQCtgy[1]/BoQBody/BoQCtgy[1]/BoQBody/Itemlist/Item

    If you specify XPath of what you mean it should be doable

  • Frank Lehmann
    Frank Lehmann Member Posts: 126 Blue Ribbon
    edited Jan 11, 2022 7:19AM

    @Paulzip

    Thanks for your reply.

    Yes, i need all text under the <item> nodes for every item.

    <BoQBody><Itemlist><Item ID="GW_786144" RNoPart="0010"><Qty>76.000</Qty><QU>St</QU><Description ID="dd1c8243-9a67-4042-9f4c-7033888826c8"><CompleteText><DetailTxt><Text><p><span style="background:#FFFFFF;">PV-Modul 325 KWp</span></p> ....

    I have found a way to remove the closed tags like <p/> ! So i think, now it is valid xml.

    But how looks like the sql statement ?

  • Paulzip
    Paulzip Member Posts: 8,693 Blue Diamond
    edited Jan 11, 2022 8:59AM

    <p/> is valid XML. It's an empty node, so you don't have to do anything.

    If you create an Oracle dir and save your file renamed as neu4.xml, then something like this should get your Item nodes :

    select *
    from  XMLTable(
            XMLNamespaces(
              default 'http://www.gaeb.de/GAEB_DA_XML/DA83/3.3'
            )
          , '/GAEB/Award/BoQ/BoQBody//BoQCtgy/BoQBody//BoQCtgy//BoQBody/Itemlist/Item'
            passing XMLType(BFilename('PAULZIP_DIR', 'neu4.xml'), nls_charset_id('AL32UTF8'))
            columns
              Item XMLTYPE path '.'
          )
    ;
    
    [snip...]
    177 rows returned
    

    It uses recursive decent on some of the nodes, but only where necessary (as otherwise the whole DOM gets loaded into memory). It does look like you can have multiple levels of BoQBody/BoQCtgy, so your example may be just one case. In which case, your XPath may need to be :

    /GAEB/Award/BoQ//BoQBody//BoQCtgy//Itemlist/Item

  • Frank Lehmann
    Frank Lehmann Member Posts: 126 Blue Ribbon

    @Paulzip

    Hi, thank you for your response.

    I have modified your example to get all ITEMS and their textlines. But it will not work.

    Any hints ?

    declare

    ls_clob clob;

    begin

     lob_writer_plsql.read_clob(p_directory => 'UTL_KUNDEN',p_filename => 'testxml.xml',p_csid => 873, p_data => ls_clob);

     for x in (select item

           from XMLTable(

           XMLNamespaces(

              default 'http://www.gaeb.de/GAEB_DA_XML/DA83/3.3'

           )

         , '/GAEB/Award/BoQ/BoQBody//BoQCtgy/BoQBody//BoQCtgy//BoQBody/Itemlist/Item'

           passing xmltype(ls_clob)

           columns

             item XMLTYPE path '.'

         )) loop


         for y in

         (select qty,qu,textblock from XMLTable(

           XMLNamespaces(

              default 'http://www.gaeb.de/GAEB_DA_XML/DA83/3.3'

           )

         , '.'

           passing x.item

           columns

              qty varchar2(100) path 'Qty',

              qu varchar2(100) path 'QU',

              textblock xmltype path '.'

           )) loop


           for z in (select textzeile from XMLTable(

           XMLNamespaces(

              default 'http://www.gaeb.de/GAEB_DA_XML/DA83/3.3'

           )

         , '.'

           passing y.textblock

           columns

              textzeile varchar2(100) path '/Description/CompleteText/DetailTxt/Text/p/span'

           )) loop

            dbms_output.put_line(y.qty || ' # ' || y.qu || ' # ' || z.textzeile );

          end loop; 

         end loop;

     end loop;

    end; 

  • Paulzip
    Paulzip Member Posts: 8,693 Blue Diamond

    "But it will not work." isn't useful. Please explain why?

  • Frank Lehmann
    Frank Lehmann Member Posts: 126 Blue Ribbon

    @Paulzip

    In this case that is a Tendering of services for craftsmen. These services are as follows.
    
    first group : BoQCtgy ID="GW_786143" RNoPart="01">
    description : <span style="background:#FFFFFF;">KG 442.1 PV-ANLAGE</span>
    all items to this group : <Item ID="GW_786144" RNoPart="0010"> .. <Item ID="GW_786144" RNoPart="0160">
    
    second group : <BoQCtgy ID="GW_786160" RNoPart="02">
    description : <span style="background:#FFFFFF;">KG 443.1 ZÄHLERANLAGE</span>
    all items to this group : <Item ID="GW_786163" RNoPart="0010"> .. <Item ID="GW_786163" RNoPart="0040">
    
    third group : <BoQCtgy ID="GW_786168" RNoPart="03">
    description : <span style="background:#FFFFFF;">KG 444.1 UNTERVERTEILUNGEN</span>
    all items to this group : <Item ID="GW_786170" RNoPart="0010"> .. <Item ID="GW_786170" RNoPart="0200">
    
    fourth group : <BoQCtgy ID="GW_786191" RNoPart="04">
    description : <span style="background:#FFFFFF;">KG 444.2 KABEL UND LEITUNGEN</span>
    all items to this group : <Item ID="GW_786192" RNoPart="0010"> .. <Item ID="GW_786192" RNoPart="0250">
    
    etc.
    
    For each item i am in need of these values:
    <Qty>xxxxxx</Qty>
    <QU>yyyy</QU>
    <Text><p><span>
    
    After correct select of this xml file i have this for the first group and first item:
    
    "KG 442.1 PV-ANLAGE","0010","76.000","St","PV-Modul 325 KWp"
    "KG 442.1 PV-ANLAGE","0010","76.000","St","Rahmen             38mm Alu silber eloxiert"
    "KG 442.1 PV-ANLAGE","0010","76.000","St","Technologie:         Monokristalin"
    etc.
    


  • Paulzip
    Paulzip Member Posts: 8,693 Blue Diamond
    edited Jan 11, 2022 12:53PM

    Your nested level XPath isn't correct. This works for me...

    declare
      vXML XMLTYPe;
    begin
       vXML := XMLType(BFilename('PAULZIP_DIR', 'neu4.xml'), nls_charset_id('AL32UTF8'));
    -- lob_writer_plsql.read_clob(p_directory => 'UTL_KUNDEN',p_filename => 'testxml.xml',p_csid => 873, p_data => ls_clob);
       for x in (
         select item
         from XMLTable(
                XMLNamespaces(
                   default 'http://www.gaeb.de/GAEB_DA_XML/DA83/3.3'
                )
              , '/GAEB/Award/BoQ/BoQBody//BoQCtgy/BoQBody//BoQCtgy//BoQBody/Itemlist/Item'
                passing vXML
                columns
                  item XMLTYPE path '.'
              )
         ) 
       loop
         for y in (
           select qty,qu,Description
           from XMLTable(
                  XMLNamespaces(
                    default 'http://www.gaeb.de/GAEB_DA_XML/DA83/3.3'
                  )
                , '/Item'
                  passing x.item
                  columns
                    qty         varchar2(100) path 'Qty',
                    qu          varchar2(100) path 'QU',
                    Description xmltype       path 'Description'
                )
         ) 
         loop
           for z in (
             select textzeile 
             from XMLTable(
                    XMLNamespaces(
                      default 'http://www.gaeb.de/GAEB_DA_XML/DA83/3.3'
                    )
                  , '/Description/CompleteText/DetailTxt/Text/p/span'
                    passing y.Description
                    columns       
                      textzeile varchar2(1000) path '.'
                   )
          ) 
          loop
            dbms_output.put_line(y.qty || ' # ' || y.qu || ' # ' || z.textzeile );
          end loop; 
       end loop;
     end loop;
    end;
    
    76.000 # St # PV-Modul 325 KWp
    76.000 # St # Rahmen 			38mm Alu silber eloxiert
    76.000 # St # Technologie: 		Monokristalin
    76.000 # St # Nennleistung PMPP :		325Wp
    
    [snip..]