5 Replies Latest reply: Dec 23, 2011 6:26 AM by tsuji RSS

    XML/XSD question (using ODI)

    604709
      Hi all.
      I have posted this in the ODI forum as well, but this may be a more proper place to ask for help.
      I'm working on a new dwh solution where the main source of data will come from XML-files. The problem is that we are having difficulties with the use of abstract types in the xsd-file. We currently use ODI to read the XML-files and to store the data in our database, but all fields from the XML-files are not visible in the target tables.

      The problem can be simplified like this example:

      We have a main element, testElement, which can contain one or more publications.
      Publication is of type PublicationType, and PublicationType is an abstract that contains title, author and date.
      We have four other types which extends PublicationType; BookType, MagazineType, NewspaperType and AdsType. They all contain additional fields.

      XSD-file
      <?xml version="1.0" encoding="UTF-8"?>
      <schema xmlns="http://www.w3.org/2001/XMLSchema"
      targetNamespace="urn:testing:kontroll:example:oppgave:v1"
      xmlns:tns="urn:testing:kontroll:example:oppgave:v1"
      xmlns:xdb="http://xmlns.oracle.com/xdb"
      elementFormDefault="qualified">
      <element name="testElement" type="tns:TestElementType" xdb:defaultTable="TEST_TAB" />
      <complexType name="TestElementType">
      <sequence>
      <element name="publication" type="tns:PublicationType" minOccurs="1"
      maxOccurs="unbounded" />
      </sequence>
      </complexType>
      <complexType name="PublicationType" abstract="true">
      <sequence>
      <element name="title" type="string"/>
      <element name="author" type="string" minOccurs="0"
      maxOccurs="unbounded" />
      <element name="date" type="string"/>
      </sequence>
      </complexType>
      <complexType name="BookType">
      <complexContent>
      <extension base="tns:PublicationType">
      <sequence>
      <element name="ISBN" type="string"/>
      <element name="publisher" type="string"/>
      </sequence>
      </extension>
      </complexContent>
      </complexType>
      <complexType name="MagazineType">
      <complexContent>
      <extension base="tns:PublicationType">
      <sequence>
      <element name="editor" type="string"/>
      <element name="period" type="string" minOccurs="0"
      maxOccurs="1"/>
      </sequence>
      </extension>
      </complexContent>
      </complexType>
      <complexType name="NewspaperType">
      <complexContent>
      <extension base="tns:PublicationType">
      <sequence>
      <element name="daily" type="boolean"/>
      <element name="owner" type="string" minOccurs="0"
      maxOccurs="1"/>
      </sequence>
      </extension>
      </complexContent>
      </complexType>
      <complexType name="AdsType">
      <complexContent>
      <extension base="tns:PublicationType">
      <sequence>
      <element name="company" type="string"/>
      <element name="article" type="string" />
      </sequence>
      </extension>
      </complexContent>
      </complexType>
      </schema>

      XML-file
      <?xml version="1.0" encoding="UTF-8"?>
      <tns:testElement xmlns:tns="urn:testing:kontroll:example:oppgave:v1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:testing:kontroll:example:oppgave:v1 ExampleXMLSchema.xsd ">
      <tns:publication xsi:type="tns:BookType">
      <tns:title>Boken</tns:title>
      <tns:author>Arne Svendsen</tns:author>
      <tns:date>2001</tns:date>
      <tns:ISBN>78979797</tns:ISBN>
      <tns:publisher>The Company Ltd</tns:publisher>
      </tns:publication>
      <tns:publication xsi:type="tns:MagazineType">
      <tns:title>Fancy Magazine</tns:title>
      <tns:author>Mads Madsen</tns:author>
      <tns:date>2011</tns:date>
      <tns:editor>Svante Svantesen</tns:editor>
      <tns:period>weekly</tns:period>
      </tns:publication>
      </tns:testElement>

      When tables are generated in the database through ODI I'm not getting all the attributes present in the xml-file.
      Can anybody tell me if this should work (and, if yes, why it doesn`t)? Or if the XSD/XML looks wrong in some way?
      Is this a known limitation in Oracle or ODI etc.?
      Any pointers to documentation describing similar problems would also be helpful.

      Thanks,
      Bjørn
        • 1. Re: XML/XSD question (using ODI)
          tsuji
          Came across this:
          http://www.business-intelligence-quotient.com/?tag=odiinvokewebservice
          in particular
          There is one issue with this, however. The ODI XML parser doesn’t like xsi attributes such as xsi:type.
          Maybe it would help to determine what to do with the issue.
          • 2. Re: XML/XSD question (using ODI)
            odie_63
            Hi, Bjørn,
            When tables are generated in the database through ODI I'm not getting all the attributes present in the xml-file.
            I don't know ODI, so I'm just curious here : what tables are generated? Is there one table per extented type (+ child tables for repeating elements), or just one Publication table with missing columns?

            I've looked at the other thread you mentioned on the ODI forum, and saw that you'd also want to store XML files in an XMLType table.
            I tested the schema registration in the database to see how Oracle reacts to the structure, and it looks OK, type extensions are supported.
            SQL> begin
              2   dbms_xmlschema.registerSchema(
              3     schemaURL => 'ExampleXMLSchema.xsd'
              4   , schemaDoc => bfilename('TEST_DIR', 'ExampleXMLSchema.xsd')
              5   , local => true
              6   , genTypes => true
              7   , genTables => true
              8   , enableHierarchy => dbms_xmlschema.ENABLE_HIERARCHY_NONE
              9   );
             10  end;
             11  /
             
            PL/SQL procedure successfully completed
             
            SQL> insert into test_tab
              2  values (xmltype('<?xml version="1.0" encoding="UTF-8"?>
              3  <tns:testElement xmlns:tns="urn:testing:kontroll:example:oppgave:v1"
              4                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              5                   xsi:schemaLocation="urn:testing:kontroll:example:oppgave:v1 ExampleXMLSchema.xsd ">
              6    <tns:publication xsi:type="tns:BookType">
              7      <tns:title>Boken</tns:title>
              8      <tns:author>Arne Svendsen</tns:author>
              9      <tns:date>2001</tns:date>
             10      <tns:ISBN>78979797</tns:ISBN>
             11      <tns:publisher>The Company Ltd</tns:publisher>
             12    </tns:publication>
             13    <tns:publication xsi:type="tns:MagazineType">
             14      <tns:title>Fancy Magazine</tns:title>
             15      <tns:author>Mads Madsen</tns:author>
             16      <tns:author>Arne Svendsen</tns:author>
             17      <tns:date>2011</tns:date>
             18      <tns:editor>Svante Svantesen</tns:editor>
             19      <tns:period>weekly</tns:period>
             20    </tns:publication>
             21  </tns:testElement>'))
             22  ;
             
            1 row inserted
             
            Querying as relational data :
            SQL> select x.*
              2  from test_tab t
              3     , xmltable(xmlnamespaces(default 'urn:testing:kontroll:example:oppgave:v1'),
              4       'for $i in /testElement/publication
              5        return element r {
              6          $i/child::*
              7        , element pubtype {
              8            typeswitch($i)
              9              case element(publication, BookType)      return "Book"
             10              case element(publication, MagazineType)  return "Magazine"
             11              case element(publication, AdsType)       return "Ads"
             12              case element(publication, NewspaperType) return "Newspaper"
             13              default return "Publication"
             14          }
             15        }'
             16       passing t.object_value
             17       columns title   varchar2(500) path 'title'
             18             , authors varchar2(500) path 'string-join(author,",")'
             19             , pubdate number(4)     path 'date'
             20             , isbn    number(13)    path 'ISBN'
             21             , editor  varchar2(500) path 'editor'
             22             , period  varchar2(500) path 'period'
             23             , pubtype varchar2(30)  path 'pubtype'
             24       ) x
             25  ;
             
            TITLE                 AUTHORS                        PUBDATE           ISBN EDITOR                PERIOD      PUBTYPE
            --------------------- ------------------------------ ------- -------------- --------------------- ----------- ------------------------------
            Boken                 Arne Svendsen                     2001       78979797                                   Book
            Fancy Magazine        Mads Madsen,Arne Svendsen         2011                Svante Svantesen      weekly      Magazine
             
            • 3. Re: XML/XSD question (using ODI)
              604709
              Hi odie and thanks for your reply.
              I´m getting three tables here with the following data (two rows in the two first tables and one in the last).

              Table AUTHOR DATA
              AUTHORORDER 1 1
              AUTHOR_DATA Arne Svendsen Mads MAdsen
              PUBLICATIONFK 0 1

              Table PUBLICATION DATA
              DATE_ 2001 2011
              DATE_ORDER 2 2
              PUBLICATIONORDER 0 1
              PUBLICATIONPK 0 1
              TESTELEMENTFK 0 0
              TITLE Boken Fancy Magazine
              TITLEORDER 0 0

              Table TESTELEMENT DATA
              SNPSFILENAME ExampleXMLSchema.xml
              SNPSFILEPATH /u01/oracle/product/11.1.1.5/odi/xmldir
              SNPSLOADDATE 12/22/11 7:42 AM
              TESTELEMENTPK 0

              As you can see ODI automatically inserts foreign key relationship between the different levels in the XML structure. Which is fine, and what I was hoping for. But I´m not getting any data about e.g. the publisher, the editor, period etc. that you can see in the xml-file.

              Regards
              Bjørn
              • 4. Re: XML/XSD question (using ODI)
                604709
                Interesting article tsuji.
                I tried to remove the xsi: bits in the xml, but still not getting all the attributes from the file.

                Thanks,
                Bjørn
                • 5. Re: XML/XSD question (using ODI)
                  tsuji
                  Hi Bjørn. The sense of the line I quoted for highlighting means only that when you specify the type of the element at runtime within the xml document instance, the ODI parser implementation seems incomplete to handle it in all conceiveable situations. It goes without saying that you cannot simply remove xsi:type as a workaround because it is then an invalid document as per the schema for it.

                  If you have full control of the xml construction, you can keep ProductType as abstract and you have to make BookType or MagazineType referred to elements of different name from that of production. Those elements be of different names can then be declared as a substitutionGroup with the header element "production".

                  But since now you have two different element names that can appear in the xml instance (whereas production cannot appear in it as ProductType is declared "abstract"), this line of reasoning may be material enough to provoke some associate change in the mapping or whatever to map them to the right tables...

                  In other word, at least, I did not mean to suggest you simply remove xsi:type as a possible solution, no. It definitely is not one as it would be an invalid document instance. I would suggest you look into ODI's documentation and the reference tutorial therein the article quoted to check what possible actions you can take to avoid this weak-point in its implementation as of the present (because the schema is fine as such and the xml instance is fine as such, only the implementation is not up to the job to handle that dark corner).