1 2 3 4 Previous Next 86 Replies Latest reply: Nov 19, 2010 3:57 PM by Marco Gralike Go to original post RSS
      • 15. Re: XML file processing into oracle
        783602
        need further information if poss
        • 16. Re: XML file processing into oracle
          BluShadow
          There's probably a way that Mark knows of, but alternatively you could load your XML as you are doing and then Insert it into the schema generated table.
          • 17. Re: XML file processing into oracle
            mdrake-Oracle
            The DefaultTable has to be an XMLType Table for various reasons. The concepts behind the default table iare related to the repository, so in theory you should not need to use the default unless you are using the repository (eg want a file/folder organziation or access paradigm for the XML you are managing). However, for a complex XML schema the DDL to create an XML table or XML Column that uses the same storage model as the defaultTable generated by the Schema Reigstration can get very complex.

            In order to understand this we need to look at how collections are managed, and how recursive structures and very large subtrees are managed.

            The first thing to note is that each repeating element defined in the XML Schema will result in a a VARRAY in the SQL object model.

            There are 2 ways VARRAYS can be stored 2 ways in the database.

            The first is to serialize the contents of the VARRAY and then store the serialzied object as a LOB. Since the VARRAY is serialized any nested VARRAYS are stored in the LOB as the parent.

            The second is store the VARRAY in a seperate table . These tables are referred to as Nested Tables, and under normal circumstances are not visable via SQL, and should never be accessed directly. Each member of the VARRAY becomes a row in the table. If the VARRAYS are nested (VARRAYS contain VARRAYS) a heirarchiy of nested tables is created, one for each nested VARRAY. The structure created here will look very similar to a convential Master-Detail heirarchy. Primary key - Foriegn key relationships and ordering columns, along with the appropriate indexes are automatically created to ensure the relationships between all the rows are maintained.

            The advantage of the LOB model is that all of the data for the VARRAY is stored in a single structure, which minimizes the amount of I/O required to store or retrieve the entire collection, thus improving insert and retrieval times for the collection. Also the DDL requried to create the storage is very simple. The disadvantages of the LOB model are that the entire set of data must be read from disc and bought into memory before any operations can be performed on any of the data in the collection. Also it is not possilbe to effectiively index any of the data contained in the collection.

            The advantage of the NESTED TABLE model is that the database can access rows and columns in the Nested Tables in exactly the same manner as it accesses rows and columns in normal relational tables. Operations which access data in the VARRAY are automatically converted to operations on the underlying storage structure. The data for a simple element (text node) or attriubte can be accessed directly, without having to access any of it parents or siblings. Nested Table storage allows indexes to be created on the values of the simple elements or attributes values. Updates of simple elements or attributes can be done without updating any other content. Operations that involving adding or removing members from a collection can also be optimized. Deleting members from a collecton translates into deleting a row from a nested table. Adding a new member to a collection translates into inserting a row into a nested tables. An XMLTable operation that projects a subset of valuies from a collection is mapped directly into a SQL statements that selects the required columns from the nested tables.

            The disadvantage of the NESTED Table model are 2 fold. First each member of the collection is a seperate row in a nested table. If there are nested collections, each member of the sub colllection generates a row in the approriate child table. This can add overhead to the the process of storing an XMLType, since the model will insert multiple rows in multiple tables compared to storing a single LOB. Also when the entire document is to be retrieved as XML then all of the rows in all of the tables must be read before the XML can be output, again this can be more expensive than the overhead associated with reading a single LOB. The second disadvantage of the NESTED TABLE storage model is the DLL can get complex when there is a deep hierarchy of nested collections.

            For the default table generated by schema registration the decision on whether to use NESTD TABLE or LOB based storage for collections is driven by the annotation storeVarrayAsTable. If set to 'true' the default table will use NESTED TABLE storage model, if set to false it will use LOB based storage model. In 10.2.x and earlier the default for storeVarrayAsTable was false. Eg we used LOB storage for the default table if storeVarrayAsTable was not explicitly supplied in the XML Schema. In 11.1.x and later we switched the default, so NESTED TABLE storage is used by default.

            In both 10.2.x and 11.1.x if a table is created manually, then we default to LOB storage for the collections, regardless of what the XML Schema specifies. The storage model for the XMLType can be overridden by providign explicit STORE VARRAY AS clauses for each of the collections. However as noted before the syntax for these can get really complex if the XML Schema contains lots of nested collections.

            In 11.2.x this changes. The create table statement now follows the directive specified in the XML Schema. Eg if the XML Schema contains an explicit varrayStoreAsTable annotation then the XMLType table or column will honour the annotation. If the XML Schema does not contain an explicit storeVarrayAstable clause if will default to using NESTED TABLES.

            The nice thing about this is that it provides us with an easy way of getting the correct syntax for earlier databases as long as we have access to an 11.2.x instance. We can register the XML schema in 11.2.x, issue the DDL to create the tables that we want to generate and then use the DBMS_METADATA package to get a first cut of the required DDL statements.
            • 18. Re: XML file processing into oracle
              mdrake-Oracle
              Eg
              SQL> create table FOO
                2  (
                3    PK NUMBER(2) PRIMARY KEY,
                4    XML XMLTYPE
                5  )
                6  XMLTYPE COLUMN XML store as OBJECT RELATIONAL
                7  XMLSCHEMA "http://xmlns.example.org/xsd/testcase.xsd" Element "csg"
                8  /
              
              Table created.
              
              Elapsed: 00:00:01.43
              SQL> set pages 0 lines 256 trimspool on
              SQL> column DDL format A256
              SQL> select DBMS_METADATA.get_DDL('TABLE','FOO') DDL from dual;
              
              
                CREATE TABLE "OTN"."FOO"
                 (    "PK" NUMBER(2,0),
                      "XML" "SYS"."XMLTYPE" ,
                       PRIMARY KEY ("PK")
                USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255
                STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
                PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
                TABLESPACE "USERS"  ENABLE
                 ) SEGMENT CREATION IMMEDIATE
                PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
                STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
                PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
                TABLESPACE "USERS"
               VARRAY "XML"."XMLDATA"."ecrminput" STORE AS TABLE "SYS_NTivd+l9nd5WjgQAB/AQAg3w=="
               (( PRIMARY KEY ("NESTED_TABLE_ID", "SYS_NC_ARRAY_INDEX$")
                USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255
                STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
                PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
                TABLESPACE "USERS"  ENABLE)
              PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 LOGGING
                STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
                PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
                TABLESPACE "USERS"
               VARRAY "xml" STORE AS TABLE "SYS_NTivd+l9ne5WjgQAB/AQAg3w=="
               (( PRIMARY KEY ("NESTED_TABLE_ID", "SYS_NC_ARRAY_INDEX$")
                USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255
                STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
                PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
                TABLESPACE "USERS"  ENABLE)
              PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 LOGGING
                STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
                PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
                TABLESPACE "USERS" ) RETURN AS LOCATOR) RETURN AS LOCATOR
                XMLTYPE COLUMN "XML" XMLSCHEMA "http://xmlns.example.org/xsd/testcase.xsd" ELEMENT "csg" ID 5566
              
              
              Elapsed: 00:00:04.58
              Which can be trimmed down to
              SQL> CREATE TABLE "OTN"."FOO1"
                2  (
                3    "PK" NUMBER(2,0),
                4    "XML" "SYS"."XMLTYPE" ,
                5    PRIMARY KEY ("PK")
                6  )
                7  VARRAY "XML"."XMLDATA"."ecrminput" STORE AS TABLE "ECRMINPUT_TABLE"
                8  (
                9    (
               10      PRIMARY KEY ("NESTED_TABLE_ID", "SYS_NC_ARRAY_INDEX$")
               11    )
               12    VARRAY "xml" STORE AS TABLE "XML_TABLE"
               13    (
               14      (
               15        PRIMARY KEY ("NESTED_TABLE_ID", "SYS_NC_ARRAY_INDEX$"))
               16      ) RETURN AS LOCATOR
               17  ) RETURN AS LOCATOR
               18  XMLTYPE COLUMN "XML" XMLSCHEMA "http://xmlns.example.org/xsd/testcase.xsd" ELEMENT "csg"
               19  /
              
              Table created.
              
              Elapsed: 00:00:00.43
              SQL>
              Which should run in 10.2.x and later...

              This is examples was not particularly complex, however if there are large numbers of collections the number and complexity of the STORE VARRAY AS TABLE clauses will increase correspondanly
              • 19. Re: XML file processing into oracle
                mdrake-Oracle
                BTW the psuedo column XMLDATA can be used to load data in an XMLType table in SQLLOADER

                eg
                C:\xdb\examples\sqlLoader>cat loadObjectTable.ctl
                load data
                infile 'filelist.dat'
                append
                into table PURCHASEORDER
                xmltype(XMLDATA)
                (
                 filename filler char(120),
                 XMLDATA  lobfile(filename) terminated by eof
                )
                • 20. Re: XML file processing into oracle
                  mdrake-Oracle
                  BTW I will be presenting a paper on this topic and how to manage really large complex XML Schemas at OOW this year...
                  • 21. Re: XML file processing into oracle
                    783602
                    Hi Mark,

                    Thanks for all your help on this.

                    Am still having issues loading the xml file into the generated table (from the shema registration). I can create a table:

                    create table xml_holding_table (xml_data xmltype) and use:

                    OPTIONS(DIRECT=TRUE, ERRORS=50, PARALLEL=FALSE)
                    LOAD DATA
                    INFILE 'infile.dat'
                    BADFILE 'xml_load.bad'
                    DISCARDFILE 'xml_load.discard'
                    TRUNCATE
                    INTO TABLE xml_talent_customer
                    xmltype(XML_DATA)
                    (
                    filename FILLER TERMINATED by '\n',
                    XML_DATA LOBFILE(filename) TERMINATED BY EOF
                    )

                    which loads the xml document in fine. But if I try to load the data in to the generated table:

                    SQL> desc xml_talent_customer
                    Name Type Nullable Default Comments
                    --------------- ----------- -------- ------- --------
                    SYS_NC_ROWINFO$ SYS.XMLTYPE Y

                    using ....

                    OPTIONS(DIRECT=TRUE, ERRORS=50, PARALLEL=FALSE)
                    LOAD DATA
                    INFILE 'infile.dat'
                    BADFILE 'xml_load.bad'
                    DISCARDFILE 'xml_load.discard'
                    TRUNCATE
                    INTO TABLE xml_talent_customer
                    xmltype(XMLDATA)
                    (
                    filename FILLER TERMINATED by '\n',
                    XMLDATA LOBFILE(filename) TERMINATED BY EOF
                    )

                    i get al load of these errors in the log file:

                    Conversion Error on row 0 column 3 in table "SYS_NT9zzSRr7XQYG5rSTpcvN+Gw=="
                    ORA-26007: invalid value for SETID or OID column

                    Conversion Error on row 1 column 3 in table "SYS_NT9zzSRr7XQYG5rSTpcvN+Gw=="
                    ORA-26007: invalid value for SETID or OID column

                    Conversion Error on row 2 column 3 in table "SYS_NT9zzSRr7XQYG5rSTpcvN+Gw=="
                    ORA-26007: invalid value for SETID or OID column.

                    Again, any ideas??

                    BTW whats the OOW and where is it??

                    Cheers,

                    Dan.
                    • 22. Re: XML file processing into oracle
                      783602
                      Would like to say one more thing but am sure there will be more :)

                      How will sql loader handle fairly large xml files? as part of the initial refresh i need to load about 6 million records all of which will have about 40 <xml> elements. Will this be handled ok by sql loader?

                      Thanks Again.
                      • 23. Re: XML file processing into oracle
                        mdrake-Oracle
                        Are we talking one large document or 6 million seperate docs
                        • 24. Re: XML file processing into oracle
                          783602
                          Yeah sorry it will be one large document with about 6 million records (parent nodes) in. Will this be ok to load into one xmltype field and will there be any major deterioration in performance when trying the select out the individual node information??
                          • 25. Re: XML file processing into oracle
                            Marco Gralike
                            BTW whats the OOW and where is it??
                            From time to time one forgets that there are normal people out there as well ;-)

                            @Dan

                            "OOW" stands for "Oracle Open World" and is the biggest IT conference held each year in San Francisco / USA (http://www.oracle.com/us/openworld/index.htm) world wide AFAIK.
                            I guess this year, combining JavaOne, Oracle Develop and Oracle Open World, approx. attending 40/50.000 attendees........

                            Edited by: Marco Gralike on Jul 14, 2010 9:42 PM
                            • 26. Re: XML file processing into oracle
                              NSK2KSN
                              Can you please let me know how schema can be changed for different requirement,
                              • 27. Re: XML file processing into oracle
                                mdrake-Oracle
                                Dan

                                One big file will probably NOT go well through SQL Loader... How big is the file, which DB Version are we targetting here..

                                Do you care about the outer most Tag.. It appears that csg is simply a wrapper around a collection of ecrminput documents. Is this correct. If so do you care about which of the ecrminput tags came from which csg file...

                                If you are not concerned about the csg tag I would consider using the Sax Loader approach. This basically uses a Java program to extract and insert each ecrminpuit document. It can be run in parallel and will allow you to load the document very quickly. However it will not track which ercminput nodes came from which csg document...

                                If you need to keep the csg document intact then we will need to look at FTP. FTP (in 10.2.0.4.0 and later) contains a fast path optimization for inserts that can load documents of this kind very rapidly, and with significant less memory usage.To enable the fast path you add the attribute blockDefault="#all" to the schema tag of your XML schema and re-register it. You then enable the FTP server using DBMS_XDB.setFTPPOrt() and off you go.

                                -Mark

                                BTW Warner Music Group this approach to load documents up to 4.3 GB, with a much more complex structure than the one disucssed here, and they will be presenting a paper on how this is working out for them, and how 11.2.x and improved performance over 10..2.0.4.0 at OOW.

                                Edited by: mdrake on Jul 15, 2010 7:26 AM

                                Edited by: mdrake on Jul 15, 2010 7:28 AM

                                Edited by: mdrake on Jul 15, 2010 9:01 AM
                                • 28. Re: XML file processing into oracle
                                  mdrake-Oracle
                                  Can you explain what you mean. Are you asking how to create an XML Schema for a differennt kind of instance document ?. If so have a look at XMLSPy from Altova
                                  • 29. Re: XML file processing into oracle
                                    Marco Gralike
                                    A SAX Loader example "howto", based on Marks code somewhere here on this forum, but also for historical reasons on my blog (getting old re-inventing stuff), see here:

                                    http://www.liberidu.com/blog/?p=473

                                    HTH

                                    M
                                    1 2 3 4 Previous Next