1 2 3 4 5 6 Previous Next 86 Replies Latest reply on Nov 19, 2010 9:57 PM by Marco Gralike Go to original post
      • 60. Re: XML file processing into oracle
        Marco Gralike
        No worries.

        Indeed they are referencing index organized tables and the "NT" objects are nested table objects. IOT's are the fastest XMLType OR storage based solution (regarding SELECT). In Oracle 11g they are replaced by the default BTree (balanced tree) index based solution, mainly due to the fact that its more flexible regarding, among others, parititioning. IOT's, as Mark have ensured me and I checked of course :-) , are the most effecient index solution regarding the minimum of I/O's. If you want to use it in Oracle 11g, you will have to explicitly set this via (by hart) the OPTION parameter in DBMS_XMLSCHEMA.

        But less physical objects, is most of the time, fastest. So if you can avoid (IOT) object creation, Oracle doesn't have to hit segment headers (=I/O's) in indexes, heap tables or IOT's and such...but missing an index were one should have been created will cost you performance. Thats one of the reasons you should annotate your xml schema via xdb:annotations if using XMLType Object Relational Storage. Because this can be errorprone and/or very tedious work, the Oracle XMLDB Dev team supplied a xdbutilities PL/SQL package set to make life easier.

        M.
        • 61. Re: XML file processing into oracle
          730119
          Hi,

          I'm facing the same challenge of loading 10 Mio records (9GB) on a weekly base into whatever xml structure. I already read the xmltype storage use case pdf
          and most probably we need a object relational storage.

          I also checked out the sax parser loader and would like to ask the audience if this tool is still a valid choice?
          I'm just asking because it is from 2003 and maybe was written because of xmldb or sqloader restrictions from that time?
          Maybe sqlloader is 7 years later the better option?
          Is it really (since nowadays) a vaild option or should I only use it if I have to parse the input file for specific elements?
          (We get one big file as input)

          Regarding the distinction between "big document" or many elements in the file I've also a question:
          From the data deliverers point of view we get one big document (one top level with 10 Mios elements inside), but from our systems point of view
          we are only interested in the 10 Mio elements.
          <Rootelement>
          <importantElement>
          ...
          </importantElement>

          ... 10 mio more ...

          <importantElement>
          ...
          </importantElement>

          </Rootelement>

          If I take the xsd as delivered and register that schema with gentab=true oracle creates several types and tables.
          If I load the file I got one row in that top level table (one root element) with 10 mio importantElements in that row - is this
          a good way doing it or should I try to use the sax parser and only load the importantElements (which will give me 10 mio rows in target table)?
          Or is there a way to tell sqlloader to skip the top level element?

          MDrake mentioned the Warner Music Group load approach - can I get this paper anywhere?

          Thanks a lot for your advice
          Chris
          • 62. Re: XML file processing into oracle
            MichaelR64
            Hi,

            I have some questions about this to, regarding sizes and the way to do it.
            I know about XML, DOM and SAX parser. Just not in relation to Oracle db.
            I am experienced in OWB and must now get an XML file the db into 4 sep. tables.
            I used StylusStudio to create an XQuery and also to test if all would fit.
            (Ofcourse that works.)

            My file is 7Gb big, needs to be split up and end up in 4 realtional tables.
            The main element occurs 1,2 million times . This one has about 10children nodes, so another table will hold 10million rows.
            The rest is small stuff.
            After processing we don't need the xml anymore, its just a carrier for the data.
            i am working on 11gR1 but we are going to 11gR2 very shortly.

            Now what i've managed to do sofar is this:
            -use an external tabel to load my xmlfile in a CLOB.
            -Create a INSERT SELECT to split up the xmlfile, like so:
            Insert into Table1
            (filename, aanvraagnummer)
             select t.clob_filename, x.aanvraagnummer
            from EXPORT_XT t
               , xmltable(
                         XMLNAMESPACES (DEFAULT 'urn:'  ),
                         '/Export/Content/aanvraag'
                         passing XMLTYPE(t.XML_DATA)
                         columns aanvraagnummer varchar2(100) path '/aanvraag/aanvraagnummer'
                         ) x;
            This one uses XMLTYPE to convert the CLOB in the t.XMLDATA field to an XMLTYPE.
            This works for small files, but errors on big files.(it complains about end-elemnt tag doesn't match the start-element)
            I checked the file and it is ok. In fact i used Stylus Studio to push the same file in the db.

            Here are the questions:
            -Am i right in assuming that the function XMLTYPE create a DOMdocument tree in memory ? the xml dev guide is very vague about differentiation between DOM functions and SAX functions.
            -If so what are its limitations ?
            -Would it make a difference if i uploaded the xml file as an xmltype ? I read somewhere that it is stored as a clob anyhow..
            That would mean that the passing statement doesn't need a XMLTYPE conversion in it.

            -If this DOM stuff is the reason causing my sql script to error what options do i have then for handling such a large file?
            I read earlier in the thread about creating a java package with a sax parser in it as an answer for handling large files. Does this mean that Oracle does not have any SAX tooling or functions in the XML part of the database ?
            That would seem a serious omission. The problem is not getting the file in the db, just finding the functions to easily manage the data it seems. I thought that there would be SQL functions availably just like xmltable /xmltype that could do this.

            Since my project is OWB centric i need something as simple possible and one that makes use of standard database stuff as much as possible(if not then i would still like a solution :-) ).
            If i read it correctly the sax parser stuff creates a separate program just like a standalone utility.
            If that's the case i would rather have something inside of the database.

            I would be very pleased if someone could shed some light here.

            Edited by: MichaelR64 on 19-nov-2010 22:10
            • 63. Re: XML file processing into oracle
              mdrake-Oracle
              You file is being reported as not well formed since it is over 4GB. The only way to handle files of this size without resorting to external Java processes is using Object Realational storage.

              You will need to obtain / create an XML Schema that reflects your XML document.

              Then you will need to register it with the database

              Finally you will need to use FTP to upload a file of this size...

              If you post the XML Schema I can assist with some minor mods that will be required to make this work..
              • 64. Re: XML file processing into oracle
                mdrake-Oracle
                SQLLDR has not been enhanced to process XML files of this type...

                If you want to use SQLLDR you would need to preprocess the XML file you are provided and create 10 Million smaller files and a SQLLIDR file list that would process all of the files generated by the preprocessor..

                The SAX loader is still probably the best approach for this problem...

                if you go down the registeredSchema route, for documents bigger than 1GB you will probably need to use FTP to load the documents.

                -Mark
                • 65. Re: XML file processing into oracle
                  Marco Gralike
                  >
                  Here are the questions:
                  -Am i right in assuming that the function XMLTYPE create a DOMdocument tree in memory ? the xml dev guide is very vague about differentiation between DOM functions and SAX functions.
                  It depends on which XMLTYPE you use and where (in memory or on disk, that is table/column). You got "five" options:

                  - xmltype in memory
                  - xmltype using a CLOB storage model
                  - xmltype using a Object Relational storage model
                  (- xmltype using a Object Relational storage model which also contains parts that are based on the xmltype using a CLOB storage model - the so called "hybrid" OR/CLOB combined storage option)
                  - xmltype using a Binary XML storage model

                  and XMLType in memory or used in combination with those storage models will get optimized as much as possible regarding performance/handling while making optimal use of the provided information about the XML content via, for example information available via an XML Schema or the way it is stored like for example Schema less Binary XML storage. For non optimal handling, it will result in doing it the standard way, that is via DOM or infoset based methods, eg. the standard XML parser way.
                  -If so what are its limitations ?
                  So this depends on the options you choose regarding the XMLType use case
                  -Would it make a difference if i uploaded the xml file as an xmltype ? I read somewhere that it is stored as a clob anyhow..
                  No, it is not CLOB anyhow... It is CLOB, by default, if you don't supply Oracle with extra information. By the way, the default from 11.2.0.2.0 and onwards is BINARY XML and not CLOB anymore.
                  That would mean that the passing statement doesn't need a XMLTYPE conversion in it.
                  There are big differences between CLOB and XMLTYPE. The most easiest one is that XMLTYPE will always be checking regarding its content if is XML, that is, XML must be follow the minimal rules of being "well formed". For example, it must have only 1 root element and elements must be defined with the proper opening and closing tags. Just like NUMBER data is checked that it complies to being a NUMBER (aka not a string, not a date, not a...). The CLOB datatype is a character large object...nothing more, nothing less.
                  -If this DOM stuff is the reason causing my sql script to error what options do i have then for handling such a large file?
                  See Mark's remarks...
                  I read earlier in the thread about creating a java package with a sax parser in it as an answer for handling large files. Does this mean that Oracle does not have any SAX tooling or functions in the XML part of the database ?
                  That would seem a serious omission. The problem is not getting the file in the db, just finding the functions to easily manage the data it seems. I thought that there would be SQL functions availably just like xmltable /xmltype that could do this.
                  SQL handles with relational content
                  XQuery, XPath, XSLT handle with XML content


                  Hope to be of help - Ik hoop dat je hier iets verder mee komt

                  :-)

                  Marco

                  BTW see also:

                  *"Oracle XML DB : Choosing the Best XMLType Storage Option for Your Use Case (PDF) Jan 2010"*
                  (http://www.oracle.com/technetwork/database/features/xmldb/index.html)

                  or

                  "Oracle - XMLType Storage Options"
                  (http://www.liberidu.com/blog/?p=203)

                  for a quick introduction into the material/options described.

                  More information can be found in the Oracle XMLDB Developers Guide, for example, within this guide search for "XOB" to get an idea about the methods possible to optimize your use-case
                  http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16659/toc.htm

                  Edited by: Marco Gralike on Nov 19, 2010 6:31 PM
                  • 66. Re: XML file processing into oracle
                    MichaelR64
                    Thx for the speedy response !
                    >
                    You file is being reported as not well formed since it is over 4GB. The only way to handle files of this size without resorting to external Java processes is using object Realational storage.
                    You will need to obtain / create an XML Schema that reflects your XML document.
                    Then you will need to register it with the database
                    Finally you will need to use FTP to upload a file of this size...
                    If you post the XML Schema I can assist with some minor mods that will be required to make this work.
                    >

                    So to put it in a different way:
                    - over 1Gb use FTP to upload it
                    - over 4Gb use Relational storage and use XML Schema

                    But what about the processing? What functions can i use XMLTable etc. to manipulate the data or something else ?
                    • 67. Re: XML file processing into oracle
                      MichaelR64
                      >
                      SQLLDR has not been enhanced to process XML files of this type...
                      If you want to use SQLLDR you would need to preprocess the XML file you are provided and create 10 Million smaller files and a SQLLIDR file list that would process all of the files generated by the preprocessor..
                      The SAX loader is still probably the best approach for this problem...
                      if you go down the registeredSchema route, for documents bigger than 1GB you will probably need to use FTP to load the documents.
                      >

                      This puzzles me: external tables and sqlloader is used all the time to load files several Gb's big without any problem.
                      You're telling me to create files not longer than a 1Kb and use SQLLoader to upload this in the db ?
                      Creating 10Million files on a disk ...... ?
                      I can understand there is a limit to what sqlloader can process but 1Kb can not be it ....
                      In fact i had a 6Mb file for testing purposes and that worked already.

                      Imo the problem arises when the XMLType function , used in the xmltable subquery mentioned above, adresses the CLOB and then tries to create a DOMdocument for the entire file since DOMdocuments are generally being used when exposing XPath traversal.
                      DOMdocuments are usually created in memory, therefore are also prone to crashing when faced with very large documents.
                      Since XPath traversal is "offered" as a means to traverse the datamodel provided the dataprovider, in this case the xmltype function used here in the subquery, has to be able to provide the entire dataset for querying. It can only do that when it loads the entire dataset in memory.
                      The entiredataset here is the CLOB file.

                      Edited by: MichaelR64 on 19-nov-2010 20:09
                      • 68. Re: XML file processing into oracle
                        Marco Gralike
                        Ehhh, yes|no

                        :-)

                        What Mark mend was:

                        because your file is so big it can't be fit "as is", as one big block, into memory and will be chopped off at the end, after 4 Gb > hence the "its not well formed error".

                        By making use of the different storage options, in this case XMLType Object Relational storage, for which it is mandatory to work on the basis of an XML Schema, you can, if you got one:

                        - Register the XML Schema and out of the box, with a bit of optimization while using xdb:annotations, create the table and all its other dependent objects
                        - If you now use FTP to get your XML in, it will be chopped/shredded in sections, and therefore the overall memory consumption is not that big, which will insert the needed data in one go into the created table/objects in step one.

                        The 4 Gb size limitation is only limited for a 1 file/document, if you use a workaround or in this case a different approach, you are able to avoid this current limitation.
                        But what about the processing? What functions can i use XMLTable etc. to manipulate the data or something else ?
                        Can you describe what you want to achieve...? Thx

                        Edited by: Marco Gralike on Nov 19, 2010 8:14 PM
                        • 69. Re: XML file processing into oracle
                          Marco Gralike
                          Imo the problem arises when the XMLType function , used in the xmltable subquery mentioned above, adresses the CLOB and then tries to create a DOMdocument for the entire file since DOMdocuments are generally being used when exposing XPath traversal.
                          DOMdocuments are usually created in memory, therefore are also prone to crashing when faced with very large documents.
                          You're correct, if you use XMLTABLE or any other xml operator on a structure that is CLOB or XMLType CLOB based, than Oracle has almost no way to optimize this via a more efficient way, therefore the reason that the XMLType CLOB storage model is not the default one anymore in database version 11.2.0.2.0 and onwards and in general, regarding those both options (CLOB / XMLType CLOB) should be avoided at all costs. There are only a limited amount of use-cases which would be appropriate to use CLOB or XMLType CLOB anyway. Most of the time, this is not the case. If you use CLOB or XMLType CLOB, your request will be almost always be handled in memory, via the traditional XML way of solving this, so via DOM.

                          In general using one of these two, CLOB or XMLType CLOB, is asking for problems while dealing with XML in the database. You could even say that, if you use CLOB, that it isn't an XML(DB) issue. If you want to handle XML as XML use XMLType (preferably not the XMLType CLOB storage model). If you want to handle XML as a string (CLOB / character large object) you will have to write your own string parser or handle CLOB as is used to in the traditional way using the kinds of data.

                          Edited by: Marco Gralike on Nov 19, 2010 8:25 PM
                          • 70. Re: XML file processing into oracle
                            MichaelR64
                            >

                            What Mark mend was:

                            because your file is so big it can't be fit "as is", as one big block, into memory and will be chopped off at the end, after 4 Gb > hence the "its not well formed error".
                            >
                            Yeah sort of the same as what i said: it tries to load it in memory because it wants to generate a DOMdocument but fails because a shortage of memory.

                            >
                            - Register the XML Schema and out of the box, with a bit of optimization while using xdb:annotations, create the table and all its other dependent objects
                            - If you now use FTP to get your XML in, it will be chopped/shredded in sections, and therefore the overall memory consumption is not that big, which will insert the needed data in one go into the created table/objects in step one.
                            >
                            So when you create a xmltable and use a schema in conjunction with the xmlfile the db will chop the data in sections?
                            What do you mean with this ? Is it still in one table ? Or multiple partitions ? At what level does it chop it in sections ?
                            I know my way around xml but how Oracle handles xml has me puzzled sometimes.

                            >
                            Can you describe what you want to achieve...? Thx
                            >

                            Well it is starting to get a bit clearer..
                            We are getting a large xmlfile 7Gb and it will increase later on.
                            We need to split it into 4 relational tables, the xmlfile itself can then be disposed of. It is just a carrier of relational data to us.
                            The most efficient way to process this is a SAX parser.
                            Since we want to control it all from OWB the next question is wether or not the db has an built-in SAX parser that we can use.
                            That way we don't need jdbc drivers, passwords etc.

                            So the solution you wrote about on your blog; does that operate independtly from the db or ...?
                            Can you explain a bitmore about it ?
                            (Just a few more minutes to TVOH ...:-) )
                            • 71. Re: XML file processing into oracle
                              Marco Gralike
                              Well it is starting to get a bit clearer..
                              We are getting a large xmlfile 7Gb and it will increase later on.
                              We need to split it into 4 relational tables, the xmlfile itself can then be disposed of. It is just a carrier of relational data to us.
                              The most efficient way to process this is a SAX parser.
                              Since we want to control it all from OWB the next question is wether or not the db has an built-in SAX parser that we can use.
                              That way we don't need jdbc drivers, passwords etc.

                              So the solution you wrote about on your blog; does that operate independtly from the db or ...?
                              Can you explain a bitmore about it ?
                              (Just a few more minutes to TVOH ...:-) )
                              In principal what you have to remind yourself while handling XML via XMLType is that it shields the way storage is handled from the user / process, just like using a view can shield the actual table, column content from the business logic and methods. If you have seen the pictures mentioned in the blog (most are XMLDB Development Guide base btw), than your "entry point" is "just a table" regarding the use of XMLType Object Relational storage. How Oracle shreds/chops of the whole into smaller pieces is actually not that important besides the fact that that 1 big file has been divided into smaller ones. That small that the Oracle Optimizer can do smart, traditional relational, stuff with it, without going in all this complete tree traversal stuff, that is needed in the traditional XML way of handling stuff. And therefore more efficient. If properly optimized, as fast as the standard way of doing things in a relational database...

                              I can't tell it this is the most efficient way in your architecture. I once used this XMLType OR structure to chop off stuff into smaller parts so it could be checked and analyzed more efficient before the values of this XML document were shipped into the customer Oracle workflow propriety database environment. The stored and analyzed XML in this XML document was also put into a CLOB for historical reasons and after the data was in Oracle Workflow and the CLOB data was safeguarded, the initial used XMLType Object Relational table was truncated to prevent performance issues. So a bit, maybe, like your use-case. But I can't say. It could be more efficient to use an SAX parser and insert de needed data directly into your relational environment/tables.
                              • 72. Re: XML file processing into oracle
                                Marco Gralike
                                (Just a few more minutes to TVOH ...:-) )
                                That's a good one ;-)

                                I did my share for today; its time for the latest "Big Bang Theory" episode and some coffee.
                                • 73. Re: XML file processing into oracle
                                  mdrake-Oracle
                                  Michael

                                  We are not going to able to process a single 7GB+ file inside the database without using OR storage. I agree that the file into millions of little files is probably not practicule. However if you WANT to use SQLLDR that is the ONLY way to do it today, given the current state of XML support in SQLLDR. So we will assume SQLLDR is NOT an option.

                                  So your choices are

                                  1. Use the SAX Loader Java program to process the file and load the items of interest into 4 seperate XMLType tables... (This may require some modification to the SAX Loader Code., I think the current version may be restricted to process one type of element, but this would be a simple modification).

                                  2. See if the problem can be solved using OBJECT-RELATIONAL (OR) storage model. OR has been proven to work for this kind of problem on documents up to 13GB. OR is predicated on having an XML Schema that describes the data you want to process. OR works by storing the XML in one or more tables (Each member of a collection becomes a row in a nested table when the object is stored). When XML is ingested a PULL Parser is used to read the XML and the collection elements are flushed to disc as they are completed, avoiding the use of DOM or other structures that can require large amounts of memory). For certain arcane technical reasons this process is more effecient when the documents are loaded using FTP or HTTP than when they are loaded via a SQL operation, so you will probably have to use FTP in order to be successful.

                                  When we query XML that has been stored this was we are able to rewrite certain types of XQuery operation onto the underlying storage, so we not need to reconstruct the XML docuemnt in order to produce the results of the XQUERY operation. Since you are populating existing relational tables with Scalar values extracted from your XML it's likeky that the final approach to this problem will be to use XQuery and XMLTable to create relational views over the XML Content, and then use insert as select or create table as select to migrate the data from the tables underlying the XML into the relational tables.

                                  Let's see if we can work your example with OR...

                                  Do you have / can you create an XML Schema that accurate describes the data you are looking to process....

                                  -M
                                  • 74. Re: XML file processing into oracle
                                    mdrake-Oracle
                                    TVOH : Acronym Translation please (I'm assing it's a varient of POETS day or TGIF) :)

                                    Edited by: mdrake on Nov 19, 2010 11:59 AM