This discussion is archived
14 Replies Latest reply: Mar 13, 2013 7:23 AM by Jason_(A_Non) RSS

Parsing an XML file to a table

703519 Newbie
Currently Being Moderated
Hi,

I am looking at a scenario where right now we have a daily data feed that is an xml file that is approx between 1 to 2 GB in size. Using Java code, this file is parsed and one by one, the records are inserted into the DB (Oracle 10.2). But as the data feed is growing larger, so is the file size -- which is resulting in more time taken for this process to run.

I do realize that the ideal scenario would be to stop using the file as a source of feed. But this is beyond our control.

I looked at using sqlLoader but as I understand it requires a csv based file. In my case, I would have to convert the xml file to csv, which again might be a huge operation to be carried out daily. (Edit: Am I wrong? can we use Sql Loader here?)

Thus I am exploring options if there could be a way to write a PL/SQL procedure that would parse after taking this xml file as an input and load the data into a table. Any suggestions/directions will be hugely appreciated.

Thanks,
Ak

Edit:
I also browsed through a few questions asked previously here and I came across this: Import Large XML File to Table
My situation is similar and as per the suggestion here by A_Non, SECUREFILE BINARY XML storage can be used. But the problem I have is that I am on Oracle 10.2 while this option is only available in a later version, if I am right.

Edited by: Aditya Kumar on Nov 21, 2012 10:49 PM

Edited by: Aditya Kumar on Nov 22, 2012 12:20 AM
  • 1. Re: Parsing an XML file to a table
    odie_63 Guru
    Currently Being Moderated
    Hi,

    Indeed you can't use Binary XML in your version.
    Which version is it BTW? Give the full number please :
    SELECT * FROM v$version;
    On 10.2, I think the best option is to use Object-Relational storage.
    For that, you need an XML schema describing your XML files, do you have one?

    If you don't, you can build one, it may be relatively easy depending on the complexity of the document.
    Could you post a sample XML as well?

    Once you have the schema, register it in the database and create a schema-based XMLType table (that step can be done automatically).
    Load the file in the table, preferably via the XML DB repository (FTP protocol).
    Then you can read and insert the data from the XMLType table to your target table with a simple INSERT ... SELECT.

    I suggest you invest some time in reading those :
    XML DB Dev Guide : http://docs.oracle.com/cd/B19306_01/appdev.102/b14259/xdb03usg.htm#sthref230
    XML DB FAQ : XML DB FAQ

    If you need further help, please give additional details (exact db version, sample XML, target table structure, mapping between XML elements/attributes and target columns etc.).
  • 2. Re: Parsing an XML file to a table
    703519 Newbie
    Currently Being Moderated
    Thanks a lot for your reply Odie.

    Exact Oracle version is Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi

    A couple of doubts here:

    1. The file size would range 1-2 GB (and can go beyond) Is SQLLoader also an option feasible in this scenario?
    2. The link you mentioned has this line: Storing the XML document in Oracle Database using Binary File (BFILE) -- How is this different from Binary XML in 11g? I am confused.
    3. With respect to storage space usage, should we be concerned about the storage space/ memory that the operation of loading the file in this (temporary) table will take? Because right now, the xml feed file is residing in the webserver where java code iterates through it and inserts each record in the table. Now this will change and technically this file will have to go in the DB box.


    We do not have the xsd for it as right now we retrieve a few selected tags from this xml, parse and store it in a table (columns mapped to tags). Below is the sample xml. This is one such record and there are thousands below it. I had to change things because of data confidentiality but it mimics the original.

    <?xml version="1.0" encoding="windows-1250"?>
    <Involvements xmlns="com/xyz/us/abc/v0/ijkdatatypes" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <ijk:Involvement xmlns:ijk="com/xyz/us/abc/v0/ijkdatatypes">
    <ijk:PCenterData>
    <ijk:HPCenterData>
    <ijk:HPCenter/>
    <ijk:HPCName/>
    </ijk:HPCenterData>
    </ijk:PCenterData>
    <ijk:BillableWBSCode>00000000</ijk:BillableWBSCode>
    <ijk:EDescription/>
    <ijk:InvolvementType>0</ijk:InvolvementType>
    <ijk:ProductData>
    <ijk:ICode/>
    <ijk:Product/>
    <ijk:ProductName/>
    </ijk:ProductData>
    <ijk:PartnerData>
    <ijk:BPartnerData>
    <ijk:BPartnerNumber/>
    </ijk:BPartnerData>
    <ijk:PPartnerData>
    <ijk:PPartnerNumber/>
    </ijk:PPartnerData>
    <ijk:LeadPartnerData>
    <ijk:LeadCSPNumber/>
    </ijk:LeadPartnerData>
    </ijk:PartnerData>
    <ijk:ManagerData>
    <ijk:BManagerInformation>
    <ijk:BManagerNumber/>
    </ijk:BManagerInformation>
    <ijk:PerformanceManagerInformation>
    <ijk:PerformanceManagerNumber/>
    </ijk:PerformanceManagerInformation>
    </ijk:ManagerData>
    <ijk:InvolvementDateData>
    <ijk:InvolvementstartDate/>
    <ijk:InvolvementTerminationDate/>
    </ijk:InvolvementDateData>
    <ijk:HostRegion/>
    <ijk:WBSD>
    <ijk:WBSC>00000000</ijk:WBSC>
    <ijk:WBSDescription/>
    <ijk:ContractCode/>
    <ijk:ContractLine/>
    <ijk:ContractLineDescription/>
    </ijk:WBSD>
    <ijk:PDIndicator>false</ijk:PDIndicator>
    <ijk:CompanyCode/>
    <ijk:ClientCode/>
    <ijk:ContractCode/>
    </ijk:Involvement>
         .
         .
         .
    </Involvements>
  • 3. Re: Parsing an XML file to a table
    odie_63 Guru
    Currently Being Moderated
    1. The file size would range 1-2 GB (and can go beyond) Is SQLLoader also an option feasible in this scenario?
    Yes, you can use SQL*Loader too, but it's likely you'll achieve a better throughput with the FTP approach.
    2. The link you mentioned has this line: Storing the XML document in Oracle Database using Binary File (BFILE) -- How is this different from Binary XML in 11g? I am confused.
    Those are two different, unrelated things.
    BFILE is an Oracle datatype, basically it's a pointer to an external file in the OS filesystem.
    3. With respect to storage space usage, should we be concerned about the storage space/ memory that the operation of loading the file in this (temporary) table will take? Because right now, the xml feed file is residing in the webserver where java code iterates through it and inserts each record in the table. Now this will change and technically this file will have to go in the DB box.
    What method are you currently using BTW? SAX parsing?
    How long does it take to load a file in the DB? How much improvement are you expecting?

    With the (temp) table, the memory footprint should be reduced, both for the loading process and the storage (only the data is stored).
    We do not have the xsd for it as right now
    OK.
    The structure is simple, it should be easy to build one then.
  • 4. Re: Parsing an XML file to a table
    703519 Newbie
    Currently Being Moderated
    Odie,

    Thanks for your response. Right now we are not in a position to implement this solution rightaway because of some constraints but as and when this happens I will update it here.

    Excellent to have this conversation with you here.

    À votre bon cœur !

    ak

    Edited by: adityeah on Nov 23, 2012 1:09 PM
  • 5. Re: Parsing an XML file to a table
    703519 Newbie
    Currently Being Moderated
    Hi Odie,

    Today I tried to do things as per your suggestion. Here's a brief:

    1. Generated the schema for my xml data (via a free online tool).
    2. Loaded the schema using createResource and registerSchema.
    3. Loaded the xml data via createResource.

    While doing these steps I did not encounter any error. However when I go to check if the data is present in the default table I see that it is not there. Looking at this link: The Schema Registration and Schema Location Hint Thread It is mentioned that, "There are two reasons why the content of a Schema Based XML document is not accessable via deafult table. The first is that the document is protected with an ACL that forbids the user searching for the document from seeing it. The second is that the document was not recognized as being a member of the class defined by the XML Schema and so was stored as non-schema-based in the XML DB repository rather than as schema based XML in the default table."

    The way I loaded the xml is:

    DBMS_XDB.createResource('/public/demo/xml/myData.xml',
    BFILENAME('XMLDIR','myData.xml'),
    nls_charset_id('AL32UTF8'));

    I am also able to see the xml when I go to http://localhost:8080/public/demo/xml/

    But when I query the default table, I get no results. Also, just a FYI, select * from V$VERSION returns:

    Oracle Database 10g Express Edition Release 10.2.0.1.0 - Product
    PL/SQL Release 10.2.0.1.0 - Production
    CORE 10.2.0.1.0 Production
    TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
    NLSRTL Version 10.2.0.1.0 - Production


    Can you please suggest something. I have spent the whole day today and it is out of despair that I am asking you again. Would really appreciate a response soon.

    Thanks!

    Aditya
  • 6. Re: Parsing an XML file to a table
    odie_63 Guru
    Currently Being Moderated
    Hi,
    "There are two reasons why the content of a Schema Based XML document is not accessable via deafult table. The first is that the document is protected with an ACL that forbids the user searching for the document from seeing it. The second is that the document was not recognized as being a member of the class defined by the XML Schema and so was stored as non-schema-based in the XML DB repository rather than as schema based XML in the default table."
    You're probably in the 2nd scenario here, i.e. the XML document is not schema-based.

    There are different ways to declare a document schema-based, but first I'd like to see what you're doing exactly, so could you post both the schema and the sample XML you're using?
  • 7. Re: Parsing an XML file to a table
    703519 Newbie
    Currently Being Moderated
    Odie,

    So could it be that the schema is not right then?

    Is there a way I can probably email you those files? Or any way I don't have to post them here?

    [edit: I'm a little concerned about the confidentiality hence would really appreciate if I could email them to you]

    Edited by: adityeah on Nov 24, 2012 11:51 PM
  • 8. Re: Parsing an XML file to a table
    odie_63 Guru
    Currently Being Moderated
    So could it be that the schema is not right then?
    Or the XML file.
    Is there a way I can probably email you those files? Or any way I don't have to post them here?
    mb[dot]perso[at]wanadoo[dot]fr
  • 9. Re: Parsing an XML file to a table
    703519 Newbie
    Currently Being Moderated
    Thanks! sent you email!
  • 10. Re: Parsing an XML file to a table
    odie_63 Guru
    Currently Being Moderated
    OK, it's a minor issue.
    As foreseen, the XML document is not recognized as a schema-based instance when the resource is created in the repository.

    There are three ways to handle this situation :

    1) Add an xsi:schemaLocation attribute in the root element (Engagements) in the form xsi:schemaLocation="<target namespace> <url of the registered schema>" :
    <Engagements xmlns="com/.../edcdatatypes"
                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                 xsi:schemaLocation="com/.../edcdatatypes edcdatatypes.xsd">
    or,

    2) If you want to create the resource programmatically in PL/SQL, the createResource procedure has an overloading you may use to specify the schema :
    declare
      res boolean;
    begin
      res := dbms_xdb.createResource(
        abspath   => '/public/test/engagementData.xml'
      , data      => xmltype(bfilename('TEST_DIR','engagementData.xml'), nls_charset_id('WE8MSWIN1252'))
      , schemaurl => 'edcdatatypes.xsd'
      , elem      => 'Engagements'
      );
    end;
    /
    or,

    3) Declare a default namespace to schema location mapping at the repository level, Oracle will recognize the namespace of the incoming XML resource, know it's associated with a registered schema and store the file in the default table :

    http://docs.oracle.com/cd/B19306_01/appdev.102/b14259/appaman.htm#sthref2407


    So, choose one of those three options depending on your actual constraints. For example, if you don't have any control over incoming XMLs then use either option 2 or 3.

    The XML schema is OK, though I suggest you invest some time in refining the datatypes and their mappings to SQL (string, date, etc.).
    It can be achieved through XDB annotations too :
    http://docs.oracle.com/cd/B19306_01/appdev.102/b14259/xdb05sto.htm#sthref823

    For instance, all elements defined as xs:string will be mapped to VARCHAR2(4000) by default, so it would be interesting to add restrictions in the schema as well.
    You can see the mapping Oracle did during the registration by checking the generated object types and collections.

    Once the file is correctly loaded, you can then use XMLTable to shred the document into relational rows and columns and directly insert into your final table :
    SELECT x.* 
    FROM engagement_objects t
       , XMLTable(
           XMLNamespaces(default 'com/.../edcdatatypes')
         , '/Engagements/Engagement'
           passing t.object_value
           columns 
             HostProfitCenter      varchar2(30) path 'ProfitCenterData/HostProfitCenterData/HostProfitCenter'
           , ...
         ) x
    ;
  • 11. Re: Parsing an XML file to a table
    703519 Newbie
    Currently Being Moderated
    Odie,

    I can't thank you enough for the reply. I sincerely appreciate it.

    I understood the problem the way you have described. However, from where I am, what looks to me is that the option 2 is the best approach. But it seems the schemaUrl is not a valid argument for createResource ( I am at oracle express 10.2). Here's the result of when I tried to execute the procedure:

    ORA-06550: line 4, column 10:
    PLS-00306: wrong number or types of arguments in call to 'CREATERESOURCE'
    ORA-06550: line 4, column 3:
    PL/SQL: Statement ignored




    I tried to run the procedure almost as is from the document that you sent. Here's the command I tried to run:

    declare
    res boolean;
    begin
    res := dbms_xdb.createResource(
    abspath => '/public/demo/xml/engagementData.xml'
    , data => xmltype(bfilename('XMLDIR','engagementData.xml'), nls_charset_id('WE8MSWIN1252'))
    , schemaurl => 'edcdatatypes.xsd'
    , elem => 'Engagements'
    );
    end;
    /

    This to me seems like so near and yet so far! Somewhat frustrating.

    Please advise!
    Thanks!
  • 12. Re: Parsing an XML file to a table
    odie_63 Guru
    Currently Being Moderated
    Yes, my bad...

    This overloading is not available on 10.2.

    So you're gonna have to use option 3.

    Or a plain SQL INSERT, or SQL*Loader :
    http://docs.oracle.com/cd/B19306_01/appdev.102/b14259/xdb25loa.htm#g1026738
  • 13. Re: Parsing an XML file to a table
    996559 Newbie
    Currently Being Moderated
    Hi,

    I have got a similar requirement of loading large xml file data as attributes of a table. However, the xml file will not exceed 500 MB.

    Is there any way I can do this. By using the following approach, I am getting string literal too long error. -

    declare
    X xmltype := xmltype(
    '<?xml version = "1.0"?> <ROWSET> <ROW>
    <DATAQUALITYSUBREQUESTID>255</DATAQUALITYSUBREQUESTID>
    <RESULTCODE>IIR_STATUS_4</RESULTCODE>
    <RESULTMESSAGE>ADDRESS CORRECTED, ALL ELEMENTS WERE CHECKED</RESULTMESSAGE>
    <ADDRESSLINE1>90 YOAKUM ST</ADDRESSLINE1>
    <POSTALCODE>11735</POSTALCODE>
    </ROW>
    <ROW>
    <DATAQUALITYSUBREQUESTID>256</DATAQUALITYSUBREQUESTID>
    <RESULTCODE>IIR_STATUS_4</RESULTCODE>
    <RESULTMESSAGE>ADDRESS CORRECTED, ALL ELEMENTS WERE CHECKED</RESULTMESSAGE>
    <ADDRESSLINE1>205 QUINCY ST</ADDRESSLINE1>
    <POSTALCODE>11216</POSTALCODE>
    </ROW>
    .....
    ...
    ...
    ...</ROWSET>')
    begin
    for R in (
    select EXTRACTVALUE(value(P), '/ROW/DATAQUALITYSUBREQUESTID/text()') as DATAQUALITYSUBREQUESTID,
    EXTRACTVALUE(value(P), '/ROW/RESULTCODE/text()') as RESULTCODE,
    EXTRACTVALUE(value(P), '/ROW/RESULTMESSAGE/text()') as RESULTMESSAGE,
    EXTRACTVALUE(value(P), '/ROW/ADDRESSLINE1/text()') as ADDRESSLINE1,
    EXTRACTVALUE(value(P), '/ROW/POSTALCODE/text()') as POSTALCODE
    from table(XMLSEQUENCE(extract(X, '/ADDRESS/ROW')))P
    )LOOP
    insert into TEMPXML values (r.DATAQUALITYSUBREQUESTID, r.RESULTCODE, r.ADDRESSLINE1, r.POSTALCODE,r.RESULTMESSAGE );
    commit;
    end LOOP;
    END;


    The xml pasted in xmltype is of very huge size.

    Kindly help.

    -Varun
  • 14. Re: Parsing an XML file to a table
    Jason_(A_Non) Expert
    Currently Being Moderated
    Please start a new question and you can reference this thread if it is relevant.

    At first, it sounded like you were working with {message:id=10899913}.

    Anyways, here is a good guide for how to ask questions here on the forums.
    {message:id=9360002}

    Questions for your new thread.
    Why are you trying to put the XML into the PL/SQL code?
    The combination of table(XMLSEQUENCE(extract and EXTRACTVALUE where replaced by XMLTable in 10.2?
    What is your goal, beyond inserting into TEMPXML?

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points