This discussion is archived
3 Replies Latest reply: Sep 13, 2012 6:22 AM by odie_63 RSS

Slow Performance parsing a  1MB document

user13413951 Newbie
Currently Being Moderated
All,
I appreciate any advice you can lend on this subject.

I am retrieving an XML document over the network that I need then split into columns to put into a relational table.

The document is approximately 1 MB large. There are 1363 DETAILS elements under a single parent element DETAILS_COLLECTION.

I get it as a clob and convert to XMLTYPE and place into a table named: p_details_t which is a table of XMLTYPE based on a xsd.

I then try to put it into a table:
      insert into p_exams (app_session,
                       page,
                       region,
                       accession,
                       patient,
                       last_name,
                       first_name,
                       dob,
                       procedure_name,
                       dos,
                       mrn,
                       created_date,
                       app_user,
                       site,
                       status_code,
                       radiologist,
                       signed_date,
                       referrer,
                       referrer_id,
                       order_no,
                       gender,
                       company,
                       report_url,
                       exam_count,
                       affiliation,
                       cc_ids,
                       cc_names
                       )
                select  32234 app_session,
                        300 page_number,
                        1 region,
                        cols.accession,
                        upper(cols.lastname||', '||cols.firstname) c1,
                        upper(cols.lastname) c2,
                        upper(cols.firstname) c23,
                        to_date(substr(cols.dob,1,10),'YYYY-MM-DD') c234234,
                        cols.procedures,
                        to_date(substr(cols.dos,1,length(cols.dos)-7),'MM/DD/YYYY HH:MI:SS AM') c234,
                        cols.mrn,
                        sysdate d1,
                       'APPLEF' app_user,
                        cols.site,
                        cols.status_code,
                        cols.radiologist,
                        to_date(substr(cols.signed,1,length(cols.signed)-7),'MM/DD/YYYY HH:MI:SS AM') c534,
                        cols.referrer,
                        234 x324,
                        cols.orderno,
                        cols.sex,
                        cols.practice,
                        cols.report_url,
                        to_number(cols.examcount) c007,
                        cols.affiliations,
                        cols.ccDoctorIds,
                        cols.ccDoctorNames
                  from  p_details_t pd,
                       xmltable('/Details_Collection/Details' passing pd.object_value 
                       columns 
                                 firstname varchar2(100) path '/Details/@first_name',
                                 lastname varchar2(100) path '/Details/@last_name',
                                 accession varchar2(100) path '/Details/@accession_number',
                                 dob       varchar2(100) path '/Details/@birth_date',
                                 procedures varchar2(100) path '/Details/@procedures',
                                 dos        varchar2(30)  path '/Details/@scheduled_start_date',
                                 mrn   varchar2(64)       path '/Details/@MRN',        
                                 site  varchar2(500)        path '/Details/@site',
                                 status_code varchar2(50) path '/Details/@status_code',--,
                                 radiologist  varchar2(100) path '/Details/@dictated_by',
                                 signed       varchar2(100) path '/Details/@signed_date',
                                 referrer     varchar2(200) path '/Details/@requested_by',
                                 orderno        varchar2(100) path '/Details/@order_number',
                                 sex          varchar2(100) path '/Details/@gender_code',
                                 practice     varchar2(200) path '/Details/@practice',
                                 report_url   varchar2(1000) path '/Details/@reportURL',
                                 examcount    varchar2(100)  path '/Details/@ExamCount',
                                 affiliations varchar2(200) path '/Details/@requesting_affiliations',
                                 ccDoctorIds    varchar2(100)  path '/Details/@ccDoctorIds',
                                 ccDoctorNames  varchar2(200)  path '/Details/@ccDoctorNames'                                 
                                ) cols
While the insert works, it takes close to 30 seconds to process the file. This seems much longer than I would have expected. Do you see something wrong with the code above?

I am using 10.2 Standard Edition.

Thanks in advance for any help you can provide.
  • 1. Re: Slow Performance parsing a  1MB document
    Jason_(A_Non) Expert
    Currently Being Moderated
    Being on 10.2, your options are a limited. If you were able to upgrade to anything in the 11g line, then you could make a dramatic performance improvement simply by changing the underlying storage of the p_details_t table.

    If you were to run an explain plan on the SQL statement you provided, I suspect you would see a
    COLLECTION ITERATOR PICKLER FETCH
    line in there. You can google for that phrase, but basically it means that Oracle is performing in-memory operations on parsing/walking through the XML, repeatedly.

    One option that might help you is to parse the XML as a XMLType in PL/SQL after you receive the CLOB. You then do a FORALL to insert that parsed information into the DB. I previously wrote up an example of this at [url http://anononxml.blogspot.com/2011/05/options-for-slow-performance-parsing.html]Options For Slow Performance Parsing when going through a similar exercise on 11.1.0.6 where I retrieved some information from a web service and merged that into a table in the DB. I had 2,100+ repeating nodes in the XML. As my post shows, that method was faster than using XMLTable in a INSERT statement to parse the XML. It lays out the basic structure I used.

    One additional thought. Since you are on 10.2, instead of using a single l_import_q_tab with many columns in it, you may have to create single collections, one for each column needed.
    ...
    l_import_q_accession     dbms_sql.Varchar2_Table;
    l_import_q_patient     dbms_sql.Varchar2_Table;
    ...
    You would need to ensure you populate all collection entries for each of the DETAILS records you parse, even it is only putting NULL into the variable. This is to prevent a NO_DATA_FOUND when the FORALL is accessing each entry in all the variables. Hope that at least gives your another option to improve performance on 10.2
  • 2. Re: Slow Performance parsing a  1MB document
    AlexAnd Guru
    Currently Being Moderated
    you can try to use index
    Using Indexes to Improve Performance of XPath-Based Functions - http://docs.oracle.com/cd/B19306_01/appdev.102/b14259/xdb03usg.htm#sthref371
    Indexing XMLType Columns - http://docs.oracle.com/cd/B19306_01/appdev.102/b14259/xdb04cre.htm#sthref596

    also search on forum like
    Re: how to create index on xmtype column
    Re: How to create the proper index for my xml..
  • 3. Re: Slow Performance parsing a  1MB document
    odie_63 Guru
    Currently Being Moderated
    While the insert works, it takes close to 30 seconds to process the file. This seems much longer than I would have expected. Do you see something wrong with the code above?
    Please provide more details.
    If the XMLType table really is schema-based then it should be fast, even on 10.2.

    So, can you give :

    - exact db version (SELECT * FROM v$version)
    - the XML schema
    - the command you used to register the schema and create the table

    Thanks.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points