8 Replies Latest reply: Oct 16, 2013 7:24 AM by cellar RSS

    error 13199 with bulk_load_from_staging_table

    cellar

      Hi there,

       

      well we have a model with more than a billion rows. Due to unfiltered operation, this model contains useless datas.

      We want to purge it. The problem is that we will purge nearly 70 millions rows so the classic delete will not work.

       

      What we made (locale is en_US.UTF-8) :

       

      - extract the datas in N-Triple format for the model via Jena Adapter with a Java program

      result is a compressed file of 15GBs.

       

      - start purging with grep -iv of useless triples from the previous file generated.

      gunzip -c < cmr_metadata.nt.gz | grep -iv "xxx" | gzip > purged_cmr_metadata.nt.gz

       

      - split every 100 millions rows the compressed file in smaller compressed files with nawk :

       

      gunzip -c < ../purged_cmr_metadata.nt.gz | nawk '

      BEGIN {

      fn=sprintf("gzip -c > newtriples%d_nt.gz", fileno++)

      }

      {print | fn }

      NR%100000000==0 {fn=sprintf("gzip -c > newtriples%d_nt.gz", fileno++)}

      '

      result is 5 files compressed.

       

      - create external staging table with the files generated :

       

      create table stable_ext (

                             RDF$STC_sub varchar2(4000),

                             RDF$STC_pred varchar2(4000),

                             RDF$STC_obj varchar2(4000))

        organization external (

        type oracle_loader

        default directory TMP_DIR

        access parameters (

        records delimited by NEWLINE

        preprocessor tmp_dir:'decompress.sh'

        fields terminated by ' ')

        location ('newtriples0_nt.gz','newtriples1_nt.gz','newtriples2_nt.gz','newtriples3_nt.gz','newtriples4_nt.gz','newtriples5_nt.gz'))

        reject limit unlimited

        parallel 4;

       

      decompress.sh is only /usr/bin/gunzip -c < $1

       

      So at this time my staging table is populated with datas.

       

      - next is to create the new model and the new table.

       

      create table cmr_metadata_new

      compress

      as select * from cmr_metadata

      where 1=0 ;

       

      exec sem_apis.CREATE_SEM_MODEL(MODEL_NAME=>'CMR_METADATA_NEW',TABLE_NAME=>'CMR_METADATA_NEW',COLUMN_NAME=>'TRIPLE');

       

      - and during the load we have an error :

       

      exec sem_apis.BULK_LOAD_FROM_STAGING_TABLE(MODEL_NAME=>'CMR_METADATA_NEW',TABLE_OWNER=>'CMROWNER',TABLE_NAME=>'STABLE_EXT',FLAGS=>'PARALLEL=4 MBV_METHOD=SHADOW');

      BEGIN sem_apis.BULK_LOAD_FROM_STAGING_TABLE(MODEL_NAME=>'CMR_METADATA_NEW',TABLE_OWNER=>'CMROWNER',TABLE_NAME=>'STABLE_EXT',FLAGS=>'PARALLEL=4 MBV_METHOD=SHADOW'); END;

       

      *

      ERROR at line 1:

      ORA-13199: During MBV Swap: [08-OCT-13 09.24.54.580980000 AM +02:00]

      ORA-13199: Element Parse Error: Invalid value [debug-1s info:

      3152157576748681903,,,,] (value: "74/634/EWG:) [

      ORA-06512: at "MDSYS.MD", line 1723

      ORA-06512: at "MDSYS.MDERR", line 17

      ORA-06512: at "MDSYS.SDO_RDF", line 29

      ORA-06512: at line 1

      ORA-06512: at "MDSYS.SDO_RDF_INTERNAL", line 3903

      ]

      ORA-06512: at "MDSYS.MD", line 1723

      ORA-06512: at "MDSYS.MDERR", line 17

      ORA-06512: at "MDSYS.SDO_RDF_INTERNAL", line 4311

      ORA-06512: at "MDSYS.SDO_RDF_INTERNAL", line 6409

      ORA-06512: at "MDSYS.SDO_RDF", line 747

      ORA-06512: at "MDSYS.RDF_APIS", line 893

      ORA-06512: at line 1

       

      What can be wrong ?

       

      Thanks

        • 1. Re: error 13199 with bulk_load_from_staging_table
          alwu-Oracle

          Hi,

           

          Can you please search in the N-TRIPLE files for a substring "74/634/EWG:"?

          Please cut & paste the matches here.

           

          Also, please do the same thing in your staging table.

           

          Thanks,


          Zhe Wu

          • 2. Re: error 13199 with bulk_load_from_staging_table
            cellar

            Hi Zhe,

             

            well the error seems to be random and related with degree of parallelism as the values is different from each execution.

            In the meantime, we checked the encoding in the extract file and it seems that we have some wrong characters encoded.

            I have added the PARSE parameter in the FLAGS and is complaining each time (based on PARALLEL) with a different value with Invalid PL@ Object.

            So i will try again to extract datas from my model by forcing encoding to UTF-8.

             

            Thanks

            Stephane

            • 3. Re: error 13199 with bulk_load_from_staging_table
              alwu-Oracle

              Hi Stephane,

               

              I suggested you scan the data because I think you may have spaces in your literal values. And you are using a space as the field separator.

               

              Here is my test case.

               

              $ gunzip -c /tmp/test.nt.gz | cat

              <urn:a> <urn:b> <urn:c> .

              <urn:a> <urn:b> "hello world" .

               

              create table stable_ext (

                                     RDF$STC_sub varchar2(4000),

                                     RDF$STC_pred varchar2(4000),

                                     RDF$STC_obj varchar2(4000))

                organization external (

                type oracle_loader

                default directory TMP_DIR

                access parameters (

                records delimited by NEWLINE

                preprocessor tmp_dir:'decompress.sh'

                fields terminated by ' ')

                location ('test.nt.gz')

                )

                reject limit unlimited

                parallel 4;

               

              SQL> select * from stable_ext;

               

              RDF$STC_SUB

              --------------------------------------------------------------------------------

              RDF$STC_PRED

              --------------------------------------------------------------------------------

              RDF$STC_OBJ

              --------------------------------------------------------------------------------

              <urn:a>

              <urn:b>

              <urn:c>

               

               

              <urn:a>

              <urn:b>

              "hello

               

              RDF$STC_SUB

              --------------------------------------------------------------------------------

              RDF$STC_PRED

              --------------------------------------------------------------------------------

              RDF$STC_OBJ

              --------------------------------------------------------------------------------

               

              Notice that the literal resource is only partial.

               

              Thanks,

               

              Zhe

              • 4. Re: error 13199 with bulk_load_from_staging_table
                cellar

                Hi there,

                yes my object value can contain white spaces.

                I saw that after posting. So the delimiter is a little bit more complicated.

                 

                But I tried with sqlldr and a customized version of bulkload and i was able to load nearly all of rows.

                 

                It took 5h45 to load 515 millions rows with 5 sql*loader in parallel.

                 

                I still have some rows with value over 4000 characters.

                For these, i have to find an other solution.

                Maybe use batch loading with BatchLoader API with the bad files as input.

                 

                Thanks

                • 5. Re: error 13199 with bulk_load_from_staging_table
                  sdas

                  Please try the following (I am assuming that name of the badfile from the previous external table loading was "badfile.nt" (uncompressed) in TMP_DIR directory -- replace it with the appropriate filename):

                   

                  create table stable_CLOB_ext (

                                         RDF$STC_sub varchar2(4000),

                                         RDF$STC_pred varchar2(4000),

                                         RDF$STC_obj CLOB)

                    organization external (

                    type oracle_loader

                    default directory TMP_DIR

                    access parameters (

                    records delimited by NEWLINE

                    fields

                    (

                     "RDF$STC_SUB"         CHAR(4000) terminated by whitespace

                    ,"RDF$STC_PRED"       CHAR(4000) terminated by whitespace

                    ,"RDF$STC_OBJ"         CHAR(32767)

                  )

                  )

                    location ('badfile.nt'))

                    reject limit unlimited

                    parallel 4;

                   

                   

                  insert into CMR_METADATA_NEW select sdo_rdf_triple_s('CMR_METADATA_NEW',rdf$stc_sub,rdf$stc_pred,rdf$stc_obj) from stable_CLOB_ext;

                   

                  Thanks,

                  - Souri.

                  • 6. Re: error 13199 with bulk_load_from_staging_table
                    cellar

                    Hi Souri,

                     

                    thanks

                     

                    that's what i wanted to know if we can use CLOB for object instead of varchar2 in staging table.

                    I will give a try and let you know once my model recreation is completed.

                    • 7. Re: error 13199 with bulk_load_from_staging_table
                      sdas

                      Stephane,

                       

                      Please note that I used "INSERT AS SELECT" for loading the triples with CLOB objects, because

                      bulk-load (sem_apis.Bulk_load_from_staging_table) assumes VARCHAR2 for the rdf$stc_obj column definition.

                       

                      Thanks,

                      - Souri.

                      • 8. Re: error 13199 with bulk_load_from_staging_table
                        cellar

                        Hi there,

                         

                        well my new model is recreated.

                        I have one more step now.

                        My table CMR_METADATA has one field named CONTEXT.

                        I have to set the value for this field on the new table CMR_METADATA_NEW.

                         

                        SQL> merge into cmr_metadata_new N

                          2  using (select a.TRIPLE.rdf_s_id s_id,a.TRIPLE.rdf_p_id p_id,a.TRIPLE.rdf_o_id o_id,a.TRIPLE.rdf_c_id c_id,a.context

                          3  from cmr_metadata a) O

                          4  on (N.TRIPLE.rdf_s_id=O.s_id and N.TRIPLE.rdf_p_id=O.p_id and N.TRIPLE.rdf_o_id=O.o_id and N.TRIPLE.rdf_c_id=O.c_id)

                          5  when matched then update set N.context=O.context

                          6  ;

                         

                        Once is completed, i planned to use sem_apis.swap_names('CMR_METADATA','CMR_METADATA_NEW')

                        and then rename CMR_METADATA table to CMR_METADATA_OLD and push the new one has the CMR_METADATA.

                        I have then to recreate entailments and virtual models.

                         

                        Thanks