3 Replies Latest reply on Feb 10, 2014 1:15 AM by Sdas-Oracle

    SEM_APIS.BULK_LOAD_FROM_STAGING_TABLE does not finish

    28f0f31d-f7f8-4fcd-a737-4d97bf31ab13

      Hi all,

       

      I am trying to bulk load an N-Triples file using a staging table according to instructions given in documentation (RDF Semantic Graph Overview).

      The file contains about 23,000,000 triples, and I have split it in 16 files of 1,437,500 triples, in order to load them in parallel.

       

      First I successfully load these files into the staging table by invoking the following command, which needs about 1 minute to finish:

       

      BEGIN
        SEM_APIS.LOAD_INTO_STAGING_TABLE(
          staging_table => 'STAGE_TABLE'
         ,source_table  => 'STAGE_TABLE_SOURCE'
         ,input_format  => 'N-TRIPLE'
         ,parallel          => '16');
      END;
      /
      

       

      Then, I call SEM_APIS.BULK_LOAD_FROM_STAGING_TABLE to load data into the semantic store:

      BEGIN
          SEM_APIS.BULK_LOAD_FROM_STAGING_TABLE(
              model_name  => 'rdfmodel',
              table_owner => 'rdfuser',
              flags       => 'PARSE PARALLEL_CREATE_INDEX PARALLEL=16 MBV_METHOD=SHADOW',
              table_name  => 'stage_table'
      );
      END;
      /
      

      However, this operation does not finish. I left it running for more than 5 hours and not any triple is inserted into the table mdsys.semm_rdfmodel.

      Also, I have observed that even if PARALLEL parameter is set the most CPUs remain idle. Usually, only one or two CPUs are used to a percentage more than 50% (I check it using htop).

       

      Any hint about how I can find out what goes wrong will be very appreciated!

       

      Best regards,

      George

        • 1. Re: SEM_APIS.BULK_LOAD_FROM_STAGING_TABLE does not finish
          Sdas-Oracle

          1) Please enable parallel DML before invoking the sem_apis.bulk_load_from_staging_table call:

           

          alter session enable parallel dml;

           

          2) Also, to keep track of progress, please create the RDF$ET_TAB table in the invoker's schema and grant SELECT and UPDATE privilege on that table to MDSYS (as suggested in the documentation: http://docs.oracle.com/cd/E16655_01/appdev.121/e17895/sdo_rdf_concepts.htm#RDFRM99904):

           

          CREATE TABLE RDF$ET_TAB (

            proc_sid VARCHAR2(30),

            proc_sig VARCHAR2(200),

            event_name varchar2(200),

            start_time timestamp,

            end_time timestamp,

            start_comment varchar2(1000) DEFAULT NULL,

            end_comment varchar2(1000) DEFAULT NULL

          );

          GRANT INSERT, UPDATE on RDF$ET_TAB to MDSYS;

           

          Thanks

          - Souri.

          • 2. Re: SEM_APIS.BULK_LOAD_FROM_STAGING_TABLE does not finish
            28f0f31d-f7f8-4fcd-a737-4d97bf31ab13

            Hi Souri,

             

            thank you for you advices!

             

            I enabled parallel dml but the results are the same. I, also, created the RDF$ET_TABLE and it seems that the events that do not finish are FIND_AND_RESOLVE_BATCH_DUPS and -find_BATCH_DUP_SELECTIONS.

             

            To find unfinished events I posed the query:

             

            SELECT DISTINCT proc_sid, event_name, start_time, end_time, start_comment, end_comment
            FROM RDF$ET_TAB
            WHERE end_time IS NULL
            ORDER BY start_time ASC;
            

             

            and the results are the following:

             

            PROC_SIDEVENT_NAMESTART_TIMEEND_TIMESTART_COMMENTEND_COMMENT
            M1_16B08_6F40B1F_68*BULK_LOAD_FROM_STAGING_TABLE06-FEB-14 12.02.43.669541 AM
            M1_16B08_6F40B1F_68FIND_AND_RESOLVE_BATCH_DUPS06-FEB-14 12.06.57.878441 AM

            [P1=[PARALLEL(RDF$2LM1_16B08_6F40B1F_68,16)]

            P2=[PARALLEL(RDF$TLM1_16B08_6F40B1F_68,16)]

            P3=[PARALLEL(RDF$BDM1_16B08_6F40B1F_68,16)]

            P4=[PARALLEL(RDF$BSM1_16B08_6F40B1F_68,16)]

            Px=[PARALLEL(x,16)]

            Py=[PARALLEL(y,16)]]

            M1_16B08_6F40B1F_68-find_BATCH_DUP_SELECTIONS06-FEB-14 12.06.59.969255 AM

             

             

            I suppose that these events are about duplicates. So, I thought that adding the flag DEL_BATCH_DUPS=USE_INSERT to the sem_apis.bulk_load_from_staging_table call would help. I added this flag and retried. The storing, again, did not finish after 5 hours but the results of the query for the unfinished events are slightly different:

             

            PROC_SIDEVENT_NAMESTART_TIMEEND_TIMESTART_COMMENTEND_COMMENT

            M1_16B047841AF37_68

            *BULK_LOAD_FROM_STAGING_TABLE05-FEB-14 07.32.05.582389 PM
            M1_16B047841AF37_68LOAD_BATCH_TRIPLES_TABLE05-FEB-14 07.35.24.297935 PM

            [nP=[0]

            P1=[PARALLEL(RDF$TLM1_16B047841AF37_68,16)]

            P2=[PARALLEL(st,16)]]

             

            Best regards,

            George

             

            Message was edited by: 28f0f31d-f7f8-4fcd-a737-4d97bf31ab13

            • 3. Re: SEM_APIS.BULK_LOAD_FROM_STAGING_TABLE does not finish
              Sdas-Oracle

              George,

               

              The output of RDF$ET_TAB query seems odd. There should be a number of other steps before FIND_AND_RESOLVE_BATCH_DUPS or LOAD_BATCH_TRIPLES_TABLE. Let us take this offline until we figure out what is going on.

               

              If you contact me at souripriya dot das at oracle dot com, we can look at it more thoroughly.

               

              Thanks,

              - Souri.