6 Replies Latest reply on Aug 20, 2015 10:04 PM by alwu-Oracle

    Use of staging table name parameter in prepareBulk / completeBulk functions.

    Donald Pellegrino

      I attempting to load 1.6 billion triples into an RDF Semantic Graph instance. I am using the prepareBulk / completeBulk approach described in "7.12 Bulk Loading Using RDF Semantic Graph Support for Apache Jena." I loaded the triples from .ttl.gz files into a staging table with prepareBulk as per "Example 7-10 Loading Data into the Staging Table (prepareBulk)."

       

      Following Example 7-10, I used "null" for the "staging table name" parameter to prepareBulk. I then ran a separate program to execute completeBulk as per "Example 7-11 Loading Data from the Staging Table into the Semantic Network (completeBulk)." Example 7-11 also shows the use of "null" as a default value for the "staging table name" parameter. The prepareBulk operations appear to have run successfully with a null staging table name. However, null does not appear to be valid input to the staging table name parameter of completeBulk. Running "completeBulk(null, null);" gives the following error message:

       

      Hit exception ORA-00942: table or view does not exist

       

      What is the relationship between the "staging table name" parameters to prepareBulk and completeBulk? Is null a valid value for this parameter to prepareBulk, and if so what should be the corresponding value passed to completeBulk?

        • 1. Re: Use of staging table name parameter in prepareBulk / completeBulk functions.
          alwu-Oracle

          Hi,

           

          This sounds strange. We have a test case for that. Let's try this. By default, the staging table created is under the same user schema and the table name would be "RDFB_" followed by the model name.

           

          Can you please check the existence of such a table in your schema? It should have 1B+ rows. If so, you can directly pass in that table name.

           

          Since you are dealing with a good amount of data, the following should be helpful for performance:

          1) remove the index on the application table before you run the completeBulk call;
          2) enable parallel DML before the call: oracle.executeSQL("alter session enable parallel dml");

          3) use parallel loading options. An example is as follows. Degree of parallelism is set at 4 and you will need to customize it for your own setup.

            "PARSE PARALLEL_CREATE_INDEX PARALLEL=4 mbv_method=shadow"

           

          Thanks,


          Zhe Wu

          1 person found this helpful
          • 2. Re: Use of staging table name parameter in prepareBulk / completeBulk functions.
            Donald Pellegrino

            I found a RDFB_<model_name> table with the 1.6 billion triples in it. Passing this in as the table name to completeBulk seems to work. Unfortunately, the completeBulk operation is consuming all of the TEMP tablespace and then failing. I will try the performance suggestions you provided and see if that gets us past the TEMP tablespace issue.

             

            I also found an empty RDFC_<model_name> table. Do I need to do anything with the RDFC_ table?

             

            Will the completeBulk operation delete these two staging tables if it completes successfully or do I need to delete them manually?

             

            Is there any way to estimate how much tablespace or processing time might be required for future bulk loads?

            • 3. Re: Use of staging table name parameter in prepareBulk / completeBulk functions.
              alwu-Oracle

              Hi,

               

              I would recommend you set up a bigfile temp tablespace. An example is as follows. If you are using ASM, then the syntax will be a bit different (e.g.  tempfile '+DATA')

               

              create bigfile temporary tablespace semtmp

                tempfile '?/dbs/semtmpts.dat'

                size 512M reuse

                autoextend on next 512M maxsize unlimited

                EXTENT MANAGEMENT LOCAL

                ;

               

              ALTER DATABASE DEFAULT TEMPORARY TABLESPACE semtmp;

               

              For 1.6 billion triples, I think it is safe to allocate 500GB of disk space. As far as processing time goes, it really depends on what kind of hardware (I/O performance, # of CPUs, size of RAM/SGA/PGA etc).

               

              Thanks,


              Zhe Wu

              1 person found this helpful
              • 4. Re: Use of staging table name parameter in prepareBulk / completeBulk functions.
                alwu-Oracle

                Forgot to mention that RDFC_ is for triples/quads with long literals. Since it is empty you can ignore it.

                 

                After the completeBulk, please truncate these RDFB_ and RDFC_ tables.

                 

                Thanks,

                 

                Zhe Wu

                1 person found this helpful
                • 5. Re: Use of staging table name parameter in prepareBulk / completeBulk functions.
                  Donald Pellegrino

                  Thanks alwu-Oracle. Unfortunately, I seem to have gotten past the TEMP tablespace issue only to run into an UNDO tablespace issue:

                   

                  Hit exception ORA-13199: During MBV: [13-AUG-15 01.35.35.308493000 AM -04:00]

                  ORA-30036: unable to extend segment by 8 in undo tablespace 'UNDOTBS1' [

                  ORA-06512: at "MDSYS.SDO_RDF_INTERNAL", line 3672

                   

                  ]

                  ORA-06512: at "MDSYS.MD", line 1723

                  ORA-06512: at "MDSYS.MDERR", line 17

                  ORA-06512: at "MDSYS.SDO_RDF_INTERNAL", line 3760

                  ORA-06512: at "MDSYS.SDO_RDF_INTERNAL", line 7395

                  ORA-06512: at "MDSYS.SDO_RDF", line 1014

                  ORA-06512: at "MDSYS.RDF_APIS", line 926

                  ORA-06512: at line 1

                   

                  Any suggestions for managing undo efficiently or disabling it for completeBulk calls?

                  • 6. Re: Use of staging table name parameter in prepareBulk / completeBulk functions.
                    alwu-Oracle

                    Hi,

                     

                    It is best to use BIGFILE UNDO. An example is as follows. If you are using ASM, then the syntax will be a bit different.

                     

                    e.g.

                     

                    CREATE bigfile UNDO TABLESPACE semundo

                      DATAFILE '?/dbs/semundo.dat' SIZE 512M REUSE

                      AUTOEXTEND ON next 512M maxsize unlimited

                      EXTENT MANAGEMENT LOCAL

                      ;

                    ALTER SYSTEM SET UNDO_TABLESPACE=semundo;

                     

                    Thanks,

                     

                    Zhe