2 Replies Latest reply: Dec 6, 2011 10:24 AM by 900576 RSS

    Complete Bulk

    900576
      We are currently trying to tune complete bulk call. We are executing complete bulk on the graph by getting the BulkUpdateHandler and calling completeBulk with a few flags. We ran the following execution with the flags "PARALLEL=8 IZC_JOIN_HINT=USE_HASH MBV_JOIN_HINT=USE_HASH". Are there any other settings I can use to speed up the processing of the BulkComplete? It is currently taking about 4h 45m to BulkComplete about 57Million triples into an empty model.

      Here is the RDF$ET_TAB entries for the execution:
      *BULK_LOAD_FROM_STAGING_TABLE<br>05-DEC-11 09.46.36.102264 AM<br>05-DEC-11 02.32.11.217183 PM<br>[[values=0,triples=57101108,dup_sel=59365,dups=137657]]<br><br>
      LOAD_BATCH_VALUES_TABLE<br>05-DEC-11 09.46.36.211461 AM<br>05-DEC-11 10.12.36.229220 AM<br>
      -ins_LEX_VALS_INTO_BATCH_VAL<br>05-DEC-11 09.46.36.483601 AM<br>05-DEC-11 10.12.11.238727 AM<br>P1=[ PARALLEL(tv,8) ]P2=[ PARALLEL(st,8) ]<br>[rows_appended=18598196]<br><br>
      -ins_CANON_VALS_INTO_BATCH_VAL<br>05-DEC-11 10.12.11.462472 AM<br>05-DEC-11 10.12.36.224332 AM<br> ID=8732397014379457511 CID=6037693442462750226 CV=["1155448"] V=["1155448"^^<http://www.w3.org/2001/XMLSchema#string>]<br>[P1=[ PARALLEL(tv,8) ]rows_appended=3704981]<br><br>
      GATHER_BATCH_VALUES_STATS<br>05-DEC-11 10.12.36.229793 AM<br>05-DEC-11 10.12.48.174650 AM<br><br>
      LOCK_VALUES_TABLE<br>05-DEC-11 10.12.48.175699 AM<br>05-DEC-11 10.12.48.189650 AM<br><br>
      IS_ZERO_COLLISIONS<br>05-DEC-11 10.12.48.190683 AM<br>05-DEC-11 10.18.27.238464 AM<br>[J=[ USE_HASH (ov,x) ]P1=[ PARALLEL(RDF$TVM3_12DC282B8B5F6_54,8) ]P2=[ PARALLEL(ov,8) ]P3=[ PARALLEL(x,8) ]]<br>
      MERGE_BATCH_VALUES<br>05-DEC-11 10.18.27.240004 AM<br>05-DEC-11 10.25.27.217308 AM<br>[J=[ USE_HASH (ov,x) ]P1=[ PARALLEL(ov,8) ]P2=[ PARALLEL(x,8) ]][rows_appended=0]<br><br>
      RELEASE_VALUES_TABLE<br>05-DEC-11 10.25.27.217857 AM<br>05-DEC-11 10.25.27.218466 AM<br><br>
      LOAD_BATCH_TRIPLES_TABLE<br>05-DEC-11 10.25.27.321306 AM<br>05-DEC-11 11.01.23.335066 AM<br>[nP=[0]P1=[ PARALLEL(RDF$TLM3_12DC282B8B5F6_54,8) ]P2=[ PARALLEL(st,8) ]][rows_appended=57179400]<br><br>
      GATHER_BATCH_TRIPLES_STATS<br>05-DEC-11 11.01.23.335616 AM<br>05-DEC-11 11.01.35.637969 AM<br><br>
      FIND_AND_RESOLVE_BATCH_DUPS<br>05-DEC-11 11.01.35.659022 AM<br>05-DEC-11 11.09.24.251057 AM<br>[P1=[ PARALLEL(RDF$2LM3_12DC282B8B5F6_54,8) ]P2=[ PARALLEL(RDF$TLM3_12DC282B8B5F6_54,8) ]P3=[ PARALLEL(RDF$BDM3_12DC282B8B5F6_54,8) ]P4=[ PARALLEL(RDF$BSM3_12DC282B8B5F6_54,8) ]Px=[ PARALLEL(x,8) ]Py=[ PARALLEL(y,8) ]]<br><br>
      -find_BATCH_DUP_SELECTIONS<br>05-DEC-11 11.01.35.701874 AM<br>05-DEC-11 11.08.54.584436 AM<br>[rows_appended=59365]<br><br>
      -find_BATCH_DUPS<br>05-DEC-11 11.08.54.618094 AM<br>05-DEC-11 11.09.08.264061 AM<br>Duplicate: g_id= sid=6371922277292682971 pid=1023897553896697138 cid=3567578795737637710 oid=3567578795737637710 cost=2<br>[rows_appended=137657]<br><br>
      -del_BATCH_DUPS<br>05-DEC-11 11.09.08.265079 AM<br>05-DEC-11 11.09.18.238481 AM<br>[rows_affected=-137657]<br><br>
      -ins_BATCH_DUP_SELECTIONS<br>05-DEC-11 11.09.18.239754 AM<br>05-DEC-11 11.09.18.498214 AM<br><br>[rows_appended=59365]<br><br>
      -del_BATCH_DUP_SEL_from_BATCH_DUP<br>05-DEC-11 11.09.18.499262 AM<br>05-DEC-11 11.09.24.245801 AM<br>[rows_deleted=59365]<br><br>
      LOAD_INTO_NONEMPTY_MODEL_PARTN<br>05-DEC-11 11.09.24.468003 AM<br>05-DEC-11 02.32.10.193757 PM<br>P1= [PARALLEL(RDF$TLM3_12DC282B8B5F6_54,8) ]P2=[ PARALLEL(RDF_LINK$,8) ]P3=[ PARALLEL(x,8) ]P4=[ PARALLEL(t,8) ]]<br><br>
      -append_to_MODEL<br>05-DEC-11 11.09.24.502595 AM<br>05-DEC-11 12.11.36.400449 PM<br>P3=[ PARALLEL(x,8) ]P4=[ PARALLEL(t,8) ]<br>[Dup: content overlap between current batch and RDF model]<br><br>
      -merge_into_MODEL_PARTN<br>05-DEC-11 12.11.36.416599 PM<br>05-DEC-11 02.32.10.192640 PM<br><br>[rows_affected=57101108]<br><br>
      -insert_INTO_APP_TABLE<br>05-DEC-11 02.24.15.213073 PM<br>05-DEC-11 02.32.03.499360 PM<br>P1=[ PARALLEL(RDF$TLM3_12DC282B8B5F6_54,8) ]P2=[ PARALLEL(x,8) ]P3=[ PARALLEL(ap,8) ]Px=[ PARALLEL(RDF$BDM3_12DC282B8B5F6_54,8) ]<br>[rows_appended=57179400]<br>

      Edited by: MichaelB on Dec 6, 2011 10:09 AM
        • 1. Re: Complete Bulk
          sdas
          The following portion of the event trace shows that it was trying to load into a non-empty RDF model:
          ...
          LOAD_INTO_NONEMPTY_MODEL_PARTN
          05-DEC-11 11.09.24.468003 AM
          05-DEC-11 02.32.10.193757 PM
          P1= [PARALLEL(RDF$TLM3_12DC282B8B5F6_54,8) ]P2=[ PARALLEL(RDF_LINK$,8) ]P3=[ PARALLEL(x,8) ]P4=[ PARALLEL(t,8) ]]

          -append_to_MODEL
          05-DEC-11 11.09.24.502595 AM
          05-DEC-11 12.11.36.400449 PM
          P3=[ PARALLEL(x,8) ]P4=[ PARALLEL(t,8) ]
          [Dup: content overlap between current batch and RDF model]

          -merge_into_MODEL_PARTN
          05-DEC-11 12.11.36.416599 PM
          05-DEC-11 02.32.10.192640 PM

          [rows_affected=57101108]

          -insert_INTO_APP_TABLE
          05-DEC-11 02.24.15.213073 PM
          05-DEC-11 02.32.03.499360 PM
          P1=[ PARALLEL(RDF$TLM3_12DC282B8B5F6_54,8) ]P2=[ PARALLEL(x,8) ]P3=[ PARALLEL(ap,8) ]Px=[ PARALLEL(RDF$BDM3_12DC282B8B5F6_54,8) ]
          [rows_appended=57179400]
          • 2. Re: Complete Bulk
            900576
            You are correct. We have about 200,000 triples already in the model. We'll try to load all of our triples into the staging table before performing the BulkComplete to see if that helps the performance that much.

            Thanks!