2 Replies Latest reply on Jun 16, 2010 8:22 AM by 766393

    Insert several millions of triples : which solution is the fastest ?

    766393
      Hi,

      Here is my context :
      - I need to populate a model regularly (contains about 20 000 000 triples)
      - The triples are created in about a hundred steps, each step creating between a few thousands and half a million triples
      - Several steps contain the same triples, but I cannot find out about these intersections in advance.
      - Going through all steps with an in-memory Jena model and addInBulk-ing at the end is impossible due to memory usage.

      Here are the solutions I tested :
      - going through all steps and running addInBulk each time (very slow)
      - going through blocks of steps and running addInBulk a few times. (better, but getting slower and slower towards the end. The last addInBulk runs at ~1000 triples/s while the first one was running at ~4000 triples/s - probably due to uniqueness constraints ?)

      Would you know a faster way to insert all of these ?
      (a way to disable the uniqueness constraint and run it at the end / another way to insert (via files ?) / anything ?)

      Thanks

      Regards

      Julien
        • 1. Re: Insert several millions of triples : which solution is the fastest ?
          715399
          Hi Julien,


          In order to improve loading speed, you might want to drop the application table index before loading:

          GraphOracleSem.dropApplicationTableIndex() ,

          and rebuild it later with GraphOracleSem.rebuildApplicationTableIndex()


          Vlad
          • 2. Re: Insert several millions of triples : which solution is the fastest ?
            766393
            Thanks.

            Here are the results (the numbers I gave in my first post were apparently wrong, sorry).

            Is this the kind of improvement you were expecting ? It seems quite faint.

            When I did not drop the index :
            - 1st insert : 1896st/s (~4.5M triples)
            - 2nd insert : 1901st/s (~2.5M triples)
            - ...
            - last insert : 900st/s (~1M triples)

            Using your advice (dropping app table index) :
            - 1st insert : 2030st/s (~4M triples)
            - 2nd insert : 2300st/s (~2.5M triples)
            - ...
            - last insert : 1000st/s (~1M triples)

            By the way, are there statistics / benchmarks / etc regarding Oracle Semantic Technologies ?

            Thanks

            Regards
            Julien