3 Replies Latest reply on Oct 18, 2006 7:49 PM by Mannamal-Oracle

    loading triples becomes increasingly expensive

    437959
      Am I wrong in surmising that as more triples are loaded into my RDF network it will become increasingly expensive to load triples because the incoming subject,predicate,object values and the triples to be constructed must be checked against existing instances of those values and triples in the mdsys.rdf_value$ and mdsys.rdf_link$ tables, respectively, in order to prevent duplicates?

      I would appreciate any comments.
      Thank you,

      Peter
        • 1. Re: loading triples becomes increasingly expensive
          Mannamal-Oracle
          It is correct that the load becomes expensive as the number of triples loaded increases, but the load process uses the necessary indexes so the rate of slowdown of the load process is not that sharp and the load is still scalable.

          What are the data sizes you are looking to load? If you wish you can write to me directly at melliyal <dot> annamalai <at> oracle <dot> com. We are also interested in understanding your application.

          Melli
          • 2. Re: loading triples becomes increasingly expensive
            437959
            Melli,

            The strings for subject, property,object are all on the order of 100 bytes.
            The number of triples to be initially loaded will be on the order of 10 million.
            These would be coming from 20 + datasets sql loaded into staging tables and then processed by different pl/sql procedures per dataset.

            Thereafter,the number of triples to be added to the model on a daily basis will be on the order of 1000.

            Would rebuilding the indexes on the value$ and link$ (and other?) tables after loading each dataset significantly improve the loading of the next dataset?

            Is there any way to add more than one tablespace to an RDF network, for the purpose of reliably separating tables from indexes?

            Of major concern is the downtime required to load new triples and then recreate the rules index.

            Thank you,

            Peter
            • 3. Re: loading triples becomes increasingly expensive
              Mannamal-Oracle
              There are two java based loaders for loading RDF data, one is the incremental loader, the other is the batch loader (both are documented on OTN). The batch loader works on an empty model, for an initial load of large sets of data, such as 10 million in your example. Subsequent loads should be done using the incremental loader.

              Both the java based loaders optimize the load by managing the indexes appropriately (rebuilding indexes were necessary etc.). If you are interested in loading using SQL*Loader, (as we are discussing in SQL*Loader error 350 using SDO_RDF_TRIPLE_S constructor we will need to do some investigation before we can recommend how the indexes should be managed. As I described in that post, we will test SQL*Loader for large loads and post recommendations accordingly.
              Is there any way to add more than one tablespace to
              an RDF network, for the purpose of reliably
              separating tables from indexes?
              Not in the current release. We will note this as a potential requirement for future plans.

              >
              Of major concern is the downtime required to load new
              triples and then recreate the rules index.
              This need has been expressed by other users as well, and has been noted as requriement for a future release.

              Melli