This content has been marked as final. Show 5 replies
When a semantic network is created, the indexes on the underlying storage are created as part of that operation. Afterwards, during bulk-load and other load operations, the indexes are automatically maintained.
In order to create a model using Oracle SEM_APIS, you need to do first create an application table with an SDO_RDF_TRIPLE_S type column and then create an RDF model associated with that column of the application table. For example,
create table nci_rdf_data
execute sem_apis.create_sem_model(NCI, 'nci_rdf_data', 'triple');
The Jena adaptor was created later and provided a slightly different interface.
Invoking sqlldr is just for loading the staging table. Once the staging table has been loaded, you need to invoke the SEM_APIS.BULK_LOAD_FROM_STAGING_TABLE routine to load the data from the staging table into the Semantic Store and the application table.
The bulk-load API automatically populates the indexes on the internal storage tables. User may optionally create custom indexes on the application table, if necessary. (Please note that presence of indexes on application table during large bulk-load slows down the bulk-load operation due to incremental index maintenance.)
Thanks for the information. Next question- what is the fastest method for loading RDF data into Oracle 11g? I've used Jena and the SQL*Loader approach. Is there a faster method? If the SQL*Loader is the fastest, can I load data in parallel by using several SQL*Loader staging tables for each model I am loading?
Using SQL*Loader to load the staging table and then using the bulk-load API is the fastest method when you are loading large number of triples.
Loading into N different target models, in parallel, from N different staging tables is going to be faster than loading them one after another.
Loading into a single target model, in parallel, from N different staging tables is not always going to be faster, particularly if the target model is empty to start with. This is caused by the performance difference between bottom-up index build and incremental index maintenance.
Not sure if this answers your question.
Actually that does answer my question. Thanks.
I've been running a process that loads data into a model. I have used the SQL*Loader and bulk-load API. I've loaded 60 million triples in about 20 hours. I don't think that is very fast.
The Oracle database is a "vanilla" installation. I just used the Oracle Database Configuration Assistant. I found your earlier post about speeding up the loading process by changing the sga_target, pga_aggregate_target, etc. I will modify those values and see if that helps. Please let me know if there are any other "standard" settings to use. I'm not a DBA so I just went with the default settings.
Another wrinkle with this process is I'm running the Oracle server on a virtual Linux server. My virtual machine has 4 GB of memory. I am going to restart the virtual machine with 8GB. We don't have an extra machine to host Linux. If you think virtualization is leading to poor performance, we could justify buying another machine. What should be the "average" load time for loading triples into Oracle?
To answer your question on the differences in using Oracle Jena Adaptor vs sql approach.
Oracle Jena Adaptor will create a few additional tables when a new RDF model is created using the provided Java API. Also, an additional index on the application table will be created to facilitate triple deletes. These are mainly for addressing common application development needs.
For example, the additional index on the application table is necessary for Top Braid Composer to remove a class definition triple from your ontology.
Some default names for those additional tables and the index are used, if not specified.
Javadoc has detailed information.
Regular SQL approach will not do the above additional steps. You will have to do it by yourself if there is a need.