This content has been marked as final. Show 3 replies
Part 1: The backend RDF will not store the graph as duplicates - when the same triples are re-inserted the graph size in the backend store will not go up. Perhaps you are referring to the size of the application table (the table with the column of type sdo_rdf_triple_s?) That will have a row for all triples that you have loaded, including duplicates, but the backend store will not store duplicates as distinct triples. So the size of the triples in the triple store will not double.
Part 2: Yes, inferencing is done in the database for the subset of OWL that we support. For other OWL predicates an external reasoner can be used, and yes, the data will have to be loaded into main memory. The typical approach taken is to separate out the schema data from the instance data. The schema data (T Box) is more likely to have sophisticated OWL constructs and likely to be much smaller than the instance data (A Box), and so you will not require a big dataset to be loaded into memory.
Some of the OWL DL constructs can be supported using the user-defined rules feature in Oracle. See the inferencing best practices paper at http://www.oracle.com/technology/tech/semantic_technologies/pdf/semantic_infer_bestprac_wp.pdf
and the presentation at: "A scalable RDBMS-based inference engine for RDFS/OWL", Oct. 2007, http://ontolog.cim3.net/file/work/DatabaseAndOntology/2007-10-
18_AlanWu/RDBMS-RDFS-OWL-InferenceEngine--AlanWu_20071018.pdf The presentation also has details on separating out T Box and A Box reasoning.
OWL Full is not tractable, we are looking at OWL DL in the future.
Thanks for the reply. Yes the size of the application table is increasing, not the back-end triple store. But I want to know if there is any way to check this redundancy at 'application table' level. As said I am using 'getBulkUpdateHandler().add(graph) function to add the graph to the back-end.
Also, I have read the TBox and ABox approach. But I have one doubt regarding the SPARQL queries using 'sdo_rdf_match()' in 10g, ie Is inferencing done on the result set (the small set of results yielded which match the pattern given in the SPARQL) or done on the entire model and then it is queried. Too put simply 'which is done first, Query or Inference'.
You could add a duplicate check programmatically, but that would be computationally intensive. Are you worried about space?
Inferencing in Oracle is done before-hand, and not at query time. There are specific APIs to be called for inferencing (create_rules_index for RDF and RDFS in 10g, and create_entailment for RDF, RDFS and OWL in 11g with create_rules_index still being used for user-defined rules in 11g). When these APIs are called, the graph is inferred and the results stored. A query can be on the original data and the inferred triples (the query API (SDO_RDF_MATCH or SEM_MATCH in 11g) allows you to specify that you want to use the inferred triples by specifying the rules as part of the SDO_RDF_Rulebases parameter).