3 Replies Latest reply on Mar 26, 2008 4:42 PM by Mannamal-Oracle

    OWL inference support in 11g and bulk loading problems

      I am relatively new to Semantic technologies, so please apologize me if am wrong in my problem statement. My work contains two parts:

      Part 1. Building Ontologies(huge) and Persisting them to ORACLE Database.
      Part 2. Making Inferences on these Ontologies within the Database.(not in-memory)

      I am using ORACLE 10gR2 and hoping to upgrade to 11g based on the inputs that I get from this reply.

      Problems related to part 1:

      I am loading the Ontologies into database by using 'getBulkUpdateHandler().add(graph)', where 'graph' is the Ontology that I am currently persisting. But when I am trying to add the same graph again( into the same back-end RDF model), it is duplicating the triples showing the size to be double of the original graph. I expect it not to load the same graph again and again. So is there any way that this redundancy can be controlled. I trie d with 'add(graph, false/true)' also, but the result is same.

      Problems related to part 2:

      1. Is Inferencing made with in the data base itself ?

      2.When can we expect OWL DL and OWL Full inference support in ORACLE( I have read that 11g supports only a subset of OWL entailments) and mean while If I use external reasoners (like those that come with jena or pellet), does it mean that inferences will be made from the databases itself or should they be loaded into main memory(I have tried OWLFullReasoner(In-Memory) in jena to make inferences on a Ontology of 80,0000 triples with 1GB of JVM and it is throwing a out of memory exception after '5' long hours).

      So please help me through this.....
        • 1. Re: OWL inference support in 11g and bulk loading problems
          Part 1: The backend RDF will not store the graph as duplicates - when the same triples are re-inserted the graph size in the backend store will not go up. Perhaps you are referring to the size of the application table (the table with the column of type sdo_rdf_triple_s?) That will have a row for all triples that you have loaded, including duplicates, but the backend store will not store duplicates as distinct triples. So the size of the triples in the triple store will not double.

          Part 2: Yes, inferencing is done in the database for the subset of OWL that we support. For other OWL predicates an external reasoner can be used, and yes, the data will have to be loaded into main memory. The typical approach taken is to separate out the schema data from the instance data. The schema data (T Box) is more likely to have sophisticated OWL constructs and likely to be much smaller than the instance data (A Box), and so you will not require a big dataset to be loaded into memory.

          Some of the OWL DL constructs can be supported using the user-defined rules feature in Oracle. See the inferencing best practices paper at http://www.oracle.com/technology/tech/semantic_technologies/pdf/semantic_infer_bestprac_wp.pdf
          and the presentation at: "A scalable RDBMS-based inference engine for RDFS/OWL", Oct. 2007, http://ontolog.cim3.net/file/work/DatabaseAndOntology/2007-10-
          18_AlanWu/RDBMS-RDFS-OWL-InferenceEngine--AlanWu_20071018.pdf The presentation also has details on separating out T Box and A Box reasoning.

          OWL Full is not tractable, we are looking at OWL DL in the future.

          • 2. Re: OWL inference support in 11g and bulk loading problems
            Thanks for the reply. Yes the size of the application table is increasing, not the back-end triple store. But I want to know if there is any way to check this redundancy at 'application table' level. As said I am using 'getBulkUpdateHandler().add(graph) function to add the graph to the back-end.

            Also, I have read the TBox and ABox approach. But I have one doubt regarding the SPARQL queries using 'sdo_rdf_match()' in 10g, ie Is inferencing done on the result set (the small set of results yielded which match the pattern given in the SPARQL) or done on the entire model and then it is queried. Too put simply 'which is done first, Query or Inference'.

            Narni Rajesh.
            • 3. Re: OWL inference support in 11g and bulk loading problems
              You could add a duplicate check programmatically, but that would be computationally intensive. Are you worried about space?

              Inferencing in Oracle is done before-hand, and not at query time. There are specific APIs to be called for inferencing (create_rules_index for RDF and RDFS in 10g, and create_entailment for RDF, RDFS and OWL in 11g with create_rules_index still being used for user-defined rules in 11g). When these APIs are called, the graph is inferred and the results stored. A query can be on the original data and the inferred triples (the query API (SDO_RDF_MATCH or SEM_MATCH in 11g) allows you to specify that you want to use the inferred triples by specifying the rules as part of the SDO_RDF_Rulebases parameter).