5 Replies Latest reply on Jul 17, 2006 4:49 PM by Mannamal-Oracle

    Inconsistent querying for Blank Nodes

    522722
      Using RDF database with Oracle 10.2.0.1, Windows, Java.

      Please see my test code below. I insert a triple with a blank node "_:MyBNode", but when I query it, the blank node suddenly has a different ID (somelike like _:ORABN555856E2F1D3462388C08120EA98A212). How can I get consistent behavior, so that I can map the Oracle-generated ID back to the original ID? Why doesn't Oracle just use the strings that I provide?

      Thanks for any help!

      Holger Knublauch
      TopQuadrant, Inc.
      http://www.topbraidcomposer.com
      --------------------------------------------------------


      String insert =
           "INSERT INTO TEST_TRIPLES " +
           " VALUES( TEST_TRIPLES_SEQ.nextval, " +
           " SDO_RDF_TRIPLE_S( ?, ?, ?, ?, ? ) )";
      OraclePreparedStatement ps = (OraclePreparedStatement) connection.prepareStatement(insert);
      ps.setString(1, "TEST");
      ps.setString(2, "<http://a.com/Person>");
      ps.setString(3, "<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>");
      ps.setString(4, "_:MyBNode");
      int modelID = // ...
      ps.setInt(5, modelID);
      int count = ps.executeUpdate();
      assertEquals(1, count);

      String query =
           "SELECT * FROM TABLE ( " +
           " sdo_rdf_match( '(?s ?p ?o)'," +
           " sdo_rdf_models('TEST'), null, null, null) )";

      Statement stmt = connection.createStatement();
      ResultSet rs = stmt.executeQuery(query);
      assertTrue(rs.next());
      String subject = rs.getString("s");
      String predicate = rs.getString("p");
      String object = rs.getString("o");
        • 1. Re: Inconsistent querying for Blank Nodes
          Mannamal-Oracle
          The logic behind using a system generated string to represent a blank node is as follows - if two applications happen to use the same blank node string to represent two different entities, they should not be seen as referring to the same entity. (if two are URIs are identical, then they indeed should be interpreted as referring to the same entity, blank nodes however, are used for unnamed entities).

          Is there a need for you to retrieve the same blank node that you put in? If the blank node is just used to represent an unnamed entity, does it matter?

          If an application would like to re-use blank nodes - that is, use the same blank node string to represent two entities that are unnamed but are the same - then the appropriate constructor can be used (takes in the model id from which to reuse the blank node). This is described in section 1.4.2 in the documentation. When this constructor is used, the original blank node string is stored in an internal table. The query APIs however will still return the system generated blank node string as shown below in your example. Internally, the re-used blank nodes will be maintained as equivalent.

          Melli
          • 2. Re: Inconsistent querying for Blank Nodes
            522722
            Thanks for this explanation, which sheds some light on the logic.

            My use case is that I am developing an ontology editor based on the Jena API. In Jena (and all other APIs that I know of), bnodes are represented with a unique ID, and this ID does not change over time. This means that once I create a bnode (e.g. an owl:Restriction) then its gets an internal ID and this ID is used for comparisons and hash map look-ups etc.

            So assume I have created a new Jena restriction with internal ID "_:MyAnonId". I add it to the database in a triple

            (:Person, rdfs:subClassOf _:MyAnonId)

            Some code in my program will rely on the resource known as _:MyAnonId, for example it will be shown on the screen and kept in reference maps etc. Now assume my code asks for all superclasses of Person, and gets back something like _:ORABN478724824784. My system doesn't know that this object is the same (in the sense of the equals method) like my original restriction (that still floats around elsewhere). They would be different objects, breaking the fundamental contracts in the Jena API or any other API I know. The behavior in such a system would be unpredictable.

            Note that your architecture is no problem as long as you focus on querying, without creating new resources and remembering them (as Java objects).

            But to resolve my problem I guess I need an Oracle method that returns me the internal ID for a given input ID, e.g. a map (_:MyAnonId -> _:ORABN47...). Then I could on-the-fly replace the ids. Unfortunately this would mean zero to two database look ups for each queried triple - a performance nightmare. Does such a method exist at all? Then I could at least continue with my project.

            However, I really wonder: why can't Oracle just use the same ID that people use to add statements?

            Thanks for your insights
            Holger
            • 3. Re: Inconsistent querying for Blank Nodes
              Mannamal-Oracle
              From what you are saying, it appears to me that you are using the blank as a named entity - in effect, like a URI, which uniquely identifies an entity. Our implementation of blank nodes was based on the interpretation that the name of a blank node has no real meaning.

              The reason we use system generated identifiers to represent blank nodes is because one user could insert '_:A' as a blank node, and another user could also insert '_:A' as a blank node. If these are stored as is, then during query time these will be matched, which would be incorrect. The two blank nodes are not the same, they just happen to have used to same string to represent potentially two different unnamed entities.

              If you have access to the mdsys schema, or can ask someone who has access to mdsys (perhaps the DBA) to grant you permission to select from the table rdf_blank_node$, then a workaround is the following:

              When you insert a triple containing '_:MyArnold' use the constructor that takes in an additional parameter <model_id>:
              sdo_rdf_triple_s(<model_name>, <subject_name>, <predicate_name>,
              <object_name>, <model_id>) .
              The last argument <model_id> indicates that blank nodes should be "re-used" and will ensure that the association between the system generated identifier and the original string is maintained (refer to section 1.4.2 in the documentation). The first time the constructor is used with the string '_:MyArnold', the system generated identifier will be created and associated with the string '_:MyArnond'. The next time you insert a triple containing ':_MyArnold', using the same constructor (which includes the parameter <model_id>) the same system generated identifier that was created with the first insert will be used (or as we say "re-used").

              Now you will be able to map the system generated identifier to the original string that represented your blank node by querying the table mdsys.rdf_blank_node$. The column node_value contains the system generated identifier and the column orig_name contains the original string that represented the blank node. Any performance impact can be minimized by including a join with the table mdsys.rdf_blank_node$ as part of your SQL query.

              The table rdf_blank_node$ resides under the mdsys schema. Thus to access it you would have to connect as mdsys and grant access to the table rdf_blank_node$ to yourself (or request the DBA to do it).

              Another way to address the issue would be use URIs instead of blank nodes. Is it possible to edit the ontology editor's output to do so?
              • 4. Re: Inconsistent querying for Blank Nodes
                522722
                From what you are saying, it appears to me that you
                are using the blank as a named entity - in effect,
                like a URI, which uniquely identifies an entity. Our
                implementation of blank nodes was based on the
                interpretation that the name of a blank node has no
                real meaning.
                I understand this from a theoretical point of view, but I don't think this interpretation is practical. As I said it is not my invention that bnodes are often used with a unique ID. And here I mean unique in the sense that they remain the same while the application is running. The basic contract in APIs like Jena is that resources (including bnodes) are compared by their IDs. As a result it becomes a major problem if I create a bnode object (as a Java object in memory) and then as a query result some other bnode is created for the same node but with a different ID. No method in Jena would know that these two things mean the same node, and this is important for example to resolve references within a graph.
                The reason we use system generated identifiers to
                represent blank nodes is because one user could
                insert '_:A' as a blank node, and another user could
                also insert '_:A' as a blank node. If these are
                stored as is, then during query time these will be
                matched, which would be incorrect. The two blank
                nodes are not the same, they just happen to have used
                to same string to represent potentially two different
                unnamed entities.
                But this is a worst case scenario. While this could theoretically happen, I believe this logic could equally well be handled further up, on a programming level. Jena for example ensures this invariant inside a model as well. Any application could in principle test if a given ID is already used in the database or not.

                In my humbe opinion (without knowing the Oracle internals) it would have been much easier for everyone if Oracle would at least allow users to pass in their own IDs for bnodes. The worst-case behavior you describe should be an option, but not necessarily the default one.

                Another option would be to let Oracle use the same conversion in both directions. There should be an option which also returns bnodes with the IDs from the conversion tables, and not the internal ID.
                If you have access to the mdsys schema, [...]
                Thanks for describing this in detail. I can have a look, although I am afraid this will significantly impact the usability of this approach (both performance and access right problems).
                Another way to address the issue would be use URIs
                instead of blank nodes. Is it possible to edit the
                ontology editor's output to do so?
                No, and this would not solve the real problem, which is to create a correct implementation of an Oracle bridge for any Java API like Jena or Sesame. The ontology editor requires bnodes everywhere (OWL restrictions, rdf:Lists etc). Our hope was to enhance our ontology development tool TopBraid Composer (which already has a Jena-based database backend), so that it can be used as a front-end and editor for an Oracle 10g RDF database. However, it now seems that this will not be possible.

                Thanks
                Holger
                • 5. Re: Inconsistent querying for Blank Nodes
                  Mannamal-Oracle
                  Thanks for explaining this in detail, Holger. It seems to me to be a question of how blank nodes are interpreted. If indeed many applications expect blank nodes to be treated the way you have described, then we can look into adding that as an enhancement to our product, maybe in a patch release.

                  Melli

                  Message was edited by:
                  mannamal