1 2 Previous Next 16 Replies Latest reply on Oct 9, 2009 6:55 AM by 714977

    SPARQL query on multiple graphs

    652134
      Hello,
      I am using Oracle 11.1.0.6.0 with the Oracle Jena Release 2 drivers. I have imported Pubmed articles into one model and the Uniprot RDF data into another model. I am trying to run a SPARQL query to join Pubmed articles for a certain topic to their proteins. Here is the SPARQL query:

      PREFIX uniprot: <http://purl.uniprot.org/core/>
      PREFIX df: <http://www.ncbi.nlm.nih.gov/pubmed/>
      SELECT ?uniprot
      WHERE { 
      GRAPH <http://sw.brainstage.com/pubmed>
      { ?article df:hasMajorMesh <http://www.ncbi.nlm.nih.gov/pubmed/D017354>  
      }
      GRAPH <http://purl.uniprot.org>

      ?uniprot uniprot:citation ?citation .
      ?citation <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> uniprot:Journal_Citation .
      ?citation <http://www.w3.org/2002/07/owl#sameAs> ?article .
      }
      }

      When I execute the query, the Jena output appears to perform the following steps:

      1. execute the select on the first graph (PUBMED)
      for each record returned in step 1:
      2. sdo_rdf_match('(?citation <http://www.w3.org/2002/07/owl#sameAs> <URI from step 1>

      Both of these graphs are very large. Pubmed has over 17 million articles. Uniprot contains 500K proteins, but the size of the triples file is enormous: 153 GB. I have performed the sem_apis.analyze_model() command on the models. I know the query is correct because it does return data if I add a LIMIT statement to the end of the SPARQL query.

      Is there something I can do to improve the query performance using SPARQL? I am a little surprised that the Oracle Jena driver loops over the data when connecting to more than one graph. It seems inefficient.

      Thanks,
      Chuck
        • 1. Re: SPARQL query on multiple graphs
          Sdas-Oracle
          <ul><li>One SQL-based alternative would be to use a join of two SEM_MATCH queries as shown below (NOTE: graph-pattern args below use the curly-brace syntax available if patch 7600122 installed on 11.1.0.7. Also, names of the two models are assumed to be uniprot_model and pubmed_model.):</li>
          </ul>

          SELECT uniprot
          FROM
          TABLE(SEM_MATCH('
          {
          ?uniprot uniprot:citation ?citation .
          ?citation &lt;[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]&gt; uniprot:Journal_Citation .
          ?citation &lt;[http://www.w3.org/2002/07/owl#sameAs]&gt; ?article .
          }
          ', sem_models('*uniprot_model*'),null,sem_aliases(sem_alias('uniprot','http://purl.uniprot.org/core/')),null
          )) t1,
          TABLE(SEM_MATCH('
          {?article df:hasMajorMesh &lt;[http://www.ncbi.nlm.nih.gov/pubmed/D017354]&gt; }
          ',sem_models('*pubmed_model*'),null,sem_aliases(sem_alias('df','http://www.ncbi.nlm.nih.gov/pubmed/')),null
          )) t2
          WHERE t1.article$RDFVID=t2.article$RDFVID;

          <ul><li>     Another alternative for this particular example would be to combine the two graphs into a default graph and the two graph patterns into one as shown below:</li>
          </ul>
          (Jena Adaptor's attachment can be used to query multiple graphs. Allow dup option should be turned on.)

          PREFIX uniprot: &lt;[http://purl.uniprot.org/core/]&gt;
          PREFIX df: &lt;[http://www.ncbi.nlm.nih.gov/pubmed/]&gt;
          SELECT ?uniprot
          FROM &lt;[http://sw.brainstage.com/pubmed]&gt;
          FROM &lt;[http://purl.uniprot.org]&gt;
          {
          ?article df:hasMajorMesh &lt;[http://www.ncbi.nlm.nih.gov/pubmed/D017354]&gt;
          ?uniprot uniprot:citation ?citation .
          ?citation &lt;[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]&gt; uniprot:Journal_Citation .
          ?citation &lt;[http://www.w3.org/2002/07/owl#sameAs]&gt; ?article .
          }
          • 2. Re: SPARQL query on multiple graphs
            652134
            Hi Souri,
            Thanks for the reply. I would like to keep using SPARQL queries if possible. I will be upgrading to 11.1.0.7 in the near future. Could adding semantic indices to these models improve performance? If not, then I will use the SQL-based approach you describe.

            On the subject of semantic indices, is there a way to create a semantic index on a subset of the data within a model? In my example, I am trying to join the articles found in the Pubmed model with the articles listed in the Uniprot model. Indexing all the data from these models would help, but I'm worried the indexes would contain a huge amount of data. Can I index the article information found in the models? Is there some way for me to create an index for this join? Is this a case where I could use a virtual model?

            Thanks,
            Chuck
            • 3. Re: SPARQL query on multiple graphs
              Sdas-Oracle
              Chuck,

              The second alternative, that uses SPARQL, would benefit from use of a virtual model defined by combining the two RDF models.

              Regarding indexes, I am assuming you are referring to the recently introduced ability to create new (semantic) network indexes. We do not currently provide any SEM_APIS procedure for creating network indexes that will index only a subset of the triples in a model (e.g., a function-based index whose key is non-NULL only for a subset of the triples in a model).

              If you would like more details, please contact me by email.

              Thanks,
              - Souri.
              • 4. Re: SPARQL query on multiple graphs
                alwu-Oracle
                Hi,

                A bit clarification on Jena Adaptor's implementation. Jena Adaptor implements Jena's APIs including Graph, Model, BulkUpdateHandler, BindingQueryPlan, etc.

                There is no nested loop logic in Jena Adaptor's implementation. When a SPARQL query
                comes in, ARQ will parse it and pose questions to the underlying Graph implementation. In our logic, we convert each request to a single SQL query. ARQ in this case,
                loops through the data and ask many small queries against the underlying implementation.

                Thanks,
                Zhe Wu
                • 5. Re: SPARQL query on multiple graphs
                  652134
                  Hi Alan,
                  Thanks for the clarification about the inner workings of Jena and ARQ. So, will creating SEM_API indices (part of the 11.1.0.7 patches) help in this case? It sounds like I will encounter the ARQ "loop" whenever I execute a multiple graph SPARQL query through the Jena driver. Is Souri's suggestion of using SQL with SEM_MATCH my only option to avoid looping?

                  Regards,
                  Chuck
                  • 6. Re: SPARQL query on multiple graphs
                    alwu-Oracle
                    Nest loop will be there whether you create a new index or not.

                    Seems to me that you are a bit reluctant to use SQL approach. Sdas's second proposal
                    may work for you. Have you checked that? No looping will show up if you
                    combine the two graph and issue a SPARQL against the combined graph.

                    Thanks,

                    Zhe Wu
                    • 7. Re: SPARQL query on multiple graphs
                      652134
                      Hi Alan,
                      You are right, I would prefer to use SPARQL if possible. My application has been able to use SPARQL for all my other queries up until this point. In the future, I may want to query publicly available SPARQL endpoints. Therefore, most of my queries will involve SPARQL. If I need to use a SQL query to improve the performance, then I will do so.

                      Let me ask a dumb question. Is the Oracle Semantic technology designed to put all the data into a single graph? The SEM_APIS procedures seem to be designed to handle multiple graphs (for example you can analyze separate models using sem_apis.analyze_model). I have separated my data into several graphs. I am loading data from external data sources but their data not in RDF (except Uniprot). I believe at some point they will provide their data in an RDF format. I put the data into separate graphs in the event that I need to drop my version of the data and replace it with an "official" RDF version. I have loaded the following data into separate graphs:

                      Datasource Triples
                      Uniprot 1,135,895,574
                      Pubmed 655,411,818     
                      Homologene 172,371

                      I am concerned that if all this data is in one graph that the queries for the Homologene data will be hampered by all the data from Uniprot and Pubmed.
                      • 8. Re: SPARQL query on multiple graphs
                        alwu-Oracle
                        Sorry I did not make this clear enough. When I said "combined graph" I did not mean you have to store all triples in one graph. I meant that you simply write a plain conjunctive query against two graphs.

                        Using the second proposal, you issue the following query against a GraphOracleSem object
                        with an Attachment object.

                        PREFIX uniprot: <http://purl.uniprot.org/core/>
                        PREFIX df: <http://www.ncbi.nlm.nih.gov/pubmed/>
                        SELECT ?uniprot
                        {
                        ?article df:hasMajorMesh <http://www.ncbi.nlm.nih.gov/pubmed/D017354>
                        ?uniprot uniprot:citation ?citation .
                        ?citation <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> uniprot:Journal_Citation .
                        ?citation <http://www.w3.org/2002/07/owl#sameAs> ?article .
                        }
                        • 9. Re: SPARQL query on multiple graphs
                          652134
                          Hi Alan,
                          Thanks for the clarification. OK, I can add the models to the Attachment. How do I use the GraphOracleModel constructor in this case. All the GraphOracleModel constructors require a modelName (e.g. public GraphOracleSem(Oracle oracle, java.lang.String modelName, Attachment attachment)). What do I put in as the modelName?

                          Thanks,
                          Chuck
                          • 10. Re: SPARQL query on multiple graphs
                            alwu-Oracle
                            Please check
                            Running a SPARQL query across all models

                            Also, please set QueryOptions.ALLOW_QUERY_VALID_AND_DUP when you construct your Attachment object to speed up your query.
                            • 11. Re: SPARQL query on multiple graphs
                              652134
                              I am getting an Error: ORA-44004: invalid qualified SQL name
                              ORA-06512: at "SYS.DBMS_ASSERT", line 188

                              when running this code:

                              Query query = null;
                              QueryExecution qexec = null;
                              try {
                              String[] models = {"PUBMED", "UNIPROT"};
                              Attachment attachment = Attachment.createInstance(
                              models, "",
                              InferenceMaintenanceMode.NO_UPDATE,
                              QueryOptions.ALLOW_QUERY_VALID_AND_DUP);
                              GraphOracleSem graph = new GraphOracleSem(oracle,"PUBMED",attachment);
                              ModelOracleSem model = new ModelOracleSem(graph);

                              query = QueryFactory.create(queryText);
                              qexec = QueryExecutionFactory.create(query, model);


                              ResultSet results = qexec.execSelect();


                              Any suggestions?
                              • 12. Re: SPARQL query on multiple graphs
                                alwu-Oracle
                                You are very close. Try something like this.

                                *String[] models = { "UNIPROT"};*
                                String[] no_rbs = new String[0];
                                Attachment attachment = Attachment.createInstance(
                                models, no_rbs,
                                InferenceMaintenanceMode.NO_UPDATE,
                                QueryOptions.ALLOW_QUERY_VALID_AND_DUP);

                                GraphOracleSem graph = new GraphOracleSem(oracle,"PUBMED",attachment);
                                • 13. Re: SPARQL query on multiple graphs
                                  652134
                                  thnaks Alan. That worked!!
                                  • 14. Re: SPARQL query on multiple graphs
                                    714977
                                    String[] modelNames = {"cardiovascular", "pediatric", "occupationalhealth"};
                                    String modelName = "DEFAULT_MODEL"; --> empty database
                                    String[] list = {"RDFS"};

                                    Attachment attachment = Attachment.createInstance(modelNames, list, InferenceMaintenanceMode.NO_UPDATE, QueryOptions.ALLOW_QUERY_VALID_AND_DUP);
                                                        
                                         graph = new GraphOracleSem(oracleConn, modelName, attachment);                    
                                    graph.performInference();

                                    This my code snippet.
                                    When I try to execute, it will throw this particular error messages.

                                    Exception in thread "main" javax.xml.ws.soap.SOAPFaultException: rethrew: java.sql.SQLException: ORA-20000: We do not have a rules index for this Model-Rulebase combination
                                    ORA-06512: at "MDSYS.RDF_MATCH_IMPL_T", line 691
                                    ORA-06512: at "MDSYS.RDF_MATCH_IMPL_T", line 222
                                    ORA-06512: at line 1

                                    Can you show me an example of SPARQL querying towards more than 2 models with inferencing.

                                    Some other issues are:

                                    1) When I tried to query from 2 different combination of models it gave different result and error.
                                    a) Query from cardiovascular and pediatric - works (but on certain SPARQL query)
                                    b) Query from cardiovascular and occupationalhealth - error message Exception in thread "main" javax.xml.ws.soap.SOAPFaultException: rethrew:
                                    java.sql.SQLException: ORA-20000: We do not have a rules index for this Model-Rulebase combination ORA-06512: at "MDSYS.RDF_MATCH_IMPL_T", line
                                    691 ORA-06512: at "MDSYS.RDF_MATCH_IMPL_T", line 222 ORA-06512: at line 1

                                    c) Query from pediatricand occupationalhealth - error message Exception in thread "main" javax.xml.ws.soap.SOAPFaultException: rethrew:
                                    java.sql.SQLException: ORA-20000: We do not have a rules index for this Model-Rulebase combination ORA-06512: at "MDSYS.RDF_MATCH_IMPL_T", line
                                    691 ORA-06512: at "MDSYS.RDF_MATCH_IMPL_T", line 222 ORA-06512: at line 1

                                    Fyi, I'm comparing the result with results generated by Allegrograh. The results is hardly tally when i try to query multiple models.

                                    Please help me.


                                    Regards,
                                    Lia

                                    Edited by: user11745027 on Aug 2, 2009 8:57 PM

                                    Edited by: user11745027 on Aug 2, 2009 8:58 PM
                                    1 2 Previous Next