10 Replies Latest reply on Nov 2, 2009 2:23 PM by alwu-Oracle

    SEM_MATCH query execution time when a predicate is set as a variable

    712796
      Hello there,

      Our ontology contains 70 million triples (12.5 million asserted triples and 58 million inferred triples). We created a virtual model which includes both asserted and inferred triples).

      Our semantic network index configuration is 'PSCM' (index status is valid).

      The problem we encounter is when we run a SEM_MATCH query which includes a predicate as a variable, the query ran for 22 seconds. When a similar query is run when the variable is subject or an object or both the time for the SEM_MATCH query execution is 0.2 seconds. Is this reasonable? What do we do wrong?

      For example:
      1. Predicate as a variable (execution time is 22 seconds):
      SELECT a FROM TABLE ( SEM_MATCH
      ( ' { <http://www.tomproj.com/rep#entity_4> ?a 4 }',
      SEM_MODELS( 'TOM_VIRTUAL_MODEL') ,
      NULL,
      NULL,
      NULL,
      'ALLOW_DUP=T'));

      2. Subject as a variable (execution time is 0.2 seconds):
      SELECT a FROM TABLE ( SEM_MATCH
      ( ' { ?a <http://www.tomproj.com/rep#id> 4  }',
      SEM_MODELS( 'TOM_VIRTUAL_MODEL') ,
      NULL,
      NULL,
      NULL,
      'ALLOW_DUP=T'));

      2. Object as a variable (execution time is 0.2 seconds):
      SELECT a FROM TABLE ( SEM_MATCH
      ( ' { <http://www.tomproj.com/rep/#entity_4> <http://www.tomproj.com/rep#id> a }',
      SEM_MODELS( 'TOM_VIRTUAL_MODEL') ,
      NULL,
      NULL,
      NULL,
      'ALLOW_DUP=T'));

      Cheers,

      Doron
        • 1. Re: SEM_MATCH query execution time when a predicate is set as a variable
          alwu-Oracle
          Hi,

          Have you checked sem_apis.add_sem_index API?

          See http://download.oracle.com/docs/cd/E11882_01/appdev.112/e11828/toc.htm

          Thanks,

          Zhe Wu
          • 2. Re: SEM_MATCH query execution time when a predicate is set as a variable
            712796
            As I mentioned in my previous post, our network is indexed with 'PSCM' option (using the sem_apis.add_sem_index('PSCM')).

            Cheers,

            Doron
            • 3. Re: SEM_MATCH query execution time when a predicate is set as a variable
              alwu-Oracle
              Hi,

              Sorry I did not make it clear. Say your query pattern has subject bound, predicate unbound,
              then you can try adding SEM_APIS.ADD_SEM_INDEX('SCP'), or 'SC', or 'S'.

              Adding/removing index has impact on your overall application. So you have to carefully benchmark
              before and after adding a single index. Also, you need to check the additional storage requirement
              caused by new index.

              Thanks,

              Zhe Wu
              • 4. Re: SEM_MATCH query execution time when a predicate is set as a variable
                712796
                Thanks for the reply,

                First of all some background is required:
                We have a virtual model named 'TOM_VIRTUAL_MODEL' which contains the following models 'TOM' and 'TOM_TF'.
                The virtual model also contains two rule bases: 'OWLPRIME' and a user rule base 'TOM_RB'.

                We are using Oracle 11g rel1 on Windows Server 2008.

                I performed the following steps:
                1. SEM_APIS.ADD_SEM_INDEX('SCP');
                2. SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('TOM','SCP','REBUILD');
                3. SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('TOM_TF','SCP','REBUILD');
                4. SEM_APIS.ALTER_SEM_INDEX_ON_RULES_INDEX('TOM_RI','SCP','REBUILD'); // this is the rule bases index.

                The execution time of my query (see below) was 22secs.
                SELECT a FROM TABLE ( SEM_MATCH
                ( ' { <http://www.tomproj.com/rep#entity_4> ?a 4 }',
                SEM_MODELS( 'TOM_VIRTUAL_MODEL') ,
                NULL,
                NULL,
                NULL,
                'ALLOW_DUP=T'));

                The execution time of this query on 'TOM' model (not the 'TOM_VIRTUAL_MODEL') did improve allot but the problem remains when querying the virtual model.

                Did I do something wrong? Is there another way to enforce the new index on the virtual model?

                P.S

                Does adding a new index (i.e. 'SCM') affects the previous index (i.e. 'PSCM')?
                Can this index adding procedure degrade the performance of queries that use an unbound subject pattern?

                Cheers,

                Doron
                • 5. Re: SEM_MATCH query execution time when a predicate is set as a variable
                  Sdas-Oracle
                  Please add 'M' as the last column of the index keys. This is good in general, and especially improtant when using virtual models.

                  So, please first drop the SCP index
                  1. SEM_APIS.DROP_SEM_INDEX('SCP');

                  and then create an SCPM index instead:

                  1. SEM_APIS.ADD_SEM_INDEX('SCPM');
                  2. SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('TOM','SCPM','REBUILD');
                  3. SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('TOM_TF','SCPM','REBUILD');
                  4. SEM_APIS.ALTER_SEM_INDEX_ON_RULES_INDEX('TOM_RI','SCPM','REBUILD'); // this is the rule bases index.

                  This will improve performance for queries against the virtual model.

                  Thanks.
                  • 6. Re: SEM_MATCH query execution time when a predicate is set as a variable
                    alwu-Oracle
                    Hi,

                    In addition to Souri's suggestion, you may want to look into the "parallel" parameter of sem_apis.alter_sem_index... APIs. Parallel index building can be much faster depending on your hardware configuration.

                    BTW, have you tried using Jena Adaptor for single pattern queries?

                    Cheers,

                    Zhe Wu
                    • 7. Re: SEM_MATCH query execution time when a predicate is set as a variable
                      712796
                      Hi,

                      So far no lack, None of the solutions are working.

                      Query execution time didn't change after executing the following:
                      1. SEM_APIS.DROP_SEM_INDEX('SCP');
                      2. SEM_APIS.ADD_SEM_INDEX('SCPM');
                      3. SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('TOM','SCPM','REBUILD');
                      4. SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('TOM_TF','SCPM','REBUILD');
                      5. SEM_APIS.ALTER_SEM_INDEX_ON_RULES_INDEX('TOM_RI','SCPM','REBUILD');

                      My original query (see below) execution time remains 22 secs.
                      SELECT a FROM TABLE ( SEM_MATCH
                      ( ' { <http://www.tomproj.com/rep#entity_4> ?a 4 }',
                      SEM_MODELS( 'TOM_VIRTUAL_MODEL') ,
                      NULL,
                      NULL,
                      NULL,
                      'ALLOW_DUP=T'));

                      When running the SEM_MATCH on each model ('TOM,'TOM_TF') separately execution time was very fast (~0.007 secs):
                      1. SELECT a FROM TABLE ( SEM_MATCH
                      ( ' { <http://www.tomproj.com/rep#entity_4> ?a 4 }',
                      SEM_MODELS( 'TOM') ,
                      NULL,
                      NULL,
                      NULL,
                      NULL));

                      2. . SELECT a FROM TABLE ( SEM_MATCH
                      ( ' { <http://www.tomproj.com/rep#entity_4> ?a 4 }',
                      SEM_MODELS( 'TOM_TF') ,
                      NULL,
                      NULL,
                      NULL,
                      NULL));

                      When running the query on both models, execution time was 54 secs:
                      SELECT a FROM TABLE ( SEM_MATCH
                      ( ' { <http://www.tomproj.com/rep#entity_4> ?a 4 }',
                      SEM_MODELS( 'TOM','TOM_TF') ,
                      NULL,
                      NULL,
                      NULL,
                      NULL));

                      We also run the original query via the Jena Adaptor using Java language and SPARQL, but the results were similar.

                      As for using parallel parameter in SEM_APIS.ALTER_SEM_INDEX…, since we are using Oracle 11g release 1 the SEM_APIS.ALTER_SEM_INDEX doesn’t support the parallel parameter as far as we know.

                      In Oracle 11g release 2 the SEM_APIS.ALTER_SEM_INDEX was extended to support the use of 'parallel' parameter.

                      Any ideas on what do we do wrong?

                      Cheers,
                      Doron
                      • 8. Re: SEM_MATCH query execution time when a predicate is set as a variable
                        Sdas-Oracle
                        Would it be possible to send the query plans to us: souripriya dot das at oracle dot com? Thanks.
                        • 9. Re: SEM_MATCH query execution time when a predicate is set as a variable
                          712796
                          Once again the guys from Oracle proved to be very helpful.

                          Using the assistance of the guys from Oracle we concluded the following:
                          1. In order to execute queries with a predicate as variable it is recommended to create a new index as 'SCPM' using the add_network_index and apply this index for the relevant index using alter_sem_index_for_model API. The 'M' part of the index is used to index model name - which can be useful when using a virtual model (as wee did).
                          2. It is recommended to use sem_apis.analyze_model in order to collect stats before running any queries. I guess that is to help the query optimizer.
                          3. Since the query contains two patterns: { ?s  <http://www.tomproj.com/rep#db_id> 4 . ?s ?o ?p }, we used the hint part of the sem_match to help the query optimizer use the correct index: 'HINT0={ LEADING(t0 t1) USE_NL(t0 t1) INDEX(t0  RDF_LNK_PCSM_IDX) INDEX(t1 RDF_LNK_SCPM_IDX)} ALLOW_DUP=T'. This hint means that for the first pattern ('?s <http://www.tomproj.com/rep#db_id> 4' ) use index 'PCSM' and for the second pattern ('?s ?o ?p') use 'SCPM'.

                          Finally our query looked like this:
                          SELECT s,o,p FROM TABLE(SEM_MATCH( '{

                          ?s <http://www.tomproj.com/rep#db_id> 4 .

                          ?s ?o ?p

                          }' , SEM_Models ('TOM_VIRTUAL_MODEL'),NULL,NULL,NULL,NULL,'HINT0={ LEADING(t0 t1) USE_NL(t0 t1) INDEX(t0  RDF_LNK_PCSM_IDX) INDEX(t1 RDF_LNK_SCPM_IDX)} ALLOW_DUP=T'));


                          The improvement in query run time after applying the appropriate index and hints was from 22 sec to 0.002 sec.


                          Thanks,

                          Doron
                          • 10. Re: SEM_MATCH query execution time when a predicate is set as a variable
                            alwu-Oracle
                            A irrelevant thing from the performance. You used 4 in the query. You may want to either use plain literal form "4" or typed literal form like "4"^^xsd:integer or something.

                            Zhe