2 Replies Latest reply on Apr 24, 2017 6:36 PM by 3054700

    SEM_MATCH query optimization

    3054700

      Hi all,

       

      I would like to know how to improve the execution time of Query 3, considering that Query 1 and Query 2 are very fast and Query 3 takes forever.

       

      Query 1:

      SELECT * FROM TABLE(SEM_MATCH('
        SELECT ?a  ?b  ?c
        WHERE {
          ?a :prop1 ?b .
          ?a :prop2 ?c
        }
        LIMIT  750',
      SEM_Models('MY_MODEL'), NULL, NULL, NULL, NULL, 'REWRITE=F DISABLE_NO_MERGE DO_UNESCAPE=T'));
      

       

      And:

       

      Query 2:

      SELECT * FROM TABLE(SEM_MATCH('
        SELECT ?a  ?b  ?c
        WHERE {
          ?a :prop1 ?b .
          ?b :prop2 ?c
        }
        LIMIT  750',
      SEM_Models('MY_MODEL'), NULL, NULL, NULL, NULL, 'REWRITE=F DISABLE_NO_MERGE DO_UNESCAPE=T'));
      

       

      For instance the execution of Queries 1 and 2 returns almost immediately (much less than 1 second). Now when I execute the Query 3 where the subject of :prop2 can be either ?a or ?b, it takes forever (not finished after 15 minutes).

       

      Query 3:

      SELECT * FROM TABLE(SEM_MATCH('
        SELECT ?a  ?b  ?c
        WHERE {
          ?a :prop1 ?b .
          ?d :prop2 ?c .
          FILTER (?d IN (?a, ?b))
        }
        LIMIT  750',
      SEM_Models('MY_MODEL'), NULL, NULL, NULL, NULL, 'REWRITE=F DISABLE_NO_MERGE DO_UNESCAPE=T'));
      

       

      Following are the execution plans (had to remove REWRITE=F option from the queries).

       

      Query 1:

      ----------------------------------------------------------------------------
      | Id  | Operation                      | Name              | Pstart| Pstop |
      ----------------------------------------------------------------------------
      |   0 | SELECT STATEMENT               |                   |       |       |
      |   1 |  NESTED LOOPS                  |                   |       |       |
      |   2 |   NESTED LOOPS                 |                   |       |       |
      |   3 |    NESTED LOOPS                |                   |       |       |
      |*  4 |     VIEW                       |                   |       |       |
      |*  5 |      COUNT STOPKEY             |                   |       |       |
      |   6 |       NESTED LOOPS             |                   |       |       |
      |   7 |        PARTITION LIST SINGLE   |                   |     7 |     7 |
      |*  8 |         INDEX RANGE SCAN       | RDF_LNK_PSCGM_IDX |     7 |     7 |
      |   9 |        PARTITION LIST SINGLE   |                   |     7 |     7 |
      |* 10 |         INDEX RANGE SCAN       | RDF_LNK_SCPM_IDX  |     7 |     7 |
      |  11 |     TABLE ACCESS BY INDEX ROWID| RDF_VALUE$        |       |       |
      |* 12 |      INDEX UNIQUE SCAN         | C_PK_VID          |       |       |
      |  13 |    TABLE ACCESS BY INDEX ROWID | RDF_VALUE$        |       |       |
      |* 14 |     INDEX UNIQUE SCAN          | C_PK_VID          |       |       |
      |  15 |   TABLE ACCESS BY INDEX ROWID  | RDF_VALUE$        |       |       |
      |* 16 |    INDEX UNIQUE SCAN           | C_PK_VID          |       |       |
      ----------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
      
         4 - filter("R"."C$RDFVID">0 AND "R"."B$RDFVID">0 AND 
                    "R"."A$RDFVID">0)
         5 - filter(ROWNUM<=750)
         8 - access("P_VALUE_ID"=4836715027433119050)
        10 - access("START_NODE_ID"="START_NODE_ID" AND 
                    "P_VALUE_ID"=1078255659061339009)
             filter("P_VALUE_ID"=1078255659061339009)
        12 - access("R"."A$RDFVID"="V0"."VALUE_ID")
        14 - access("R"."B$RDFVID"="V1"."VALUE_ID")
      

       

      Query 2: (only changes are in Predicate Information)

      Predicate Information (identified by operation id):
      ---------------------------------------------------
      
         4 - filter("R"."C$RDFVID">0 AND "R"."B$RDFVID">0 AND 
                    "R"."A$RDFVID">0)
         5 - filter(ROWNUM<=750)
         8 - access("P_VALUE_ID"=4836715027433119050)
        10 - access("CANON_END_NODE_ID"="START_NODE_ID" AND 
                    "P_VALUE_ID"=1078255659061339009)
        12 - access("R"."B$RDFVID"="V1"."VALUE_ID")
        14 - access("R"."A$RDFVID"="V0"."VALUE_ID")
        16 - access("R"."C$RDFVID"="V2"."VALUE_ID")
      

       

      Query 3:

      ----------------------------------------------------------------------------------
      | Id  | Operation                            | Name              | Pstart| Pstop |
      ----------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT                     |                   |       |       |
      |   1 |  NESTED LOOPS                        |                   |       |       |
      |*  2 |   VIEW                               |                   |       |       |
      |*  3 |    COUNT STOPKEY                     |                   |       |       |
      |   4 |     CONCATENATION                    |                   |       |       |
      |*  5 |      FILTER                          |                   |       |       |
      |   6 |       NESTED LOOPS                   |                   |       |       |
      |   7 |        NESTED LOOPS                  |                   |       |       |
      |   8 |         NESTED LOOPS                 |                   |       |       |
      |   9 |          NESTED LOOPS                |                   |       |       |
      |  10 |           PARTITION LIST SINGLE      |                   |     7 |     7 |
      |* 11 |            INDEX RANGE SCAN          | RDF_LNK_PSCGM_IDX |     7 |     7 |
      |  12 |           TABLE ACCESS BY INDEX ROWID| RDF_VALUE$        |       |       |
      |* 13 |            INDEX UNIQUE SCAN         | C_PK_VID          |       |       |
      |* 14 |          TABLE ACCESS FULL           | RDF_VALUE$        |       |       |
      |  15 |         PARTITION LIST SINGLE        |                   |     7 |     7 |
      |* 16 |          INDEX RANGE SCAN            | RDF_LNK_CPSM_IDX  |     7 |     7 |
      |  17 |        TABLE ACCESS BY INDEX ROWID   | RDF_VALUE$        |       |       |
      |* 18 |         INDEX UNIQUE SCAN            | C_PK_VID          |       |       |
      |* 19 |      FILTER                          |                   |       |       |
      |  20 |       NESTED LOOPS                   |                   |       |       |
      |* 21 |        HASH JOIN                     |                   |       |       |
      |  22 |         PARTITION LIST SINGLE        |                   |     7 |     7 |
      |* 23 |          INDEX RANGE SCAN            | RDF_LNK_PSCM_IDX  |     7 |     7 |
      |  24 |         NESTED LOOPS                 |                   |       |       |
      |  25 |          NESTED LOOPS                |                   |       |       |
      |  26 |           PARTITION LIST SINGLE      |                   |     7 |     7 |
      |* 27 |            INDEX RANGE SCAN          | RDF_LNK_PSCGM_IDX |     7 |     7 |
      |  28 |           TABLE ACCESS BY INDEX ROWID| RDF_VALUE$        |       |       |
      |* 29 |            INDEX UNIQUE SCAN         | C_PK_VID          |       |       |
      |* 30 |          TABLE ACCESS FULL           | RDF_VALUE$        |       |       |
      |* 31 |        TABLE ACCESS BY INDEX ROWID   | RDF_VALUE$        |       |       |
      |* 32 |         INDEX UNIQUE SCAN            | C_PK_VID          |       |       |
      |  33 |   TABLE ACCESS BY INDEX ROWID        | RDF_VALUE$        |       |       |
      |* 34 |    INDEX UNIQUE SCAN                 | C_PK_VID          |       |       |
      ----------------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
      
         2 - filter("R"."C$RDFVID">0)
         3 - filter(ROWNUM<=750)
         5 - filter(ROWNUM<=750)
        11 - access("P_VALUE_ID"=4836715027433119050 AND "START_NODE_ID">0 AND 
                    "START_NODE_ID" IS NOT NULL)
        13 - access("V2"."VALUE_ID"="START_NODE_ID")
        14 - filter("SEM_APIS"."RDFTERMEQUAL"("V2"."VALUE_TYPE","V2"."VNAME_PREF
                    IX","V2"."VNAME_SUFFIX","V2"."LITERAL_TYPE","V2"."LANGUAGE_TYPE","V3"."VAL
                    UE_TYPE","V3"."VNAME_PREFIX","V3"."VNAME_SUFFIX","V3"."LITERAL_TYPE","V3".
                    "LANGUAGE_TYPE")=1)
        16 - access("V3"."VALUE_ID"="CANON_END_NODE_ID" AND 
                    "P_VALUE_ID"=1078255659061339009 AND "START_NODE_ID">0 AND 
                    "START_NODE_ID" IS NOT NULL)
             filter("CANON_END_NODE_ID">0)
        18 - access("V5"."VALUE_ID"="START_NODE_ID")
        19 - filter(ROWNUM<=750)
        21 - access("V5"."VALUE_ID"="START_NODE_ID")
        23 - access("P_VALUE_ID"=1078255659061339009 AND "START_NODE_ID">0 AND 
                    "CANON_END_NODE_ID">0 AND "START_NODE_ID" IS NOT NULL)
             filter("CANON_END_NODE_ID">0)
        27 - access("P_VALUE_ID"=4836715027433119050 AND "START_NODE_ID">0 AND 
                    "START_NODE_ID" IS NOT NULL)
        29 - access("V2"."VALUE_ID"="START_NODE_ID")
        30 - filter("SEM_APIS"."RDFTERMEQUAL"("V2"."VALUE_TYPE","V2"."VNAME_PREF
                    IX","V2"."VNAME_SUFFIX","V2"."LITERAL_TYPE","V2"."LANGUAGE_TYPE","V5"."VAL
                    UE_TYPE","V5"."VNAME_PREFIX","V5"."VNAME_SUFFIX","V5"."LITERAL_TYPE","V5".
                    "LANGUAGE_TYPE")=1)
        31 - filter(LNNVL("SEM_APIS"."RDFTERMEQUAL"("V2"."VALUE_TYPE","V2"."VNAM
                    E_PREFIX","V2"."VNAME_SUFFIX","V2"."LITERAL_TYPE","V2"."LANGUAGE_TYPE","V3
                    "."VALUE_TYPE","V3"."VNAME_PREFIX","V3"."VNAME_SUFFIX","V3"."LITERAL_TYPE"
                    ,"V3"."LANGUAGE_TYPE")=1))
        32 - access("V3"."VALUE_ID"="CANON_END_NODE_ID")
        34 - access("R"."C$RDFVID"="V0"."VALUE_ID")
      

       

      Any idea is very welcome. Thank you in advance.

      Fred

        • 1. Re: SEM_MATCH query optimization
          jerven

          Hi, for any SPARQL engine it's best to avoid using filters for known values. This tends to change simple joins into exhaustive value equality testing.

           

          Query 3:

          SELECT * FROM TABLE(SEM_MATCH('  SELECT ?a ?b ?c  WHERE {  ?a :prop1 ?b .  ?d :prop2 ?c .  FILTER (?d IN (?a, ?b))  }  LIMIT 750', SEM_Models('MY_MODEL'), NULL, NULL, NULL, NULL, 'REWRITE=F DISABLE_NO_MERGE DO_UNESCAPE=T'))

          Suggestion 3.

           

          SELECT * FROM TABLE(SEM_MATCH('

          SELECT ?a ?b ?c

          WHERE {

          ?a :prop1 ?b .

          { ?a :prop2 ?c . }

          UNION

          { ?b :prop2 ?c . }

          }

          LIMIT 750',

          SEM_Models('MY_MODEL'), NULL, NULL, NULL, NULL, 'REWRITE=F DISABLE_NO_MERGE DO_UNESCAPE=T'))

           

          Here we use an UNION to have the same effect as the FILTER IN, which tends to perform much better.

          Hope it works for you.

           

          Regards,

          Jerven

          • 2. Re: SEM_MATCH query optimization
            3054700

            Thank you Jerven, that's what I figured!