9 Replies Latest reply on Aug 17, 2016 2:59 PM by Matperry-Oracle

    SPARQL otimizations

    3274437

      Dear all,

       

      We need to execute a query that looks like the bellow. In this case, the result has only 9 rows.

       

      For the subqueries, we notice that using ALL_LINK_HASH for the object properties (e.g. lines 8 to 11) and putting other properties/filters/binds in a different BGP (e.g. lines 12 to 16) improves the performance over 10x.

       

      However, we are wondering if there are other recommendations for this type of query. We are using semantic indexes PCSGM, PSCGM, PSCM, CPSM and SCPM, and executing the function SEM_APIS.ANALYZE_MODEL.

       

      PREFIX  : <http://example.org/>
      SELECT  ?I_C0 ?C0 ?I_C2 ?C1 ?P0 ?P1
      WHERE
        { ?I_C0 rdfs:label ?C0 .
          ?I_C2 rdfs:label ?C1
          { { { SELECT  ?I_C0 ?I_C2 ?P0 ?P1
                WHERE
                  { { # HINT0={ALL_LINK_HASH}
                      ?I_C1 :C1#C3_CD_C3 ?I_C0 .
                      ?I_C2 :C2#ID_C1 ?I_C1
                    }
                    { ?I_C2 :C2#P4 ?P0 . 
                      ?I_C0  :C3#P5  ?P2 .
                      FILTER orardf:textContains(?P2, "fuzzy({ exploração }, 70, 1)", 1)
                      BIND(concat("Unidade P5: ", str(?P2), "") AS ?P1)
                    }
                  }
                LIMIT   750
              }
              UNION
                { SELECT  ?I_C0 ?I_C2 ?P0 ?P1
                  WHERE
                    { { # HINT0={ALL_LINK_HASH}
                        ?I_C1 :C1#C3_CD_C3 ?I_C0 .
                        ?I_C2 :C2#ID_C1 ?I_C1
                      }
                      { ?I_C2 :C2#P4 ?P0 . 
                        ?I_C0  :C3#P6  ?P3 .
                        FILTER orardf:textContains(?P3, "fuzzy({ exploração }, 70, 1)", 1)
                        BIND(concat("Unidade P6: ", str(?P3), "") AS ?P1)
                      }
                    }
                  LIMIT   750
                }
              UNION
                { SELECT  ?I_C0 ?I_C2 ?P0 ?P1
                  WHERE
                    { { # HINT0={ALL_LINK_HASH}
                        ?I_C1 :C1#C3_CD_C3 ?I_C0 .
                        ?I_C2 :C2#ID_C1 ?I_C1
                      }
                      { ?I_C2 :C2#P4 ?P0 . 
                        ?I_C0  :C3#P7  ?P4 .
                        FILTER orardf:textContains(?P4, "fuzzy({ exploração }, 70, 1)", 1)
                        BIND(concat("Unidade P7: ", str(?P4), "") AS ?P1)
                      }
                    }
                  LIMIT   750
                }
              UNION
                { SELECT  ?I_C0 ?I_C2 ?P0 ?P1
                  WHERE
                    { { # HINT0={ALL_LINK_HASH}
                        ?I_C1 :C1#C3_CD_C3 ?I_C0 .
                        ?I_C2 :C2#ID_C1 ?I_C1
                      }
                      { ?I_C2 :C2#P4 ?P0 . 
                        ?I_C0  :C3#P8  ?P5 .
                        FILTER orardf:textContains(?P5, "fuzzy({ exploração }, 70, 1)", 1)
                        BIND(concat("Unidade P8: ", str(?P5), "") AS ?P1)
                      }
                    }
                  LIMIT   750
                }
            }
          }
        }
      LIMIT   750
      

       

      Thanks in advance.

        • 1. Re: SPARQL otimizations
          Matperry-Oracle

          Hi,

           

          I'm guessing the textContains filters are the selective part of the query. How many results do you get for the following query?

           

          SELECT  ?I_C0 

          WHERE  {

              ?I_C0  :C3#P5  ?P2 . 

               FILTER orardf:textContains(?P2, "fuzzy({ exploração }, 70, 1)", 1) 

           

          Also, what query plan do you get for the query below?

           

          SELECT  ?I_C0 ?I_C2 ?P0

          WHERE  {

            ?I_C1 :C1#C3_CD_C3 ?I_C0 . 

             ?I_C2 :C2#ID_C1 ?I_C1  .

             ?I_C2 :C2#P4 ?P0 . 

              ?I_C0  :C3#P5  ?P2 . 

               FILTER orardf:textContains(?P2, "fuzzy({ exploração }, 70, 1)", 1) 

          }

           

          Thanks,

          Matt

          • 2. Re: SPARQL otimizations
            3274437

            Hi Matt,

             

            I got 1478 for this query!

             

            The problem is actually with property :C2#P4 . I executed the following query and got 3 rows.

             

            SELECT  ?I_C2

            WHERE { ?I_C2 :C2#P4 ?P0 }

             

            I'm sorry, I can't figure out how to get the query plan for the sem_match query. I executed the following:

             

            Explain Plan For

                 SELECT * FROM  TABLE (sem_match(' <sparql> ',

                      Sem_Models('MODEL_MAT'), Null, Null, Null, Null, 'REWRITE=F DISABLE_NO_MERGE'))

             

            Then,

             

                 SELECT plan_table_output from table(dbms_xplan.display('plan_table',null,'basic,parallel,partition,predicate'));

             

            And all I got was this:

            • 3. Re: SPARQL otimizations
              Matperry-Oracle

              If you get rid of ' REWRITE=F ' in the options parameter. The full execution plan should show. In addition, the user performing explain plan will need to have select privileges on MDSYS.RDF_LINK$ for the plan to show properly.

               

              Thanks,

              Matt

              • 4. Re: SPARQL otimizations
                3274437

                Hi Matt,

                 

                Thanks! Here is the query plan!

                 

                Plan hash value: 705851199
                --------------------------------------------------------------------------------
                | Id  | Operation                          | Name              | Pstart| Pstop |
                --------------------------------------------------------------------------------
                |   0 | SELECT STATEMENT                   |                   |       |       |
                |   1 |  NESTED LOOPS                      |                   |       |       |
                |   2 |   NESTED LOOPS                     |                   |       |       |
                |   3 |    NESTED LOOPS                    |                   |       |       |
                |   4 |     NESTED LOOPS                   |                   |       |       |
                |   5 |      NESTED LOOPS                  |                   |       |       |
                |   6 |       NESTED LOOPS                 |                   |       |       |
                |   7 |        NESTED LOOPS                |                   |       |       |
                |   8 |         TABLE ACCESS BY INDEX ROWID| RDF_VALUE$        |       |       |
                |*  9 |          DOMAIN INDEX              | RDF_V$TEXT_IDX    |       |       |
                |  10 |         PARTITION LIST SINGLE      |                   |     2 |     2 |
                |* 11 |          INDEX RANGE SCAN          | RDF_LNK_CPSM_IDX  |     2 |     2 |
                |  12 |        PARTITION LIST SINGLE       |                   |     2 |     2 |
                |* 13 |         INDEX RANGE SCAN           | RDF_LNK_CPSM_IDX  |     2 |     2 |
                |  14 |       PARTITION LIST SINGLE        |                   |     2 |     2 |
                |* 15 |        INDEX RANGE SCAN            | RDF_LNK_CPSM_IDX  |     2 |     2 |
                |  16 |      PARTITION LIST SINGLE         |                   |     2 |     2 |
                |* 17 |       INDEX RANGE SCAN             | RDF_LNK_PSCGM_IDX |     2 |     2 |
                |  18 |     TABLE ACCESS BY INDEX ROWID    | RDF_VALUE$        |       |       |
                |* 19 |      INDEX UNIQUE SCAN             | C_PK_VID          |       |       |
                |  20 |    TABLE ACCESS BY INDEX ROWID     | RDF_VALUE$        |       |       |
                |* 21 |     INDEX UNIQUE SCAN              | C_PK_VID          |       |       |
                |  22 |   TABLE ACCESS BY INDEX ROWID      | RDF_VALUE$        |       |       |
                |* 23 |    INDEX UNIQUE SCAN               | C_PK_VID          |       |       |
                --------------------------------------------------------------------------------
                
                Predicate Information (identified by operation id):
                ---------------------------------------------------
                
                   9 - access("CTXSYS"."CONTAINS"("V0"."VNAME_PREFIX",''fuzzy({ 
                              exploração }, 70, 1)'',1)>0)
                  11 - access("V0"."VALUE_ID"="CANON_END_NODE_ID" AND 
                              "P_VALUE_ID"=7418697099396023919 AND "START_NODE_ID">0 AND 
                              "START_NODE_ID" IS NOT NULL)
                       filter("CANON_END_NODE_ID">0)
                  13 - access("CANON_END_NODE_ID"="START_NODE_ID" AND 
                              "P_VALUE_ID"=1534205713339294672)
                       filter("CANON_END_NODE_ID">0)
                  15 - access("CANON_END_NODE_ID"="START_NODE_ID" AND 
                              "P_VALUE_ID"=227675759942938949 AND "START_NODE_ID">0 AND 
                              "START_NODE_ID" IS NOT NULL)
                  17 - access("P_VALUE_ID"=1517695056292898938 AND 
                              "START_NODE_ID"="START_NODE_ID" AND "CANON_END_NODE_ID">0 AND 
                              "CANON_END_NODE_ID" IS NOT NULL)
                       filter("START_NODE_ID">0)
                  19 - access("CANON_END_NODE_ID"="V1"."VALUE_ID")
                  21 - access("START_NODE_ID"="V0"."VALUE_ID")
                  23 - access("CANON_END_NODE_ID"="V2"."VALUE_ID")
                
                • 5. Re: SPARQL otimizations
                  Matperry-Oracle

                  The plan seems ok. What kind of execution time do you get for this query, and how does it compare with the others you've tried?

                  • 6. Re: SPARQL otimizations
                    3274437

                    This query takes around 15 seconds, but in other cases we could have more sub queries. I guess the time could increase along with que number of unions/sub queries.

                     

                    One thing that made a huge difference was to put the triples/filters/binds in a specific BGP.

                     

                    If we put the sub queries like the bellow, the execution time is around 170 seconds

                    SELECT  ?I_C0 ?I_C2 ?P0 ?P1
                              WHERE
                                { { # HINT0={ALL_LINK_HASH}
                                    ?I_C1 :C1#C3_CD_C3 ?I_C0 .
                                    ?I_C2 :C2#ID_C1 ?I_C1
                                  }
                                  ?I_C2 :C2#P4 ?P0 . 
                                  ?I_C0  :C3#P5  ?P2 .
                                  FILTER orardf:textContains(?P2, "fuzzy({ exploração }, 70, 1)", 1)
                                  BIND(concat("Unidade P5: ", str(?P2), "") AS ?P1)
                                }
                              LIMIT   750
                    

                    *Note that lines (7 to 10) are not surrounded by braces

                     

                    So, we are wondering if there are other improvements similar to the above that we can do.

                     

                    And how about the semantic indexes? Is there another one that we can use to improve this? Or maybe we can exclude unnecessary indexes.

                     

                    Thanks in advance!

                    • 7. Re: SPARQL otimizations
                      Matperry-Oracle

                      I don't think think the extra nesting is necessary, and the existing indexes should be sufficient. How does the query run if you keep the UNIONs and remove the extra nesting like below.

                       

                      PREFIX  : <http://example.org/

                      SELECT  ?I_C0 ?C0 ?I_C2 ?C1 ?P0 ?P1 

                      WHERE 

                        { ?I_C0 rdfs:label ?C0 . 

                          ?I_C2 rdfs:label ?C1 

                          { SELECT  ?I_C0 ?I_C2 ?P0 (concat("Unidade P5: ", str(?P2), "") AS ?P1) 

                             WHERE  {

                                      ?I_C1 :C1#C3_CD_C3 ?I_C0 . 

                                      ?I_C2 :C2#ID_C1 ?I_C1 

                                      ?I_C2 :C2#P4 ?P0 .  

                                      ?I_C0  :C3#P5  ?P2 . 

                                      FILTER orardf:textContains(?P2, "fuzzy({ exploração }, 70, 1)", 1) 

                             }

                         } 

                      UNION 

                         { SELECT  ?I_C0 ?I_C2 ?P0 (concat("Unidade P6: ", str(?P3), "") AS ?P1) 

                           WHERE  {

                                        ?I_C1 :C1#C3_CD_C3 ?I_C0 . 

                                        ?I_C2 :C2#ID_C1 ?I_C1 

                                        ?I_C2 :C2#P4 ?P0 .  

                                        ?I_C0  :C3#P6  ?P3 . 

                                        FILTER orardf:textContains(?P3, "fuzzy({ exploração }, 70, 1)", 1) 

                           } 

                         } 

                      UNION 

                         { SELECT  ?I_C0 ?I_C2 ?P0 (concat("Unidade P7: ", str(?P4), "") AS ?P1) 

                           WHERE  {

                                        ?I_C1 :C1#C3_CD_C3 ?I_C0 . 

                                        ?I_C2 :C2#ID_C1 ?I_C1 

                                        ?I_C2 :C2#P4 ?P0 .  

                                        ?I_C0  :C3#P7  ?P4 . 

                                        FILTER orardf:textContains(?P4, "fuzzy({ exploração }, 70, 1)", 1) 

                             } 

                        } 

                      UNION 

                        { SELECT  ?I_C0 ?I_C2 ?P0 (concat("Unidade P8: ", str(?P5), "") AS ?P1)  

                           WHERE  {

                                        ?I_C1 :C1#C3_CD_C3 ?I_C0 . 

                                        ?I_C2 :C2#ID_C1 ?I_C1 

                                        ?I_C2 :C2#P4 ?P0 .  

                                        ?I_C0  :C3#P8  ?P5 . 

                                        FILTER orardf:textContains(?P5, "fuzzy({ exploração }, 70, 1)", 1) 

                            } 

                        }

                      LIMIT   750

                      • 8. Re: SPARQL otimizations
                        3274437

                        Hi Matt,

                         

                        It still takes around 170 seconds. But I'm not able to get the full query plan anymore.

                        • 9. Re: SPARQL otimizations
                          Matperry-Oracle

                          Let's take this offline. Please email me directly at matthew dot perry at oracle dot com with some more details of your setup (DB version, number of triples, number of models, etc.).

                           

                          Thanks,
                          Matt