5 Replies Latest reply on Sep 24, 2010 5:13 PM by Matperry-Oracle

    How to improve performance of queries?

    799283
      Hi,

      I was wondering, how one can achieve the best performance using 11g & semantic technologies. Before I run my own tests, I'm very interested in your experiences.

      Questions I am especially interested in are:

      1) General access:
      I only need some "Model", we already omit the usage of OntModel since I understood OntModel always resides within memory (bad for scalability). Now, to get a "Model", shall I just use "ModelOracleSem.createOracleSemModel(oracle, dbModelName)" or go via a graph with "new ModelOracleSem(graph)"? Or should I prefer any other way to maximize performance? I guess I should omit inference as far as possible right?
      2) SELECT vs. CONSTRUCT - do they differ regarding their performance?
      3) SELECT vs. model.listStatements - which one is faster?
      4) Complex queries vs. multiple simple queries - Is it faster to use one complex, large query or multiple small queries?


      Thanks for any advice and experiences!

      Best,
      Damian
        • 1. Re: How to improve performance of queries?
          715399
          Hi Damian,

          If you're interested in our experiences w.r.t. performance, then this presentation is a must-read:

          http://download.oracle.com/otndocs/tech/semantic_web/pdf/2010_ora_semtech_wkshp.pdf

          There you will find best practices and a lot of advice on loading, query and inference.

          Regarding your questions,
          1) There is no difference between ModelOracleSem.createOracleSemModel and using a ModelOracleSem(graph) constructor. Usage of inference should depend on your application: if you need it, then enable it. Inference is done using forward chaining; inferred triples are materialized beforehand so that queries can execute fast.

          2) SELECT and CONSTRUCT are translated to essentially the same queries, so I don't think they will differ in performance significantly.

          3) If you don't need to use SPARQL, then using listStatements might be faster (we have specifically optimized it).

          4) You should test this out yourself ;)

          Regards,
          Vladimir
          1 person found this helpful
          • 2. Re: How to improve performance of queries?
            799283
            Hi Vladimir,

            thanks for the helpful reply. I guess some of my questions are quite stupid, but I am pretty new to this so I hope you understand ;)

            Until now, I tried to use CONSTRUCT as often as possible and access the sub-models using listStatements. But if I read your posting correctly, it is best to use get as much out of a single SELECT statement as possible right (if model.listStatements is not enough)?

            Thanks again,
            Damian
            • 3. Re: How to improve performance of queries?
              715399
              Hi Damian,

              No worries; actually the questions are pretty good.

              In your case, it makes more sense to use a single SELECT query; otherwise you're first creating a temporary, in-memory graph using CONSTRUCT, and then traversing it using listStatements. The latter approach will not scale with large RDF graphs.

              Regards,
              Vladimir
              1 person found this helpful
              • 4. Re: How to improve performance of queries?
                799283
                Vladimir,

                thanks again, you really made my day!!! :)

                Have a nice weekend!

                Best,
                Damian
                • 5. Re: How to improve performance of queries?
                  Matperry-Oracle
                  Hi Damian,

                  Here are a few more general tips for query performance:

                  1) Always gather statistics for the query optimizer. In addition, we usually see better performance with column group statistics for PS and PC column groups.

                  exec sem_apis.analyze_model('my_model',METHOD_OPT =>'FOR COLUMNS (P_VALUE_ID, CANON_END_NODE_ID) SIZE AUTO',DEGREE=>4);
                  exec sem_apis.analyze_model('my_model',METHOD_OPT =>'FOR COLUMNS (P_VALUE_ID, START_NODE_ID) SIZE AUTO',DEGREE=>4);
                  exec sem_perf.gather_stats(just_on_values_table=>true,degree=>4);

                  Note: the DEGREE argument is for degree of parallelism

                  Usually, you would load data, then gather statistics, and then periodically re-gather them as updates are done (maybe when 20% of the data is new).

                  2) Create appropriate semantic network indexes. We generally recommend PCSM and PSCM indexes. PCSM is always there, and PSCM is created by default in the latest patch but not in 11.2.0.1.0 release (11.2.0.1.0 has a PSCF index that should be dropped and replaced with PSCM).

                  Both of these items are covered in the documentation.

                  You may also find the following presentation from SemTech 2010 helpful. It covers many best practices for load, query and inference.
                  http://download.oracle.com/otndocs/tech/semantic_web/pdf/2010_ora_semtech_wkshp.pdf

                  Thanks,
                  Matt
                  1 person found this helpful