4 Replies Latest reply on Oct 9, 2009 1:51 PM by alwu-Oracle

    Performance issu with Jena using optional and filter

    725543
      Dear all,

      We do faces performance issues using Jena, Oracle 11.2 on Linux. following sparql query is translated to multiple queries. I do expect 127 results from a RDF store holding 1Million triplets. From Jena, it takes more than 70 sec which is much to slow.

      We have built sem indexes and also gathered statistic on them (see below). Have 1500 GB assigned for memory...

      Question: how can-we from Jena insure a much faster performance using optional and filter...

      Thank-you

      --------------
      ORIGINAL SPARQL Query

      SELECT ?cs, ?literal where { {?cs rdf:type skos:ConceptScheme}
      . OPTIONAL {?cs skos:prefLabel ?literal FILTER(LANG(?literal) = "en")}
      };



      Is translated to

      SELECT cs$RDFVTYP, decode(cs$RDFVTYP, 'BLN', ('_:'||substr(cs,instr(cs,'m',4)+1)), cs) cs FROM table(sdo_rdf_match('(?cs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#ConceptScheme>) ', sdo_rdf_models('test'), sdo_rdf_rulebases('OWLPRIME','SKOSCORE'), null, null, null))
      ; SELECT literal$RDFVTYP, literal$RDFLTYP, literal$RDFLANG, literal$RDFCLOB, decode(literal$RDFVTYP, 'BLN', ('_:'||substr(literal,instr(literal,'m',4)+1)), literal) literal FROM table(sdo_rdf_match('(<http://test.europa.eu/100202> <http://www.w3.org/2004/02/skos/core#prefLabel> ?literal) ', sdo_rdf_models('test'), sdo_rdf_rulebases('OWLPRIME','SKOSCORE'), null, null, null));
      SELECT literal$RDFVTYP, literal$RDFLTYP, literal$RDFLANG, literal$RDFCLOB, decode(literal$RDFVTYP, 'BLN', ('_:'||substr(literal,instr(literal,'m',4)+1)), literal) literal FROM table(sdo_rdf_match('(<http://test.europa.eu/100184> <http://www.w3.org/2004/02/skos/core#prefLabel> ?literal) ', sdo_rdf_models('test'), sdo_rdf_rulebases('OWLPRIME','SKOSCORE'), null, null, null));
      ...
      ; SELECT literal$RDFVTYP, literal$RDFLTYP, literal$RDFLANG, literal$RDFCLOB, decode(literal$RDFVTYP, 'BLN', ('_:'||substr(literal,instr(literal,'m',4)+1)), literal) literal FROM table(sdo_rdf_match('(<http://test.europa.eu/100236> <http://www.w3.org/2004/02/skos/core#prefLabel> ?literal) ', sdo_rdf_models('test'), sdo_rdf_rulebases('OWLPRIME','SKOSCORE'), null, null, null))
      ; SELECT literal$RDFVTYP, literal$RDFLTYP, literal$RDFLANG, literal$RDFCLOB, decode(literal$RDFVTYP, 'BLN', ('_:'||substr(literal,instr(literal,'m',4)+1)), literal) literal FROM table(sdo_rdf_match('(<http://test.europa.eu/100286> <http://www.w3.org/2004/02/skos/core#prefLabel> ?literal) ', sdo_rdf_models('test'), sdo_rdf_rulebases('OWLPRIME','SKOSCORE'), null, null, null))
      ; SELECT literal$RDFVTYP, literal$RDFLTYP, literal$RDFLANG, literal$RDFCLOB, decode(literal$RDFVTYP, 'BLN', ('_:'||substr(literal,instr(literal,'m',4)+1)), literal) literal FROM table(sdo_rdf_match('(<http://test.europa.eu/100177> <http://www.w3.org/2004/02/skos/core#prefLabel> ?literal) ', sdo_rdf_models('test'), sdo_rdf_rulebases('OWLPRIME','SKOSCORE'), null, null, null))
      ; SELECT literal$RDFVTYP, literal$RDFLTYP, literal$RDFLANG, literal$RDFCLOB, decode(literal$RDFVTYP, 'BLN', ('_:'||substr(literal,instr(literal,'m',4)+1)), literal) literal FROM table(sdo_rdf_match('(<http://test.europa.eu/100183> <http://www.w3.org/2004/02/skos/core#prefLabel> ?literal) ', sdo_rdf_models('test'), sdo_rdf_rulebases('OWLPRIME','SKOSCORE'), null, null, null));

      Issue based on Jena:
      1/ JENA execute filtering on language afterwards. This means, from Oracle, we extract ALL the literal for each conceptschemes.
      2/ We execute x queries

      Results: 68 seconds…

      It could be translated to 1 unique query as following:
      SELECT cs, literal
      FROM TABLE (
      SEM_MATCH(
      '{ {?cs rdf:type skos:ConceptScheme}
      . OPTIONAL {?cs skos:prefLabel ?literal FILTER(LANG(?literal) = "en")}
      }',
      SDO_RDF_Models('test'),
      SEM_RULEBASES('OWLPRIME', 'skoscore'),
      SDO_RDF_Aliases(SDO_RDF_Alias('skos','http://www.w3.org/2004/02/skos/core#'), SDO_RDF_Alias('xl','http://www.w3.org/2008/05/skos-xl#')),
      null));

      still it is even slower 256 seconds

      The fastest we could get is:
      SELECT cs, literal
      FROM TABLE (
      SEM_MATCH(
      '{ {?cs rdf:type skos:ConceptScheme}
      . OPTIONAL {?cs skos:prefLabel ?literal}
      }',
      SDO_RDF_Models('test'),
      SEM_RULEBASES('OWLPRIME', 'skoscore'),
      SDO_RDF_Aliases(SDO_RDF_Alias('skos','http://www.w3.org/2004/02/skos/core#'), SDO_RDF_Alias('xl','http://www.w3.org/2008/05/skos-xl#')),
      null))

      WHERE literal$rdflang = 'fr' or literal is null;


      Which return a result in less than 5 seconds.

      ---
      Indexes:
      - SEM_APIS.ADD_SEM_INDEX('SPC');
      -- SEM_APIS.ADD_SEM_INDEX('CPS');
      -- SEM_APIS.ADD_SEM_INDEX('PCS');
      -- SEM_APIS.ADD_SEM_INDEX('PSC');
      -- SEM_APIS.ADD_SEM_INDEX('SCP');
      -- SEM_APIS.ADD_SEM_INDEX('CSP');
      -- SEM_APIS.ADD_SEM_INDEX('SPOC');

      -- SEM_APIS.ADD_SEM_INDEX('S');
      -- SEM_APIS.ADD_SEM_INDEX('P');
      -- SEM_APIS.ADD_SEM_INDEX('C');
      -- SEM_APIS.ADD_SEM_INDEX('O');
      -- SEM_APIS.ADD_SEM_INDEX('SP');
      -- SEM_APIS.ADD_SEM_INDEX('SC');
      -- SEM_APIS.ADD_SEM_INDEX('SO');
      -- SEM_APIS.ADD_SEM_INDEX('PS');
      -- SEM_APIS.ADD_SEM_INDEX('PC');
      -- SEM_APIS.ADD_SEM_INDEX('PO');


      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','PSCF','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','SPC','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','CPS','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','PCS','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','PSC','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','SCP','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','CSP','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','S','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','P','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','C','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','O','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','SP','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','SC','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','SO','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','PS','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','PC','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','PO','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_MODEL('test','SPOC','UNUSABLE');

      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','PSCF','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','SPC','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','CPS','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','PCS','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','PSC','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','SCP','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','CSP','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','S','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','P','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','C','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','O','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','SP','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','SC','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','SO','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','PS','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','PC','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','PO','REBUILD');
      -- SEM_APIS.ALTER_SEM_INDEX_ON_ENTAILMENT('test_owlpskos_idx','SPOC','UNUSABLE');


      Statistic built:
      execute sem_apis.analyze_model(‘test’);
      execute sem_apis.analyze_rules_index(‘test_owlpskos_idx’);