7 Ответы Последний ответ: 25.03.2011 17:31, автор: alwu-Oracle

    Iterator takes long time to initialize and interate thru.

    817559
      I am using 11.2 oracleJena adapter with arq2.8.5, jena 2.6.3 (also tried with jena 2.6.4)

      I saw a perfromance issues when I use iterator. The time it takes from the start iterator to in the the iterator is long. I am not sure why that is the case. Are there better ways of getting to information? Can you please guide?

      Thanks,

                          ResultSet results = qexec.execSelect();
                          System.out.println("got result");
                     List<Entity> entities = new ArrayList();
                     long time=System.currentTimeMillis();
                     System.out.println("Start Iteator");
                          for ( ; results.hasNext() ; ) {

                          System.out.println("In iterator"+(time-System.currentTimeMillis()));
      ....}
        • 1. Re: Iterator takes long time to initialize and interate thru.
          alwu-Oracle
          Hi,

          First, do you see extensive tracing when your program runs?

          Second, how many solutions are you retrieving through the iterator?

          Thanks,

          Zhe
          • 2. Re: Iterator takes long time to initialize and interate thru.
            817559
            What do you mean by "extensive tracing"?
            number QuerySolution returned are 1
            here is the query:
            PREFIX bp:<http://www.biopax.org/release/biopax-level3.owl#> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX ORACLE_SEM_FS_NS:<http://oracle.com/semtech#dop=24,RESULT_CACHE>SELECT ?entityId WHERE { ?entityId bp:name ?object . FILTER(regex(?object, "P01375", "i")) }

            It semes iterating thru iterator is expensive for example:
            Oracle oracle = createOracle();
                      ModelOracleSem modelDest = ModelOracleSem.createOracleSemModel(oracle,
                                model);
                      System.out.println("start");
                      long time=System.currentTimeMillis();

                      StmtIterator stmtsIter=modelDest.listStatements(new SimpleSelector(
                                          null,
                                          modelDest.getProperty("http://www.biopax.org/release/biopax-level3.owl#name"),
                                          (RDFNode) null) {
                                     public boolean selects(Statement s) {
                                          return s.getString().contains(filter);
                                     }
                                });
                      System.out.println((System.currentTimeMillis()-time)+" - got iterator");
                      System.out.println(" finally "+stmtsIter.toList().size()+" took "+(System.currentTimeMillis()-time));

            __________

            The output is as follows:
            start
            250 - got iterator
            finally 1 took 122624
            done


            ___________________________

            Now if I execute the query I observer following:

            After I have ResultSet, iterating thru 1 QuerySolution, it takes 243740ms, I am not sure what I am doing wrong/missing. Can you help? In the above scenario same thing the iterator is returned, but traversing thru elements seems costly?

            Here is the code that generates that results in output below:
                      System.out.println(querStr);
                      long time=System.currentTimeMillis();
                      int i=0;
                      try {
                           int iMatchCount = 0;

                                ResultSet results = qexec.execSelect();
                                Model m = results.getResourceModel();
                           //     System.out.println("got result "+m.size());
                      
                                time=System.currentTimeMillis();
                                while(results.hasNext())
                                {
                                System.out.println((System.currentTimeMillis()-time)+" "+i);
                                     QuerySolution soln=results.next();
                                     //System.out.println(soln.getResource("entityId").getURI());
                                     i++;
                                     
                                }
            _______________

            PREFIX bp:<http://www.biopax.org/release/biopax-level3.owl#> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX ORACLE_SEM_FS_NS:<http://oracle.com/semtech#dop=24,RESULT_CACHE>SELECT ?entityId WHERE { ?entityId bp:name ?object . FILTER(regex(?object, "P01375", "i")) }


            243709 0
            close
            1 time took 243740ms
            done
            • 3. Re: Iterator takes long time to initialize and interate thru.
              alwu-Oracle
              Hi,

              Extensive tracing is something like

              "2882 [main] DEBUG oracle.spatial.rdf.client.jena.GraphOracleSem - retrieveModelId: starts"

              From what you described, I don't think you have such tracing going on.

              I doubt iterator has much to do with performance in your case. It is query execution plan.
              Please take out the FILTER expression and re-run your query. Does that make a difference?

              Also, do you have a really powerful machine? I noticed that you use DOP=24 in your query.

              Thanks,

              Zhe Wu
              • 4. Re: Iterator takes long time to initialize and interate thru.
                817559
                I don't see any tracing.

                So removing the the filter criteria, I get huge result set, and I need to iterate thru that and run SimpleSelector to get all resources with specific object value.

                Is the query executed when you run execSelect()?

                Core to my problem is that I have over million triples, and I want to get all resources that contain filter value for a given property. What is your recommendation. I have not been able to do that efficiently.

                Yes, I do have a powerfiul machine where I am having difficulties.
                • 5. Re: Iterator takes long time to initialize and interate thru.
                  alwu-Oracle
                  Hi,

                  I guess it is clear now that iterator itself does not have much to do with what we are seeing here.

                  You brought up a great question "Is the query executed when you run execSelect()? "
                  The answer is that the query starts executing when execSelect is called. However, the query finishes after you exhaust the result set.

                  Back to the query, it is slow because you have many matches and there is no index to help speed up the regular expression. You are using 11.2 database, so you can leverage orardf:textContains for text search.

                  Section 1.6.4 of the following document has details.
                  http://download.oracle.com/docs/cd/E11882_01/appdev.112/e11828/sdo_rdf_concepts.htm

                  Thanks,

                  Zhe Wu
                  • 6. Re: Iterator takes long time to initialize and interate thru.
                    817559
                    Thanks.....
                    for queries with out filter it takes long time. for example my query

                         private static final String SIMPLETYPE = "PREFIX bp:<http://www.biopax.org/release/biopax-level3.owl#> "
                              + "PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> "
                              + "PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> "
                              + "PREFIX ORACLE_SEM_FS_NS:"
                              + "<http://oracle.com/semtech#dop=24,RESULT_CACHE>"
                              + " SELECT distinct ?entityId "
                              + "WHERE { "
                              + "{ "
                              + "?entityId rdf:type bp:"
                              + TYPEVALUE
                              + "} " + "}";
                    I suppose translated to
                    start
                    INFO [main] (SimpleLog.java:63) - Final clause = SELECT entityId$RDFVTYP, decode(entityId$RDFVTYP, 'BLN', ('_:'||substr(entityId,instr(entityId,'m',4)+1)), entityId) entityId FROM table(sdo_rdf_match('(?entityId <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.biopax.org/release/biopax-level3.owl#Protein>) ', sdo_rdf_models('BIOLOGICALDATASTORE'), null, null, null, NULL,' '))

                    If you see it 61793ms
                    to start 617393
                    • 7. Re: Iterator takes long time to initialize and interate thru.
                      alwu-Oracle
                      Hi,

                      How many rows did you get back from the query when the filter is absent?

                      BTW, did you get a chance to try the textContains?

                      Thanks,

                      Zhe