7 Replies Latest reply on Oct 13, 2010 8:58 PM by alwu-Oracle

    Performance on the performanceInference() method

    804791
      I am currently new developer using the Semantic Technologies with the Jena Adaptor for Oracle 11g release 2 and I am having performance issues with the performanceInference() method. I am currently working in a collaborative work environment where the entailment index needs to be keep VALID so other users know that the current Model is up to date when querying the ontology.

      I have two questions?

      1) My current test is running around 4-9 seconds to perform the inference on the Model is there any other ways to speed this up?
      2) Is there a better way to keep the entailment index VALID during the addition/deletion of Triples without using the performInference call (I know you can switch the InferenceMaintenanceMode.UPDATE_WHEN_COMMIT and not use the performInference and commit the transaction on the graph but this still is producing the similar results regarding the performance)?
      3) Is only one Attachment allowed to be enable on a specific Model? I have noticed I cannot put multiple Attachments on a existing Model.

      Any help is very much appreciated, thanks.

      The example code is below:

      public void testIncrementalInference() throws Exception
      {
      Oracle oracle = new Oracle(JDBC_URL, USER, PASSWORD);

      String[] modelNames =
      {
      "acorn_instances", "acorn_delta", "AD_NCR_Testbed", "asset_core",
      "assetdescription", "assetdescription_i", "cstl",
      "designofexperiments", "networktopology", "ncr_master_import",
      "treat_combin_1", "treat_combin_2"
      };
      /* Test attaching an RDFS rulebase */
      Attachment attachment = Attachment.createInstance(
      modelNames,
      "RDFS",
      InferenceMaintenanceMode.NO_UPDATE,
      QueryOptions.ALLOW_QUERY_INCOMPLETE);

      GraphOracleSem graphOracleSem = new GraphOracleSem(
      oracle,
      "experiment",
      attachment);
      ModelOracleSem modelOracleSem = new ModelOracleSem(graphOracleSem);
      graphOracleSem.setInferenceOption("INC=T,DOP=4");
      graphOracleSem.analyze();
      graphOracleSem.performInference();
      graphOracleSem.analyzeInferredGraph();

      /* Start time of the FACTROracleOntologyLoader */
      Long startDl = System.currentTimeMillis();
      List<Triple> triples = new ArrayList<Triple>();
      for (int i = 0; i < 50; i++)
      {
      Triple rahulNadellaTriple = Triple.create(
      Node.createURI(UUID.randomUUID().toString()),
      Node.createURI(UUID.randomUUID().toString()),
      Node.createURI(UUID.randomUUID().toString()));
      triples.add(rahulNadellaTriple);
      }
      Triple triple1 = Triple.create(
      Node.createURI("urn:testuser"),
      Node.createURI("http://xmlns.com/foaf/0.1/name"),
      Node.createURI("Jeff Ramson"));
      Triple triple2 = Triple.create(
      Node.createURI("urn:testuser"),
      Node.createURI("http://xmlns.com/foaf/0.1/name"),
      Node.createURI("RJ Soward"));
      Triple triple3 = Triple.create(
      Node.createURI("urn:testuser"),
      Node.createURI("http://xmlns.com/foaf/0.1/name"),
      Node.createURI("James Smith"));
      triples.add(triple1);
      triples.add(triple2);
      triples.add(triple3);

      modelOracleSem.getGraph().getBulkUpdateHandler().add(triples);
      // modelOracleSem.commit();
      // graphOracleSem.commitTransaction();
      graphOracleSem.analyze();
      graphOracleSem.performInference();
      graphOracleSem.analyzeInferredGraph();

      Long endDl = System.currentTimeMillis();
      double milliseconds = ((double) (endDl - startDl));
      System.out.println("Milliseconds: " + milliseconds);

      StringBuilder sparqlQuery = new StringBuilder(
      "select ?f ?k WHERE {?f <http://xmlns.com/foaf/0.1/name> ?k .}");

      Query query = QueryFactory.create(sparqlQuery.toString());
      QueryExecution qexec = QueryExecutionFactory.create(query, modelOracleSem);
      /* Execute the SPARQL Select Query */
      ResultSet results = qexec.execSelect();
      FACTRResultFormatter.out(results, query);
      }
        • 1. Re: Performance on the performanceInference() method
          alwu-Oracle
          Hi,

          Regarding 1), what is your performance target? Could you please tell us a bit about your hardware and database configuration?

          Regarding 2), you have to run the API to update the inference closure. This is by design.

          Regarding 3), in one attachment, you can specify multiple models and/or multiple rulebases. Why do you want to add
          multiple attachments to an existing model?

          Thanks,

          Zhe Wu
          • 2. Re: Performance on the performanceInference() method
            804791
            We are currently looking to speed up our Semantic implementation currently the inserts/deletes run at 12-15 seconds and updates run 20-25 seconds. I would prefer that it be in single digits.

            1) The current PFILE is listed below.
            3) Is it possible to get an example on how to use the rulebases.

            Any ideas would be helpful,

            Thanks for help,
            Rahul

            ##############################################################################
            # Copyright (c) 1991, 2001, 2002 by Oracle Corporation
            ##############################################################################

            ###########################################
            # Archive
            ###########################################
            log_archive_format=%t_%s_%r.dbf

            ###########################################
            # Cache and I/O
            ###########################################
            db_block_size=16384

            ###########################################
            # Cursors and Library Cache
            ###########################################
            open_cursors=2048

            ###########################################
            # Database Identification
            ###########################################
            db_domain=""
            db_name=MY_SID

            ###########################################
            # Diagnostics and Statistics
            ###########################################
            UTL_FILE_DIR=/opt/oracle/app/oracle/oradata/MY_SID/utlfile
            #DEPRICATED - background_dump_dest=/opt/oracle/app/oracle/oradata/MY_SID/bdump
            core_dump_dest=/opt/oracle/app/oracle/oradata/MY_SID/cdump
            timed_statistics=TRUE
            #DEPRICATED user_dump_dest=/opt/oracle/app/oracle/oradata/MY_SID/udump

            ###########################################
            # File Configuration
            ###########################################
            control_files=/opt/oracle/app/oracle/oradata/MY_SID/ctlfile/control01.ctl, /opt/oracle/app/oracle/oradata/MY_SID/ctlfile/control02.ctl, /opt/oracle/app/oracle/oradata/MY_SID/ctlfile/control03.ctl
            db_recovery_file_dest=/opt/oracle/app/oracle/flash_recovery_area
            db_recovery_file_dest_size=2147483648

            ###########################################
            # Job Queues
            ###########################################
            job_queue_processes=3

            ###########################################
            # Miscellaneous
            ###########################################
            compatible=11.2.0.0.0
            diagnostic_dest=/opt/oracle/app/oracle

            ###########################################
            # Network Registration
            ###########################################
            local_listener=LISTENER_MY_SID

            ###########################################
            # Optimizer
            ###########################################
            optimizer_index_cost_adj=10
            query_rewrite_enabled=FALSE

            ###########################################
            # Memory / Pools
            ###########################################

            ###Development Server Settings
            sga_target = 500M
            sga_max_size = 1G
            JAVA_POOL_SIZE = 175M

            ###Production Server Settings
            #sga_target = 1G
            #sga_max_size = 2G
            #JAVA_POOL_SIZE = 256M

            ###########################################
            # Processes and Sessions
            ###########################################
            processes=300
            sessions=115

            ###########################################
            # Redo Log and Recovery
            ###########################################
            log_buffer=2850816
            log_checkpoint_interval=1000000

            ###########################################
            # Security and Auditing
            ###########################################
            audit_file_dest=/opt/oracle/app/oracle/oradata/MY_SID/adump
            audit_trail=NONE
            os_authent_prefix=""
            remote_login_passwordfile=EXCLUSIVE
            #DEPRICATED - remote_os_authent=FALSE
            sec_case_sensitive_logon=FALSE

            ###########################################
            # Shared Server
            ###########################################
            dispatchers="(PROTOCOL=TCP) (SERVICE=MY_SIDXDB)"
            shared_servers=1

            ###########################################
            # Sort, Hash Joins, Bitmap Indexes
            ###########################################

            ###Development Server Settings
            pga_aggregate_target=500M

            ###Production Server Settings
            #pga_aggregate_target=4G

            workarea_size_policy=AUTO

            ###########################################
            # System Managed Undo and Rollback Segments
            ###########################################
            undo_management=AUTO
            undo_retention=3600
            undo_tablespace=UNDOTBS1

            ###########################################
            # Transactions
            ###########################################
            dml_locks=800
            • 3. Re: Performance on the performanceInference() method
              alwu-Oracle
              Could you tell us a bit more about your hardware? Total Memory, # of CPUs, speed of CPUs, storage configuration, disk speed, etc...

              Thanks,

              Zhe
              • 4. Re: Performance on the performanceInference() method
                804791
                The hardware is being run on linux, the # of CPUs is 4, the total memory is 40gb RAM, 2.8 GHz clock speed, etc.

                Another question what is the appropriate way to update a Triple found in the database? Currently we are doing a delete through the bulkUpdateHandler then inserting the new triple, is there a simpler way to do this?

                Thanks for the help.
                Rahul
                • 5. Re: Performance on the performanceInference() method
                  alwu-Oracle
                  Hi,

                  To update a triple, you can do a delete and an insert through GraphOracleSem API. I assume your model is created by Jena Adapter.
                  (Jena Adapter will create an index on your application table to speed up deletes)

                  You have 40GB of RAM, and you don't allocate much memory to the database. Are there some other applications running on the same machine?

                  What kind of disk storage do you have? Do you have multiple physical disks? If so, it is a good idea to use ASM. It is important to have a
                  balanced hardware setup so that the parallel inference can be carried out efficiently (not blocked by I/O). SSDs are wonderful choices,
                  if applicable.

                  BTW, how many triples do you have in your graph?

                  Thanks,

                  Zhe Wu
                  • 6. Re: Performance on the performanceInference() method
                    804791
                    We'll probably end up using one of the Dell 710's -- dual quad cores for the final machine we deploy the database. This machine will only meant for databases nothing else. I don't remember the RAM...but I'm sure it's over 40 GB...

                    We'll end up using the SAN for disk space. Each server has a single RCA fiber to the SAN. We could request multiple SAN disks, but I don't know if we will still reap the benefit of parallel inference processing when we access multiple SAN logical disks via a single fiber channel.

                    I was wondering if there are any other alternatives to improve the performance of inferencing after a insert/update/delete on the semantic database using the Jena Adaptor for Oracle. Currently we are using the peformance options to improve inferencing (incremental inference, parallel execution, and selective inference) provided within the Oracle Database Semantic Technology Developer Guide. I am really looking to improve the performance of the update which will be the most used function within the application. The big issue is that we always need to keep the entailments VALID so other users can see changes made to the ontology (i.e. collaborative work environment).

                    The current performance I am seeing through a couple of JUnit tests are:
                    INSERT -> 2-5 seconds
                    UPDATE -> 7- 11 seconds
                    DELETE -> 2-5 seconds

                    Thanks for all the help,
                    Rahul
                    • 7. Re: Performance on the performanceInference() method
                      alwu-Oracle
                      Hi,

                      The first thing you could do is to bump up your SGA and PGA. If your total memory is over 40GB and there is no other applications running, you can bump up the total memory of SGA + PGA to 32GB or more. Regarding your SAN disks, what is the speed of your fibre channel?

                      Since you are making small changes to the graph, you don't have to run the following every time. Please comment them out as re-run your test.
                      graphOracleSem.analyze();
                      graphOracleSem.analyzeInferredGraph();

                      From your code snippet, it is not clear why UPDATE is taking 5 more seconds than insert/delete.
                      A single triple addition or a single triple delete should only take a few (or a few ten) milliseconds.
                      Maintaining inference closure seems to take < 5 seconds on your system. So updating a triple (removing
                      a triple followed by adding a new triple) and then re-run the inference should not take much longer
                      than a delete followed by inference maintenance.

                      This is getting quite involved. If you like, we can take this off the forum discussion and post a summary
                      later. You can email alan dot wu at oracle dot com.

                      Thanks,

                      Zhe Wu