7 Replies Latest reply on Apr 13, 2010 9:08 PM by alwu-Oracle

    Extremely poor load performance into RDF store

      We are seeing unacceptably poor performance loading our RDF models. We have triedloading with Jena adaptor as well as loading using the BatchLoader class. Either way, the absolute best rate we have seen is about 20000/minute. Here are the specs:

      Loading a triples file from disk which contains 5.08 million triples
      Oracle 11gr2 running on Enterprise Linux64 with 16 GB RAM
      Oracle 11gr2 running on Solaris 10 with all the latest patches and packages...64 bit with 8GB RAM

      Sun is a bit slower than linux, but both are quite poor. I can load the identical triples file into Allegro running on my desktop PC and consistently get load rates of 1 million per minute.

      Can anyone suggest what could be causing this performance issue? Oracle is a very recent install on both the sun and linux boxes, and there is nothing whatsoever running on either one other than oracle, and I am the only oracle user. All install and parameter settings are oracle default.
        • 1. Re: Extremely poor load performance into RDF store
          First of all, what is your database configuration? (SGA, PGA, etc.)

          Second, the BatchLoader class is doing batch loading. You said you tried Jena Adaptor.
          Did you try the bulk load function in Jena Adaptor? If you like, you can cut & paste the code snippet you used to do the data loading.


          Zhe Wu
          • 2. Re: Extremely poor load performance into RDF store
            An update on the stats. I changed the way models are created, doing it manually from sqlplus first (vs letting the Jena stuff create the model on the fly), and that made things run about 3 times faster, but I still am not seeing anything better than triple load times of 79K/minute.
            Here is the java that runs the load:

            String model = "GMI";

            oracleGraph = oracleTools.makeGraph(model);
            oracleModel = new ModelOracleSem(oracleGraph);
            InputStream is = FileManager.get().open(props.getProperty(model +".NTRIPLES_IND_FILE"));
            StopWatch timer = new StopWatch();
            // read file contents into oracleModel
            oracleModel.read(is, "", "N-TRIPLE");
            System.out.println("\nLoad time into Oracle: " +timer.getElapsedTimeSecs());

            Here is most of the show parameters output:
            active_instance_count integer
            aq_tm_processes integer 0
            archive_lag_target integer 0
            asm_diskgroups string
            asm_diskstring string
            asm_power_limit integer 1
            asm_preferred_read_failure_groups string
            audit_file_dest string /opt/oracle/
            audit_sys_operations boolean FALSE
            audit_syslog_level string
            audit_trail string DB

            background_core_dump string partial
            background_dump_dest string /opt/oracle/
            backup_tape_io_slaves boolean FALSE
            bitmap_merge_area_size integer 1048576
            blank_trimming boolean FALSE
            buffer_pool_keep string
            buffer_pool_recycle string
            cell_offload_compaction string ADAPTIVE
            cell_offload_decryption boolean TRUE
            cell_offload_parameters string
            cell_offload_plan_display string AUTO
            cell_offload_processing boolean TRUE

            cell_partition_large_extents string DEFAULT
            circuits integer
            client_result_cache_lag big integer 3000
            client_result_cache_size big integer 0
            cluster_database boolean FALSE
            cluster_database_instances integer 1
            cluster_interconnects string
            commit_logging string
            commit_point_strength integer 1
            commit_wait string
            commit_write string
            compatible string
            control_file_record_keep_time integer 7

            control_files string /opt/oracle/
            control_management_pack_access string DIAGNOSTIC+T
            core_dump_dest string /opt/oracle/
            cpu_count integer 4
            create_bitmap_area_size integer 8388608
            create_stored_outlines string
            cursor_sharing string EXACT
            cursor_space_for_time boolean FALSE
            db_16k_cache_size big integer 0
            db_2k_cache_size big integer 0

            db_32k_cache_size big integer 0
            db_4k_cache_size big integer 0
            db_8k_cache_size big integer 0
            db_block_buffers integer 0
            db_block_checking string FALSE
            db_block_checksum string TYPICAL
            db_block_size integer 8192
            db_cache_advice string ON
            db_cache_size big integer 0
            db_create_file_dest string
            db_create_online_log_dest_1 string
            db_create_online_log_dest_2 string
            db_create_online_log_dest_3 string

            db_create_online_log_dest_4 string
            db_create_online_log_dest_5 string
            db_domain string domain
            db_file_multiblock_read_count integer 75
            db_file_name_convert string
            db_files integer 200
            db_flash_cache_file string
            db_flash_cache_size big integer 0
            db_flashback_retention_target integer 1440
            db_keep_cache_size big integer 0
            db_lost_write_protect string NONE
            db_name string jtvOrcl
            db_recovery_file_dest string

            db_recovery_file_dest_size big integer 0
            db_recycle_cache_size big integer 0
            db_securefile string PERMITTED
            db_ultra_safe string OFF
            db_unique_name string jtvOrcl
            db_writer_processes integer 1
            dbwr_io_slaves integer 0
            ddl_lock_timeout integer 0
            deferred_segment_creation boolean TRUE
            dg_broker_config_file1 string /opt/oracle/
            dg_broker_config_file2 string /opt/oracle/

            dg_broker_start boolean FALSE
            diagnostic_dest string /opt/oracle
            disk_asynch_io boolean TRUE
            dispatchers string (PROTOCOL=TC
            distributed_lock_timeout integer 60
            dml_locks integer 1088
            dst_upgrade_insert_conv boolean TRUE
            enable_ddl_logging boolean FALSE
            event string
            fal_client string
            fal_server string
            fast_start_io_target integer 0

            fast_start_mttr_target integer 0
            fast_start_parallel_rollback string LOW
            file_mapping boolean FALSE
            fileio_network_adapters string
            filesystemio_options string none
            fixed_date string
            gcs_server_processes integer 0
            global_context_pool_size string
            global_names boolean FALSE
            global_txn_processes integer 1
            hash_area_size integer 131072
            hi_shared_memory_address integer 0
            hs_autoregister boolean TRUE

            ifile file
            instance_groups string
            instance_name string jtvOrcl
            instance_number integer 0
            instance_type string RDBMS
            java_jit_enabled boolean TRUE
            java_max_sessionspace_size integer 0
            java_pool_size big integer 0
            java_soft_sessionspace_limit integer 0
            job_queue_processes integer 1000
            large_pool_size big integer 0
            ldap_directory_access string NONE
            ldap_directory_sysauth string no

            license_max_sessions integer 0
            license_max_users integer 0
            license_sessions_warning integer 0
            listener_networks string
            local_listener string
            lock_name_space string
            lock_sga boolean FALSE
            log_archive_config string

            max_dispatchers integer
            max_dump_file_size string unlimited
            max_enabled_roles integer 150
            max_shared_servers integer
            memory_max_target big integer 0
            memory_target big integer 0
            nls_calendar string
            nls_comp string BINARY

            nls_currency string
            nls_date_format string
            nls_date_language string
            nls_dual_currency string
            nls_iso_currency string
            nls_language string AMERICAN
            nls_length_semantics string BYTE
            nls_nchar_conv_excp string FALSE
            nls_numeric_characters string
            nls_sort string
            nls_territory string AMERICA
            nls_time_format string
            nls_time_tz_format string

            nls_timestamp_format string
            nls_timestamp_tz_format string
            object_cache_max_size_percent integer 10
            object_cache_optimal_size integer 102400
            olap_page_pool_size big integer 0
            open_cursors integer 300
            open_links integer 4
            open_links_per_instance integer 4
            optimizer_capture_sql_plan_baselines boolean FALSE
            optimizer_dynamic_sampling integer 2
            optimizer_features_enable string
            optimizer_index_caching integer 0
            optimizer_index_cost_adj integer 100

            optimizer_mode string ALL_ROWS
            optimizer_secure_view_merging boolean TRUE
            optimizer_use_invisible_indexes boolean FALSE
            optimizer_use_pending_statistics boolean FALSE
            optimizer_use_sql_plan_baselines boolean TRUE
            os_authent_prefix string ops$
            os_roles boolean FALSE
            parallel_adaptive_multi_user boolean TRUE
            parallel_automatic_tuning boolean FALSE
            parallel_degree_limit string CPU
            parallel_degree_policy string MANUAL
            parallel_execution_message_size integer 16384
            parallel_force_local boolean FALSE

            parallel_instance_group string
            parallel_io_cap_enabled boolean FALSE
            parallel_max_servers integer 80
            parallel_min_percent integer 0
            parallel_min_servers integer 0
            parallel_min_time_threshold string AUTO
            parallel_server boolean FALSE
            parallel_server_instances integer 1
            parallel_servers_target integer 32
            parallel_threads_per_cpu integer 2
            permit_92_wrap_format boolean TRUE
            pga_aggregate_target big integer 5905M
            plscope_settings string IDENTIFIERS:NONE

            plsql_ccflags string
            plsql_code_type string INTERPRETED
            plsql_debug boolean FALSE
            plsql_optimize_level integer 2
            plsql_v2_compatibility boolean FALSE
            plsql_warnings string DISABLE:ALL
            pre_page_sga boolean FALSE
            processes integer 150
            query_rewrite_enabled string TRUE
            query_rewrite_integrity string enforced
            rdbms_server_dn string
            read_only_open_delayed boolean FALSE
            recovery_parallelism integer 0

            recyclebin string on
            redo_transport_user string
            remote_dependencies_mode string TIMESTAMP
            remote_listener string
            remote_login_passwordfile string EXCLUSIVE
            remote_os_authent boolean FALSE
            remote_os_roles boolean FALSE
            replication_dependency_tracking boolean TRUE
            resource_limit boolean FALSE
            resource_manager_cpu_allocation integer 4
            resource_manager_plan string
            result_cache_max_result integer 5
            result_cache_max_size big integer 2624K

            result_cache_mode string MANUAL
            result_cache_remote_expiration integer 0
            resumable_timeout integer 0
            rollback_segments string
            sec_case_sensitive_logon boolean TRUE
            sec_max_failed_login_attempts integer 10
            sec_protocol_error_further_action string CONTINUE
            sec_protocol_error_trace_action string TRACE
            sec_return_server_release_banner boolean FALSE
            serial_reuse string disable
            service_names string jtvOrcl.domain
            session_cached_cursors integer 50
            session_max_open_files integer 10

            sessions integer 248
            sga_max_size big integer 512M
            sga_target big integer 512M
            shadow_core_dump string partial
            shared_memory_address integer 0
            shared_pool_reserved_size big integer 16148070
            shared_pool_size big integer 0
            shared_server_sessions integer
            shared_servers integer 1
            skip_unusable_indexes boolean TRUE
            smtp_out_server string
            sort_area_retained_size integer 0
            sort_area_size integer 65536

            spfile string /opt/oracle/product/11.2.0/db_
            sql92_security boolean FALSE
            sql_trace boolean FALSE
            sqltune_category string DEFAULT
            standby_archive_dest string ?/dbs/arch
            standby_file_management string MANUAL
            star_transformation_enabled string FALSE
            statistics_level string TYPICAL
            streams_pool_size big integer 0
            tape_asynch_io boolean TRUE
            thread integer 0
            timed_os_statistics integer 0

            timed_statistics boolean TRUE
            trace_enabled boolean TRUE
            tracefile_identifier string
            transactions integer 272
            transactions_per_rollback_segment integer 5
            undo_management string AUTO
            undo_retention integer 900
            undo_tablespace string UNDOTBS1
            use_indirect_data_buffers boolean FALSE
            user_dump_dest string /opt/oracle/diag/rdbms/jtvorcl
            utl_file_dir string
            workarea_size_policy string AUTO

            xml_db_events string enable
            • 3. Re: Extremely poor load performance into RDF store

              You may want to change these two things.

              1) your SGA setting is too low. Is there a particular reason for using only 512M?

              sga_max_size big integer 512M
              sga_target big integer 512M

              2) For the data loading, you are actually using incremental API. Can you please try
              the bulk load API?

              Please search for addInBulk on page 6 of the following document (Jena Adaptor v2.0).

              Also, you can look at Test7 in the following document (Jena Adapter for Release 11.2).

              You can also try adjusting the filesystemio_options parameter setting.


              Zhe Wu
              • 4. Re: Extremely poor load performance into RDF store
                Thanks for the advice and quick reply. I implemented the changes you suggested. By using the bulk update methods, it made the load run quite a bit faster. My load rates are now 215K triples/minute. With the incremental load methods, I was getting 70K-80K/minute. However, this is still magnitudes short of the LUBM benchmark numbers. I bumped up both sga_max_size and sga_target to 4G (from 512M), but that actually made the load run slower (down to 177K triples/minute).

                Can you think of any other things I can change to make these loads run faster. I believe I should see load rates at least 4x faster than I am seeing right now.

                • 5. Re: Extremely poor load performance into RDF store

                  215k triples/minute => 12.9 Million triples per hour.

                  What kind of disk storage do you have for this system? Do you have a single disk, or multiple disks with ASM, or multiple disks with RAID (and ASM)?


                  • 6. Re: Extremely poor load performance into RDF store
                    We are running single disk. No RAID or ASM.
                    • 7. Re: Extremely poor load performance into RDF store
                      It is very likely that your system is I/O bound. For obvious reasons, data loading is very I/O intensive (Oracle persists data).
                      Oracle Database semantic technologies is not an in-memory solution. Having a good I/O subsystem is therefore critical to the overall
                      performance including loading, query, and inference.

                      Is it possible to add a few (say two) more physical disks to your compute and start using ASM? Alternatively, a single SSD
                      (solid state disk) can help too.

                      Now you have 4 cores but just a single disk in your computer. Increasing the I/O capacity will lead to a more balanced hardware.


                      Zhe Wu