i am playing around with external table feature and i managed to modify the example to match one of our data model's table.
My modified example works, i can query the data from within sqlplus or sql developer.
I populated my store with about 2 Million records.
The results of a query - without a where clause - are returned relatively fast. But when it comes to a where clause, the
performance - especially when i query records which have one part of the major key in common (there are around 1 mio
"records" which share the first part of the major key, second part is different) performance degrades. In this case the
result of the query is presented after about 45 minutes.
In addition i wrote a small jsp app running on tomcat, which queries the same kvstore - without using the external table
feature. Running the same query as mentioned in external table sample, it takes "only" let's say around 5 minutes to return
the same amount of data.
So i am asking myself some questions:
- is there a way to make the external table feature faster (i already tried to improve performance by using additional location files,
but without performance improvement)?
- can i pass in some parameters into the Formatter class? If so, how? (I'm thinking about setting additional filters for major key)
- is the the external table - in general - feasible for querying large amounts of data?
I know that i cannot expect "miracles" when talking about performance, when having a look at my configuration (i run a
kvstore on 2 storage nodes (physically different machines) with limited amount of RAM (assigned 2GB to each storage node
when running makebootconfig), but i am wondering about the different results when using either external table feature or
direct access to kvstore.
Any hints are very welcome.
The query predicate (i.e. the WHERE clause) does not get passed to NoSQL Database. That's a limitation of Oracle RDBMS external table feature, not Oracle NoSQL Database. You might consider using the subkey facility in the NoSQL Database External Table integration.
In terms of more parallelism, the parameters are finicky so you need to poke at it a bit. Take a look at
Be sure to enable parallel query processing using these three commands:
ALTER SESSION ENABLE PARALLEL DDL;
ALTER SESSION ENABLE PARALLEL DML;
ALTER SESSION ENABLE PARALLEL QUERY;
as well as adding a PARALLEL nn directive in the External Table CREATE TABLE declaration, where nn is the number of LOCATION files you specify. We have observed very good scaling when the parallelism is invoked. You could do something as simple as top to see if there are parallel threads running on the ORA side of the house when you do the SELECT.
In terms of passing in parameters to a formatter class, it's not pretty but you could do it with -Dfoo=bar on the java invocation in the nosql_stream script.
And yes, the external table feature is feasible for querying large amounts of data, but of course it's only taking a read-only snapshot.
I hope this is helpful.
many thanks for your detailed reply. But i think i did everything right when defining multiple location files - in fact i added
the parallel statement in the create table script as well.
I verified if it's working by having a look at the explain plan in toad (which shows if the query is running in parallel) and
it told me that the query was running with 2 slave processes (i attached 2 location files to the table definition): IO Cost CPU Cost Cost Plan
+ 16 SELECT STATEMENT ALL_ROWS +
+4 4 4 4 PX COORDINATOR +
+3 16 3 913.933 3 16 3 PX SEND QC (RANDOM) PARALLEL_TO_SERIAL SYS.:TQ10000 :Q1000Bytes: 498.232 Cardinality: 82 +
+2 16 2 913.933 2 16 2 PX BLOCK ITERATOR PARALLEL_COMBINED_WITH_CHILD :Q1000Bytes: 498.232 Cardinality: 82 +
+1 16 1 913.933 1 16 1 EXTERNAL TABLE ACCESS FULL TABLE PARALLEL_COMBINED_WITH_PARENT SCHEMA.TABLE_NOSQL :Q1000 Filter Predicates: "TABLE_ID"='14' Bytes: 498.232 Cardinality: 82+
Still the query takes quite some time to return...
From your reply i think that the difference - in terms of performance - between the query on the external table and the jsp app comes
from the direct access to the keys which the jsp app does (i use the table_id to build a part of the key and iterate through the results)
whereas the query on the external table does not get passed the predicate and does a kind of "full table scan" on the store, right?
Thanks and regards
You are correct that an external table query will do a full table scan. Does your jsp use storeIterator or is it doing gets? If so, you can pass the same parameters that you are passing to storeIterator in your jsp to the external table query. You would do this at publish-time.
thanks again! Yes, i'm using storeIterator to loop through my results when querying my key value store. And i already defined two
additional external tables which work on the same store, by creating separate config.xml files (whose Property oracle.kv.parentKey
was extended by the part of the major key) and publishing those external table configurations.
It works fine. I think I'll have to find out more by adding more storage nodes, regarding scaling...
Edited by: 907108 on Jun 6, 2013 4:31 PM