Oracle Text (MOSC)

MOSC Banner

Text index creates with multiple rows per token

edited Jan 21, 2014 10:00PM in Oracle Text (MOSC) 10 commentsAnswered
I'm testing Text indexing and optimization in 11.2.0.3.  I created a small table with the same document in it 10,000 times.
create table T (id number, doc varchar2(512));
insert into T select dbms_random.random, 'word1 word2 word3 word4 word5 word6 word7' from dual connect by rownum <= 10000;
commit;
create index TI on T(doc) indextype is CTXSYS.CONTEXT parameters('memory 2057152000');

I expected to see 7 rows in the $I table for the 7 words above each with a doc count of 10,000.  Instead I see 9 rows per word (for a total of 63 rowsw), with 8 of them containing TOKEN_COUNT = 1167 and the 9th with 664.  And, when I optimize this index right away with "exec ctxsys.optimize_index('TI','FULL')", or even with REBUILD,  each token now has 8 rows, 7 with TOKEN_COUNT 1266 and the 8th with 1138.  This appears to be odd behaviour for a freshly created index.

Howdy, Stranger!

Log In

To view full details, sign in to My Oracle Support Community.

Register

Don't have a My Oracle Support Community account? Click here to get started.

Category Leaderboard

Top contributors this month

New to My Oracle Support Community? Visit our Welcome Center

MOSC Help Center