Text index creates with multiple rows per token

Jan 9, 2014 6:30AM

I'm testing Text indexing and optimization in 11.2.0.3. I created a small table with the same document in it 10,000 times.
create table T (id number, doc varchar2(512));
insert into T select dbms_random.random, 'word1 word2 word3 word4 word5 word6 word7' from dual connect by rownum <= 10000;
commit;
create index TI on T(doc) indextype is CTXSYS.CONTEXT parameters('memory 2057152000');

I expected to see 7 rows in the $I table for the 7 words above each with a doc count of 10,000. Instead I see 9 rows per word (for a total of 63 rowsw), with 8 of them containing TOKEN_COUNT = 1167 and the 9th with 664. And, when I optimize this index right away with "exec ctxsys.optimize_index('TI','FULL')", or even with REBUILD, each token now has 8 rows, 7 with TOKEN_COUNT 1266 and the 8th with 1138. This appears to be odd behaviour for a freshly created index.

MFA for Oracle Community

Oracle Text (MOSC)

Text index creates with multiple rows per token

Howdy, Stranger!

Category Leaderboard

Top contributors this month