Text index creates with multiple rows per token
I'm testing Text indexing and optimization in 11.2.0.3. I created a small table with the same document in it 10,000 times.
create table T (id number, doc varchar2(512));
insert into T select dbms_random.random, 'word1 word2 word3 word4 word5 word6 word7' from dual connect by rownum <= 10000;
commit;
create index TI on T(doc) indextype is CTXSYS.CONTEXT parameters('memory 2057152000');
I expected to see 7 rows in the $I table for the 7 words above each with a doc count of 10,000. Instead I see 9 rows per word (for a total of 63 rowsw), with 8 of them containing TOKEN_COUNT = 1167 and the 9th with 664. And, when I optimize this index right away with "exec ctxsys.optimize_index('TI','FULL')", or even with REBUILD, each token now has 8 rows, 7 with TOKEN_COUNT 1266 and the 8th with 1138. This appears to be odd behaviour for a freshly created index.
create table T (id number, doc varchar2(512));
insert into T select dbms_random.random, 'word1 word2 word3 word4 word5 word6 word7' from dual connect by rownum <= 10000;
commit;
create index TI on T(doc) indextype is CTXSYS.CONTEXT parameters('memory 2057152000');
I expected to see 7 rows in the $I table for the 7 words above each with a doc count of 10,000. Instead I see 9 rows per word (for a total of 63 rowsw), with 8 of them containing TOKEN_COUNT = 1167 and the 9th with 664. And, when I optimize this index right away with "exec ctxsys.optimize_index('TI','FULL')", or even with REBUILD, each token now has 8 rows, 7 with TOKEN_COUNT 1266 and the 8th with 1138. This appears to be odd behaviour for a freshly created index.
0