Forum Stats

  • 3,825,929 Users
  • 2,260,581 Discussions
  • 7,896,738 Comments

Discussions

Oracle Text index information

Laury
Laury Member Posts: 1,665 Silver Badge
edited Aug 31, 2017 3:11PM in Text

Hi,

I have an Oracle Text index that has been created as:

create index my_docs_doc_idx on my_docs(doc)

indextype is ctxsys.context;

So, the table is MY_DOCS, the indexed column is DOC.

But, when I query the views dba_tables I find tables with that names DR$MY_DOCS_DOC_IDX$K and DR$MY_DOCS_DOC_IDX$N.

that are (regarding the naming) related to the index MY_DOCS_DOC_IDX.

Furthetmore, when I query the table dba_indexes, I find for the above tables, indexes of the type SYS_IL, SYS_IOT.

Does someone know what means these tables and index names?

Does someone know I to get Oracle Text information: name, corresponding table, size, free space, allocated (used) space?

Thanks by advance for any tip, if you have some experience with it.

Kind Regards

Best Answer

  • Roger Ford-Oracle
    Roger Ford-Oracle Member Posts: 1,132 Employee
    edited Aug 21, 2017 6:44AM Answer ✓

    The various sub-tables are used internally by Oracle Text. They can vary between releases, and depend on the options chosen for the index.

    The basic tables are:

    $I : the "token table".  Lists all the tokens (words) found in the text with a binary postings list telling the system which document id and word position the token is to be found

    $K : maps base table ROWID to internal document id (DOCID)

    $R : maps internal DOCID to ROWID - same information as $K but in a different format. Which we use depends on whether we know DOCID and need ROWID, or vice versa,

    $N : so called "negative row" table. Contains a list of deleted DOCID values which need to be cleaned up next time we run optimise_index.

    Other common tables that may be seen:

    $P : used by the SUBSTRING_INDEX option to contain rotated versions of tokens

    $S : used by SDATA sections

    These tables should never be modified by the user.  The only table which is generally useful for the user is the $I table - doing a query such as

    SELECT token_type, token_info FROM dr$myindex$i

    Can let you see which words are indexed - quite useful on small indexes to see how words are tokenised if you use various index options.

    For example here we can see that when we index "the cat sat on the mat", the words "the" and "on" are not indexed since they are default stop words:

    SQL> create table test (text varchar2(60));Table created.SQL> insert into test values ('the cat sat on the mat');1 row created.SQL> create index testindex on test(text) indextype is ctxsys.context;Index created.SQL> select token_type, token_text from dr$testindex$i;  0 CAT  0 MAT  0 SATSQL>

    Token_type is 0 for normal words. Various other indexed tokens like themes, stems and section information have different token_type values.

Answers

  • Roger Ford-Oracle
    Roger Ford-Oracle Member Posts: 1,132 Employee
    edited Aug 21, 2017 6:44AM Answer ✓

    The various sub-tables are used internally by Oracle Text. They can vary between releases, and depend on the options chosen for the index.

    The basic tables are:

    $I : the "token table".  Lists all the tokens (words) found in the text with a binary postings list telling the system which document id and word position the token is to be found

    $K : maps base table ROWID to internal document id (DOCID)

    $R : maps internal DOCID to ROWID - same information as $K but in a different format. Which we use depends on whether we know DOCID and need ROWID, or vice versa,

    $N : so called "negative row" table. Contains a list of deleted DOCID values which need to be cleaned up next time we run optimise_index.

    Other common tables that may be seen:

    $P : used by the SUBSTRING_INDEX option to contain rotated versions of tokens

    $S : used by SDATA sections

    These tables should never be modified by the user.  The only table which is generally useful for the user is the $I table - doing a query such as

    SELECT token_type, token_info FROM dr$myindex$i

    Can let you see which words are indexed - quite useful on small indexes to see how words are tokenised if you use various index options.

    For example here we can see that when we index "the cat sat on the mat", the words "the" and "on" are not indexed since they are default stop words:

    SQL> create table test (text varchar2(60));Table created.SQL> insert into test values ('the cat sat on the mat');1 row created.SQL> create index testindex on test(text) indextype is ctxsys.context;Index created.SQL> select token_type, token_text from dr$testindex$i;  0 CAT  0 MAT  0 SATSQL>

    Token_type is 0 for normal words. Various other indexed tokens like themes, stems and section information have different token_type values.

  • Laury
    Laury Member Posts: 1,665 Silver Badge
    edited Aug 21, 2017 9:37AM

    Hi Roger,

    Thanks for this interesting feedback.

    I suspected the table $I,K had something to do with the implementation of the Text indexes.

    So, you mean that th query "select token_type, token_text from dr$testindex$i;" only shows the non-default token indexed?

    But, how can I get Oracle Text indexes information rearding: name, corresponding table, size, free space, allocated (used) space?

    Thanks and Regards

This discussion has been closed.