Forum Stats

  • 3,825,769 Users
  • 2,260,558 Discussions
  • 7,896,671 Comments

Discussions

Indexing file starting with <job id="MHS">

pcpaasche
pcpaasche Member Posts: 100 Blue Ribbon
edited Jan 29, 2018 5:41PM in Text

select * from v$version:

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

PL/SQL Release 12.1.0.2.0 - Production

CORE 12.1.0.2.0 Production

TNS for Linux: Version 12.1.0.2.0 - Production

NLSRTL Version 12.1.0.2.0 - Production

I have a file named TEST_FILE.txt, that has a first line like this:

<job id="MHS">

When running Oracle Text, the file does not get indexed, at least it is not returned when I query for a word any the file.

However, if I change the first line in the file TEST_FILE.txt to this:

#<job id="MHS">

, then the file is indexed and I get a result when querying for any given word in the file.

Why is that? What can I do to have the file indexed when it starts with <job id="MHS"> ?

I want to treat the file as an ordinary text file where all words are available in the index, no matter what tags there are in the beginning of the file or in the rest of the file.

---------------------

Some background info of my setup:

EXECUTE ctx_ddl.create_preference('URL_PREF', 'URL_DATASTORE');

EXECUTE     CTX_DDL.CREATE_PREFERENCE ('blackcode_lexer', 'BASIC_LEXER');

EXECUTE     CTX_DDL.SET_ATTRIBUTE ('blackcode_lexer', 'PRINTJOINS', '_');

CREATE INDEX BLACKCODE.SOURCE_CODE_SEARCH_IDX1

   ON BLACKCODE.SOURCE_CODE_SEARCH(HTTP_URL)

   INDEXTYPE IS CTXSYS.CONTEXT

      PARAMETERS('filter ctxsys.null_filter Datastore URL_PREF LEXER blackcode_lexer')

NOPARALLEL;

My query:

SELECT *

        FROM blackcode.SOURCE_CODE_SEARCH

       WHERE CONTAINS(http_url, 'job id' , 1) > 0;

pcpaasche

Best Answer

  • Bud Light
    Bud Light Member Posts: 70 Blue Ribbon
    edited Aug 24, 2017 12:39PM Answer ✓

    My guess was that Oracle was treating the file as HTML or XML and not indexing the node or attribute names.

    Even though the docs say it is the default, try physically adding the NULL_SECTION_GROUP.

    I was able to reproduce your results locally and it works for me after creating the section group.

    exec ctx_ddl.create_section_group('blackcode_section_group', 'NULL_SECTION_GROUP');

    CREATE INDEX BLACKCODE.SOURCE_CODE_SEARCH_IDX1

       ON BLACKCODE.SOURCE_CODE_SEARCH(HTTP_URL)

       INDEXTYPE IS CTXSYS.CONTEXT

          PARAMETERS('filter ctxsys.null_filter Datastore URL_PREF LEXER blackcode_lexer section group blackcode_section_group')

    NOPARALLEL;

    select token_text from dr$SOURCE_CODE_SEARCH_IDX1$i;

    pcpaasche

Answers

  • Bud Light
    Bud Light Member Posts: 70 Blue Ribbon
    edited Aug 24, 2017 12:39PM Answer ✓

    My guess was that Oracle was treating the file as HTML or XML and not indexing the node or attribute names.

    Even though the docs say it is the default, try physically adding the NULL_SECTION_GROUP.

    I was able to reproduce your results locally and it works for me after creating the section group.

    exec ctx_ddl.create_section_group('blackcode_section_group', 'NULL_SECTION_GROUP');

    CREATE INDEX BLACKCODE.SOURCE_CODE_SEARCH_IDX1

       ON BLACKCODE.SOURCE_CODE_SEARCH(HTTP_URL)

       INDEXTYPE IS CTXSYS.CONTEXT

          PARAMETERS('filter ctxsys.null_filter Datastore URL_PREF LEXER blackcode_lexer section group blackcode_section_group')

    NOPARALLEL;

    select token_text from dr$SOURCE_CODE_SEARCH_IDX1$i;

    pcpaasche
This discussion has been closed.