This discussion is archived
6 Replies Latest reply: Sep 12, 2013 2:41 PM by chris227 RSS

Replacing ctxcat with context

chris227 Guru
Currently Being Moderated

Hi,

 

in Re: Not like operator not working while matching text from two tables

i proposed to consider the usage of oracle text.

I posted an example with CTXCAT.

But i am a little bit outdated regarding oracle text, but i had in my mind that fabulous Barbara Böhmer wrote on one occasion, that there is little reason to use cxtcat anymore.

 

So due to the fact that the following returns ORA-20000: Oracle Text error: DRG-10849: catsearch does not support functional invocation on 11.2.0.3

 

select

*

from child c, parent p

where

c.id=p.id

and

not catsearch(p.names,c.name,null) > 0

 

i tried to use context-index instead. But i didnt get the desirerd result

 

create table child as

(select 1 id, 'Genentech'  as name from dual union all

select 2 id, 'Altana Pharma AG'  as name from dual union all

select 3 id, 'Yamanouchi'  as name from dual union all

select 4 id, 'Sigma-Tau'  as name from dual union all

select 5 id, 'Schering-Plough'  as name  from dual union all

select 6 id, 'Pharma AG'  as name from dual union all

select 7 id, 'Pfizer'  as name  from dual

)

 

create table  parent as

(select 1 id, 'Genentech number'  as names from dual union all

select 2 id, 'Altana Pharma AG'  as names from dual union all

select 3 id, 'AG site/Yamanouchi'  as names from dual union all

select 4 id, 'sigMa Tau'  as names from dual union all

select 5 id, 'Schering-Plough'  as names  from dual union all

select 6 id, 'AG'  as names from dual union all

select 7 id, 'Inc'  as names  from dual

)

 

CREATE INDEX idx_parent_name_cat ON parent (names)

   INDEXTYPE IS CTXSYS.context

 

select

*

from child c, parent p

where

c.id=p.id

and

not contains(p.names,c.name) > 0

 

IDNAMEIDNAMES
4Sigma-Tau4sigMa Tau
5Schering-Plough5Schering-Plough
6Pharma AG6AG
7Pfizer7Inc

 

I feel that i should define AG as a stopword. But what about 4+5. Should i define then as print/skipjoins?

For example when Sigma-Tau is stored as SigmaTau now, dont this loose the possibility to search for the token sigma only?

 

Regards

  • 1. Re: Replacing ctxcat with context
    Barbara Boehmer Oracle ACE
    Currently Being Moderated

    By default, Oracle Text will convert the values in parent.names to upper case and consider the hyphen and slash as break characters.  So, you need to convert the values of child.name to upper case and replace the hyphen and slash with spaces to compare them.  Please see the demonstration below.  The following assumes that you want id 6 and 7 returned, as in the originally stated desired result set in the other post.  The original poster's requirements were a little fuzzy and seemed to change.

     

    SCOTT@orcl12c> create table child as

      2    (select 1 id, 'Genentech'  as name from dual union all

      3      select 2 id, 'Altana Pharma AG'  as name from dual union all

      4      select 3 id, 'Yamanouchi'  as name from dual union all

      5      select 4 id, 'Sigma-Tau'  as name from dual union all

      6      select 5 id, 'Schering-Plough'    as name  from dual union all

      7      select 6 id, 'Pharma AG'  as name from dual union all

      8      select 7 id, 'Pfizer'  as name    from dual)

      9  /

     

    Table created.

     

    SCOTT@orcl12c> create table  parent as

      2    (select 1 id, 'Genentech number'  as names from dual union all

      3      select 2 id, 'Altana Pharma AG'  as names from dual union all

      4      select 3 id, 'AG site/Yamanouchi'  as names from dual union all

      5      select 4 id, 'sigMa Tau'  as names from dual union all

      6      select 5 id, 'Schering-Plough'    as names  from dual union all

      7      select 6 id, 'AG'  as names from dual union all

      8      select 7 id, 'Inc'  as names  from dual)

      9  /

     

    Table created.

     

    SCOTT@orcl12c> CREATE INDEX idx_parent_name_cat ON parent (names)

      2      INDEXTYPE IS CTXSYS.context

      3  /

     

    Index created.

     

    SCOTT@orcl12c> SELECT token_text FROM dr$idx_parent_name_cat$i

      2  /

     

    TOKEN_TEXT

    ----------------------------------------------------------------

    AG

    ALTANA

    GENENTECH

    INC

    NUMBER

    PHARMA

    PLOUGH

    SCHERING

    SIGMA

    SITE

    TAU

    YAMANOUCHI

     

    12 rows selected.

     

    SCOTT@orcl12c> select *

      2  from   child c, parent p

      3  where  c.id = p.id

      4  and    not contains (p.names, TRANSLATE (UPPER (c.name), '-/', '  ')) > 0

      5  /

     

            ID NAME                     ID NAMES

    ---------- ---------------- ---------- ------------------

             6 Pharma AG                 6 AG

             7 Pfizer                    7 Inc

     

    2 rows selected.

  • 2. Re: Replacing ctxcat with context
    chris227 Guru
    Currently Being Moderated

    Thank you for your answer.

    I dont get it yet. If i have to do upper and translate anyway i dont really need an text index i guess.

     

    So i define the hyphen as skipjoin, add an non-machting uppper/lower-case, and alter the contains by the usage of {}.

    The result is somewhat better, but i guess i still prefer the ctxcat-index for this.

    To be honest one must say, that matching the two words "sigma tau" with the single word 'sigma-tau' might not be the sense of context-index usage.

     

    delete from child where id = 1
    insert into child values(1, 'GENENTECH')

    drop INDEX idx_parent_name_cat

    begin
       ctx_ddl.create_preference('parent_lexer', 'BASIC_LEXER');
       ctx_ddl.set_attribute('parent_lexer', 'skipjoins', '-');
    end;

    CREATE INDEX idx_parent_name_cat ON parent (names)
       INDEXTYPE IS CTXSYS.context
       parameters ('LEXER parent_lexer')

    select
    p.id,p.names,
    c.id,c.name
    from child c, parent p
    where
    c.id=p.id
    and
    not contains(p.names,'{'||c.name||'}') > 0

     

    IDNAMESIDNAME
    4sigMa Tau4Sigma-Tau
    6AG6Pharma AG
    7Inc7Pfizer

     

    regards

  • 3. Re: Replacing ctxcat with context
    Barbara Boehmer Oracle ACE
    Currently Being Moderated

    I guess it all depends on whether you want rows like id 4 included in your result set or not.  In my prior example, the UPPER was unnecessary.  All that was needed was the TRANSLATE.  My method removes the hyphen and slash from the child table to match the parent table, where Oracle Text removes them by default.  Your method preserves the hyphen and slash in the parent table, leaving them in the child table, so some rows match and some do not.

  • 4. Re: Replacing ctxcat with context
    chris227 Guru
    Currently Being Moderated

    Thank you for your answer.

     

    BarbaraBoehmer wrote:

     

    Your method preserves the hyphen and slash in the parent table, leaving them in the child table

     

    As i understand it's more like, when indexing the hyphen is stripped from the word as it is for the searchword when searched for.

    Because it would not be very clever to have the space as skipjoin, the match "sigma tau" vs. "sigma-tau" will never be possible.

    So the conlusion for me is, the context-index isnt a adequate replacement for the ctxcat-index, in this case.

     

    select token_text

    from

    dr$idx_parent_name_cat$i

     

    TOKEN_TEXT
    AG
    ALTANA
    GENENTECH
    NUMBER
    PHARMA
    SCHERINGPLOUGH
    SIGMA
    SITE
    TAU
    YAMANOUCHI
  • 5. Re: Replacing ctxcat with context
    Barbara Boehmer Oracle ACE
    Currently Being Moderated

    I think you are confused.  Both the ctxcat index and the context produce the same tokens by default.  If you use the same lexer with both types of index, then both return the same tokens.  By default, both context and ctxcat consider the hyphen as a break character and tokenize "word1-word2" separately as "word1" and "word2".  If you set the hyphen as a printjoin, then both context and ctxcat will tokenize it as one token "word1-word2" including the hyphen.  If you use skipjoin, it does not replace the hyphen with a space, but just removes it, producing one token of "word1word2".  I think this may be what was confusing you.  Please see the demonstration below of context and ctxcat indexes with and without a lexer setting the hyphen as a printjoin and see that the tokens and results are the same.  The big difference with ctxcat is that I had to use pl/sql and loops to avoid the problem with the optimizer attempting to invoke functional invocation.  Note that it is the parent table, not the child table that is tokenized.  There is no hyphen in "Sigma Tau" of id 4 in the parent table.  So, if you do not remove the hyphen in "Sigma-Tau" of id 4 in the child table, using translate, then the two records do not match.  Using a lexer in the index on the parent table has no effect on the values in the child table.  I hope this is clearer now.

     

     

    SCOTT@orcl12c> -- tables:

    SCOTT@orcl12c> create table child as

      2    (select 1 id, 'Genentech' as name from dual union all

      3      select 2 id, 'Altana Pharma AG'  as name from dual union all

      4      select 3 id, 'Yamanouchi'  as name from dual union all

      5      select 4 id, 'Sigma-Tau'  as name from dual union all

      6      select 5 id, 'Schering-Plough'    as name  from dual union all

      7      select 6 id, 'Pharma AG'  as name from dual union all

      8      select 7 id, 'Pfizer'  as name    from dual)

      9  /

     

    Table created.

     

    SCOTT@orcl12c> create table  parent as

      2    (select 1 id, 'Genentech' as names from dual union all

      3      select 2 id, 'Altana Pharma AG'  as names from dual union all

      4      select 3 id, 'AG site/Yamanouchi'  as names from dual union all

      5      select 4 id, 'sigMa Tau'  as names from dual union all

      6      select 5 id, 'Schering-Plough'    as names  from dual union all

      7      select 6 id, 'AG'  as names from dual union all

      8      select 7 id, 'Inc'  as names  from dual)

      9  /

     

    Table created.

     

    SCOTT@orcl12c> -- indexes on parent table without lexer:

    SCOTT@orcl12c> CREATE INDEX context_index ON parent (names) INDEXTYPE IS CTXSYS.context

      2  /

     

    Index created.

     

    SCOTT@orcl12c> SELECT token_text FROM dr$context_index$i

      2  /

     

    TOKEN_TEXT

    ----------------------------------------------------------------

    AG

    ALTANA

    GENENTECH

    INC

    PHARMA

    PLOUGH

    SCHERING

    SIGMA

    SITE

    TAU

    YAMANOUCHI

     

    11 rows selected.

     

    SCOTT@orcl12c> CREATE INDEX ctxcat_index ON parent (names) INDEXTYPE IS CTXSYS.ctxcat

      2  /

     

    Index created.

     

    SCOTT@orcl12c> SELECT dr$token FROM dr$ctxcat_index$i

      2  /

     

    DR$TOKEN

    ----------------------------------------------------------------

    AG

    AG

    AG

    ALTANA

    GENENTECH

    INC

    PHARMA

    PLOUGH

    SCHERING

    SIGMA

    SITE

    TAU

    YAMANOUCHI

     

    13 rows selected.

     

    SCOTT@orcl12c> -- queries:

    SCOTT@orcl12c> select *

      2  from   parent p, child c

      3  where  not contains (p.names, '{' || translate (c.name, '-/', '  ') || '}') > 0

      4  and    c.id = p.id

      5  /

     

            ID NAMES                      ID NAME

    ---------- ------------------ ---------- ----------------

             6 AG                          6 Pharma AG

             7 Inc                         7 Pfizer

     

    2 rows selected.

     

    SCOTT@orcl12c> set serveroutput on format wrapped

    SCOTT@orcl12c> declare

      2    v_count number;

      3  begin

      4    dbms_output.put_line ('          ID NAMES                ID NAME');

      5    dbms_output.put_line ('---------- ------------------ ---------- ----------------');

      6    for p in

      7       (select id, names from parent)

      8    loop

      9       for c in

    10         (select id, name from child where id = p.id)

    11       loop

    12         select count(*) into v_count from parent

    13         where  catsearch (names, '{' || translate (c.name, '-/', '  ') || '}', null) <= 0

    14         and      id = c.id;

    15         if v_count = 0 then

    16           dbms_output.put_line

    17             (lpad (p.id, 10) || ' ' ||

    18          rpad (p.names, 18) ||

    19          lpad (c.id, 11) || ' ' ||

    20          c.name);

    21         end if;

    22       end loop;

    23    end loop;

    24  end;

    25  /

            ID NAMES                      ID NAME

    ---------- ------------------ ---------- ----------------

             6 AG                          6 Pharma AG

             7 Inc                         7 Pfizer

     

    PL/SQL procedure successfully completed.

     

    SCOTT@orcl12c> -- indexes on parent table with lexer:

    SCOTT@orcl12c> DROP INDEX context_index

      2  /

     

    Index dropped.

     

    SCOTT@orcl12c> DROP INDEX ctxcat_index

      2  /

     

    Index dropped.

     

    SCOTT@orcl12c> begin

      2      ctx_ddl.create_preference('parent_lexer', 'BASIC_LEXER');

      3      ctx_ddl.set_attribute('parent_lexer', 'PRINTJOINS', '-');

      4  end;

      5  /

     

    PL/SQL procedure successfully completed.

     

    SCOTT@orcl12c> CREATE INDEX context_index ON parent (names) INDEXTYPE IS CTXSYS.context

      2  PARAMETERS ('LEXER parent_lexer')

      3  /

     

    Index created.

     

    SCOTT@orcl12c> SELECT token_text FROM dr$context_index$i

      2  /

     

    TOKEN_TEXT

    ----------------------------------------------------------------

    AG

    ALTANA

    GENENTECH

    INC

    PHARMA

    SCHERING-PLOUGH

    SIGMA

    SITE

    TAU

    YAMANOUCHI

     

    10 rows selected.

     

    SCOTT@orcl12c> CREATE INDEX ctxcat_index ON parent (names) INDEXTYPE IS CTXSYS.ctxcat

      2  PARAMETERS ('LEXER parent_lexer')

      3  /

     

    Index created.

     

    SCOTT@orcl12c> SELECT dr$token FROM dr$ctxcat_index$i

      2  /

     

    DR$TOKEN

    ----------------------------------------------------------------

    AG

    AG

    AG

    ALTANA

    GENENTECH

    INC

    PHARMA

    SCHERING-PLOUGH

    SIGMA

    SITE

    TAU

    YAMANOUCHI

     

    12 rows selected.

     

    SCOTT@orcl12c> -- queries:

    SCOTT@orcl12c> select *

      2  from   parent p, child c

      3  where  not contains (p.names, '{' || c.name || '}') > 0

      4  and    c.id = p.id

      5  /

     

            ID NAMES                      ID NAME

    ---------- ------------------ ---------- ----------------

             4 sigMa Tau                   4 Sigma-Tau

             6 AG                          6 Pharma AG

             7 Inc                         7 Pfizer

     

    3 rows selected.

     

    SCOTT@orcl12c> set serveroutput on format wrapped

    SCOTT@orcl12c> declare

      2    v_count number;

      3  begin

      4    dbms_output.put_line ('          ID NAMES                ID NAME');

      5    dbms_output.put_line ('---------- ------------------ ---------- ----------------');

      6    for p in

      7       (select id, names from parent)

      8    loop

      9       for c in

    10         (select id, name from child where id = p.id)

    11       loop

    12         select count(*) into v_count from parent

    13         where  catsearch (names, '{' || c.name || '}', null) <= 0

    14         and      id = c.id;

    15         if v_count = 0 then

    16           dbms_output.put_line

    17             (lpad (p.id, 10) || ' ' ||

    18          rpad (p.names, 18) ||

    19          lpad (c.id, 11) || ' ' ||

    20          c.name);

    21         end if;

    22       end loop;

    23    end loop;

    24  end;

    25  /

            ID NAMES                      ID NAME

    ---------- ------------------ ---------- ----------------

             4 sigMa Tau                   4 Sigma-Tau

             6 AG                          6 Pharma AG

             7 Inc                         7 Pfizer

     

    PL/SQL procedure successfully completed.

  • 6. Re: Replacing ctxcat with context
    chris227 Guru
    Currently Being Moderated

    Thanks Barbara for the effort you spent in this thread. Based on my observations over the last years i have to say that you are one of the most awesome people i know at OTN, sophistcated skills without any vanity which makes it only more remarkable.

     

    On the issue:

    I wasnt confused by the functionality of skipjoins, i think i did understand what this does.

    I totally had forgotten about the usage of {} and that the hyphen is a special (minus) character, think i had not used this since over ten years.

     

    So for both "plain" indexes i now got the same results, without using translate (and i stop here since i surely get confused if i keep thinking on this anymore ...)

     

    select

    c.id cid, c.name cname, p.id pid, p.names pnames

    from   parent p join child c

    on    c.id = p.id

    and

    not contains (p.names, '{' || c.name || '}') > 0

     

    CIDCNAMEPIDPNAMES
    6Pharma AG6AG
    7Pfizer7Inc

     

    select

    c.id cid, c.name cname, p.id pid, p.names pnames

    from   parent p join child c

    on    c.id = p.id

    and

    not exists

    (select 1 from parent p

    where

    c.id=p.id

    and

    catsearch(p.names,'{'||c.name||'}',null) > 0

    )

     

     

    CIDCNAMEPIDPNAMES
    6Pharma AG6AG
    7Pfizer7Inc

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points