6 Replies Latest reply: Sep 12, 2013 4:41 PM by chris227 RSS

    Replacing ctxcat with context

    chris227

      Hi,

       

      in Re: Not like operator not working while matching text from two tables

      i proposed to consider the usage of oracle text.

      I posted an example with CTXCAT.

      But i am a little bit outdated regarding oracle text, but i had in my mind that fabulous Barbara Böhmer wrote on one occasion, that there is little reason to use cxtcat anymore.

       

      So due to the fact that the following returns ORA-20000: Oracle Text error: DRG-10849: catsearch does not support functional invocation on 11.2.0.3

       

      select

      *

      from child c, parent p

      where

      c.id=p.id

      and

      not catsearch(p.names,c.name,null) > 0

       

      i tried to use context-index instead. But i didnt get the desirerd result

       

      create table child as

      (select 1 id, 'Genentech'  as name from dual union all

      select 2 id, 'Altana Pharma AG'  as name from dual union all

      select 3 id, 'Yamanouchi'  as name from dual union all

      select 4 id, 'Sigma-Tau'  as name from dual union all

      select 5 id, 'Schering-Plough'  as name  from dual union all

      select 6 id, 'Pharma AG'  as name from dual union all

      select 7 id, 'Pfizer'  as name  from dual

      )

       

      create table  parent as

      (select 1 id, 'Genentech number'  as names from dual union all

      select 2 id, 'Altana Pharma AG'  as names from dual union all

      select 3 id, 'AG site/Yamanouchi'  as names from dual union all

      select 4 id, 'sigMa Tau'  as names from dual union all

      select 5 id, 'Schering-Plough'  as names  from dual union all

      select 6 id, 'AG'  as names from dual union all

      select 7 id, 'Inc'  as names  from dual

      )

       

      CREATE INDEX idx_parent_name_cat ON parent (names)

         INDEXTYPE IS CTXSYS.context

       

      select

      *

      from child c, parent p

      where

      c.id=p.id

      and

      not contains(p.names,c.name) > 0

       

      IDNAMEIDNAMES
      4Sigma-Tau4sigMa Tau
      5Schering-Plough5Schering-Plough
      6Pharma AG6AG
      7Pfizer7Inc

       

      I feel that i should define AG as a stopword. But what about 4+5. Should i define then as print/skipjoins?

      For example when Sigma-Tau is stored as SigmaTau now, dont this loose the possibility to search for the token sigma only?

       

      Regards

        • 1. Re: Replacing ctxcat with context
          Barbara Boehmer

          By default, Oracle Text will convert the values in parent.names to upper case and consider the hyphen and slash as break characters.  So, you need to convert the values of child.name to upper case and replace the hyphen and slash with spaces to compare them.  Please see the demonstration below.  The following assumes that you want id 6 and 7 returned, as in the originally stated desired result set in the other post.  The original poster's requirements were a little fuzzy and seemed to change.

           

          SCOTT@orcl12c> create table child as

            2    (select 1 id, 'Genentech'  as name from dual union all

            3      select 2 id, 'Altana Pharma AG'  as name from dual union all

            4      select 3 id, 'Yamanouchi'  as name from dual union all

            5      select 4 id, 'Sigma-Tau'  as name from dual union all

            6      select 5 id, 'Schering-Plough'    as name  from dual union all

            7      select 6 id, 'Pharma AG'  as name from dual union all

            8      select 7 id, 'Pfizer'  as name    from dual)

            9  /

           

          Table created.

           

          SCOTT@orcl12c> create table  parent as

            2    (select 1 id, 'Genentech number'  as names from dual union all

            3      select 2 id, 'Altana Pharma AG'  as names from dual union all

            4      select 3 id, 'AG site/Yamanouchi'  as names from dual union all

            5      select 4 id, 'sigMa Tau'  as names from dual union all

            6      select 5 id, 'Schering-Plough'    as names  from dual union all

            7      select 6 id, 'AG'  as names from dual union all

            8      select 7 id, 'Inc'  as names  from dual)

            9  /

           

          Table created.

           

          SCOTT@orcl12c> CREATE INDEX idx_parent_name_cat ON parent (names)

            2      INDEXTYPE IS CTXSYS.context

            3  /

           

          Index created.

           

          SCOTT@orcl12c> SELECT token_text FROM dr$idx_parent_name_cat$i

            2  /

           

          TOKEN_TEXT

          ----------------------------------------------------------------

          AG

          ALTANA

          GENENTECH

          INC

          NUMBER

          PHARMA

          PLOUGH

          SCHERING

          SIGMA

          SITE

          TAU

          YAMANOUCHI

           

          12 rows selected.

           

          SCOTT@orcl12c> select *

            2  from   child c, parent p

            3  where  c.id = p.id

            4  and    not contains (p.names, TRANSLATE (UPPER (c.name), '-/', '  ')) > 0

            5  /

           

                  ID NAME                     ID NAMES

          ---------- ---------------- ---------- ------------------

                   6 Pharma AG                 6 AG

                   7 Pfizer                    7 Inc

           

          2 rows selected.

          • 2. Re: Replacing ctxcat with context
            chris227

            Thank you for your answer.

            I dont get it yet. If i have to do upper and translate anyway i dont really need an text index i guess.

             

            So i define the hyphen as skipjoin, add an non-machting uppper/lower-case, and alter the contains by the usage of {}.

            The result is somewhat better, but i guess i still prefer the ctxcat-index for this.

            To be honest one must say, that matching the two words "sigma tau" with the single word 'sigma-tau' might not be the sense of context-index usage.

             

            delete from child where id = 1
            insert into child values(1, 'GENENTECH')

            drop INDEX idx_parent_name_cat

            begin
               ctx_ddl.create_preference('parent_lexer', 'BASIC_LEXER');
               ctx_ddl.set_attribute('parent_lexer', 'skipjoins', '-');
            end;

            CREATE INDEX idx_parent_name_cat ON parent (names)
               INDEXTYPE IS CTXSYS.context
               parameters ('LEXER parent_lexer')

            select
            p.id,p.names,
            c.id,c.name
            from child c, parent p
            where
            c.id=p.id
            and
            not contains(p.names,'{'||c.name||'}') > 0

             

            IDNAMESIDNAME
            4sigMa Tau4Sigma-Tau
            6AG6Pharma AG
            7Inc7Pfizer

             

            regards

            • 3. Re: Replacing ctxcat with context
              Barbara Boehmer

              I guess it all depends on whether you want rows like id 4 included in your result set or not.  In my prior example, the UPPER was unnecessary.  All that was needed was the TRANSLATE.  My method removes the hyphen and slash from the child table to match the parent table, where Oracle Text removes them by default.  Your method preserves the hyphen and slash in the parent table, leaving them in the child table, so some rows match and some do not.

              • 4. Re: Replacing ctxcat with context
                chris227

                Thank you for your answer.

                 

                BarbaraBoehmer wrote:

                 

                Your method preserves the hyphen and slash in the parent table, leaving them in the child table

                 

                As i understand it's more like, when indexing the hyphen is stripped from the word as it is for the searchword when searched for.

                Because it would not be very clever to have the space as skipjoin, the match "sigma tau" vs. "sigma-tau" will never be possible.

                So the conlusion for me is, the context-index isnt a adequate replacement for the ctxcat-index, in this case.

                 

                select token_text

                from

                dr$idx_parent_name_cat$i

                 

                TOKEN_TEXT
                AG
                ALTANA
                GENENTECH
                NUMBER
                PHARMA
                SCHERINGPLOUGH
                SIGMA
                SITE
                TAU
                YAMANOUCHI
                • 5. Re: Replacing ctxcat with context
                  Barbara Boehmer

                  I think you are confused.  Both the ctxcat index and the context produce the same tokens by default.  If you use the same lexer with both types of index, then both return the same tokens.  By default, both context and ctxcat consider the hyphen as a break character and tokenize "word1-word2" separately as "word1" and "word2".  If you set the hyphen as a printjoin, then both context and ctxcat will tokenize it as one token "word1-word2" including the hyphen.  If you use skipjoin, it does not replace the hyphen with a space, but just removes it, producing one token of "word1word2".  I think this may be what was confusing you.  Please see the demonstration below of context and ctxcat indexes with and without a lexer setting the hyphen as a printjoin and see that the tokens and results are the same.  The big difference with ctxcat is that I had to use pl/sql and loops to avoid the problem with the optimizer attempting to invoke functional invocation.  Note that it is the parent table, not the child table that is tokenized.  There is no hyphen in "Sigma Tau" of id 4 in the parent table.  So, if you do not remove the hyphen in "Sigma-Tau" of id 4 in the child table, using translate, then the two records do not match.  Using a lexer in the index on the parent table has no effect on the values in the child table.  I hope this is clearer now.

                   

                   

                  SCOTT@orcl12c> -- tables:

                  SCOTT@orcl12c> create table child as

                    2    (select 1 id, 'Genentech' as name from dual union all

                    3      select 2 id, 'Altana Pharma AG'  as name from dual union all

                    4      select 3 id, 'Yamanouchi'  as name from dual union all

                    5      select 4 id, 'Sigma-Tau'  as name from dual union all

                    6      select 5 id, 'Schering-Plough'    as name  from dual union all

                    7      select 6 id, 'Pharma AG'  as name from dual union all

                    8      select 7 id, 'Pfizer'  as name    from dual)

                    9  /

                   

                  Table created.

                   

                  SCOTT@orcl12c> create table  parent as

                    2    (select 1 id, 'Genentech' as names from dual union all

                    3      select 2 id, 'Altana Pharma AG'  as names from dual union all

                    4      select 3 id, 'AG site/Yamanouchi'  as names from dual union all

                    5      select 4 id, 'sigMa Tau'  as names from dual union all

                    6      select 5 id, 'Schering-Plough'    as names  from dual union all

                    7      select 6 id, 'AG'  as names from dual union all

                    8      select 7 id, 'Inc'  as names  from dual)

                    9  /

                   

                  Table created.

                   

                  SCOTT@orcl12c> -- indexes on parent table without lexer:

                  SCOTT@orcl12c> CREATE INDEX context_index ON parent (names) INDEXTYPE IS CTXSYS.context

                    2  /

                   

                  Index created.

                   

                  SCOTT@orcl12c> SELECT token_text FROM dr$context_index$i

                    2  /

                   

                  TOKEN_TEXT

                  ----------------------------------------------------------------

                  AG

                  ALTANA

                  GENENTECH

                  INC

                  PHARMA

                  PLOUGH

                  SCHERING

                  SIGMA

                  SITE

                  TAU

                  YAMANOUCHI

                   

                  11 rows selected.

                   

                  SCOTT@orcl12c> CREATE INDEX ctxcat_index ON parent (names) INDEXTYPE IS CTXSYS.ctxcat

                    2  /

                   

                  Index created.

                   

                  SCOTT@orcl12c> SELECT dr$token FROM dr$ctxcat_index$i

                    2  /

                   

                  DR$TOKEN

                  ----------------------------------------------------------------

                  AG

                  AG

                  AG

                  ALTANA

                  GENENTECH

                  INC

                  PHARMA

                  PLOUGH

                  SCHERING

                  SIGMA

                  SITE

                  TAU

                  YAMANOUCHI

                   

                  13 rows selected.

                   

                  SCOTT@orcl12c> -- queries:

                  SCOTT@orcl12c> select *

                    2  from   parent p, child c

                    3  where  not contains (p.names, '{' || translate (c.name, '-/', '  ') || '}') > 0

                    4  and    c.id = p.id

                    5  /

                   

                          ID NAMES                      ID NAME

                  ---------- ------------------ ---------- ----------------

                           6 AG                          6 Pharma AG

                           7 Inc                         7 Pfizer

                   

                  2 rows selected.

                   

                  SCOTT@orcl12c> set serveroutput on format wrapped

                  SCOTT@orcl12c> declare

                    2    v_count number;

                    3  begin

                    4    dbms_output.put_line ('          ID NAMES                ID NAME');

                    5    dbms_output.put_line ('---------- ------------------ ---------- ----------------');

                    6    for p in

                    7       (select id, names from parent)

                    8    loop

                    9       for c in

                  10         (select id, name from child where id = p.id)

                  11       loop

                  12         select count(*) into v_count from parent

                  13         where  catsearch (names, '{' || translate (c.name, '-/', '  ') || '}', null) <= 0

                  14         and      id = c.id;

                  15         if v_count = 0 then

                  16           dbms_output.put_line

                  17             (lpad (p.id, 10) || ' ' ||

                  18          rpad (p.names, 18) ||

                  19          lpad (c.id, 11) || ' ' ||

                  20          c.name);

                  21         end if;

                  22       end loop;

                  23    end loop;

                  24  end;

                  25  /

                          ID NAMES                      ID NAME

                  ---------- ------------------ ---------- ----------------

                           6 AG                          6 Pharma AG

                           7 Inc                         7 Pfizer

                   

                  PL/SQL procedure successfully completed.

                   

                  SCOTT@orcl12c> -- indexes on parent table with lexer:

                  SCOTT@orcl12c> DROP INDEX context_index

                    2  /

                   

                  Index dropped.

                   

                  SCOTT@orcl12c> DROP INDEX ctxcat_index

                    2  /

                   

                  Index dropped.

                   

                  SCOTT@orcl12c> begin

                    2      ctx_ddl.create_preference('parent_lexer', 'BASIC_LEXER');

                    3      ctx_ddl.set_attribute('parent_lexer', 'PRINTJOINS', '-');

                    4  end;

                    5  /

                   

                  PL/SQL procedure successfully completed.

                   

                  SCOTT@orcl12c> CREATE INDEX context_index ON parent (names) INDEXTYPE IS CTXSYS.context

                    2  PARAMETERS ('LEXER parent_lexer')

                    3  /

                   

                  Index created.

                   

                  SCOTT@orcl12c> SELECT token_text FROM dr$context_index$i

                    2  /

                   

                  TOKEN_TEXT

                  ----------------------------------------------------------------

                  AG

                  ALTANA

                  GENENTECH

                  INC

                  PHARMA

                  SCHERING-PLOUGH

                  SIGMA

                  SITE

                  TAU

                  YAMANOUCHI

                   

                  10 rows selected.

                   

                  SCOTT@orcl12c> CREATE INDEX ctxcat_index ON parent (names) INDEXTYPE IS CTXSYS.ctxcat

                    2  PARAMETERS ('LEXER parent_lexer')

                    3  /

                   

                  Index created.

                   

                  SCOTT@orcl12c> SELECT dr$token FROM dr$ctxcat_index$i

                    2  /

                   

                  DR$TOKEN

                  ----------------------------------------------------------------

                  AG

                  AG

                  AG

                  ALTANA

                  GENENTECH

                  INC

                  PHARMA

                  SCHERING-PLOUGH

                  SIGMA

                  SITE

                  TAU

                  YAMANOUCHI

                   

                  12 rows selected.

                   

                  SCOTT@orcl12c> -- queries:

                  SCOTT@orcl12c> select *

                    2  from   parent p, child c

                    3  where  not contains (p.names, '{' || c.name || '}') > 0

                    4  and    c.id = p.id

                    5  /

                   

                          ID NAMES                      ID NAME

                  ---------- ------------------ ---------- ----------------

                           4 sigMa Tau                   4 Sigma-Tau

                           6 AG                          6 Pharma AG

                           7 Inc                         7 Pfizer

                   

                  3 rows selected.

                   

                  SCOTT@orcl12c> set serveroutput on format wrapped

                  SCOTT@orcl12c> declare

                    2    v_count number;

                    3  begin

                    4    dbms_output.put_line ('          ID NAMES                ID NAME');

                    5    dbms_output.put_line ('---------- ------------------ ---------- ----------------');

                    6    for p in

                    7       (select id, names from parent)

                    8    loop

                    9       for c in

                  10         (select id, name from child where id = p.id)

                  11       loop

                  12         select count(*) into v_count from parent

                  13         where  catsearch (names, '{' || c.name || '}', null) <= 0

                  14         and      id = c.id;

                  15         if v_count = 0 then

                  16           dbms_output.put_line

                  17             (lpad (p.id, 10) || ' ' ||

                  18          rpad (p.names, 18) ||

                  19          lpad (c.id, 11) || ' ' ||

                  20          c.name);

                  21         end if;

                  22       end loop;

                  23    end loop;

                  24  end;

                  25  /

                          ID NAMES                      ID NAME

                  ---------- ------------------ ---------- ----------------

                           4 sigMa Tau                   4 Sigma-Tau

                           6 AG                          6 Pharma AG

                           7 Inc                         7 Pfizer

                   

                  PL/SQL procedure successfully completed.

                  • 6. Re: Replacing ctxcat with context
                    chris227

                    Thanks Barbara for the effort you spent in this thread. Based on my observations over the last years i have to say that you are one of the most awesome people i know at OTN, sophistcated skills without any vanity which makes it only more remarkable.

                     

                    On the issue:

                    I wasnt confused by the functionality of skipjoins, i think i did understand what this does.

                    I totally had forgotten about the usage of {} and that the hyphen is a special (minus) character, think i had not used this since over ten years.

                     

                    So for both "plain" indexes i now got the same results, without using translate (and i stop here since i surely get confused if i keep thinking on this anymore ...)

                     

                    select

                    c.id cid, c.name cname, p.id pid, p.names pnames

                    from   parent p join child c

                    on    c.id = p.id

                    and

                    not contains (p.names, '{' || c.name || '}') > 0

                     

                    CIDCNAMEPIDPNAMES
                    6Pharma AG6AG
                    7Pfizer7Inc

                     

                    select

                    c.id cid, c.name cname, p.id pid, p.names pnames

                    from   parent p join child c

                    on    c.id = p.id

                    and

                    not exists

                    (select 1 from parent p

                    where

                    c.id=p.id

                    and

                    catsearch(p.names,'{'||c.name||'}',null) > 0

                    )

                     

                     

                    CIDCNAMEPIDPNAMES
                    6Pharma AG6AG
                    7Pfizer7Inc