1 2 Previous Next 16 Replies Latest reply: Jul 21, 2014 1:02 PM by spur230 RSS

    Extend ctx_thes.syn to return  formatted synonyms

    spur230

      I am using Oracle 11.2.0.3. We have oracle context text index and have implemented thesaurus on selected few words.

       

      Let say:

      select ctx_thes.syn('nathan') from dual;
      

       

      gives me

      {nathan}| {nate}

       

      However, I want   it to return nathan%| nate%  without  curly bracket and adding wildcard character.

       

      Since , oracle text index is extensible indexing framework, is it possible to change ctx_thes.syn  to meet my requirement or should I write my custom code. If  I need to write a custom code, which table should I be looking into.

       

      Thanks.

        • 1. Re: Extend ctx_thes.syn to return  formatted synonyms
          Barbara Boehmer

          SCOTT@orcl12c> begin

            2    ctx_thes.create_thesaurus ('test_thes');

            3    ctx_thes.create_relation ('test_thes', 'nathan', 'SYN', 'nate');

            4  end;

            5  /

           

          PL/SQL procedure successfully completed.

           

          SCOTT@orcl12c> select ctx_thes.syn ('nathan', 'test_thes')

            2  from   dual

            3  /

           

          CTX_THES.SYN('NATHAN','TEST_THES')

          --------------------------------------------------------------------------------

          {NATHAN}|{NATE}

           

          1 row selected.

           

          SCOTT@orcl12c> select replace

            2            (replace

            3          (replace

            4            (ctx_thes.syn ('nathan', 'test_thes'),

            5             '{', ''),

            6           '}', ''),

            7             '|', '%|')

            8          || '%'

            9  from   dual

          10  /

           

          REPLACE(REPLACE(REPLACE(CTX_THES.SYN('NATHAN','TEST_THES'),'{',''),'}',''),'|','

          --------------------------------------------------------------------------------

          NATHAN%|NATE%

           

          1 row selected.

           

          SCOTT@orcl12c>

          • 2. Re: Extend ctx_thes.syn to return  formatted synonyms
            spur230

            For our name search, we have  custom set of  words defined as synonyms.  I created  these synonyms utilizing oracle text thesaurus as you are doing above.  However, in our application  formatting the end-user input text ( like to enable prefix search by adding % or making soundex search  by adding ! ) is done in front end. So, to implement synonym search, it looks like we need to  make a additional database  calls of CTX_THES.SYN('each_searched_term' for each search terms to get list of synonyms and do existing formatting we were doing.

             

            I am thinking if it would be possible to use USER_DATASOURCE  to  and use CTX_THES.SYN in the procedure and create datastore having name as well as its synonyms?

             

            My other option would be to move formatting  from frontend to database but this involves lot of change.  Which one would you recommend ?

             

            Thanks for your time.

            • 3. Re: Extend ctx_thes.syn to return  formatted synonyms
              Barbara Boehmer

              If you want to allow your users to specify whether they want to use soundex and/or wildcards or not, but you always want to search for synonyms, then I think it is more practical to index the synonyms, using a procedure and a user_datastore, as demonstrated below.

               

               

              SCOTT@orcl12c> begin

                2    ctx_thes.create_thesaurus ('test_thes');

                3    ctx_thes.create_relation ('test_thes', 'nathan', 'SYN', 'nate');

                4  end;

                5  /

               

              PL/SQL procedure successfully completed.

               

              SCOTT@orcl12c> create table test_tab

                2    (id      number,

                3      test_col  varchar2(30))

                4  /

               

              Table created.

               

              SCOTT@orcl12c> insert all

                2  into test_tab values (1, 'nate')

                3  into test_tab values (2, 'nathaniel')

                4  into test_tab values (3, 'nathin')

                5  into test_tab values (4, 'nobody')

                6  into test_tab values (5, 'Nathan Myhrvold')

                7  into test_tab values (6, 'Nat King Cole')

                8  select * from dual

                9  /

               

              6 rows created.

               

              SCOTT@orcl12c> create or replace procedure test_proc

                2    (p_rid  in         rowid,

                3      p_clob in out nocopy clob)

                4  as

                5  begin

                6    for r1 in

                7       (select ctx_thes.syn

                8             (trim (regexp_substr (test_col, '[^ ]+', 1, rownum)),

                9              'test_thes') token

              10        from     (select test_col from test_tab where rowid = p_rid)

              11        connect by level <= regexp_count (test_col, ' ') + 1)

              12    loop

              13       dbms_lob.writeappend (p_clob, length (r1.token), r1.token);

              14    end loop;

              15  end test_proc;

              16  /

               

              Procedure created.

               

              SCOTT@orcl12c> show errors

              No errors.

              SCOTT@orcl12c> begin

                2    ctx_ddl.create_preference ('test_ds', 'user_datastore');

                3    ctx_ddl.set_attribute ('test_ds', 'procedure', 'test_proc');

                4  end;

                5  /

               

              PL/SQL procedure successfully completed.

               

              SCOTT@orcl12c> create index test_idx

                2  on test_tab (test_col)

                3  indextype is ctxsys.context

                4  parameters ('datastore  test_ds')

                5  /

               

              Index created.

               

              SCOTT@orcl12c> select token_text from dr$test_idx$i

                2  /

               

              TOKEN_TEXT

              ----------------------------------------------------------------

              COLE

              KING

              MYHRVOLD

              NAT

              NATE

              NATHAN

              NATHANIEL

              NATHIN

              NOBODY

               

              9 rows selected.

               

              SCOTT@orcl12c> variable user_input varchar2(100)

              SCOTT@orcl12c> exec :user_input := '!nate%'

               

              PL/SQL procedure successfully completed.

               

              SCOTT@orcl12c> select * from test_tab

                2  where  contains (test_col, :user_input) > 0

                3  /

               

                      ID TEST_COL

              ---------- ------------------------------

                       1 nate

                       5 Nathan Myhrvold

                       6 Nat King Cole

               

              3 rows selected.

               

              SCOTT@orcl12c> exec :user_input := '!nathan%'

               

              PL/SQL procedure successfully completed.

               

              SCOTT@orcl12c> /

               

                      ID TEST_COL

              ---------- ------------------------------

                       1 nate

                       2 nathaniel

                       3 nathin

                       5 Nathan Myhrvold

               

              4 rows selected.

               

              Message was edited by: Barbara Boehmer (made correction to select in procedure)

              • 4. Re: Extend ctx_thes.syn to return  formatted synonyms
                spur230

                Many thanks for your response. I have created synonym for adviser and advisor. However, it is not getting  picked if  organization name is advisors with a 's'  or advisor, (has addition ,).

                 

                Do I need create synonyms for these terms too?

                 

                Also another confusion I am having is , suppose  I create related term for advisory and adviser along with existing synonyms, then do need to call ctx_thes.rt() for all searched terms along with ctx_thes.syn()  that I am calling?

                 

                Isn't it possible to configure Oracle text to automatically do synonym and related term search when they are defined?

                • 5. Re: Extend ctx_thes.syn to return  formatted synonyms
                  spur230

                  Hi Barbara,

                   

                   

                  Regarding the above  test_proc, I am getting lots of NULL and some values are not preset. Is that expected?

                   

                  WITH org

                       AS (SELECT 1 org_pk, 'ALL american  ,INC' org_nm FROM DUAL

                           UNION ALL

                           SELECT 2 org_pk, 'A.G  COMPANY' org_nm FROM DUAL

                           UNION ALL

                           SELECT 3 org_pk, 'GROWTH FUND SPONSORS and SONS, INC.' org_nm FROM DUAL)           

                  SELECT org_pk,  TRIM (REGEXP_SUBSTR (org_nm,

                                                '[^ ]+',

                                                1,

                                                ROWNUM))

                          token from org

                                   connect by level <= regexp_count (org_nm, ' ') + 1

                   

                  Instead of this , I will be using  following to tokenize the string. let me know if this should be ok

                   

                  with org as (

                               select 1 org_pk,'ALL american  ,INC' org_nm from dual union all

                               select 2 org_pk,'A.G  COMPANY' org_nm from dual union all

                               select 3 org_pk,'GROWTH FUND SPONSORS and SONS, INC.' org_nm from dual

                              )

                  select  org_pk,

                          x.org_nm      

                    from  org,

                          xmltable(

                                   'ora:tokenize(.," ")'

                                   passing ' ' || org_nm

                                   columns

                                     org_nm varchar2(4000) path '.'

                                  ) x

                    where x.org_nm is not null

                  • 6. Re: Extend ctx_thes.syn to return  formatted synonyms
                    Barbara Boehmer
                    ...I have created synonym for adviser and advisor. However, it is not getting  picked if  organization name is advisors with a 's'  or advisor, (has addition ,).

                     

                    Do I need create synonyms for these terms too?...

                     

                     

                    No, you do not need synonyms for that.  You can use either stemming or soundex or fuzzy or a wldcard, as shown below.

                     

                    SCOTT@orcl12c> create table test_tab (test_col varchar2(30))

                      2  /

                     

                    Table created.

                     

                    SCOTT@orcl12c> insert into test_tab values ('advisors')

                      2  /

                     

                    1 row created.

                     

                    SCOTT@orcl12c> create index test_idx on test_tab (test_col)

                      2  indextype is ctxsys.context

                      3  /

                     

                    Index created.

                     

                    SCOTT@orcl12c> select * from test_tab where contains (test_col, 'advisor') > 0

                      2  /

                     

                    no rows selected

                     

                    SCOTT@orcl12c> select * from test_tab where contains (test_col, '$advisor') > 0

                      2  /

                     

                    TEST_COL

                    ------------------------------

                    advisors

                     

                    1 row selected.

                     

                    SCOTT@orcl12c> select * from test_tab where contains (test_col, '!advisor') > 0

                      2  /

                     

                    TEST_COL

                    ------------------------------

                    advisors

                     

                    1 row selected.

                     

                    SCOTT@orcl12c> select * from test_tab where contains (test_col, '?advisor') > 0

                      2  /

                     

                    TEST_COL

                    ------------------------------

                    advisors

                     

                    1 row selected.

                     

                    SCOTT@orcl12c> select * from test_tab where contains (test_col, 'advisor%') > 0

                      2  /

                     

                    TEST_COL

                    ------------------------------

                    advisors

                     

                    1 row selected.

                     

                    SCOTT@orcl12c>

                    • 7. Re: Extend ctx_thes.syn to return  formatted synonyms
                      Barbara Boehmer

                      ...suppose  I create related term for advisory and adviser along with existing synonyms, then do need to call ctx_thes.rt() for all searched terms along with ctx_thes.syn()  that I am calling?

                       

                       

                      You can either create it as a synonym, even though it is a related term or create it as a related term and add a call to ctx_thes.rt.

                      • 8. Re: Extend ctx_thes.syn to return  formatted synonyms
                        Barbara Boehmer

                        ...Isn't it possible to configure Oracle text to automatically do synonym and related term search when they are defined?

                         

                        As far as I know, you either have to use syn or ctx_thes.syn in your query or create a function that uses ctx_thes.syn and use that function either in your query or to convert your input prior to passing it to the query or use ctx_thes.syn in a procedure used by a user_datastore.  As far as I know, there is nothing that you can modify in the database, to cause it to apply the syn or ctx_thes.syn function to all text queries.

                        • 9. Re: Extend ctx_thes.syn to return  formatted synonyms
                          Barbara Boehmer

                          Regarding the above  test_proc, I am getting lots of NULL and some values are not preset. Is that expected?...

                           

                           

                          I think you may have copied a version of the procedure before I corrected it.  I added a sub-query that returns one row, instead of using dual.  That eliminates joining to every row in the table.  The other thing that causes null values is if there are double spaces in your data.  Please see the following example that shows what the corrected procedure returns.  Note that if there are triple spaces and such, then you would have to allow for that.  There are many was to tokenize strings and you can use whichever you like.  All of the special characters in the output below are ignored during indexing, so you end up with just the tokenized individual words.

                           

                          SCOTT@orcl12c> begin

                            2    ctx_thes.create_thesaurus ('test_thes');

                            3    ctx_thes.create_relation ('test_thes', 'nathan', 'SYN', 'nate');

                            4  end;

                            5  /

                           

                          PL/SQL procedure successfully completed.

                           

                          SCOTT@orcl12c> create table org

                            2        AS (SELECT 1 org_pk, 'ALL american  ,INC' org_nm FROM DUAL

                            3            UNION ALL

                            4            SELECT 2 org_pk, 'A.G  COMPANY' org_nm FROM DUAL

                            5            UNION ALL

                            6            SELECT 3 org_pk, 'GROWTH FUND SPONSORS and SONS, INC.' org_nm FROM DUAL)

                            7  /

                           

                          Table created.

                           

                          SCOTT@orcl12c> insert all

                            2  into org values (4, 'nate')

                            3  into org values (5, 'nathaniel')

                            4  into org values (6, 'nathin')

                            5  into org values (7, 'nobody')

                            6  into org values (8, 'Nathan Myhrvold')

                            7  into org values (9, 'Nat King Cole')

                            8  into org values (10, 'adviser')

                            9  into org values (11, 'advisor')

                          10  into org values (12, 'advisers')

                          11  into org values (13, 'advisors')

                          12  into org values (14, 'advisory')

                          13  select * from dual

                          14  /

                           

                          11 rows created.

                           

                          SCOTT@orcl12c> create or replace procedure test_proc

                            2    (p_rid  in         rowid,

                            3      p_clob in out nocopy clob)

                            4  as

                            5  begin

                            6    for r1 in

                            7       (select ctx_thes.syn

                            8             (trim (regexp_substr (org_nm, '[^ ]+', 1, rownum)),

                            9              'test_thes') token

                          10        from     (select org_nm from org where rowid = p_rid)

                          11        connect by level <= regexp_count (replace (org_nm, '    ', ' '), ' ') + 1)

                          12    loop

                          13       dbms_lob.writeappend (p_clob, length (r1.token), r1.token);

                          14    end loop;

                          15  end test_proc;

                          16  /

                           

                          Procedure created.

                           

                          SCOTT@orcl12c> show errors

                          No errors.

                          SCOTT@orcl12c> declare

                            2    v_clob  clob;

                            3  begin

                            4    for r in (select rowid from org) loop

                            5       dbms_lob.createtemporary (v_clob, true);

                            6       test_proc (r.rowid, v_clob);

                            7       dbms_output.put_line (v_clob);

                            8       dbms_lob.freetemporary (v_clob);

                            9    end loop;

                          10  end;

                          11  /

                          {ALL}{AMERICAN}{,INC}

                          {A.G}{COMPANY}

                          {GROWTH}{FUND}{SPONSORS}{AND}{SONS,}{INC.}

                          {NATE}|{NATHAN}

                          {NATHANIEL}

                          {NATHIN}

                          {NOBODY}

                          {NATHAN}|{NATE}{MYHRVOLD}

                          {NAT}{KING}{COLE}

                          {ADVISER}

                          {ADVISOR}

                          {ADVISERS}

                          {ADVISORS}

                          {ADVISORY}

                           

                          PL/SQL procedure successfully completed.

                           

                          SCOTT@orcl12c>

                          • 10. Re: Extend ctx_thes.syn to return  formatted synonyms
                            Barbara Boehmer

                            To use the method you selected, it would be something like this:

                             

                            SCOTT@orcl12c> begin

                              2    ctx_thes.create_thesaurus ('test_thes');

                              3    ctx_thes.create_relation ('test_thes', 'nathan', 'SYN', 'nate');

                              4  end;

                              5  /

                             

                            PL/SQL procedure successfully completed.

                             

                            SCOTT@orcl12c> create table org

                              2        AS (SELECT 1 org_pk, 'ALL american  ,INC' org_nm FROM DUAL

                              3            UNION ALL

                              4            SELECT 2 org_pk, 'A.G  COMPANY' org_nm FROM DUAL

                              5            UNION ALL

                              6            SELECT 3 org_pk, 'GROWTH FUND SPONSORS and SONS, INC.' org_nm FROM DUAL)

                              7  /

                             

                            Table created.

                             

                            SCOTT@orcl12c> insert all

                              2  into org values (4, 'nate')

                              3  into org values (5, 'nathaniel')

                              4  into org values (6, 'nathin')

                              5  into org values (7, 'nobody')

                              6  into org values (8, 'Nathan Myhrvold')

                              7  into org values (9, 'Nat King Cole')

                              8  into org values (10, 'adviser')

                              9  into org values (11, 'advisor')

                            10  into org values (12, 'advisers')

                            11  into org values (13, 'advisors')

                            12  into org values (14, 'advisory')

                            13  select * from dual

                            14  /

                             

                            11 rows created.

                             

                            SCOTT@orcl12c> create or replace procedure test_proc

                              2    (p_rid  in         rowid,

                              3      p_clob in out nocopy clob)

                              4  as

                              5  begin

                              6    for r1 in

                              7       (select ctx_thes.syn (org_nm, 'test_thes') token

                              8        from     (select x.org_nm

                              9            from     org,

                            10               xmltable

                            11                 ('ora:tokenize(.," ")'

                            12                  passing ' ' || org_nm

                            13                  columns org_nm varchar2(4000) path '.') x

                            14            where  org.rowid = p_rid

                            15            and     x.org_nm is not null))

                            16    loop

                            17       dbms_lob.writeappend (p_clob, length (r1.token), r1.token);

                            18    end loop;

                            19  end test_proc;

                            20  /

                             

                            Procedure created.

                             

                            SCOTT@orcl12c> show errors

                            No errors.

                            SCOTT@orcl12c> declare

                              2    v_clob  clob;

                              3  begin

                              4    for r in (select rowid from org) loop

                              5       dbms_lob.createtemporary (v_clob, true);

                              6       test_proc (r.rowid, v_clob);

                              7       dbms_output.put_line (v_clob);

                              8       dbms_lob.freetemporary (v_clob);

                              9    end loop;

                            10  end;

                            11  /

                            {ALL}{AMERICAN}{,INC}

                            {A.G}{COMPANY}

                            {GROWTH}{FUND}{SPONSORS}{AND}{SONS,}{INC.}

                            {NATE}|{NATHAN}

                            {NATHANIEL}

                            {NATHIN}

                            {NOBODY}

                            {NATHAN}|{NATE}{MYHRVOLD}

                            {NAT}{KING}{COLE}

                            {ADVISER}

                            {ADVISOR}

                            {ADVISERS}

                            {ADVISORS}

                            {ADVISORY}

                             

                            PL/SQL procedure successfully completed.

                             

                            SCOTT@orcl12c>

                            • 11. Re: Extend ctx_thes.syn to return  formatted synonyms
                              spur230

                              I have over 4 million names and creating context index using  user_datastore  to include synonym is taking over 30 minutes with parallel option.

                               

                              Apart from parallel option ,   I tried memory preference ( memory  1G) but it had no impact on performance. Am I using the syntax correctly?

                               

                              CREATE INDEX ft_idx_temp_indvl

                                 ON temp_indvl (INDVL_NM_UPD_FLG)

                                 INDEXTYPE IS ctxsys.context

                                    PARAMETERS ('DATASTORE test_user_datastore    SYNC ( ON COMMIT) MEMORY 1G')

                                    PARALLEL;

                               

                              Would  you recommend any other settings/ preference  to speed up creation of context index using user_datasource?

                               

                              Database: 11.2.0.3

                              • 12. Re: Extend ctx_thes.syn to return  formatted synonyms
                                Barbara Boehmer

                                Please see the following section of the online documentation for general guidelines on Oracle Text indexing performance;

                                 

                                Tuning Oracle Text

                                • 13. Re: Extend ctx_thes.syn to return  formatted synonyms
                                  spur230

                                  Thanks for point me to right direction.  Using parallel and  result_cache to get synonyms helped a lot with index creation time.Further more , since  my list of synonyms is less than 100 I am only using ctx_thes.syn funtion only when the term exists in ctx_thes_phrases.

                                   

                                  My function to get synonyms using RESULT_CACHE:

                                   

                                  CREATE or replace  FUNCTION fn_synonyms (
                                                      p_name IN VARCHAR2
                                                                          ) RETURN VARCHAR2 RESULT_CACHE IS
                                      BEGIN
                                              RETURN REGEXP_REPLACE ( ctx_thes.syn ('{' || p_name || '}', 'C_THESAURUS') , '^{|}$|}\|{', ' ');
                                      END fn_synonyms;
                                  
                                  

                                   

                                  My user_datasource proceudure ( I' m joining with ctx_thes_phrases and using fn_synonyms only when requred ):

                                   

                                  CREATE OR REPLACE PROCEDURE sp_name_with_synonym (p_rowid IN ROWID, p_return IN OUT NOCOPY VARCHAR2)
                                  AS
                                     l_string   VARCHAR2 (32767);
                                  BEGIN
                                     SELECT   CASE WHEN c.thp_phrase IS NOT NULL THEN fn_synonyms (first_nm) ELSE first_nm END || ' ' || MID_NM || ' ' || LAST_NM
                                       INTO   p_return
                                       FROM   indvl t,
                                              (SELECT /*+ result_cache*/
                                                     thp_phrase
                                                 FROM   ctx_thes_phrases
                                                WHERE   thp_thesaurus = 'C_THESAURUS') c
                                      WHERE   c.thp_phrase(+) = t.first_nm    and   t.ROWID = p_rowid;
                                  
                                  EXCEPTION
                                     WHEN NO_DATA_FOUND
                                     THEN
                                        RAISE PROGRAM_ERROR;
                                  END;
                                  
                                  • 14. Re: Extend ctx_thes.syn to return  formatted synonyms
                                    spur230

                                    Now that nate and nathan  are created as synonyms using user_datastore shown above.

                                     

                                    An enhancement we are looking is: when searching for nate,  it should have higher priority than its synonyms.

                                    1 2 Previous Next