5 Replies Latest reply: Apr 22, 2013 6:28 PM by Barbara Boehmer RSS

    order of words, fuzzy and utl_match

    949210
      Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
      PL/SQL Release 10.2.0.1.0 - Production
      "CORE     10.2.0.1.0     Production"
      TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
      NLSRTL Version 10.2.0.1.0 - Production
      create table category(cat_id number(20),cat_type varchar2(3000));
      create table category_match(cat_id number(20),cat_type varchar2(3000));
      Insert into category (CAT_ID,CAT_TYPE) values (12790,'AUTO CONSULTANTS');
      INSERT INTO CATEGORY (CAT_ID,CAT_TYPE) VALUES (23803,'AUTO CONSULTANT');
      Insert into category (CAT_ID,CAT_TYPE) values (23804,'CONSULTANT FOR AUTO FINANCE');
      Insert into category_match (CAT_ID,CAT_TYPE) values (12790,'AUTO CONSULTANTS');
      INSERT INTO CATEGORY_match (CAT_ID,CAT_TYPE) VALUES (23803,'AUTO CONSULTANT');
      Insert into category_match (CAT_ID,CAT_TYPE) values (23804,'CONSULTANT FOR AUTO FINANCE');
      
      CREATE INDEX "LOOKING4"."MYINDEX" ON "CATEGORY_MATCH"
        (
          "CAT_TYPE"
        )
        INDEXTYPE IS "CTXSYS"."CONTEXT" ;
      
      
      CREATE INDEX "LOOKING4"."CAT_TYPE_IDX" ON "CATEGORY"
        (
          "CAT_TYPE"
        )
        INDEXTYPE IS "CTXSYS"."CTXCAT" ;
      
      
      select cat_id,CAT_TYPE,UTL_MATCH.edit_distance_similarity(CAT_TYPE,'AUTO CONSULTANT') from
      (
      select * from category where catsearch(cat_type,
      '<query>
            <textquery grammar="context">
            <progression>
            <seq>auto consultant</seq>
            <seq>?(auto) and ?(consultant)</seq>
              </progression>
              </textquery>
       </query>'
      ,NULL)>0
      )where rownum<5
      
      
      23803     AUTO CONSULTANT     100
      12790     AUTO CONSULTANTS     94
      23804     CONSULTANT FOR AUTO FINANCE     26
      
      update category set cat_type='CONSULTANTS AUTO' WHERE CAT_ID=12790
      select cat_id,CAT_TYPE,UTL_MATCH.edit_distance_similarity(CAT_TYPE,'AUTO CONSULTANT') from
      (
      select * from category where catsearch(cat_type,
      '<query>
            <textquery grammar="context">
            <progression>
            <seq>auto consultant</seq>
            <seq>?(auto) and ?(consultant)</seq>
              </progression>
              </textquery>
       </query>'
      ,NULL)>0
      )where rownum<5
      23803     AUTO CONSULTANT     100
      12790     CONSULTANTS AUTO     32
      23804     CONSULTANT FOR AUTO FINANCE     26
      
      select score(1),cat_id,cat_type from CATEGORY_MATCH where cat_id in(
      select cat_id from category where catsearch(cat_type,
      '<query>
            <textquery grammar="context">
            <progression>
            <seq>auto consultant</seq>
            <seq>?(auto) and ?(consultant)</seq>
              </progression>
              </textquery>
       </query>'
      ,NULL)>0) AND
      contains(cat_type,'?(auto) and ?(consultant)',1)>0
      9     23803     AUTO CONSULTANT
      9     12790     AUTO CONSULTANTS
      9     23804     CONSULTANT FOR AUTO FINANCE
      i have been using catsearch to use progressive relaxation
      there are many "cat_types" like "cat_id" =23803,12790 ,the order of words in a sentence changes
      there are upto 10 words in each row of "cat_types" column
      among others i have referred
      Achieving functionality of many preferences using one context index
      and
      Re: Fuzzy search - more accurate score??
      there is very less possibility of repetition of words in a row
      utl match seems to work perfect only when the order of appearance of words is same
      if you can suggest a way to get a very close score for cat_id 23803 and 12790 it would be much appreciated
      thanks and regards
        • 1. Re: order of words, fuzzy and utl_match
          Barbara Boehmer
          There are various ways of adjusting the scoring and ordering and there are more options available in 11g than 10g and the 11g optimizer handles some things better as well. In general, the ordering is more important than the closeness of the score values, especially if you limit results by the number of rows returned rather than the score. However, if you want maximum score manipulation, then you can use a context index with a separate contains clause and score for each condition. If you multiply each score by 10 * 10, then each score will be the maximum 100. You can then multiply or subtract or whatever to adjust those scores and use the greatest of the results for your scoring. You can use Levenshtein distance to do further sub-ordering within matching scores. All ordering should be done within an inner sub-query and limiting by rownum within an outer sub-query. Please see the demonstration below.
          SCOTT@orcl_11gR2> CREATE TABLE category (cat_id NUMBER(20), cat_type VARCHAR2(30));
          
          Table created.
          
          SCOTT@orcl_11gR2> INSERT ALL
            2  INTO category (cat_id, cat_type) VALUES (1, 'AUTO CONSULTANT')
            3  INTO category (cat_id, cat_type) VALUES (2, 'AUTO CONSULTANTS')
            4  INTO category (cat_id, cat_type) VALUES (3, 'CONSULTANTS AUTO')
            5  INTO category (cat_id, cat_type) VALUES (4, 'AUTO CONSULTING')
            6  INTO category (cat_id, cat_type) VALUES (5, 'CONSULTANT FOR AUTO FINANCE')
            7  INTO category (cat_id, cat_type) VALUES (6, 'OTHER STUFF')
            8  SELECT * FROM DUAL;
          
          6 rows created.
          
          SCOTT@orcl_11gR2> CREATE INDEX myindex ON category (cat_type) INDEXTYPE IS CTXSYS.CONTEXT
            2    PARAMETERS ('STOPLIST CTXSYS.EMPTY_STOPLIST');
          
          Index created.
          
          SCOTT@orcl_11gR2> COLUMN eds FORMAT 999
          SCOTT@orcl_11gR2> SELECT *
            2  FROM   (SELECT GREATEST (SCORE(1), SCORE(2) - 1, SCORE(3) - 2, SCORE(4) - 3) score,
            3                UTL_MATCH.EDIT_DISTANCE_SIMILARITY (cat_type,'AUTO CONSULTANT') eds,
            4                cat_id, cat_type
            5            FROM   category
            6            -- exact words in order:
            7            WHERE  CONTAINS (cat_type, 'auto consultant * 10 * 10', 1) > 0
            8            -- similar words next to each other in order:
            9            OR     CONTAINS (cat_type, 'NEAR ((?auto, ?consultant), 0, TRUE) * 10 * 10', 2) > 0
           10            -- similar words next to each other in any order:
           11            OR     CONTAINS (cat_type, 'NEAR ((?auto, ?consultant), 0, FALSE) * 10 * 10', 3) > 0
           12            -- similar words anywhere in any order:
           13            OR     CONTAINS (cat_type, '(?auto AND ?consultant) * 10 * 10', 4) > 0
           14            ORDER  BY score DESC, eds DESC)
           15  WHERE  ROWNUM < 10
           16  /
          
               SCORE  EDS     CAT_ID CAT_TYPE
          ---------- ---- ---------- ------------------------------
                 100  100          1 AUTO CONSULTANT
                  99   94          2 AUTO CONSULTANTS
                  99   87          4 AUTO CONSULTING
                  98   32          3 CONSULTANTS AUTO
                  97   26          5 CONSULTANT FOR AUTO FINANCE
          
          5 rows selected.
          • 2. Re: order of words, fuzzy and utl_match
            949210
            select * 
                FROM   (SELECT score(1),score(2),score(3),score(4),GREATEST (SCORE(1), SCORE(2) - 1, SCORE(3) - 2, SCORE(4) - 3) g_scores, 
                              UTL_MATCH.EDIT_DISTANCE_SIMILARITY (CAT_TYPE,'AUTO CONSULTANT') EDS, 
                              CAT_ID, CAT_TYPE 
                          FROM   category_match 
                          
                          WHERE  CONTAINS (cat_type, 'solar water heater* 10 * 10', 1) > 0 
                          
                          OR     CONTAINS (cat_type, 'NEAR ((?solar, ?water ,?heater), 0, TRUE) * 10 * 10', 2) > 0 
                         
                         OR     CONTAINS (cat_type, 'NEAR ((?solar, ?water ,?heater), 0, FALSE) * 10 * 10', 3) > 0 
                         
                         or     CONTAINS (CAT_TYPE, '(?solar AND ?water AND ?heater) * 10 * 10', 4) > 0 
                         order  by g_scores desc, EDS desc) 
               WHERE  ROWNUM<100 
            
            100     100     100     100     100     23     4                  SOLAR WATER HEATER-ANU
            100     100     100     100     100     22     26901          SOLAR WATER HEATER SUDARSHAN SAUR
            100     100     100     100     100     21     30                  SOLAR WATER HEATER INDUSTRIAL
            100     100     100     100     100     20     17379          SOLAR WATER HEATER DEALERS-TATA
            100     100     100     100     100     20     26906          SOLAR WATER HEATER NUETECH
            100     100     100     100     100     20     11465          SOLAR WATER HEATER DEALERS-ANU
            100     100     100     100     100     20     21                  SOLAR WATER HEATER-ZING TATA BP
            100     100     100     100     100     20     11463          SOLAR WATER HEATER MANUFACTURERS-ANU
            100     100     100     100     100     19     8                  SOLAR WATER HEATER MANUFACTURERS
            100     100     100     100     100     19     23                  SOLAR WATER HEATER EVACUATED TUBE
            100     100     100     100     100     19     49                  SOLAR WATER HEATER-HOTMAX NOVA TATA BP
            100     100     100     100     100     19     13357          SOLAR WATER HEATER INDUSTRIAL DEALERS
            100     100     100     100     100     18     16300          SOLAR WATER HEATER-TECHNOMAX
            100     100     100     100     100     18     9                  SOLAR WATER HEATER DEALERS-TATA BP
            100     100     100     100     100     18     20                  SOLAR WATER HEATER-ZING
            100     100     100     100     100     18     18                  SOLAR WATER HEATER-ORB SOLAR
            100     100     100     100     100     18     22552          SOLAR WATER HEATER-KOTAK URJA
            100     100     100     100     100     18     26908          SOLAR WATER HEATER SUPREME
            100     100     100     100     100     17     26907          SOLAR WATER HEATER TECHNOMAX"
            100     100     100     100     100     17     13322          SOLAR WATER HEATER DISTRIBUTORS
            100     100     100     100     100     17     22                  SOLAR WATER HEATER-ETC TATA BP
            100     100     100     100     100     17     48                  SOLAR WATER HEATER-VAJRA PLUS TATA BP
            100     100     100     100     100     17     27084          SOLAR WATER HEATER SALES
            100     100     100     100     100     16     16236          SOLAR WATER HEATER DEALERS-RACOLD
            100     100     100     100     100     16     15                  SOLAR WATER HEATER-NUTECH
            100     100     100     100     100     16     1                  SOLAR WATER HEATER DEALERS
            100     100     100     100     100     15     2                  SOLAR WATER HEATER DEALERS-TATA BP SOLAR
            100     100     100     100     100     15     31                  SOLAR WATER HEATER DOMESTIC
            100     100     100     100     100     15     13                  SOLAR WATER HEATER DEALERS-V GUARD
            100     100     100     100     100     14     17                  SOLAR WATER HEATER-KAMAL SOLAR
            100     100     100     100     100     13     11467          SOLAR WATER HEATER DEALERS-GILMA
            100     100     100     100     100     13     19                  SOLAR WATER HEATER-GILMA
            100     100     100     100     100     13     10                  SOLAR WATER HEATER REPAIRS & SERVICES-TATA SOLAR
            100     100     100     100     100     12     10578          SOLAR WATER HEATER
            100     100     100     100     100     11     3                  SOLAR WATER HEATER REPAIRS & SERVICES
            0      0     100     100     98     25     10120          WATER HEATER SOLAR INDUSTRIAL
            0      0     100     100     98     20     12953          WATER HEATER SOLAR-RACOLD
            0      0     100     100     98     17     10119          WATER HEATER SOLAR RESIDENCIAL
            {code}
            the query is working accurately technically
            but is there any way to get 10578 on top
            the requirement is 
            ---first
            solar water heater 
            solar water heater dealers
            solar water heater manufacturers
            solar water heater distributors
            solar water heater sales
            solar water heater repairs and servicing
            ---followed by
            SOLAR WATER HEATER REPAIRS & SERVICES-TATA SOLAR
            SOLAR WATER HEATER-KAMAL SOLAR
            SOLAR WATER HEATER DEALERS-TATA BP SOLAR   etc etc
            
            so if the end user types in "solar water" the top row would have a row from the table that has what the end user has entered followed by "dealers" or "manufacturers" or "distributors" or "sales" or "repairs and servicing"
            so if a row contains "solar water dealer" it shows up on top
            ----------------
            or(if "solar water dealer" is not there and "solar water manufacturers" or  "solar water distributors" etc is not present)
            a row from the table that has what the end user has entered PLUS "heater" followed by "dealers" or "manufacturers" or "distributors" or "sales" or "repairs and servicing"
            so "solar water heater dealers" shows up on top
            ----------------
            these words - "dealers" , "manufacturers" , "distributors" , "sales" , "repairs and servicing"  etc remain constant
            --------
            what i am using right now is 
            {code}
            create or replace
            procedure HOME_OLD
            (
            p_cat_type in varchar2,
            P_LOC IN NUMBER,
            P_MAX IN NUMBER,
            P_MIN IN NUMBER,
            P_OUT OUT SYS_REFCURSOR
            )
            as 
            VARIAB varchar2(500);
            VARIAB2 varchar2(500);
            VARIAB3 varchar2(500);
            VARIAB4 varchar2(500);
            begin
            --VARIAB2:='?'||replace(P_CAT_TYPE,' ',', ?');
            --VARIAB3:='?'||replace(P_CAT_TYPE,' ',' ?');
            --DBMS_OUTPUT.PUT_LINE(VARIAB2);
            --DBMS_OUTPUT.PUT_LINE(VARIAB3);
            SELECT stragg(cat_id) into variab
               FROM   (SELECT GREATEST (SCORE(1), SCORE(2) - 1, SCORE(3) - 2, SCORE(4) - 3) score,
                                              CAT_ID, CAT_TYPE
                          FROM   category_match
                          -- exact words in order:
                          WHERE  CONTAINS (cat_type,get_basic(P_CAT_TYPE), 1) > 0
                          -- similar words next to each other in order:
                          OR     CONTAINS (cat_type, get_near_syntax(P_CAT_TYPE), 2) > 0
                         -- similar words next to each other in any order:
                         OR     CONTAINS (cat_type, get_near_syntax_desc(P_CAT_TYPE), 3) > 0
                         -- similar words anywhere in any order:
                         OR     CONTAINS (cat_type, get_anywhere(P_CAT_TYPE), 4) > 0
                         order  by score desc)
               where  rownum < 3;
            DBMS_OUTPUT.PUT_LINE(VARIAB);
             open p_out
               FOR select * from(select rownum r,name,address1,telephone,mobile,CAT_TYP,cat_id,
             
            (case when  address2=p_loc and ACT_STATUS='Y'  then '1' when  address2=p_loc  then '2' when address2 in 
            (select NEARBY_LOC from NEAR_BY where LOCALITY_ID=p_loc) and ACT_STATUS='Y'  
            then '3' when ADDRESS2 in (select NEARBY_LOC from NEAR_BY where LOCALITY_ID=p_loc)
            then '4' when ACT_STATUS='Y' and address2<> p_loc then '5' else '6' end) as marker
            
              FROM TEST_TEST 
              WHERE 
              CAT_ID in(select * from table(STRING_TO_TABLE_NUM(variab))) and rownum<P_MAX order by marker) where r>P_MIN;
              
            
            
            IF VARIAB IS NULL THEN 
             OPEN P_OUT
               FOR SELECT * FROM(SELECT rownum r,name,address1,telephone,mobile,CATS
               FROM   (SELECT GREATEST (SCORE(1), SCORE(2) - 1, SCORE(3) - 2, SCORE(4) - 3) score,
                                              NAME,ADDRESS1,TELEPHONE,MOBILE,CATS
                          FROM   TEST_TEST2
                          -- exact words in order:
                          WHERE  CONTAINS (NAME,get_basic(P_CAT_TYPE), 1) > 0
                          -- similar words next to each other in order:
                          OR     CONTAINS (NAME, get_near_syntax(P_CAT_TYPE), 2) > 0
                         -- similar words next to each other in any order:
                         OR     CONTAINS (NAME, get_near_syntax_desc(P_CAT_TYPE), 3) > 0
                         -- similar words anywhere in any order:
                         OR     CONTAINS (NAME, get_anywhere(P_CAT_TYPE), 4) > 0
                         ORDER  BY SCORE DESC)
               WHERE  ROWNUM < P_MAX)where r>P_MIN;
            END IF;
            
            end home_old;
            {code}
            -----------------
            the flow is to find what the end user has entered in category table ,if a match exists,find all reg_ids from test_test materialized view that have selected the matched cat_id..
            the test_test materialized view lists each company cat_id-selected-by-that-company number of times  
            --------------------
            if no match is found in category table what the end user has entered could be a company so  a search in name column of test_test2 materialized view..
            this materialized view has one entry for each  company
            {code}
            create or replace
            FUNCTION GET_BASIC(P_CAT_TYPE VARCHAR2)
                RETURN VARCHAR2
              is
              VARIAB2 VARCHAR2(3000);
                  begin 
            VARIAB2:='{'||P_CAT_TYPE||'}*10*10';  
            return(VARIAB2);
            END;
            
            create or replace
            FUNCTION GET_NEAR_SYNTAX(P_CAT_TYPE VARCHAR2)
                RETURN VARCHAR2
              is
              VARIAB2 VARCHAR2(3000);
                  begin 
            VARIAB2:='NEAR((?{'||replace(P_CAT_TYPE,' ','}, ?{')||'}),10,TRUE)*10*10';  
            return(VARIAB2);
            END;
            create or replace
            FUNCTION GET_NEAR_SYNTAX_DESC(P_CAT_TYPE VARCHAR2)
                RETURN VARCHAR2
              is
              VARIAB2 VARCHAR2(3000);
                  begin 
            VARIAB2:='NEAR((?{'||replace(P_CAT_TYPE,' ','}, ?{')||'}),10,FALSE)*10*10';  
            return(VARIAB2);
            END;
            {code}
            can anything be done to ameliorate this whole flow 
            can anything be done to eliminate the near_by and act_status and locality checking in ordering by "marker" clause
            below is the materialized view creation ddl
            SELECT IN_V.REG_ID,
                IN_V.NAME,
                IN_V.TELEPHONE,
                IN_V.MOBILE,
                IN_V.ADDRESS1,
                IN_V.ADDRESS2,
                IN_V.ACT_STATUS,
                resec.cat_id,
                UPPER(STRAGG(IN_V.CAT_TYPE)) AS cat_typ
              FROM
                (SELECT RSC.REG_ID,
                  R.NAME,
                  RSC.CAT_ID,
                  C.CAT_TYPE,
                  R.ADDRESS1,
                  R.ADDRESS2,
                  R.ACT_STATUS,
                  R.TELEPHONE,
                  R.MOBILE,
                  ROW_NUMBER() OVER (PARTITION BY RSC.REG_ID ORDER BY rsc.reg_id) AS TT
                FROM REG_SEG_CAT RSC,
                  category C,
                  REGISTRATION R
                WHERE C.CAT_ID=RSC.CAT_ID
                AND R.REG_ID  =RSC.REG_ID
                ) IN_V,
                REG_SEG_CAT RESEC
              WHERE in_v.reg_id=resec.reg_id
              AND IN_V.TT      <6
              GROUP BY IN_V.REG_ID,
                IN_V.NAME,
                IN_V.TELEPHONE,
                IN_V.MOBILE,
                IN_V.ADDRESS2,
                IN_V.ACT_STATUS,
                IN_V.ADDRESS1,
                resec.cat_id;
              and sql>desc test_test
            REG_ID
            NAME
            TELEPHONE
            MOBILE
            ADDRESS1
            ADDRESS2
            ACT_STATUS
            CAT_ID
            CAT_TYP
            please let me know if you need more info
            
            Edited by: 946207 on Apr 19, 2013 6:22 PM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
            • 3. Re: order of words, fuzzy and utl_match
              Barbara Boehmer
              As to the scoring and ordering, the following provides some enhancements. As to the rest, I would need a complete script that is sufficient to reproduce a test case. Utl_match is case sensitive and spaces count, so you need to use upper or lower and trim leading and trailing spaces. This can be done in the select statement as below or in the formatting of data and search string. If you have double spaces other than leading and trailing spaces, then you need to deal with them as well.
              SCOTT@orcl_11gR2> CREATE TABLE category_match (cat_id NUMBER(20), cat_type VARCHAR2(3000));
              
              Table created.
              
              SCOTT@orcl_11gR2> SET DEFINE OFF
              SCOTT@orcl_11gR2> INSERT ALL
                2  INTO category_match (cat_id, cat_type) VALUES (1, 'solar water heater')
                3  INTO category_match (cat_id, cat_type) VALUES (2, 'solar water heater dealers')
                4  INTO category_match (cat_id, cat_type) VALUES (3, 'solar water heater manufacturers')
                5  INTO category_match (cat_id, cat_type) VALUES (4, 'solar water heater distributors')
                6  INTO category_match (cat_id, cat_type) VALUES (5, 'solar water heater sales')
                7  INTO category_match (cat_id, cat_type) VALUES (6, 'solar water heater repairs and servicing')
                8  INTO category_match (cat_id, cat_type) VALUES (7, 'SOLAR WATER HEATER REPAIRS & SERVICES-TATA SOLAR')
                9  INTO category_match (cat_id, cat_type) VALUES (8, 'SOLAR WATER HEATER-KAMAL SOLAR')
               10  INTO category_match (cat_id, cat_type) VALUES (9, 'SOLAR WATER HEATER DEALERS-TATA BP SOLAR etc etc')
               11  INTO category_match (cat_id, cat_type) VALUES (10, 'AUTO CONSULTANT')
               12  INTO category_match (cat_id, cat_type) VALUES (11, 'AUTO CONSULTANTS')
               13  INTO category_match (cat_id, cat_type) VALUES (12, 'AUTO CONSULTING')
               14  INTO category_match (cat_id, cat_type) VALUES (13, 'CONSULTANTS AUTO')
               15  INTO category_match (cat_id, cat_type) VALUES (14, 'CONSULTANT FOR AUTO FINANCE')
               16  INTO category_match (cat_id, cat_type) VALUES (15, 'OTHER STUFF')
               17  SELECT * FROM DUAL;
              
              15 rows created.
              
              SCOTT@orcl_11gR2> CREATE INDEX myindex ON category_match (cat_type) INDEXTYPE IS CTXSYS.CONTEXT;
              
              Index created.
              
              SCOTT@orcl_11gR2> COLUMN cat_type FORMAT A50
              SCOTT@orcl_11gR2> VARIABLE search_string VARCHAR2(100)
              SCOTT@orcl_11gR2> EXEC :search_string := 'solar water heater'
              
              PL/SQL procedure successfully completed.
              
              SCOTT@orcl_11gR2> SELECT GREATEST (
                2              DECODE
                3            (UTL_MATCH.EDIT_DISTANCE_SIMILARITY
                4               (LOWER (LTRIM (RTRIM (cat_type))),
                5                LOWER (LTRIM (RTRIM (:search_string)))),
                6             100, 100, 0),
                7              DECODE
                8            (UTL_MATCH.EDIT_DISTANCE_SIMILARITY
                9               (LOWER (LTRIM (RTRIM (cat_type))),
               10                LOWER (LTRIM (RTRIM (:search_string || ' dealers')))),
               11             100, 99, 0),
               12              DECODE
               13            (UTL_MATCH.EDIT_DISTANCE_SIMILARITY
               14               (LOWER (LTRIM (RTRIM (cat_type))),
               15                LOWER (LTRIM (RTRIM (:search_string || ' manufacturers')))),
               16             100, 98, 0),
               17              DECODE
               18            (UTL_MATCH.EDIT_DISTANCE_SIMILARITY
               19               (LOWER (LTRIM (RTRIM (cat_type))),
               20                LOWER (LTRIM (RTRIM (:search_string || ' distributors')))),
               21             100, 97, 0),
               22              DECODE
               23            (UTL_MATCH.EDIT_DISTANCE_SIMILARITY
               24               (LOWER (LTRIM (RTRIM (cat_type))),
               25                LOWER (LTRIM (RTRIM (:search_string || ' sales')))),
               26             100, 96, 0),
               27              DECODE
               28            (UTL_MATCH.EDIT_DISTANCE_SIMILARITY
               29               (LOWER (LTRIM (RTRIM (cat_type))),
               30                LOWER (LTRIM (RTRIM (:search_string || ' repairs and servicing')))),
               31             100, 95, 0)
               32             ) eds1,
               33           GREATEST (SCORE(1), SCORE(2) - 1, SCORE(3) - 2, SCORE(4) - 3) score,
               34           UTL_MATCH.EDIT_DISTANCE_SIMILARITY
               35             (LOWER (LTRIM (RTRIM (cat_type))),
               36              LOWER (LTRIM (RTRIM (:search_string)))) eds2,
               37           cat_id, cat_type
               38  FROM   category_match
               39  WHERE  CONTAINS
               40             (cat_type,
               41              :search_string || ' * 10 * 10', 1) > 0
               42  OR     CONTAINS
               43             (cat_type,
               44              'NEAR ((?' || REPLACE (:search_string, ' ', ', ?') || '), 0, TRUE) * 10 * 10', 2) > 0
               45  OR     CONTAINS
               46             (cat_type,
               47              'NEAR ((?' || REPLACE (:search_string, ' ', ', ?') || '), 0, FALSE) * 10 * 10', 3) > 0
               48  OR     CONTAINS
               49             (cat_type,
               50              '(FUZZY ('
               51              || REPLACE (:search_string, ' ', ',1,5000,W) AND FUZZY(')
               52              || ',1,5000,W)) * 10 * 10', 4) > 0
               53  ORDER  BY eds1 DESC, score DESC, eds2 DESC
               54  /
              
                    EDS1      SCORE       EDS2     CAT_ID CAT_TYPE
              ---------- ---------- ---------- ---------- --------------------------------------------------
                     100        100        100          1 solar water heater
                      99        100         70          2 solar water heater dealers
                      98        100         57          3 solar water heater manufacturers
                      97        100         59          4 solar water heater distributors
                      96        100         75          5 solar water heater sales
                      95        100         45          6 solar water heater repairs and servicing
                       0        100         60          8 SOLAR WATER HEATER-KAMAL SOLAR
                       0        100         38          7 SOLAR WATER HEATER REPAIRS & SERVICES-TATA SOLAR
                       0        100         38          9 SOLAR WATER HEATER DEALERS-TATA BP SOLAR etc etc
              
              9 rows selected.
              
              SCOTT@orcl_11gR2> EXEC :search_string := 'auto consultant'
              
              PL/SQL procedure successfully completed.
              
              SCOTT@orcl_11gR2> /
              
                    EDS1      SCORE       EDS2     CAT_ID CAT_TYPE
              ---------- ---------- ---------- ---------- --------------------------------------------------
                     100        100        100         10 AUTO CONSULTANT
                       0         99         94         11 AUTO CONSULTANTS
                       0         99         87         12 AUTO CONSULTING
                       0         98         32         13 CONSULTANTS AUTO
                       0         97         26         14 CONSULTANT FOR AUTO FINANCE
              
              5 rows selected.
              • 4. Re: order of words, fuzzy and utl_match
                949210
                we have decided to add one more column to the table to know whether the particular category is general or specific category
                SOLAR WATER HEATER REPAIRS & SERVICES-TATA SOLAR
                SOLAR WATER HEATER-KAMAL SOLAR
                will have an "S" whereas
                SOLAR WATER HEATER DEALERS
                SOLAR WATER HEATER SALES
                SOLAR WATER HEATER MANUFACTURERS
                will have a "G"
                and
                "solar water heater" will have an "AG"
                back to the search,
                i have come up with
                select * 
                    FROM   (SELECT  
                
                            score(5), 
                            CAT_ID, CAT_TYPE 
                              FROM   category_match 
                              
                WHERE  
                
                      CONTAINS (CAT_TYPE, '(fuzzy($solr,5,5000,weight), fuzzy($woter,5,5000,weight), fuzzy($heter,5,5000,weight), fuzzy($consultants,5,5000,weight))',5)>0 
                   order by score(5) desc 
                            ) 
                   WHERE  ROWNUM<100;
                fuzzy with stemming was required for the query to show a row containing
                "SOLAR WATER HEATING SYSTEM CONSULTANT"
                problem is the following rows
                SOLAR WATER HEATER REPAIRS & SERVICES-TATA SOLAR
                SOLAR WATER HEATER-KAMAL SOLAR
                SOLAR WATER HEATER-ORB SOLAR
                SOLAR WATER HEATER DEALERS-TATA BP SOLAR
                have two "solar"s
                so the accum rates these four rows above
                "SOLAR WATER HEATING SYSTEM CONSULTANT"
                As a late revelation i have learned that we need only one categories in the result set of this query(or the query posted by you in the first reply or any query for that matter) .
                The newly added status column can be used when the end user enters something like "solar water heater" which satisfies many rows and hence displays many rows,the row with "AG" and "G" could be selected
                when the end user enters
                1)"solr woter heter consultant" can we have a single row in the result set? (Fuzzy is required.The stemming here is required because ?heter is not equal to heating).
                2)select *
                FROM (SELECT

                score(5),
                CAT_ID, CAT_TYPE
                     FROM category_match
                WHERE
                CONTAINS (CAT_TYPE, '(fuzzy($solr,5,5000,weight) and fuzzy($woter,5,5000,weight) and fuzzy($heter,5,5000,weight) and fuzzy($consultants,5,5000,weight))',5)>0
                order by score(5) desc
                     )
                WHERE ROWNUM<100
                i tried this and it returns no results
                overall it looks like when the end user enters lesser words there is an easy way to show rows with "AG" and "G" as status
                when more words are used accum can be used but since the requirement is to get less than 2 rows how to deal with this
                (or may be even that time we will end up using status column)
                3) how to handle "auto consultants" and "consultants auto" now
                Please suggest on the questions and streamlining the results.

                Edited by: 946207 on Apr 22, 2013 6:57 PM
                • 5. Re: order of words, fuzzy and utl_match
                  Barbara Boehmer
                  "Heter" is not a legitimate word, so it has no stem, so using stemming with it is useless. Fuzzy finds words with similar spelling, but "heter" and "heating" are different enough in spelling that fuzzy won't find "heating" when doing a fuzzy search for "heter". Oracle Text cannot overcome all erroneous user input. You have to expect the user to look at the results, look at what was searched for, and perhaps try another search. One solution might be to add a spell checker. I believe my last post was providing better results than your most recent change.