This discussion is archived
4 Replies Latest reply: Jun 26, 2012 1:18 AM by user9080289 RSS

Wild search in name column

user9080289 Newbie
Currently Being Moderated
hi,

iam using oracle 11.2

Iam having a query which searches a particular part of word from name
for example

CREATE TABLE TEST123
(type1 varchar2(10),
LIST1 VARCHAR2(10),
NAME VARCHAR2(302),
NAME2 VARCHAR2(500),
ANAG_NAME VARCHAR2(2000))



SELECT *
FROM test123
WHERE TYPE1 ='s' AND
(LIST1 = 'CL'
AND (NAME2 = 'AIILN' AND
(instr(NAME, 'IAN') > 0 AND instr(NAME, 'LI') > 0))
OR LIST1 = 'TT'
AND (NAME2 = 'AIILN' AND
(instr(ANAG_NAME, 'AN') > 0 AND instr(ANAG_NAME, 'L') > 0))
)

the table may contain 2 million records.

i want to replace this with ctxsys index.
since search is has wildcard search
if use create ctxsys.context index i have to use CONTAINS clause to search record i have to use %
and if i create ctxsys.ctxcat index I have to use CATSEARCH to search record i have use '*'


but both takes more time..
or some error happens.

Can any one give some suggestion..

Thanks
user

Edited by: user9080289 on Jun 22, 2012 12:10 AM
  • 1. Re: Wild search in name column
    a.aurel Explorer
    Currently Being Moderated
    first try to use REGEXP_INSTR instead of INSTR, you'll see the diference
  • 2. Re: Wild search in name column
    user9080289 Newbie
    Currently Being Moderated
    There is no difference in replacing instr to regexr_instr
  • 3. Re: Wild search in name column
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    In general, you can create a CONTEXT index on a column, then use CONTAINS instead of INSTR. You can also create a MULTI_COLUMN_DATASTORE on multiple columns and use some kind of SECTION GROUP, so that you can use one CONTAINS clause and one index hit to search in multiple columns. You can also use FILTER BY in your index creation, so that you can include structured columns in your index, so that you can use SDATA in your CONTAINS clause, so that your entire query uses one index hit. You can also use SUBSTRING_INDEX to increase the search speed when using wildcards. You may also need to increase the WILDCARD_MAXTERMS. If you are searching for a string that is separated by spaces, punctuation, or some other delimiter, then you do not need the wildcards. However, if you are searching for a string within a word, then you do need the wildcards. If you are searching for short strings, then you need to make sure you use a stoplist that does not include them. If your strings are too short, then you are likely to exceed the WILDCARD_MAXTERMS, unless you set it to 0, so that there is no limit, but then such queries may be very slow and consume a lot of resources. Please see the demonstration below. I have added a column to create the index on, but you could create the index on any text column. The columns that are searched are determined by the columns in the datastore and filtering, not the column the index in created on.

    -- test environment consisting of table, data, and query similar to original:
    SCOTT@orcl_11gR2> -- table:
    SCOTT@orcl_11gR2> CREATE TABLE TEST123
      2    (type1        varchar2(10),
      3       LIST1        VARCHAR2(10),
      4       NAME        VARCHAR2(302),
      5       NAME2        VARCHAR2(500),
      6       ANAG_NAME  VARCHAR2(2000),
      7       names        VARCHAR2(1))
      8  /
    
    Table created.
    
    SCOTT@orcl_11gR2> -- data:
    SCOTT@orcl_11gR2> INSERT ALL
      2  INTO test123 VALUES ('s', 'CL', 'IAN LI', 'AIILN', NULL, NULL)
      3  INTO test123 VALUES ('s', 'TT', NULL, 'AIILN', 'AN L', NULL)
      4  INTO test123 VALUES ('s', 'CL', 'IANLI', 'AIILN', NULL, NULL)
      5  INTO test123 VALUES ('s', 'TT', NULL, 'AIILN', 'ANL', NULL)
      6  INTO test123 VALUES ('s', 'CL', 'LIIAN', 'AIILN', NULL, NULL)
      7  INTO test123 VALUES ('s', 'TT', NULL, 'AIILN', 'LAN', NULL)
      8  SELECT * FROM DUAL
      9  /
    
    6 rows created.
    
    SCOTT@orcl_11gR2> -- query similar to original:
    SCOTT@orcl_11gR2> COLUMN name        FORMAT A10
    SCOTT@orcl_11gR2> COLUMN name2        FORMAT A10
    SCOTT@orcl_11gR2> COLUMN anag_name FORMAT A10
    SCOTT@orcl_11gR2> SELECT *
      2  FROM   test123
      3  WHERE  type1 ='s'
      4  AND    ((list1 = 'CL'
      5             AND name2 = 'AIILN'
      6             AND INSTR (name, 'IAN') > 0
      7             AND INSTR (name, 'LI') > 0)
      8            OR
      9            (list1 = 'TT'
     10             AND name2 = 'AIILN'
     11             AND INSTR (anag_name, 'AN') > 0
     12             AND INSTR (anag_name, 'L') > 0))
     13  /
    
    TYPE1      LIST1      NAME       NAME2      ANAG_NAME  N
    ---------- ---------- ---------- ---------- ---------- -
    s          CL         IAN LI     AIILN
    s          TT                    AIILN      AN L
    s          CL         IANLI      AIILN
    s          TT                    AIILN      ANL
    s          CL         LIIAN      AIILN
    s          TT                    AIILN      LAN
    
    6 rows selected.
    -- multi_column_datastore, wordlist, and text index:
    SCOTT@orcl_11gR2> -- multi_coumn_datastore and wordlist:
    SCOTT@orcl_11gR2> BEGIN
      2    -- multi_column_datastore:
      3    CTX_DDL.CREATE_PREFERENCE ('test123_mds', 'MULTI_COLUMN_DATASTORE');
      4    CTX_DDL.SET_ATTRIBUTE ('test123_mds', 'COLUMNS', 'name, name2, anag_name');
      5    -- worlist:
      6    CTX_DDL.CREATE_PREFERENCE ('test123_wl', 'BASIC_WORDLIST');
      7    CTX_DDL.SET_ATTRIBUTE ('test123_wl', 'SUBSTRING_INDEX', 'TRUE');
      8    CTX_DDL.SET_ATTRIBUTE ('test123_wl', 'WILDCARD_MAXTERMS', 50000);
      9  END;
     10  /
    
    PL/SQL procedure successfully completed.
    
    SCOTT@orcl_11gR2> -- text index using FILTER BY, multi_column_datastore, section group, wordlist:
    SCOTT@orcl_11gR2> CREATE INDEX test123_idx
      2  ON test123 (names)
      3  INDEXTYPE IS CTXSYS.CONTEXT
      4  FILTER BY type1, list1
      5  PARAMETERS
      6    ('DATASTORE     test123_mds
      7        SECTION GROUP     CTXSYS.AUTO_SECTION_GROUP
      8        WORDLIST     test123_wl
      9        STOPLIST     CTXSYS.EMPTY_STOPLIST')
     10  /
    
    Index created.
    -- query using CONTAINS and SDATA with one index hit:
    SCOTT@orcl_11gR2> SET AUTOTRACE ON EXPLAIN
    SCOTT@orcl_11gR2> SELECT *
      2  FROM   test123
      3  WHERE  CONTAINS
      4             (names,
      5              'SDATA (type1 = ''s'')
      6            AND AIILN WITHIN name2
      7            AND ((SDATA (list1 = ''CL'')
      8                  AND (%IAN% AND %LI% WITHIN name))
      9                 OR
     10                 (SDATA (list1 = ''TT'')
     11                  AND (%AN% AND %L% WITHIN anag_name)))') > 0
     12  /
    
    TYPE1      LIST1      NAME       NAME2      ANAG_NAME  N
    ---------- ---------- ---------- ---------- ---------- -
    s          CL         IAN LI     AIILN
    s          TT                    AIILN      AN L
    s          CL         IANLI      AIILN
    s          TT                    AIILN      ANL
    s          CL         LIIAN      AIILN
    s          TT                    AIILN      LAN
    
    6 rows selected.
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3044757981
    
    -------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |             |     1 |  1435 |     4   (0)| 00:00:01 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| TEST123     |     1 |  1435 |     4   (0)| 00:00:01 |
    |*  2 |   DOMAIN INDEX              | TEST123_IDX |       |       |     4   (0)| 00:00:01 |
    -------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("CTXSYS"."CONTAINS"("NAMES",'SDATA (type1 = ''s'')            AND
                  AIILN WITHIN name2            AND ((SDATA (list1 = ''CL'')                  AND
                  (%IAN% AND %LI% WITHIN name))                 OR                 (SDATA (list1 =
                  ''TT'')                  AND (%AN% AND %L% WITHIN anag_name)))')>0)
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    SCOTT@orcl_11gR2>
  • 4. Re: Wild search in name column
    user9080289 Newbie
    Currently Being Moderated
    Thanks a lot for your detailed explanation.
    I will implement in my project and inform you the status.

    Regards
    User

    Edited by: user9080289 on Jun 26, 2012 1:18 AM

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points