9 Replies Latest reply: Jun 6, 2012 12:04 PM by Barbara Boehmer RSS

    Index on varchar2 column and like clause

    Hokins
      Hi, i have a big DB with 6 M records. There is a column with data-type varchar2(50) having normal index and contains short alphanumeric strings. e.g. "4CJHZ", "J238H".
      When i use column like '%J%' then index didn't work and goes for full table scan.
      I don't have any other option except to use like clause with % sigh at both ends as mentioned above. So please help me to find solution How to use index on varchar2 and make search fast.
        • 1. Re: Index on varchar2 column and like clause
          Barbara Boehmer
          You can use a context index. If you use create a wordlist with prefix_index and substring_index it can make your wildcard searches faster. With a large table and short strings, you may encounter problems with exceeding the wildcard_maxterms, so you probably want to set that to the maximum. Please see the demonstration below. You will also need to set up synchronization and optimization.
          SCOTT@orcl_11gR2> create table big_tab
            2    (a_column  varchar2(50))
            3  /
          
          Table created.
          
          SCOTT@orcl_11gR2> insert all
            2  into big_tab values ('4CJHZ')
            3  into big_tab values ('J238H')
            4  select * from dual
            5  /
          
          2 rows created.
          
          SCOTT@orcl_11gR2> insert into big_tab (a_column)
            2  select object_name
            3  from   all_objects
            4  where  upper (object_name) not like '%J%'
            5  /
          
          63941 rows created.
          
          SCOTT@orcl_11gR2> begin
            2    ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
            3    ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');
            4    ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH', '3');
            5    ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', '4');
            6    ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
            7    ctx_ddl.set_attribute('mywordlist','WILDCARD_MAXTERMS', 50000);
            8  end;
            9  /
          
          PL/SQL procedure successfully completed.
          
          SCOTT@orcl_11gR2> create index test_idx
            2  on big_tab (a_column)
            3  indextype is ctxsys.context
            4  parameters ('wordlist mywordlist')
            5  /
          
          Index created.
          
          SCOTT@orcl_11gR2> set autotrace on explain
          SCOTT@orcl_11gR2> select a_column
            2  from   big_tab
            3  where  contains (a_column, '%J%') > 0
            4  /
          
          A_COLUMN
          --------------------------------------------------
          4CJHZ
          J238H
          
          2 rows selected.
          
          
          Execution Plan
          ----------------------------------------------------------
          Plan hash value: 2670580718
          
          ----------------------------------------------------------------------------------------
          | Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
          ----------------------------------------------------------------------------------------
          |   0 | SELECT STATEMENT            |          |    23 |   897 |     9   (0)| 00:00:01 |
          |   1 |  TABLE ACCESS BY INDEX ROWID| BIG_TAB  |    23 |   897 |     9   (0)| 00:00:01 |
          |*  2 |   DOMAIN INDEX              | TEST_IDX |       |       |     4   (0)| 00:00:01 |
          ----------------------------------------------------------------------------------------
          
          Predicate Information (identified by operation id):
          ---------------------------------------------------
          
             2 - access("CTXSYS"."CONTAINS"("A_COLUMN",'%J%')>0)
          
          Note
          -----
             - dynamic sampling used for this statement (level=2)
          
          SCOTT@orcl_11gR2> 
          • 2. Re: Index on varchar2 column and like clause
            Hokins
            Hi, I have tried the solution but the cost increased 8 times more than "Table Access Full"..So the purpose of using index is to minimize the cost ..Some other solution to problem..?
            • 3. Re: Index on varchar2 column and like clause
              Roger Ford-Oracle
              Are you saying that the query took 8 times longer over the full 6 million records, or are you looking at a theoretical optimizer cost?

              Make sure you're not comparing an in-memory full table scan against a fetch of the index from disk. Flush your buffer cache (and preferably the OS file system cache) before doing the comparison.

              Are your strings a fixed length of 5 alphanumeric characters, or do they vary in length? Are there many such strings in each varchar2(50) field, or does each field only contain a single string?
              • 4. Re: Index on varchar2 column and like clause
                Hokins
                I am comparing the cost as shown in Query plan in EM. Well my strings are 5-7 alphanumeric characters not more than 7 and not less than 5..Fixed..There is one string in each row...
                • 5. Re: Index on varchar2 column and like clause
                  Roger Ford-Oracle
                  The optimizer costs don't always make sense with text indexes. Why don't you try it and see?

                  This isn't really what Oracle Text is designed for - it's supposed to find words in large(ish) blocks of text. However, it may be significantly faster than a full table scan. Or it may not.

                  One thing you could consider if a CONTEXT index doesn't work for you is to store all the substrings. If your strings are never more than 7 chars long, you could take the string ABC1234 and store it in seven rows as

                  ABC1234
                  BC1234
                  C1234
                  1234
                  234
                  34
                  4

                  Now if you create a conventional index on that, you will be able to search for any substring. For example if you wanted to find C12 you would search for LIKE 'C12%' and this time it would be able to use the index, since there is no leading wildcard involved (note you must have NO NULLS in the column for this to work).

                  A hybrid approach might be to do the same as above, but store all the words space-separated in a single row:
                  ABC1234 BC1234 C1234 1234 234 34 4
                  and then index that with Oracle Text. This would avoid the need to use the SUBSTRING_INDEX option, which might be less costly (although in fact it's quite similar to what SUBSTRING_INDEX is doing for you "under the covers").
                  • 6. Re: Index on varchar2 column and like clause
                    Hokins
                    Each string in a row is a unique name for each record. So it is not possible for me to store a string in seven rows. Secondly min length of string is 5.
                    • 7. Re: Index on varchar2 column and like clause
                      Hokins
                      I also tried the above method of using content search but i am getting an error msg. DRG-51030: wildcard query expansion resulted in too many terms. i set the WILDCARD_MAXTERMS to 50,000.
                      • 8. Re: Index on varchar2 column and like clause
                        Barbara Boehmer
                        The following is a modification of Roger's second suggestion, that uses a user_datastore to store the strings separated by spaces, eliminating the need for a substring index or wildcard prefix. You may still have a problem with exceeding the wildcard_maxterms limit.
                        SCOTT@orcl_11gR2> create table big_tab
                          2    (a_column  varchar2(50))
                          3  /
                        
                        Table created.
                        
                        SCOTT@orcl_11gR2> insert all
                          2  into big_tab values ('4CJHZ')
                          3  into big_tab values ('J238H')
                          4  select * from dual
                          5  /
                        
                        2 rows created.
                        
                        SCOTT@orcl_11gR2> insert into big_tab (a_column)
                          2  select object_name
                          3  from   user_objects
                          4  where  upper (object_name) not like '%J%'
                          5  /
                        
                        394 rows created.
                        
                        SCOTT@orcl_11gR2> create or replace procedure strings
                          2    (p_rowid in           rowid,
                          3       p_clob     in out nocopy clob)
                          4  as
                          5  begin
                          6    for r in
                          7        (select a_column
                          8         from      big_tab
                          9         where  rowid = p_rowid)
                         10    loop
                         11        for i in 1 .. length (r.a_column) loop
                         12          p_clob := p_clob || substr (r.a_column, i) || ' ';
                         13        end loop;
                         14    end loop;
                         15    p_clob := rtrim (p_clob);
                         16  end strings;
                         17  /
                        
                        Procedure created.
                        
                        SCOTT@orcl_11gR2> show errors
                        No errors.
                        SCOTT@orcl_11gR2> begin
                          2    ctx_ddl.create_preference ('my_ds', 'user_datastore');
                          3    ctx_ddl.set_attribute ('my_ds', 'procedure', 'strings');
                          4  end;
                          5  /
                        
                        PL/SQL procedure successfully completed.
                        
                        SCOTT@orcl_11gR2> create index test_idx
                          2  on big_tab (a_column)
                          3  indextype is ctxsys.context
                          4  parameters ('datastore my_ds')
                          5  /
                        
                        Index created.
                        
                        SCOTT@orcl_11gR2> set autotrace on explain
                        SCOTT@orcl_11gR2> select a_column
                          2  from   big_tab
                          3  where  contains (a_column, 'J%') > 0
                          4  /
                        
                        A_COLUMN
                        --------------------------------------------------
                        4CJHZ
                        J238H
                        
                        2 rows selected.
                        
                        
                        Execution Plan
                        ----------------------------------------------------------
                        Plan hash value: 2670580718
                        
                        ----------------------------------------------------------------------------------------
                        | Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
                        ----------------------------------------------------------------------------------------
                        |   0 | SELECT STATEMENT            |          |     1 |    39 |     4   (0)| 00:00:01 |
                        |   1 |  TABLE ACCESS BY INDEX ROWID| BIG_TAB  |     1 |    39 |     4   (0)| 00:00:01 |
                        |*  2 |   DOMAIN INDEX              | TEST_IDX |       |       |     4   (0)| 00:00:01 |
                        ----------------------------------------------------------------------------------------
                        
                        Predicate Information (identified by operation id):
                        ---------------------------------------------------
                        
                           2 - access("CTXSYS"."CONTAINS"("A_COLUMN",'J%')>0)
                        
                        Note
                        -----
                           - dynamic sampling used for this statement (level=2)
                        
                        SCOTT@orcl_11gR2>
                        • 9. Re: Index on varchar2 column and like clause
                          Barbara Boehmer
                          The following stores and indexes the strings, as I believe Roger was describing in his first suggestion. It eliminates the need for a starting wildcard and uses a regular index. This is just a basic example. You could join the result to the original table if you like. You could use various refresh options.
                          SCOTT@orcl_11gR2> create table big_tab
                            2    (a_column  varchar2(50))
                            3  /
                          
                          Table created.
                          
                          SCOTT@orcl_11gR2> insert all
                            2  into big_tab values ('4CJHZ')
                            3  into big_tab values ('J238H')
                            4  select * from dual
                            5  /
                          
                          2 rows created.
                          
                          SCOTT@orcl_11gR2> create materialized view big_tab_strings
                            2  as
                            3  select a_column,
                            4           substr (a_column, column_value) string
                            5  from   big_tab,
                            6           table
                            7             (cast
                            8             (multiset
                            9                (select level
                           10                 from   dual
                           11                 connect by level <= length (a_column))
                           12              as sys.odcivarchar2list))
                           13  /
                          
                          Materialized view created.
                          
                          SCOTT@orcl_11gR2> column a_column format a15
                          SCOTT@orcl_11gR2> column string   format a15
                          SCOTT@orcl_11gR2> select * from big_tab_strings
                            2  /
                          
                          A_COLUMN        STRING
                          --------------- ---------------
                          4CJHZ           4CJHZ
                          4CJHZ           CJHZ
                          4CJHZ           JHZ
                          4CJHZ           HZ
                          4CJHZ           Z
                          J238H           J238H
                          J238H           238H
                          J238H           38H
                          J238H           8H
                          J238H           H
                          
                          10 rows selected.
                          
                          SCOTT@orcl_11gR2> create index big_tab_strings_idx
                            2  on big_tab_strings (string)
                            3  /
                          
                          Index created.
                          
                          SCOTT@orcl_11gR2> set autotrace on explain
                          SCOTT@orcl_11gR2> select *
                            2  from   big_tab_strings
                            3  where  string like 'J%'
                            4  /
                          
                          A_COLUMN        STRING
                          --------------- ---------------
                          J238H           J238H
                          4CJHZ           JHZ
                          
                          2 rows selected.
                          
                          
                          Execution Plan
                          ----------------------------------------------------------
                          Plan hash value: 1291819909
                          
                          ------------------------------------------------------------------------------------------------------
                          | Id  | Operation                      | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
                          ------------------------------------------------------------------------------------------------------
                          |   0 | SELECT STATEMENT               |                     |     2 |   258 |     2   (0)| 00:00:01 |
                          |   1 |  MAT_VIEW ACCESS BY INDEX ROWID| BIG_TAB_STRINGS     |     2 |   258 |     2   (0)| 00:00:01 |
                          |*  2 |   INDEX RANGE SCAN             | BIG_TAB_STRINGS_IDX |     2 |       |     1   (0)| 00:00:01 |
                          ------------------------------------------------------------------------------------------------------
                          
                          Predicate Information (identified by operation id):
                          ---------------------------------------------------
                          
                             2 - access("STRING" LIKE 'J%')
                                 filter("STRING" LIKE 'J%')
                          
                          Note
                          -----
                             - dynamic sampling used for this statement (level=2)
                          
                          SCOTT@orcl_11gR2>