This discussion is archived
1 2 Previous Next 25 Replies Latest reply: Jan 14, 2013 11:16 AM by kevinUCB RSS

Like (%) search performance issues.

Rohit Jadhav Newbie
Currently Being Moderated
Hi,

I have a table

create table XFACE_IC_NO_XREF
(
COD_CUST_ID NUMBER(10),
ORIGINAL_REQ_IC_NUMBER      VARCHAR2(90),
DERIVED_IC_NO      VARCHAR2(90),
NAM_CUST_FULL      VARCHAR2(762),
FLG_IC_TYP      CHAR(3),
CODE_ORG_PARENT      VARCHAR2(36),
FLG_IC_TYP_SPONSOR      CHAR(1),
EXTERNAL_IC_SPONSOR      VARCHAR2(60),
EXTERNAL_IC_SPONSOR_DERIVED      VARCHAR2(40),
FLG_CUST_TYP           CHAR(3),
COD_ORGANISATION           VARCHAR2(45),
CUST_VERF_STATUS           VARCHAR2(1)
);
create index IN_XFACE_IC_NO_XREF_1 on XFACE_IC_NO_XREF (NAM_CUST_FULL);

above table has 10million customer data. We have a feature of Customer Search based on first character user will input example

INSERT INTO TEMP_XFACE_IC_NO_XREF(COD_CUST_ID,
ORIGINAL_REQ_IC_NUMBER,
DERIVED_IC_NO,
NAM_CUST_FULL,
FLG_IC_TYP,
CODE_ORG_PARENT,
FLG_IC_TYP_SPONSOR,
EXTERNAL_IC_SPONSOR,
FLG_CUST_TYP,
COD_ORGANISATION,
EXTERNAL_IC_SPONSOR_DERIVED,
CUST_VERF_STATUS,
REF_CUST_PHONE)
SELECT COD_CUST_ID,
ORIGINAL_REQ_IC_NUMBER,
DERIVED_IC_NO,
NAM_CUST_FULL,
FLG_IC_TYP,
CODE_ORG_PARENT,
FLG_IC_TYP_SPONSOR,
EXTERNAL_IC_SPONSOR,
FLG_CUST_TYP,
COD_ORGANISATION,
EXTERNAL_IC_SPONSOR_DERIVED,
CUST_VERF_STATUS,
REF_CUST_PHONE
FROM XFACE_IC_NO_XREF E
WHERE E.NAM_CUST_FULL LIKE '[CHINESECHARACTER]%'

Output of this query may be 1lack records or 1record. I am facing a performance issue on this area when it comes to character search.
Request your help/Support to suggest for alternative if any. I have tried working on lexers but unable to find sufficient examples or information. DB Version i am using is Oracle 11G.
  • 1. Re: Like (%) search performance issues.
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    If it is just the first character, then you might try a function-based index, like below.

    -- create function-based index:
    CREATE INDEX in_xface_ic_no_xref_1
    ON xface_ic_no_xref (SUBSTR (nam_cust_full, 1, 1));

    -- gather current statistics:
    EXEC DBMS_STATS.GATHER_TABLE_STATS (USER, 'XFACE_IC_NO_XREF')

    -- use index:
    INSERT ...
    SELECT ...
    FROM xface_ic_no_xref e
    WHERE SUBSTR (e.nam_cust_full, 1, 1) = '[CHINESECHARACTER]';
  • 2. Re: Like (%) search performance issues.
    Rohit Jadhav Newbie
    Currently Being Moderated
    Hi,

    you information is helpful, but it will limit me to 1character only. may be i have not detailed the requirement, let me put it again.

    User may send 1 or more than 1 character, based on which search will happen. e.g.

    if 1 character is passed then binding will be like nam_cust_full like '[1CHINESECHAR]%'

    if more than 1 character is passed then binding will be like nam_cust_full like '[CHINESECHAR(S)]%'

    Please let me know.
  • 3. Re: Like (%) search performance issues.
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    Try something like the following:

    begin
    ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
    ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');
    ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH', 1);
    ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
    ctx_ddl.set_attribute('mywordlist', 'wildcard_maxterms', 50000) ;
    end;
    /
    CREATE INDEX in_xface_ic_no_xref_1
    ON xface_ic_no_xref (nam_cust_full)
    INDEXTYPE IS CTXSYS.CONTEXT
    PARAMETERS
    ('WORDLIST mywordlist
    STOPLIST CTXSYS.EMPTY_STOPLIST')
    /
    INSERT ...
    SELECT ...
    FROM xface_ic_no_xref
    WHERE CONTAINS (nam_cust_full, '[CHARACTER(S)]%') > 0
    /
  • 4. Re: Like (%) search performance issues.
    Rohit Jadhav Newbie
    Currently Being Moderated
    Hi,

    Thanks for the suggestions. i worked on same but still the response time is not good query execution takes 4.5secs for the first time and subsequent executions are faster in millisecs......

    1) i created dummy table from the original table I have, And created index you suggested

    SELECT count(*)
    FROM xface_ic_no_xref_tmp WHERE CONTAINS (nam_cust_full, '[CHARACTER(S)]%') > 0 ; -- returned 100000

    2) This has normal index on nam_cust_full on original table

    SELECT count(*)
    FROM xface_ic_no_xref WHERE nam_cust_full like'[CHARACTER(S)]%') ; -- returned 99950


    Result count is different for 1 and 2 query.....

    Edited by: 969426 on Nov 6, 2012 7:49 PM
  • 5. Re: Like (%) search performance issues.
    Roger Ford Expert
    Currently Being Moderated
    To find out the differences, run
    SELECT nam_cust_full
    FROM xface_ic_no_xref_tmp WHERE CONTAINS (nam_cust_full, '[CHARACTER(S)]%') > 0
    MINUS
    SELECT name_cust_full
    FROM xface_ic_no_xref WHERE nam_cust_full like'[CHARACTER(S)]%') ;
    First queries always tend to be slow when they fetch lots of values. How fast is a second query for a DIFFERENT start character? If it's still slow, the issue is reading in the postings list ($I) table. If it's much faster, the issue is DOCID to ROWID conversion, which uses the $R table. You can make sure the $R table is preloaded to avoid this.

    I don't believe you need SUBSTRING_INDEX on, by the way. It should be sufficient to have PREFIX_INDEX on, and that should save you significant index space and index creation time.
  • 6. Re: Like (%) search performance issues.
    Rohit Jadhav Newbie
    Currently Being Moderated
    Hi,

    thanx for the advice. i will try with prefixed index first else i will check about $ tables you mentioned. and yes for every new character search it takes time.
    please let me know.

    Thanx
    @RJ.
  • 7. Re: Like (%) search performance issues.
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    Make sure you are using a bind variable for your characters to search for, to avoid unnecessary hard parsing.

    VARIABLE search_string VARCHAR2(100)
    EXEC :search_string := '[CHARACTER(S)]%'
    SELECT ...
    FROM xface_ic_no_xref
    WHERE CONTAINS (nam_cust_full, :search_string) > 0
    /

    The above query will return rows where any word in the nam_cust_full begins with the search_string value, so that may be a higher count than using LIKE. If you want just those where the first word starts with the search_string value, then you will need to include something to designate the beginning of the string in the indexed data and search_string.
  • 8. Re: Like (%) search performance issues.
    Rohit Jadhav Newbie
    Currently Being Moderated
    hi,

    thanx for the suggestion but i already have a binding variable.

    receive this in input.
    var_l_cust_name := var_l_cust_name ||'%';


    Select * from xface_ic_no_xref where nam_cust_full like var_l_cust_name;

    again i can catch your word "hard parsing" this is something which is happening to each of my first DB calls.
    example case is above select. may be that is why my first call takes time and subsequent calls are faster. but again when i change a character for search it takes time.


    As u have said about hard parsing. this is also one of the issue i am facing. every time my java application server (Webogic) is restarted, first call (via web service) to any db procedure takes time and subsequent calls are faster. as what i understand from this is first time it does physical reads and then on it does logical reads (reads from cache) so response is faster but i cannot make this understand to end user. for him, he is not bothered if he is firing the transaction for 1st time or 2nd time, its just a application for him. if any suggestions on this please let me know. i wonder why Java Application Server(Welogic) restart impacts my first calls to DB, since caching is at DB level and not at Java App Server its just a data-source which helps me connects to Database.

    Edited by: 969426 on Nov 7, 2012 7:36 AM
  • 9. Re: Like (%) search performance issues.
    Roger Ford Expert
    Currently Being Moderated
    It's probably simply that there are 100,000 results, and it has to fetch all of them in order to do the COUNT(*)

    Try this:
    SELECT * from (
       SELECT /* FIRST_ROWS(20) */ rowid
       FROM xface_ic_no_xref_tmp WHERE CONTAINS (nam_cust_full, '[CHARACTER(S)]%') > 0 
    ) WHERE ROWNUM <= 20
    I'll bet that runs a whole lot faster. Then you need to figure out whether you can do something similar in your application, or whether you really need to fetch all the results.
  • 10. Re: Like (%) search performance issues.
    Rohit Jadhav Newbie
    Currently Being Moderated
    I do have paging bt as per requirment I also have to show total record counts n then basd on input we apply paging i.e user wil ask for page number 1 show 10records. In response I wil show total records count 10000 and 10records out of it
  • 11. Re: Like (%) search performance issues.
    Roger Ford Expert
    Currently Being Moderated
    If you need the count, do the query as above then use ctx_query.count_hits to get the total count.

    http://docs.oracle.com/cd/E14072_01/text.112/e10944/cqrypkg.htm#i996910

    If you do the "top N" query first, and the count_hits second, the count_hits will be extremely fast.
  • 12. Re: Like (%) search performance issues.
    Rohit Jadhav Newbie
    Currently Being Moderated
    Hi,

    Thanx for the response, i tried with your suggestion of having count hits. it has reduced the count time significantly but still there is difference in count org table minus count on dummy.

    difference i could see is : for context index : It is also picking up records where the character occurrence is at second position.

    example : my srch String is '[CHINESECHAR]%' - this must show records which has 1st occurrence (in entire string) of given search criteria. but now it is also picking up record where this char is on second location.
    i can see purpose of having '%' in end of character is unused.
  • 13. Re: Like (%) search performance issues.
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    As previously stated, if you want to search for only the first word/token in the column, then you need to have something in both the column and the search string to designate this. In the following simplified example, I have used 'start ' to designate the beginning of the string.
    SCOTT@orcl_11gR2> CREATE TABLE xface_ic_no_xref
      2    (nam_cust_full VARCHAR2(762))
      3  /
    
    Table created.
    
    SCOTT@orcl_11gR2> INSERT ALL
      2  INTO xface_ic_no_xref (nam_cust_full) VALUES ('CHINESECHAR')
      3  INTO xface_ic_no_xref (nam_cust_full) VALUES ('OTHERCHAR CHINESECHAR')
      4  SELECT * FROM DUAL
      5  /
    
    2 rows created.
    
    SCOTT@orcl_11gR2> begin
      2    ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
      3    ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');
      4    ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH', 1);
      5    ctx_ddl.set_attribute('mywordlist', 'wildcard_maxterms', 50000) ;
      6  end;
      7  /
    
    PL/SQL procedure successfully completed.
    
    SCOTT@orcl_11gR2> CREATE OR REPLACE PROCEDURE test_proc
      2    (p_rowid IN           ROWID,
      3       p_clob     IN OUT NOCOPY CLOB)
      4  AS
      5  BEGIN
      6    SELECT 'start ' || nam_cust_full
      7    INTO   p_clob
      8    FROM   xface_ic_no_xref
      9    WHERE  ROWID = p_rowid;
     10  END test_proc;
     11  /
    
    Procedure created.
    
    SCOTT@orcl_11gR2> SHOW ERRORS
    No errors.
    SCOTT@orcl_11gR2> BEGIN
      2    CTX_DDL.CREATE_PREFERENCE ('test_ds', 'USER_DATASTORE');
      3    CTX_DDL.SET_ATTRIBUTE ('test_ds', 'PROCEDURE', 'test_proc');
      4  END;
      5  /
    
    PL/SQL procedure successfully completed.
    
    SCOTT@orcl_11gR2> CREATE INDEX in_xface_ic_no_xref_1
      2  ON xface_ic_no_xref (nam_cust_full)
      3  INDEXTYPE IS CTXSYS.CONTEXT
      4  PARAMETERS
      5    ('DATASTORE  test_ds
      6        WORDLIST   mywordlist
      7        STOPLIST   CTXSYS.EMPTY_STOPLIST')
      8  /
    
    Index created.
    
    SCOTT@orcl_11gR2> VARIABLE var_l_cust_name VARCHAR2(100)
    SCOTT@orcl_11gR2> EXEC :var_l_cust_name := 'start ' || 'CHINESECHAR' || '%'
    
    PL/SQL procedure successfully completed.
    
    SCOTT@orcl_11gR2> SELECT *
      2  FROM   (SELECT /* FIRST_ROWS(20) */ *
      3            FROM   xface_ic_no_xref
      4            WHERE  CONTAINS (nam_cust_full, :var_l_cust_name) > 0)
      5  WHERE  ROWNUM <= 20
      6  /
    
    NAM_CUST_FULL
    --------------------------------------------------------------------------------
    CHINESECHAR
    
    1 row selected.
  • 14. Re: Like (%) search performance issues.
    Rohit Jadhav Newbie
    Currently Being Moderated
    I tired this index same day you suggested the fix but some how index sync took 1full day and still it was not completed.
    Also index size is huge. request you views.
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points