This discussion is archived
7 Replies Latest reply: Dec 17, 2012 12:39 AM by Barbara Boehmer RSS

CatSearch taking more time than full table scan

979937 Newbie
Currently Being Moderated
Hi
I have a table which has close to 140 million records. I had been exploring the option of using oracle text for search. So , I created an index(ctxcat) on the column Name by the following query.

begin
     ctx_ddl.create_preference('FT_WL', 'BASIC_WORDLIST');
     ctx_ddl.set_attribute ('FT_WL', 'prefix-index','TRUE');
end;
/


create index history_namex on history(name) indextype is ctxsys.ctxcat parameters ('WORDLIST FT_WL');

But when I executed the following query , I have found out that catsearch is taking more time. The queries and thier stats are :-

1. select * from history where catsearch(name, 'Jake%', null) > 0 and rownum < 200;

Elapsed : 00 : 00 : 00.13

Statistics :
112 recursive calls
0 db block gets
413 consistent gets
28 physical reads
0 redo size
33168 bytes sent via SQL*Net to client
663 bytes receuved via SQL*Net from client
15 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
199 rows processed

2. select * from history where name like 'Jake%' and rownum < 200;

Elapsed : 00 : 00 : 00.05

Statistics :
1 recursive calls
0 db block gets
220 consistent gets
383 physical reads
0 redo size
26148 bytes sent via SQL*Net to client
663 bytes receuved via SQL*Net from client
15 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
199 rows processed




Can anyone explain why this is happening?


PS : there is no conventional index on the name column.

Edited by: 976934 on Dec 14, 2012 3:32 AM
  • 1. Re: CatSearch taking more time than full table scan
    Roger Ford Expert
    Currently Being Moderated
    Oracle Text's ctxcat is a word-based index. It's designed to find words within text. You're looking for fields which begin with a particular string, which is a much simpler problem.

    I presume that there's a conventional (BTREE) index on the name column?

    So when you search for LIKE 'Jake%' it can look that up in the conventional index and get an instant match.

    In contrast, the ctxcat index has to
    - search for all the words in the token list that begin with "jake"
    - fetch the posting list for each of these words
    - do a bitmap-merge on all the postings
    - translate the resulting DOCID list to ROWIDs
    - feed those ROWIDS back the kernel which will then fetch the select list for the first 200 hits

    If you have a longer text field and so a search for LIKE "%Jake%' you'll probably find it's much slower than an equivalent CTXCAT (or CONTEXT) index search.
  • 2. Re: CatSearch taking more time than full table scan
    979937 Newbie
    Currently Being Moderated
    @Roger
    Thanks for the instant reply.

    But there is no index on the name column.

    That's what made me wonder why catsearch was taking so much time.
  • 3. Re: CatSearch taking more time than full table scan
    Roger Ford Expert
    Currently Being Moderated
    I'm surprised that it's so fast, then. There's no way it's doing a full-table scan from disk on 140 million records in 0.05 seconds, so either
    - the query is being satisfied from data in the SGA (you might expect this if you've previously run the same query), or
    - "Jake" is a common name, and it doesn't have to look very far through the table to get 200 results.

    What happens if you remove the "and rownum < 200" and do "select count(*)" instead of "select *"?
  • 4. Re: CatSearch taking more time than full table scan
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    I just thought I would point out that, with catsearch the wildcard is the asterisk (*), not the percent sign (%) as with contains and like, so the two searches are not equivalent, even if there is only one word in the column.
  • 5. Re: CatSearch taking more time than full table scan
    979937 Newbie
    Currently Being Moderated
    @Barbara
    I used the asterisk(*) instead of percent sign(%) , and the query took lesser response time. Can you please explain what is the difference between using these two signs?

    Using the traceonly feature, I could see that the only differences between the two queries are that :
    1. Number of recursive calls(12 incase of asteriks and 112 in case of percent sign)
    2. Number of consistent gets (160 in case of asterisk and 413 in case of percent sign)

    Edited by: 976934 on Dec 16, 2012 8:23 PM
  • 6. Re: CatSearch taking more time than full table scan
    979937 Newbie
    Currently Being Moderated
    @Roger
    I have removed the rownum < 200 and ran a count(*) query.
    The query using Catsearch was incredibly faster .
    Normal Query took around 3 min 10s.
    And the query using catsearch took colse to 5 seconds
  • 7. Re: CatSearch taking more time than full table scan
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    The asterisk (*) is simply the correct syntax for a wildcard using catsearch. If you use % instead, then you will not get the same results. Please see the section of the online documentation below, that shows that the asterisk is the wildcard for catsearch.

    http://docs.oracle.com/cd/E11882_01/text.112/e24436/csql.htm#CHDJBGHE

    Additionally, if you want to limit the rows, then you need to get the matching results in an inner sub-query, then use the rownum in an outer query. The way that you were doing it, it first limits the rows to the first 200, then checks which of those meet the criteria, instead of the other way around. So, the correct syntax should be the following, which should also be the most efficient.
    select * from   
           (select * from history 
            where  catsearch (name, 'Jake*', null) > 0)
    where  rownum < 200;

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points