7 Replies Latest reply: Dec 17, 2012 2:39 AM by Barbara Boehmer RSS

    CatSearch taking more time than full table scan

    979937
      Hi
      I have a table which has close to 140 million records. I had been exploring the option of using oracle text for search. So , I created an index(ctxcat) on the column Name by the following query.

      begin
           ctx_ddl.create_preference('FT_WL', 'BASIC_WORDLIST');
           ctx_ddl.set_attribute ('FT_WL', 'prefix-index','TRUE');
      end;
      /


      create index history_namex on history(name) indextype is ctxsys.ctxcat parameters ('WORDLIST FT_WL');

      But when I executed the following query , I have found out that catsearch is taking more time. The queries and thier stats are :-

      1. select * from history where catsearch(name, 'Jake%', null) > 0 and rownum < 200;

      Elapsed : 00 : 00 : 00.13

      Statistics :
      112 recursive calls
      0 db block gets
      413 consistent gets
      28 physical reads
      0 redo size
      33168 bytes sent via SQL*Net to client
      663 bytes receuved via SQL*Net from client
      15 SQL*Net roundtrips to/from client
      0 sorts (memory)
      0 sorts (disk)
      199 rows processed

      2. select * from history where name like 'Jake%' and rownum < 200;

      Elapsed : 00 : 00 : 00.05

      Statistics :
      1 recursive calls
      0 db block gets
      220 consistent gets
      383 physical reads
      0 redo size
      26148 bytes sent via SQL*Net to client
      663 bytes receuved via SQL*Net from client
      15 SQL*Net roundtrips to/from client
      0 sorts (memory)
      0 sorts (disk)
      199 rows processed




      Can anyone explain why this is happening?


      PS : there is no conventional index on the name column.

      Edited by: 976934 on Dec 14, 2012 3:32 AM
        • 1. Re: CatSearch taking more time than full table scan
          Roger Ford-Oracle
          Oracle Text's ctxcat is a word-based index. It's designed to find words within text. You're looking for fields which begin with a particular string, which is a much simpler problem.

          I presume that there's a conventional (BTREE) index on the name column?

          So when you search for LIKE 'Jake%' it can look that up in the conventional index and get an instant match.

          In contrast, the ctxcat index has to
          - search for all the words in the token list that begin with "jake"
          - fetch the posting list for each of these words
          - do a bitmap-merge on all the postings
          - translate the resulting DOCID list to ROWIDs
          - feed those ROWIDS back the kernel which will then fetch the select list for the first 200 hits

          If you have a longer text field and so a search for LIKE "%Jake%' you'll probably find it's much slower than an equivalent CTXCAT (or CONTEXT) index search.
          • 2. Re: CatSearch taking more time than full table scan
            979937
            @Roger
            Thanks for the instant reply.

            But there is no index on the name column.

            That's what made me wonder why catsearch was taking so much time.
            • 3. Re: CatSearch taking more time than full table scan
              Roger Ford-Oracle
              I'm surprised that it's so fast, then. There's no way it's doing a full-table scan from disk on 140 million records in 0.05 seconds, so either
              - the query is being satisfied from data in the SGA (you might expect this if you've previously run the same query), or
              - "Jake" is a common name, and it doesn't have to look very far through the table to get 200 results.

              What happens if you remove the "and rownum < 200" and do "select count(*)" instead of "select *"?
              • 4. Re: CatSearch taking more time than full table scan
                Barbara Boehmer
                I just thought I would point out that, with catsearch the wildcard is the asterisk (*), not the percent sign (%) as with contains and like, so the two searches are not equivalent, even if there is only one word in the column.
                • 5. Re: CatSearch taking more time than full table scan
                  979937
                  @Barbara
                  I used the asterisk(*) instead of percent sign(%) , and the query took lesser response time. Can you please explain what is the difference between using these two signs?

                  Using the traceonly feature, I could see that the only differences between the two queries are that :
                  1. Number of recursive calls(12 incase of asteriks and 112 in case of percent sign)
                  2. Number of consistent gets (160 in case of asterisk and 413 in case of percent sign)

                  Edited by: 976934 on Dec 16, 2012 8:23 PM
                  • 6. Re: CatSearch taking more time than full table scan
                    979937
                    @Roger
                    I have removed the rownum < 200 and ran a count(*) query.
                    The query using Catsearch was incredibly faster .
                    Normal Query took around 3 min 10s.
                    And the query using catsearch took colse to 5 seconds
                    • 7. Re: CatSearch taking more time than full table scan
                      Barbara Boehmer
                      The asterisk (*) is simply the correct syntax for a wildcard using catsearch. If you use % instead, then you will not get the same results. Please see the section of the online documentation below, that shows that the asterisk is the wildcard for catsearch.

                      http://docs.oracle.com/cd/E11882_01/text.112/e24436/csql.htm#CHDJBGHE

                      Additionally, if you want to limit the rows, then you need to get the matching results in an inner sub-query, then use the rownum in an outer query. The way that you were doing it, it first limits the rows to the first 200, then checks which of those meet the criteria, instead of the other way around. So, the correct syntax should be the following, which should also be the most efficient.
                      select * from   
                             (select * from history 
                              where  catsearch (name, 'Jake*', null) > 0)
                      where  rownum < 200;