Oracle Text's ctxcat is a word-based index. It's designed to find words within text. You're looking for fields which begin with a particular string, which is a much simpler problem.
I presume that there's a conventional (BTREE) index on the name column?
So when you search for LIKE 'Jake%' it can look that up in the conventional index and get an instant match.
In contrast, the ctxcat index has to
- search for all the words in the token list that begin with "jake"
- fetch the posting list for each of these words
- do a bitmap-merge on all the postings
- translate the resulting DOCID list to ROWIDs
- feed those ROWIDS back the kernel which will then fetch the select list for the first 200 hits
If you have a longer text field and so a search for LIKE "%Jake%' you'll probably find it's much slower than an equivalent CTXCAT (or CONTEXT) index search.
I'm surprised that it's so fast, then. There's no way it's doing a full-table scan from disk on 140 million records in 0.05 seconds, so either
- the query is being satisfied from data in the SGA (you might expect this if you've previously run the same query), or
- "Jake" is a common name, and it doesn't have to look very far through the table to get 200 results.
What happens if you remove the "and rownum < 200" and do "select count(*)" instead of "select *"?
I just thought I would point out that, with catsearch the wildcard is the asterisk (*), not the percent sign (%) as with contains and like, so the two searches are not equivalent, even if there is only one word in the column.
I used the asterisk(*) instead of percent sign(%) , and the query took lesser response time. Can you please explain what is the difference between using these two signs?
Using the traceonly feature, I could see that the only differences between the two queries are that :
1. Number of recursive calls(12 incase of asteriks and 112 in case of percent sign)
2. Number of consistent gets (160 in case of asterisk and 413 in case of percent sign)
I have removed the rownum < 200 and ran a count(*) query.
The query using Catsearch was incredibly faster .
Normal Query took around 3 min 10s.
And the query using catsearch took colse to 5 seconds
The asterisk (*) is simply the correct syntax for a wildcard using catsearch. If you use % instead, then you will not get the same results. Please see the section of the online documentation below, that shows that the asterisk is the wildcard for catsearch.
Additionally, if you want to limit the rows, then you need to get the matching results in an inner sub-query, then use the rownum in an outer query. The way that you were doing it, it first limits the rows to the first 200, then checks which of those meet the criteria, instead of the other way around. So, the correct syntax should be the following, which should also be the most efficient.
select * from
(select * from history
where catsearch (name, 'Jake*', null) > 0)
where rownum < 200;