You can accomplish some faceted navigation with SDATA columns/sections and the ctx_query.resut_set API (See Ch. 11 of the Application Developer's Guide at http://docs.oracle.com/cd/E16655_01/text.121/e17748/resultset.htm#CCAPP9537). The crushing limitation is that SDATA only works for single-value entities, so you can 'get your feet wet', but you won't be able to directly compare with lucene-based solutions.
The 'next iteration' is the addition of MVDATA and related functionality to support multivalue structured data. Looks like it was removed after the last 12c beta, which means either functionality, performance, or documentation wasn't quite ready. If you're really interested in testing it, I would hope that the right person at Oracle would see this and contact you; if not, a support ticket might get the ball rolling.
Thanks a lot for the pointers. It is likely to be some time before it's feasible to update to 12c in any event but good to know about the group counts available via SDATA in Oracle 11 at least. It's unfortunate the MVDATA enhancements missed 12c though. We'll have to do some R&D on our side to see if we can live with the SDATA single-value limitation or whether there's a feasible workaround.
The other area we have some concerns is with respect to scalability. To scale search (using Oracle Text) we have to scale the whole database. We're not large enough to be using RAC at this point and it's unlikely to be economically feasible for us so this is something we'll be looking at closely as we evaluate Oracle Text against the Lucene-based competitors which can be scaled much more cheaply.
You should definitely talk to the product team.
On the potentially good side, several of the 12c text features are planned to be backported to 22.214.171.124; I don't know whether faceted navigation is part of that plan, but it wouldn't surprise me.
As for scalability, I don't think there's a situation in which Oracle Text is going to beat Lucene, but a well-engineered text configuration on modern hardware can meet what I would call 'standard' user needs. If you are based on Oracle for your overall solution, then it's worth discussing the basic parameters of what a successful text application would need. RAM and disk speed are key factors -- RAM perhaps more so than anything else. Take a look at the Oracle Database Appliance as a 'reference implementation' of the type of hardware you need to consider. RAC wasn't a real useful option for us -- our problems were users making 'deep' queries rather than lots of concurrency.
Thanks for your input.
As far as RAC not helping I guess that implies that Oracle's search engine indexes can't be sharded allowing the load for a single query to be split across machines? ElasticSearch definitely allows for this though I haven't tested to see how much this improves performance in practice. I assume in general concurrent queries are more readily scaled in the case of all products.
We will reach out to our Oracle contacts as we dig into this more.
Oracle and lucene offer fundamentally different approaches.
You wouldn't typically do sharding of the index in an Oracle solution; Oracle offers other techniques to satisfy performance requirements.
RAC is a potential part of a solution, but I've seen effective implementations that do not require it.
How many documents will you index, and how fast is an 'acceptable' response time?