    Oracle Text vs Apache Solr

      We are currently using Oracle Text to search XML documents we have stored in a CLOB field, and feel like we have reached the limits of what the technology is capable of and are seriously considering completely abandoning it and using Solr instead.

      Our findings have been very similar to what another group has found (http://blog.digicol.de/2009/01/13/moving-from-oracle-text-to-solrlucene/), highlights from this other group that we also ran into are close to their findings:

      •Unstable query performance: In some installations, a few simple search terms would take up to fifty times longer than others, which caused us a lot of headaches and loss of reputation. That was an extreme case, but in general the performance of Oracle Text varied way too much. (And we invested a lot of time in finding ways to optimize it.)
      •Oracle Text is hard to scale: Since it’s integrated into the database, you have to scale that as a whole (via RAC), which is expensive and inflexible. There’s no way to split a large fulltext index across lots of servers.
      •It is missing support for total document count and faceted search; if you’d like to fetch the first ten matching documents only, you have to run a second search to get the total count.
      •Oracle support often wasn’t very helpful; Text doesn’t seem to be a high priority product for them and it’s hard to find someone at Oracle who knows Text very well.
      •While the database integration is a nice feature on one hand, on the other hand it makes it hard to customize what’s going into the fulltext index, and the fulltext index synchronization can slow down batch jobs.
      •Today, every customer wants a Google-like query syntax, which isn’t provided by Oracle.

      The big show stopper for us the absense of the ability to provide faceted search capabilities from result sets provided when searching with Oracle Text? Am I missing something, is there an easy way to do this? Is there a compelling reason to soldier on with Oracle Text?
          Faceted search is coming soon. Most of the code is in place but not enabled in 11.2 - it's going through full QA for general release in a future version.

          You say "if you’d like to fetch the first ten matching documents only, you have to run a second search to get the total count" - that's true if you have a mixed (text plus non-text) query.

          if it's a simple text query, you can call ctx_query.count_hits immediately AFTER fetching the first ten hits, and the count will be returned effectively for free.
            Thanks Roger this is good to know. Is there a road map available for upcoming features available or planned?

            Please keep in mind I am not bashing Oracle Text, in fact I would much prefer to stay on it, as we have professional support and it is integrated into the database. However, it just feels like there is a lot more momentum, community support and features available with Solr, one example being easy hooks to provide REST web services right out of the box, and the recent addition of geospatial support. My honest question is, what does Oracle Text do better than Solr?

            For our decision the point is fast being approached where it is not sufficient just for Oracle Text to match these features, it seems they are already playing catch up -- they must go above and beyond to justify the continued use and cost of the service. Am I being unfair in this assessment? Picking on faceted search again, it is not good enough to say, this will be available sometime in the future -- exact release dates, features, product roadmaps, and docs are needed to ensure developers can be productive and hit the ground running.
              Any other thoughts on the advantages of Oracle Text vs Solr? This thread has been up for almost a week, is there a board that has higher traffic or a more appropriate place to pose this question? Call our Oracle rep, I guess?

              I did find a document on Oracle's site that compares an Oracle embedded version of Lucene with Solr and some other solutions which was interesting, but Oracle Text is not included.


              Another more generic article of interest was:
                No one else from Oracle has anything to say in support of their product vs competitors? Is there another forum I should be looking at?

                We have begun POC in Solr and importing data from Oracle using DataImportHandler.
                  I believe Roger Ford is the product manager for Oracle Text, so he would be the best authority on the subject. About all the rest of us on this or any other forum could tell you is that we use it and like it. We just haven't said anything because you already got the best information from Roger.
                    That's correct. And I really prefer to confine my comments to technical questions here - I don't think of this forum as somewhere to discuss the merits of one product versus another, especially as I know very little about Solr. If you'd like to discuss this offline you should have little difficulty finding or guessing my email address - and I'd be interested in any feedback you have from your Solr POC.
                      Text Search is working fine for 11G R2. I have 200TB database (40TB in 5 databases). Almost 500+ million records coming in every 24 Hours with LOB's.
                      We are using 11g R2 compression, de-duplication on LOBS. All these are working fine for us. We can retrieve almost 25000 records using contains clause (context) in 8 Mins from a 40TB database. Sometimes it gives result in 5 mins as well. It depends on the text being searched. We tuned Storgae, OS, network, database and SQL to achieve this. We also have to tune the Oracle Text from loading (SYNC) and searching aspects. 5 mins is the total time it takes to get the output on a web browser of the end-user.

                        I'm keen to to find out what you decided in the end; Soir or Oracle Text?

                        What is your payload and system configuration where you ran the benchmark tests?
                          Hi PT Expert,

                          Have you used only contains operator over text or you have entered into XML DB features, also?


                            These guys have build something interesting to this post, I haven't tested their solution yet, pls let me know is someone did.