This discussion is archived
3 Replies Latest reply: Feb 12, 2013 7:21 AM by Roger Ford RSS

Oracle Text Indexing performance in Unicode database

990511 Newbie
Currently Being Moderated
Forum folks,

I'm looking for overall performance thoughts in Text Indexing within a Unicode database. Part of our internal testing suites includes searching on values using contains filters over indexed binary and text documents. We've architected these tests such that they could be run in a suite or on their own, thus, the data is loaded at the beginning of each test and then the text indexes are created and populated prior to running any of the actual testing.

We have the same tests running on non-unicode instances of Oracle 11gR2 just fine, but when we run them against a Unicode instance, we are almost always seeing timing issues where the indexes haven't finished populating, thus our tests are saying we've only found n number of hits when we are expecting n+ 50 or in some cases n + 150 records to be returned.

We are just looking for some general information in regards to text indexing performance in a unicode database. Will we need to add sleep time to the testing to allow for the indexes to populate? How much time? We would rather not get into having to create different tests for unicode vs non-unicode, but perhaps that is necessary.

Any insight you could provide would be most appreciated.

Thanks in advance,
Dan
  • 1. Re: Oracle Text Indexing performance in Unicode database
    Roger Ford Expert
    Currently Being Moderated
    When you talk about Unicode, do you mean AL32UTF8?

    Is the data the same in both cases, or are you indexing simple 7-bit ascii data in the one database, and foreign text (maybe Chinese?) in the UTF8 database?

    With the same data, there should be virtually no difference in performance due to the AL32UTF8 database character set.

    I'm not sure I understand your testing methodology. Do you run ( load-data, index-data, run-queries ) sequentially? If so, there should be no way that the indexes can be half-created. If not, don't you have some check to see if the index creation has finished before running the query test? Were you just lucky with the "non-unicode" tests that the indexing just happened to have always finished by the time you ran the queries?
  • 2. Re: Oracle Text Indexing performance in Unicode database
    990511 Newbie
    Currently Being Moderated
    Roger,

    Thanks much for your quick reply...

    When you talk about Unicode, do you mean AL32UTF8?
    --> Yes, this is the Unicode charset we are using.

    Is the data the same in both cases, or are you indexing simple 7-bit ascii data in the one database, and foreign text (maybe Chinese?) in the UTF8 database?
    With the same data, there should be virtually no difference in performance due to the AL32UTF8 database character set.

    --> We have a data generation tool we utilize. For non-unicode data, we generate using all 256 characters in the ISO-8859-1 set. With our Unicode data for clobs, we generate using only the first 1,000 characters of UTF8 by setting up an array of code points...0 - 1000. For Blobs, we have sets of sample word documents and pdfs that are inserted, then indexed.

    I'm not sure I understand your testing methodology. Do you run ( load-data, index-data, run-queries ) sequentially?
    --> That is correct. We utilize the ctx_ddl package to populate the pending table and then to sync the index....The following is an example of the ddl we generate to create and populate the index:
    create index "DBMEARSPARK_ORA80"."RESRESUMEDOC" on "DBMEARSPARK_ORA80"."RESUME" ("RESUMEDOC") indextype is CTXSYS.CONTEXT parameters(' nopopulate sync (every "SYSTIMESTAMP + INTERVAL ''30'' MINUTE" PARALLEL 2) filter ctxsys.auto_filter ') PARALLEL 2;
    execute ctx_ddl.populate_pending('"DBMEARSPARK_ORA80"."RESRESUMEDOC"',null);
    execute ctx_ddl.sync_index('"DBMEARSPARK_ORA80"."RESRESUMEDOC"',null,null,2);

    If so, there should be no way that the indexes can be half-created. If not, don't you have some check to see if the index creation has finished before running the query test?
    --> Excellent question....is there such a check? I have not found a way to do that yet...

    Were you just lucky with the "non-unicode" tests that the indexing just happened to have always finished by the time you ran the queries?
    --> This is quite possible as well. If there is a check to see if the index is ready, then we could add that into our infrastructure.

    --> Thanks, again, for responding so quickly.

    Edited by: djulson on Feb 12, 2013 7:13 AM
  • 3. Re: Oracle Text Indexing performance in Unicode database
    Roger Ford Expert
    Currently Being Moderated
    If you're using AUTO_FILTER to process artificially-generated random text, that's probably the problem.

    It's going to be looking at those random characters and trying to figure out what file format it is.

    Try using NULL_FILTER instead.

    As for the check as to whether the index has finished, why not follow your sync_index with something like
    INSERT INTO status_table VALUES ('index complete')

    then your query application needs to just loop until it finds "index complete" in the status_table.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points