4 Replies Latest reply on Feb 21, 2013 9:04 AM by Pierre Timmermans

    Performance issue with COUNT and contains

    Pierre Timmermans

      I have a table with 2 millions fairly large XML documents and a context index on it. The application always do a count before executing a search and sometimes the count is very slow (more than one minute). I have noticed that when the count returns a big number, it is slow, while if there is no match (or not much), it goes very fast. I find it a bit counter-intuitive because in both cases the explain plan shows that a scan of the domain index is performed.

      I reproduced the problem with a small test case, and executed the count with all different tracing, and it shows that the difference is almost completely in the trace value 11, this is:

      TRACE_QRY_CON_TIME (11): Time spent in CONTAINS processing (drexrcontains/drexrstart/drexrfetch)

      Which does not give me a clue of where the time is really spent !

      Here is a quick set-up to reproduce the problem and that shows that the trace value 11 accumulate the most time

      drop table quick;
      create table quick
      (ldate date default sysdate not null,
      id number not null,
      mytext varchar2(1024)
      drop sequence s_quick;
      create sequence s_quick;

      create trigger quick_ins_trg
      before insert on quick
      for each row
      select s_quick.nextval into :new.id from dual;
      insert into quick(mytext)
      select object_type||' '||owner||' '||object_name
      from dba_objects;
      create index i_quick on quick(mytext) indextype is ctxsys.context ;

      select count(*)
      from quick
      where contains(mytext,'SOS') > 0;

      select ctx_output.get_trace_value(1) from dual;
      select ctx_output.get_trace_value(2) from dual;
      select ctx_output.get_trace_value(3) from dual;
      select ctx_output.get_trace_value(4) from dual;
      select ctx_output.get_trace_value(5) from dual;
      select ctx_output.get_trace_value(6) from dual;
      select ctx_output.get_trace_value(7) from dual;
      select ctx_output.get_trace_value(8) from dual;
      select ctx_output.get_trace_value(9) from dual;
      select ctx_output.get_trace_value(10) from dual;
      select ctx_output.get_trace_value(11) from dual; --> returns 1330
      select ctx_output.get_trace_value(15) from dual;

      Now reset the trace and execute this query

      select count(*)
      from quick
      where contains(mytext,'SYS') > 0;

      the trace value for 11 is 14824
      the trace value for 9 is 102233 (Number of bytes read from $I LOBs), while in the case of no match it was 0

      It is not clear for me why Oracle has to read the $I LOB, and neither if this is the cause of the problem on the big production table.

      I don't have much suggestions to do to the development team, apart maybe from partitioning the table and the index on a date column, but they won't like to force the user to specify a date range. I can also try to put the $I table in the buffer keep pool, but the $I table is very large on production.

      Any suggestions or direction are much welcome

      Regards, Pierre
        • 1. Re: Performance issue with COUNT and contains
          Roger Ford-Oracle
          Doing COUNT(*) requires the kernel to fetch all the rowids from the domain index and count them. The time taken is roughly proportional to the count, in most cases.

          If you have a simple CONTAINS query, without additional structured critria, you can use ctx_query.count_hits, which should be much faster, especially if you use exact=>false (which means you may get an increased count from deleted/updated rows). If you call it AFTER running a query with the same string in the CONTAINS clause, it is essentially free.
          • 2. Re: Performance issue with COUNT and contains
            Pierre Timmermans
            Thanks a lot for the info. One additional question about your last comment: in the application, they first do a count and then select the first 20 rows from the result (using the hint FIRST_ROW(20) and a query with rownum to limit the result). In this case, would it still be free to do the count with ctx_query.count_hits AFTER the fetch? Since they don't fetch the whole result sets, will it make a difference?

            I am still facing a performance problem, but it is not reproducible, so maybe I am not looking at the root cause and maybe the search is only a symptom of something else.

            The developers use ctx_query.count_hits when they can (when there is no other criteria on the base table), and they have put logging and timing in place so I can spot problem queries.

            In the logs of today, I see that the worst query took 111 seconds, it had this predicate; '(HASPATH(/MainCategory[@logType="3"])) AND (HASPATH(//DigitalAssets[@available="true"])) AND HASPATH(/MainCategory/UserGroups/UserGroup[@id="4"]/Right[@id="4"])'

            If I simulate that in SQLPLUS on prod, I get very good result, less than one second.

            set timi on
            var spredicate varchar2(1024);
            var nbr number;
            exec :spredicate := '(HASPATH(/MainCategory[@logType="3"])) AND (HASPATH(//DigitalAssets[@available="true"])) AND HASPATH(/MainCategory/UserGroups/UserGroup[@id="4"]/Right[@id="4"])';
            :nbr := ctx_query.count_hits(index_name=>'MAIN_DETAIL_LOG_XML',text_query=>:spredicate,exact=>true);
            print nbr;

            The nbr is relatively small 5694 and the timing excellent. Of course since the query was already done before I can imagine that all the data is in the cache.

            On a test system, I can't reproduce the problem neither, even if I stop/start the DB before.

            So I am facing a problem that I can't reproduce, neither on the production system neither on a test server with a restore of the prod DB.

            Maybe I can try that? http://www.oracle.com/technetwork/database/enterprise-edition/mem-load-082296.html

            Or is it possible that another operation, like a sync of the index, is blocking the query?

            Thanks again, Pierre
            • 3. Re: Performance issue with COUNT and contains
              Roger Ford-Oracle
              Yes, the count_hits will still be "free" even if you have only fetched the top 20 hits.

              The query that took 111 seconds - was that a query for the top 10 hits plus a count, or just the count, or what? If possible, can you post the full SQL for the query?

              A sync will never block a query, so that shouldn't be a problem.

              Pre-loading the tables into memory is certainly a good idea for general performance, but if your test system is fast even after bouncing the database, I'm not sure that will help greatly.

              One more thing, on the test system you should drop the file system buffer cache - it could be that although the data isn't in the SGA memory, it's been cached in the system memory by the OS. On linux you can do
              sync; echo 3 > /proc/sys/vm/drop_caches
              On Windows you probably have to reboot. Not sure about other OS's.
              • 4. Re: Performance issue with COUNT and contains
                Pierre Timmermans
                The query that took 111 seconds was the count, after the count there is a second query "top x rows" query that was executed and this one was very fast.

                Here is the query for the count

                ctx_query.count_query_hits (HASPATH(/MainCategory[@logType="3"])) AND (HASPATH(//DigitalAssets[@available="true"])) AND HASPATH(/MainCategory/UserGroups/UserGroup[@id="4"]/Right[@id="4"]);

                and the one for the search

                select *
                (select /*+ FIRST_ROWS(20) */ Rownum Rij, t.*
                (select main_detail_logging_id,
                Decode(1,1, x.xml_data.getclobval(), :Policy || :mSearch) Xml_Highlight,
                score(1) Gewicht
                from Main_Detail_Logging_XML x
                where contains(x.xml_data, :mSearch, 1) > 0
                and (1=1 or :p_Relatie_Id is null)
                and (1=1 or :mDateStart is null or :mDateStop is null)
                and (1=1 or :p_Relatie_Id is null)
                and (1=1 or :p_Relatie_Id is null)
                order by score(1) desc) t
                where Rownum <= :p_Row_Einde) r
                where Rij >= :p_Row_Start
                and (1=1 or :p_Xml_Result_Type_Id is null)
                order by Rij

                The mSearch parameter is the same as the in query_hits, the other are null except for the row range.

                I think the query is not very realistic because the user did not specify keywords, which they normally do

                Now that I flush the cache at the Linux level with your trick, I can reproduce the problem on the test system, with a more normal query string.

                stop/start/flush cache

                set timi on
                var spredicate varchar2(140);
                exec :spredicate := '(ronaldo) AND HASPATH(/MainCategory/UserGroups/UserGroup[@id="21"]/Right[@id="4"])'

                select *
                select /*+ FIRST_ROWS(20) */
                from main_detail_logging_xml x
                where contains(x.xml_data,:spredicate,1) > 0
                order by score(1) desc
                where rownum < 21

                --> it takes 55 seconds

                set timi on
                var spredicate varchar2(1024);
                var nbr number;
                exec :spredicate := '(ronaldo) AND HASPATH(/MainCategory/UserGroups/UserGroup[@id="21"]/Right[@id="4"])'
                :nbr := ctx_query.count_hits(index_name=>'MAIN_DETAIL_LOG_XML',text_query=>:spredicate,exact=>true);
                print nbr;
                --> it takes 2.84 seconds and returns 9581

                If I invert the sequence, so first the count and then the 20 first record, then

                count: 53 seconds and returns 9581

                sql: 5.31 seconds

                If I do the queries a second time, then I get very good response time, in the range of 1 second or less.

                From my test, it does not make a difference if I do first the count or first the SQL query

                I have then the feeling that the time is all about scanning the context index and that when it is not in the cache it is slow. The overall size of the index is 13G, so it is theoretically possible to put it memory, but the problem is that we have another context index (and the volume is going to increase). Another point is that there was no DBA in this company before and the DEV's people allocated a buffer_keep_cache of 12G and tried to put all the XML into (they abused of the buffer_pool keep storage clause, for all the bix XML table)

                I am also wondering if it would help to resolve the part of the search that is "non-text" related (like the haspath on numeric attribute, in this case UserGroup@id=21 and Right@id=4) via normal indexes? I could tell the dev's to extract those columns and store them in the base table, or try to map the XMLTYPE column to object relational table and use the relation table for that part of the query? Are there recommandations on that?

                Another track I was investigating is to partition the base table on a date column (the date column is in the XML but I can extract it in the base table), and then try to execute the count in parallel on all partitions. It seems difficult because the ctx_query.count_hits on a partitioned index requires the partition name as parameter, but I did not see a way to launch the count_hits in parallel on all partitions. A big advantage of the date partitioning is that we could tell the user to restrict his search on a date range if he wants good response time.

                Thanks a lot for your help, I am new to this technology and I really like it. If there are other suggestions or directions of course I am much interested

                Rgds, Pierre

                Edited by: Pierre Timmermans on Feb 21, 2013 12:57 AM

                Edited by: Pierre Timmermans on Feb 21, 2013 1:03 AM