9 Replies Latest reply: Aug 11, 2014 5:22 PM by kevinUCB RSS

    pre-loading Oracle text in memory with Oracle 12c

    Pierre Timmermans

      There is a white paper from Roger Ford that explains how to load the Oracle index in memory : http://www.oracle.com/technetwork/database/enterprise-edition/mem-load-082296.html

       

      In our application, Oracle 12c, we are indexing a big XML field (which is stored as XMLType with storage secure file) with the PATH_SECTION_GROUP. If I don't load the I table (DR$..$I) into memory using the technique explained in the white paper then I cannot have decent performance (and especially not predictable performance, it looks like if the blocks from the TOKEN_INFO columns are not memory then performance can fall sharply)

       

      But after migrating to oracle 12c, I got a different problem, which I can reproduce: when I create the index it is relatively small (as seen with ctx_report.index_size) and by applying the technique from the whitepaper, I can pin the DR$ I table into memory. But as soon as I do a ctx_ddl.optimize_index('Index','REBUILD') the size becomes much bigger and I can't pin the index in memory. Not sure if it is bug or not.

       

      What I found as work-around is to build the index with the following storage options:

      ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');

      ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'YES' );

      ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );

       

      so that the token_info column will be stored in a secure file. Then I can change the storage of that column to put it in the keep buffer cache, and write a procedure to read the LOB so that it will be loaded in the keep cache. The size of the LOB column is more or less the same as when creating the index without the BIG_IO option but it remains constant even after a ctx_dll.optimize_index. The procedure to read the LOB and to load it into the cache is very similar to the loaddollarR procedure from the white paper.

       

      Because of the SDATA section, there is a new DR table (S table) and an IOT on top of it. This is not documented in the white paper (the white paper was written for Oracle 10g). In my case this DR$ S table is much used, and the IOT also, but putting it in the keep cache is not as important as the token_info column of the DR I table. A final note: doing SEPARATE_OFFSETS = 'YES' was very bad in my case, the combined size of the two columns is much bigger than having only the TOKEN_INFO column and both columns are read.

       

       

      Here is an example on how to reproduce the problem with the size increasing when doing ctx_optimize

       

      1. create the table

       

      drop table test;
      CREATE TABLE test
      (ID NUMBER(9,0) NOT NULL ENABLE,
       XML_DATA XMLTYPE
      )
      XMLTYPE COLUMN XML_DATA STORE AS SECUREFILE BINARY XML (tablespace users disable storage in row);
      

       

      2. insert a few records

      insert into test values(1,'<Book><TITLE>Tale of Two Cities</TITLE>It was the best of times.<Author NAME="Charles Dickens"> Born in England in the town, Stratford_Upon_Avon </Author></Book>');
      insert into test values(2,'<BOOK><TITLE>The House of Mirth</TITLE>Written in 1905<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
      insert into test values(3,'<BOOK><TITLE>Age of innocence</TITLE>She got a prize for it.<Author NAME="Edith Wharton"> Wharton was born to George Frederic Jones and Lucretia Stevens Rhinelander in New York City.</Author></BOOK>');
      

      3. create the text index

      drop index i_test;
        exec ctx_ddl.create_section_group('TEST_SGP','PATH_SECTION_GROUP');
      begin
        CTX_DDL.ADD_SDATA_SECTION(group_name => 'TEST_SGP',  
                                  section_name => 'SData_02',
                                  tag => 'SData_02',
                                  datatype => 'varchar2');
      end;
      /
       exec ctx_ddl.create_preference('TEST_STO','BASIC_STORAGE');
      exec  ctx_ddl.set_attribute('TEST_STO','I_TABLE_CLAUSE','tablespace USERS storage (initial 64K)');
      exec  ctx_ddl.set_attribute('TEST_STO','I_INDEX_CLAUSE','tablespace USERS storage (initial 64K) compress 2');
      exec  ctx_ddl.set_attribute ('TEST_STO', 'BIG_IO', 'NO' );
      exec  ctx_ddl.set_attribute ('TEST_STO', 'SEPARATE_OFFSETS', 'NO' );
      create index I_TEST
        on TEST (XML_DATA)
        indextype is ctxsys.context
        parameters('
          section group   "TEST_SGP"
          storage         "TEST_STO"
        ') parallel 2;
      

       

      4. check the index size

      select ctx_report.index_size('I_TEST') from dual;
      

       

      it says :

      TOTALS FOR INDEX TEST.I_TEST

      ---------------------------------------------------------------------------

      TOTAL BLOCKS ALLOCATED:                                                104

      TOTAL BLOCKS USED:                                                      72

      TOTAL BYTES ALLOCATED:                                 851,968 (832.00 KB)

      TOTAL BYTES USED:                                      589,824 (576.00 KB)

       

      4. optimize the index

       

      exec ctx_ddl.optimize_index('I_TEST','REBUILD');
      

      and now recompute the size, it says

       

      TOTALS FOR INDEX TEST.I_TEST

      ---------------------------------------------------------------------------

      TOTAL BLOCKS ALLOCATED:                                               1112

      TOTAL BLOCKS USED:                                                    1080

      TOTAL BYTES ALLOCATED:                                 9,109,504 (8.69 MB)

      TOTAL BYTES USED:                                      8,847,360 (8.44 MB)

       

      which shows that it went from 576KB to 8.44MB. With a big index the difference is not so big, but still from 14G to 19G.

       

      5. Workaround: use the BIG_IO option, so that the token_info column of the DR$ I table will be stored in a secure file and the size will stay relatively small. Then you can load this column in the cache using a procedure similar to

      alter table DR$I_TEST$I storage (buffer_pool keep);
      alter table dr$i_test$i modify lob(token_info) (cache storage (buffer_pool keep));
      rem: now we must read the lob so that it will be loaded in the keep buffer pool, use the prccedure below
      

       

      create or replace procedure loadTokenInfo is
        type c_type is ref cursor;
        c2 c_type;
        s varchar2(2000);
        b blob;
        buff varchar2(100);
        siz number;
        off number;
        cntr number;
      begin
          s := 'select token_info from  DR$i_test$I';
          open c2 for s;
          loop
             fetch c2 into b;
             exit when c2%notfound;
             siz := 10;
             off := 1;
             cntr := 0;
             if dbms_lob.getlength(b) > 0 then
               begin
                 loop
                   dbms_lob.read(b, siz, off, buff);
                   cntr := cntr + 1;
                   off := off + 4096;
                 end loop;
               exception when no_data_found then
                 if cntr > 0 then
                   dbms_output.put_line('4K chunks fetched: '||cntr);
                 end if;
               end;
             end if;
          end loop;
      end;
      /
      

       

      Rgds, Pierre

        • 1. Re: pre-loading Oracle text in memory with Oracle 12c
          kevinUCB

          Well, this reply won't have much in the way of answers, but your post makes for an interesting start to a conversation.

          I have a similar use case, and also use SDATA fairly heavily. I asked Roger at Openworld last year about where to rank the $S table/index in terms of keeping it in memory; the reply was that it's worth it if you use it a lot. In my case, it's a little over 1GB for a 15GB DR$I table, so we do keep it in memory.

          Coincidentally, I did ask at that time about perhaps updating the white paper. Perhaps doing it in the forum, "in order to celebrate the ten-year anniversary of initial publication", would help spur things on a bit?

           

          The Text Application Developer's Guide has a table [http://docs.oracle.com/cd/E16655_01/text.121/e17748/csection.htm#CACCGFDE] that indicates that SDATA is not supported in PATH_SECTION_GROUP. We use XML_SECTION_GROUP; we had to modify our XML a bit to disambiguate /doc/title/original/value and /doc/title/translated/value, but otherwise it behaves properly.

           

          As for your substantive problems, I'm a bit surprised that the index gets bigger after a rebuild. Is the increased size through all components of the index, or can you look at the before/after for INDEX_SIZE and see that it's specifically the $I table?

          Have you run a full INDEX_STATS before and after to determine if the rebuilt index is perhaps more fragmented (does optimize rebuild on an index defined as PARALLEL 2 execute the rebuild in parallel)?

          Is it possible that the change to BIG_IO uses a different/larger block size and therefore allocates more space?

           

          In the end, I'm not sure why making the ability to read DR$I from disk faster would improve the performance of queries if you're spending the time to cache the text index in RAM. It might speed up the initial load (perhaps by a lot??), but once in memory, shouldn't IO stop being a factor?

           

          Finally, your LoadTokenInfo procedure mimics the loadAllDollarR procedure in the white paper -- do you also run the commands that cache all the other objects? If so, how long does it take?

          • 2. Re: pre-loading Oracle text in memory with Oracle 12c
            Pierre Timmermans

            It is very interesting what you say about the XML_SECTION_GROUP, I am very interested in trying it out instead of PATH_SECTION_GROUP, but until now I was afraid it would be too difficult. Our base XML is very big but it contains a lot of attributes that are not relevant for the search (internal id's amont others). I tried to convince the developer to strip the XML, and make one XML for the search with the smallest set of attributes/sections, but without success. With XML_SECTION_GROUP, we would have to explicitly tell Oracle what section must be indexed, I hope this would result in a smaller index and specifically in a smallest DR$I. There might be a problem that INPATH operator are not allowed with XML_SECTION_GROUP though. Also, this XML is so big, it would take a lot of time to identify all sections.

             

            The fact that the index gets bigger after the rebuild (with ctx_ddl.optimize index) is maybe a bug or a side effect, I can reproduce it easily (code in my first post), it is something I did not have on Oracle 10g. Doing a rebuild with "alter index rebuild" puts the index back at the original size. I tried multiple settings but could not make sense of it. With BIG_IO, the situation is improved : the difference I think is that without BIG_IO the token_info field is stored in-line while with BIG_IO it is stored offline.


            Unfortunately the index_stats is not working on my index, I get the strange error from the previous post, so I could not see from where the increase in size comes. but since I can reproduce the problem with only 3 records in the base table, it must not be related to fragmentation.

             

            For the caching, my impression is that the $S table and the IOT on top of it is much smaller (about 4G) and that it gets "naturally" in the normal buffer cache and so the keep buffer cache is not needed. But my experience with the DR$I table is that it is much bigger and does not get/stay "naturally" in the buffer cache unless it is forced. If I don't force it, then performance can get very unpredictable, suddenly a query (especially the count of the result hits) will cause massive amount of disk reads and the query would take forever. So without pre-loading the DR$I table, the user experience is much affected, it takes a lot of searches to get the cache loaded. Possibly there are other issues in our application of course. Also, it is important to know that when you do a ctx_ddl.optimize rebuild the settings are lost and one has to alter the storage parameters again.

             

            The loadTokenInfo is a replica of the function provided by R. Ford in the white paper, it is needed only when one use the BIG_IO option, I assume because now the token_info is stored out-of-line.

             

            Are your users happy with the performance ? Do you pin the 15G of you DR$I table in memory ? Isn't it all of nothing for the DR$I table ? I have the feeling that if the keep cache cannot acomodate the whole DR$I then nothing gets cached.

             

            When I test with sql_trace=true, I see that for a very big search (a keyword much used), I have a query execution time of about 2 seconds. But none of those 2 seconds is  disk IO's once everything is in the cache: it is not very clear where the 2 seconds are spent, maybe just the logical reads. My hope is that those 2 seconds would be the upper bound for the user.

             

            For our biggest client I am preparing a migration to a better system (at least the SAN will be better configured !) and the app will run on a RAC (which raises another question : should I load-balanced the searches on both nodes or make them sticky on one node where the sync of the index also would be made sticky ? To my surprise, my first tests showed that load-balancing works better). We have to deliver this new system soon (mid august) so I am a bit under pression, I really hope I can do some testing with XML_SECTION_GROUP

             

            Thanks and regards, Pierre

            ps: if I can help convince Roger Ford to update the white paper, I will certainly do !!

            • 3. Re: pre-loading Oracle text in memory with Oracle 12c
              Pierre Timmermans

              I have investigated further, here is more feedback, in the hope it can help the next one that will hit the same problems. I run on Oracle 12c, the text index is on a XMLType column stored as secure file.

               

              - if I create the text index normally (without BIG_IO, i.e. option BIG_IO = 'NO'), then the overall index size is about 19G and the DR$I table is 17.9G. With a db_keep_cache_size of 22G I can hold the index in the cache applying the techniques described in the white paper. The performance is decent (1 to 4 seconds, 4 seconds being for a search term very frequently used and 1 second for most reasonable queries). The performance is also predictible

              - As soon as I run ctx_ddl.optimize_index('REBUILD') the index size becomes 38G, I can't pin the index in the keep buffer cache and the performance becomes highly unpredictible and sometimes very bad (minutes).

              - If I run alter index rebuild, then the size is back to what it was. The problem of alter index rebuild is that index is not available (it takes 40 minutes) and also if the rebuild fails then I am left without an index. It is a serious operational risk.

              - If I create the index with the option BIG_IO = 'YES' then the size of the index is about 19G and does not increase after a rebuild. But the performance is not very good, it is consistently between 4 and 8 seconds. So my work-around described in the first message is not good.

               

              Note: if I don't pin the index in memory using the techniques described in the white-paper, than the performance is a disaster: it takes minutes and plenty of physical reads when I do a search on a widely used token

               

              I will try to submit a service request for the problem of the ctx_optimize 'REBUILD' that explodes the size of the index. I can reproduce it easily. I am not sure I will be able to because my metalink account does not allow me to submit technical SR. It is probably an administrative problem but I know from experience they are very difficult to solve (like you call Belgium that tells you to call Romania that tells you to call someone else .... :-))

               

              I am investigating another track which is to strip the base XML as much as possible of attributes that are not searchable, I can obviously reduce the size of the index with this track

               

              Rgds, Pierre

              • 4. Re: pre-loading Oracle text in memory with Oracle 12c
                kevinUCB

                Have you experimented with using optimize_index('FULL') rather than 'REBUILD'? During our initial experiments we were having a problem where REBUILD was failing when the new index was ready to be swapped in for the old one; while we waited for support [to fail to address the problem, grumble, grumble], we used optimize FULL to see how close to a fully defragmented index we could get.

                We used a machine with a lot of CPU cores, and, with a high degree of parallelism we got very close to complete defragmentation (the size was a bit bigger, but the number of rows essentially the same) and we added the benefit of being able to do it whenever we thought we could, or after a bulk load/delete, and/or on weekends.

                As for stripping down the base XML, if you know the data has elements/attributes that are just noise for text search but can easily be eliminated programmatically, then it's to your benefit. However, when we looked at it, the number of rows we were going to eliminate in $I wasn't all that much in the big scheme of things. We needed to get rid of hundreds of thousands of rows to make the business case that we should eliminate stuff from the index, and nothing we could do had that kind of impact. We did get rid of some duplication of data inside the XML that helped make relevance scores a bit better. You can also look at stopclasses in 12c; it doesn't look like it will help us much, but for certain types of data it would probably be very effective.

                • 5. Re: pre-loading Oracle text in memory with Oracle 12c
                  Pierre Timmermans

                  I have left aside the problem that the optimize_index causes an explosion of the size of the $I table, I will open a SR on metalink when possible and see what they say. My work-around short term will be drop/create the index. But as soon as possible I will experiment with FULL (I think I tried but it did not help).

                   

                  I saw an old forum thread where someone was doing a shrink table to reclaim empty space with "alter table DR$..$I shrink space cascade;" (it requires to enable row movement before), I can try that as well

                   

                  dropping and creating the index takes only 20 minutes (with parallel 20). It means a downtime but I think it is acceptable once a week during the night.There is probably an operational risk, I encountered a few problems in my tests, however I think they were mostly related to the BIG_IO option that makes the process longer (I abandonned the BIG_IO track, it slows down performance). Some problems I add :

                  - deadlock when Oracle is switching the shadow index (only with optimize rebuild)

                  - node eviction: the node where the rebuild was taking place got rebooted by the clusterware because it was waiting too long on a resource (could not investigate in details yet)

                   

                  As for stripping the base XML of all unecessary tags, I suppose in your case you don't need it because you use the XML_SECTION_GROUP and so only what you explicitly say is indexed. In my case, we have tons of attributes with id's that are not searchable but that takes place in the DR$I table. Like you say it is pure noise. So I hope it will make a big difference in my case. I have written a XQUERY expression that I will put in a trigger in order to remove all noise (I got help here xquery insert node and assign value based on an XPATH expression), I am testing this today. Of course a better solution would be to use XML_SECTION_GROUP, I need the developer for that.

                   

                  Finally I realized that the developer add a complex HASPATH expression in all searches, something like HASPATH(//DigitalAssets/DigitalAsset[@available="true" and @videoFormatId="11"]/VideoLocations/VideoLocation[@typeId="8"]). This condition is true in 99.9 of the cases. To my surprise, I discovered it has a significant negative impact (not alwasy, but when the other search terms are not selective), I have the impression that Oracle sometimes use this expression as driving one which ampers the performance a lot and sometimes not. I will replace this by a SDATA section and see if it improves (I think SDATA will never be the driving part).

                   

                  The last part I can investigate, only for this particular client, is to partition the table in two and make the context index local. It makes a lot of sense because one partition will be very small and the search will always first hit that partition by default (unless the user specifically ask for the bigger type of information)

                   

                  A final though, is that I have the impression that context index on XMLType is a bit deprecated, oracle seems to recommend xquery type of index, it is unfortunate because switching now is completly impossible for me

                   

                  Thanks,

                   

                  Pierre

                  • 6. Re: pre-loading Oracle text in memory with Oracle 12c
                    Pierre Timmermans

                    Further investigations about the explosion of the size of the DR$I table after a ctx_ddl.optimize: there is a metalink note Doc ID 799674.1 exactly about that. It was opened as a bug but later closed as not a bug. The work-around is to create a tablespace with a bigger block size and store the index in this tablespace, in this case the waste of space should be much less.

                     

                    However, the key point for me is to load the DR$I in memory and to pin it: that's possible with the keep pool. Unfortunately the parameter keep_buffer_cache is only supported for standard block size. There are other parameters, db_nK_cache_size, but I suppose it will not be pin the data in memory and keep it.

                     

                    So I am afraid I am stuck. The guy that closed the SR as not a bug probably did not realize that the size of the DR$I table is of huge importance for people relying on the white-paper of R. Ford to load the index in memory :-(

                     

                    A possibility would be create the whole database with a bigger block size, I'll have to try that but I need time...

                     

                    Rgds, Pierre

                    • 7. Re: pre-loading Oracle text in memory with Oracle 12c
                      kevinUCB

                      We've been wandering away from trying to use the keep_pool for a while; the benefit of "knowing" that the text index is pinned is less than the cost of having to calculate, monitor, and manage the keep pool size.

                      "I'd rather throw RAM at the problem than DBA time" is how we described it.

                      So if you eliminate the keep cache, still 'warm up' the database by preloading the text index into db_cache, and running optimize(FULL), do you think you can keep the index in memory? If you're doing other stuff that sucks up a bunch of memory, then that might not work, but if search is constant then you can at least test it out and see whether the text components get aged out.

                       

                      I'm not totally convinced that block size should matter that much -- perhaps some tweaking of the storage for the DR$I table might make segment management a bit better, but ultimately the size of the index shouldn't blow up to double the size, that just baffles me. The infrastructure is designed to keep the LOBs small enough to store inline, so you might get rid of a bit of overhead, but I don't get how that would allow you to cut the # of blocks allocated in half. (As an aside, the example in the bug looks kind of synthetic -- how did they start out with all blocks full, then 'optimize' their way to a bunch of partially full blocks?)

                      • 8. Re: Re: pre-loading Oracle text in memory with Oracle 12c
                        Pierre Timmermans

                        I have been working a lot on that issue recently, I can give some more info.

                         

                        First I totally agree with you, I don't like to use the keep_pool and I would love to avoid it. On the other hand, we have a specific use case : 90% of the activity in the DB is done by queuing and dbms_scheduler jobs where response time does not matter. All those processes are probably filling the buffer cache. We have a customer facing application that uses the text index to search the database : performance is critical for them.

                         

                        What kind of performance do you have with your application ?

                         

                        In my case, I have learned the hard way that having the index in memory (the DR$I table in fact) is the key : if it is not, then performance is poor. I find it reasonable to pin the DR$I table in memory and if you look at competitors this is what they do. With MongoDB they explicitly says that the index must be in memory. With elasticsearch, they use JVM's that are also in memory. And effectively, if you look at the awr report, you will see that Oracle is continuously accessing the DR$I table, there is a SQL similar to

                         

                        
                        SELECT /*+ DYNAMIC_SAMPLING(0) INDEX(i) */     
                        TOKEN_FIRST, TOKEN_LAST, TOKEN_COUNT, ROWID     
                        FROM DR$idxname$I
                        WHERE TOKEN_TEXT = :word AND TOKEN_TYPE = :wtype     
                        ORDER BY TOKEN_TEXT,  TOKEN_TYPE,  TOKEN_FIRST
                        
                        

                         

                        which is continuously done.

                         

                        I think that the algorithm used by Oracle to keep blocks in cache is too complex. A just realized that in 12.1.0.2 (was released last week) there is finally a "killer" functionality, the in-memory parameters, with which you can pin tables or columns in memory with compression, etc. this looks ideal for the text index, I hope that R. Ford will finally update his white paper :-)

                         

                        But my other problem was that the optimize_index in REBUILD mode caused the DR$I table to double in size : it seems crazy that this was closed as not a bug but it was and I can't do anything about it. It is a bug in my opinion, because the create index command and "alter index rebuild" command both result in a much smaller index, so why would the guys that developped the optimize function (is it another team, using another algorithm ?) make the index two times bigger ?

                         

                        And for that the track I have been following is to put the index in a 16K tablespace : in this case the space used by the index remains more or less flat (increases but much more reasonably). The difficulty here is to pin the index in memory because the trick of R. Ford was not working anymore.

                         

                        What worked:

                         

                        first set the keep_pool to zero and set the db_16k_cache_size to instead. Then change the storage preference to make sure that everything you want to cache (mostly the DR$I) table come in the tablespace with the non-standard block size of 16k.

                         

                        Then comes the tricky part : the pre-loading of the data in the buffer cache. The problem is that with Oracle 12c, Oracle will use direct_path_read for FTS which basically means that it bypasses the cache and read directory from file to the PGA !!! There is an event to avoid that, I was lucky to find it on a blog (I can't remember which, sorry for the credit).

                         

                        I ended-up doing that. the events to 10949 is to avoid the direct path reads issue.

                         

                        alter session set events '10949 trace name context forever, level 1';
                        alter table DR#idxname0001$I cache;
                        alter table DR#idxname0002$I cache;
                        alter table DR#idxname0003$I cache;
                        SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0001$I;
                        SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0002$I;
                        SELECT /*+ FULL(ITAB) CACHE(ITAB) */ SUM(TOKEN_COUNT),  SUM(LENGTH(TOKEN_INFO)) FROM DR#idxname0003$I;
                        SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0001$I ITAB;
                        SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0002$I ITAB;
                        SELECT /*+ INDEX(ITAB) CACHE(ITAB) */  SUM(LENGTH(TOKEN_TEXT)) FROM DR#idxname0003$I ITAB;
                        

                         

                        It worked. With a big relief I expected to take some time out, but there was a last surprise. The command

                         

                        exec ctx_ddl.optimize_index(idx_name=>'idxname',part_name=>'partname',optlevel=>'REBUILD');
                        

                         

                        gqve the following

                         

                        ERROR at line 1:

                        ORA-20000: Oracle Text error:

                        DRG-50857: oracle error in drftoptrebxch

                        ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION

                        ORA-06512: at "CTXSYS.DRUE", line 160

                        ORA-06512: at "CTXSYS.CTX_DDL", line 1141

                        ORA-06512: at line 1

                         

                        Which is very much exactly described in a metalink note 1645634.1 but in the case of a non-partitioned index. The work-around given seemed very logical but it did not work in the case of a partitioned index. After experimenting, I found out that the bug occurs when the partitioned index is created with  dbms_pclxutil.build_part_index procedure (this enables  enables intra-partition parallelism in the index creation process). This is a very annoying and stupid bug, maybe there is a work-around, but did not find it on metalink

                         

                        Other points of attention with the text index creation (stuff that surprised me at first !) ;

                        - if you use the dbms_pclxutil package, then the ctx_output logging does not work, because the index is created immediately and then populated in the background via dbms_jobs.

                        - this in combination with the fact that if you are on a RAC, you won't see any activity on the box can be very frightening : this is because oracle can choose to start the workers on the other node.

                         

                        I understand much better how the text indexing works, I think it is a great technology which can scale via partitioning. But like always the design of the application is crucial, most of our problems come from the fact that we did not choose the right sectioning (we choosed PATH_SECTION_GROUP while XML_SECTION_GROUP is so much better IMO). Maybe later I can convince the dev to change the sectionining, especially because SDATA and MDATA section are not supported with PATCH_SECTION_GROUP (although it seems to work, even though we had one occurence of a bad result linked to the existence of SDATA in the index definition). Also the whole problematic of mixed structured/unstructured searches is completly tackled if one use XML_SECTION_GROUP with MDATA/SDATA (but of course the app was written for Oracle 10...)

                         

                        Regards, Pierre

                        • 9. Re: pre-loading Oracle text in memory with Oracle 12c
                          kevinUCB

                          Interesting.

                          I agree with just about everything you've seen; I think XML_SECTION_GROUP has worked very well for what we need (and your use case is even better than mine).

                          I also ran into the bug with ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION. It was my most frustrating support experience because they closed it even though I had given them enough info to reproduce. Maybe I'll try it again.

                           

                          My performance goal is 3 sec. engine time and 3-6 sec. front-end UI response. 'Reasonably selective' searches (<20,000 results) generally meet this goal. You are so right about getting the $I table and $X into memory, and, as I said earlier, putting the $S table into cache helped us a lot.

                           

                          I'd still encourage you to try to optimize Full with a short time limit (we do parallel 12 for 60 minutes, for example) and see what that does to your index size. It sounds like the 16K cache is working for you, but it's still something you'd have to tune and monitor as your collection grows; I wonder if the rebuild takes all that extra space because it's copying in the same tablespace and therefore filling blocks that would otherwise be able to fit more rows contiguously.