5 Replies Latest reply: Jul 26, 2013 3:15 PM by Barbara Boehmer RSS

    contains on msword documents return wrong results

    pranieri-Oracle

      Hello All,

      IHAC that use Oracle 11.2.0.1. He has an'application that access to a document  table (with .doc and .pdf document)  where a Oracletext index is created.

      In the following the index ddl:

      begin

        CTX_DDL.create_preference('SIDDA_LEXER','BASIC_LEXER');

        CTX_DDL.set_attribute('SIDDA_LEXER','index_themes','NO');

        CTX_DDL.set_attribute('SIDDA_LEXER', 'PRINTJOINS', '.-');

        CTX_DDL.create_preference('SIDDA_WORDLIST','BASIC_WORDLIST');

        CTX_DDL.set_attribute('SIDDA_WORDLIST','STEMMER', 'ITALIAN');

        CTX_DDL.set_attribute('SIDDA_WORDLIST','FUZZY_MATCH', 'ITALIAN');

        CTX_DDL.create_preference ('SIDDA_FILTER','AUTO_FILTER');

        CTX_DDL.set_attribute('SIDDA_FILTER', 'TIMEOUT', '900');

      end;

      /

      CREATE INDEX IDX_SIDDA_AED_INTR ON BDN.SIDDA_AED_OOLE(SIDDA_DOC) INDEXTYPE IS CTXSYS.CONTEXT

      PARAMETERS ('FILTER BDN.SIDDA_FILTER

                   LEXER BDN.SIDDA_LEXER

                   WORDLIST BDN.SIDDA_WORDLIST

                   STOPLIST BDN.SIDDA_STOPLIST

                        MEMORY 52428800');

       

      The customer access to document table with the following kind of query:

       

      select* from sidda_aed_oole a

      where

      and contains(sidda_doc,<search word>)>0;

       

      For some <search word> the results are corrects, for others <search word>, the query returns, as results, documents where the <search word> is not found.

       

      Any Idea????

       

      Thank you

       

      Paola

        • 1. Re: contains on msword documents return wrong results
          Herald ten Dam

          Hi,

           

          see if you have errors on your index/documents:

           

          select * from CTX_USER_INDEX_ERRORS;

           

          With the err_textid you can look up for which document is was, it is the rowid. The err_text gives the error accounted.

           

          Herald ten Dam

          http://htendam.wordpress.com

          • 2. Re: contains on msword documents return wrong results
            pranieri-Oracle

            In the CTX_USER_INDEX_ERRORS table we find some errors referred to others documents.

            As I said the problem is that  with the contains search we find also documents where the search word is not present, and on this documents we do not find any errors. 

            • 3. Re: contains on msword documents return wrong results
              Herald ten Dam

              Hi,

               

              I see that you have created the index without a sync option. Is the index synched? Do you use ctx_ddl.sync_index('IDX_SIDDA_AED_INTR')? If not synched you are maybe looking at old data which the index find.

               

              Herald ten Dam

              http://htendam.wordpress.com

              • 4. Re: contains on msword documents return wrong results
                pranieri-Oracle

                The index is created after all data are inserted. After the index creation the documents on table are not modify any more. 

                • 5. Re: contains on msword documents return wrong results
                  Barbara Boehmer

                  This can happen if you are searching for a string that contains a word that is in your stoplist.  When you do that, it returns any document that contains the string searched for with any word in place of the stopword.  Please see the example below that demonstrates this behavior.  Think of searching for a stopword as searching for anyword in its place.

                   

                  SCOTT@orcl12c> create table sidda_aed_oole

                    2    (sidda_doc  clob)

                    3  /

                   

                  Table created.

                   

                  SCOTT@orcl12c> insert all

                    2  into sidda_aed_oole values ('searchword')

                    3  into sidda_aed_oole values ('other stuff')

                    4  into sidda_aed_oole values ('additional stuff')

                    5  select * from dual

                    6  /

                   

                  3 rows created.

                   

                  SCOTT@orcl12c> begin

                    2    ctx_ddl.create_stoplist ('sidda_stoplist', 'basic_stoplist');

                    3    ctx_ddl.add_stopword ('sidda_stoplist', 'somestopword');

                    4  end;

                    5  /

                   

                  PL/SQL procedure successfully completed.

                   

                  SCOTT@orcl12c> CREATE INDEX IDX_SIDDA_AED_INTR ON SIDDA_AED_OOLE(SIDDA_DOC)

                    2  INDEXTYPE IS CTXSYS.CONTEXT

                    3  PARAMETERS ('STOPLIST SIDDA_STOPLIST')

                    4  /

                   

                  Index created.

                   

                  SCOTT@orcl12c> select * from sidda_aed_oole a

                    2  where  contains (sidda_doc, 'searchword') > 0

                    3  /

                   

                  SIDDA_DOC

                  --------------------------------------------------------------------------------

                  searchword

                   

                  1 row selected.

                   

                  SCOTT@orcl12c> select * from sidda_aed_oole a

                    2  where  contains (sidda_doc, 'somestopword stuff') > 0

                    3  /

                   

                  SIDDA_DOC

                  --------------------------------------------------------------------------------

                  other stuff

                  additional stuff

                   

                  2 rows selected.

                   

                  SCOTT@orcl12c> select * from sidda_aed_oole a

                    2  where  contains (sidda_doc, 'other somestopword') > 0

                    3  /

                   

                  SIDDA_DOC

                  --------------------------------------------------------------------------------

                  other stuff

                   

                  1 row selected.

                   

                  SCOTT@orcl12c> select * from sidda_aed_oole a

                    2  where  contains (sidda_doc, 'additional somestopword') > 0

                    3  /

                   

                  SIDDA_DOC

                  --------------------------------------------------------------------------------

                  additional stuff

                   

                  1 row selected.