1 2 Previous Next 23 Replies Latest reply on Mar 14, 2008 11:08 AM by Sohil Bhavsar Go to original post
      • 15. Re: Using Oracle Text with Apex
        VANJ
        roll your own SNIPPET procedure using the offsets returned by CTX_DOC.HIGHLIGHT

        Hm, maybe I am being dense, but I don't see how that can be done. Mind giving me another nudge in the right direction? Thanks!

        [Ideally, I would display those snippets right on the main search results page itself, search engine style]
        • 16. Re: Using Oracle Text with Apex
          dccase
          I haven't tried this, so you're on your own...

          The output of CTX_DOC.HIGHLIGHT is a list of offsets (starting and ending positions) of the search results in your document.

          You could probably loop through those offsets and grab a chunk of text starting X characters before and Y characters after each match.

          SUBSTR can work on CLOB's in 9i.
          • 17. Re: Using Oracle Text with Apex
            Dimitri Gielis
            This thread was spotted to be included in the weekly APEX forum wrap-up.
            (in the most interesting topics part)

            Thanks,
            Dimitri
            -- APEX Forum wrap-up --
            @ http://dgielis.blogspot.com/search/label/apex%20forum
            • 18. Re: Using Oracle Text with Apex
              VANJ
              Just for completeness (and since Oracle Text introduced the ctx_doc.snippet function in 10g), here is a DIY version of the function for use in pre-10g versions.

              Basically, as Doug suggested, it just uses the builtin ctx_doc.highlight API and loops over the result and returns some text before and after the matches, with some optional parameters thrown in for good measure. The search string is highlighted, the "context" before and after it is not.

              Works like a charm. Hope this helps.
                  FUNCTION snippet (
                      p_clob          IN CLOB,
                      p_index_name    IN VARCHAR2,
                      p_textkey       IN VARCHAR2,
                      p_text_query    IN VARCHAR2,
                      p_start_context IN INTEGER  DEFAULT 25,
                      p_end_context   IN INTEGER  DEFAULT 25,
                      p_max_matches   IN INTEGER  DEFAULT 5,
                      p_starttag      IN VARCHAR2 DEFAULT '< span class="hl">',
                      p_endtag        IN VARCHAR2 DEFAULT '< /span>',
                      p_delimiter     IN VARCHAR2 DEFAULT '< br/>'
                  )
                  RETURN VARCHAR2
                  IS
                      l_restab    ctx_doc.highlight_tab;
                      l_snippet   VARCHAR2(32767);
                      l_start     INTEGER;
                      l_length    INTEGER;
                      l_clob_size INTEGER;
                      l_matched   VARCHAR2(32767);
                  BEGIN
                      ctx_doc.set_key_type('PRIMARY_KEY');

                      ctx_doc.highlight (
                          index_name    => p_index_name,
                          textkey       => p_textkey,
                          text_query    => p_text_query,
                          restab        => l_restab
                      );

                      l_clob_size := dbms_lob.getlength(p_clob);
                      FOR i IN 1..least(l_restab.count,p_max_matches)
                      LOOP
                          l_start := l_restab(i).offset - p_start_context;
                          IF (l_start <= 0)
                          THEN
                              l_start := 1;
                          END IF;

                          l_length := l_restab(i).length + p_end_context;
                          IF (l_length + l_restab(i).offset >= l_clob_size)
                          THEN
                              l_length := l_clob_size - l_restab(i).offset + 1;
                          END IF;

                          l_matched := p_starttag || dbms_lob.substr(p_clob,l_restab(i).length,l_restab(i).offset) || p_endtag;
                          l_snippet := l_snippet || p_delimiter || dbms_lob.substr(p_clob,l_restab(i).offset-l_start-1,l_start) || l_matched || dbms_lob.substr(p_clob,l_length,l_restab(i).offset+l_restab(i).length);
                          IF (length(l_snippet) > 4000)
                          THEN
                              RETURN l_snippet;
                          END IF;
                      END LOOP;

                      RETURN l_snippet;
                  END snippet;
              Hope this helps
              • 19. Re: Using Oracle Text with Apex
                abe50
                Vikas,

                When you have time, could you create some kind of example on your example application page. If not, could you tell us the steps on how to implement it.

                Thanks,
                -Abe
                • 20. Re: Using Oracle Text with Apex
                  VANJ
                  I cannot create an example on apex.oracle.com because it doesn't provide access to the server filesystem and/or create directory objects.

                  Here are the steps

                  1.

                  Create the table to store the documents. Load the documents into the table.
                  CREATE TABLE documents
                  (
                    REC_ID   INTEGER       PRIMARY KEY,
                    FILENAME VARCHAR2(100) UNIQUE,
                    doc_clob      CLOB          DEFAULT empty_clob()
                  );
                  insert into documents(rec_id,filename) values (1,'file1.txt');
                  insert into documents(rec_id,filename) values (2,'file2.txt');
                  insert into documents(rec_id,filename) values (3,'file3.txt');
                  BEGIN
                      FOR rec IN (SELECT filename,doc_clob FROM documents ORDER BY rec_id)
                      LOOP
                          lob_util.load_file('my_dir',rec.filename,rec.doc_clob);
                      END LOOP;
                  END;
                  /
                  2. The LOB_UTIL package contains a bunch of utility functions. The source is

                  CREATE OR REPLACE PACKAGE lob_util
                  AS
                      PROCEDURE load_file(
                          p_directory IN     VARCHAR2,
                          p_filename  IN     VARCHAR2,
                          p_lob       IN OUT CLOB
                      );

                      FUNCTION parse_search_string (p_search_str IN VARCHAR2)
                      RETURN VARCHAR2;

                      PROCEDURE markup (
                           p_index_name IN VARCHAR2,
                           p_textkey    IN VARCHAR2,
                           p_text_query IN VARCHAR2,
                           p_restab     IN OUT CLOB,
                           p_starttag   IN VARCHAR2 DEFAULT '[span class="hl"]',
                           p_endtag     IN VARCHAR2 DEFAULT '[span]'
                      );

                      FUNCTION snippet (
                          p_clob          IN CLOB,
                          p_index_name    IN VARCHAR2,
                          p_textkey       IN VARCHAR2,
                          p_text_query    IN VARCHAR2,
                          p_start_context IN INTEGER  DEFAULT 25,
                          p_end_context   IN INTEGER  DEFAULT 25,
                          p_max_matches   IN INTEGER  DEFAULT 5,
                          p_starttag      IN VARCHAR2 DEFAULT '[span class="hl"]',
                          p_endtag        IN VARCHAR2 DEFAULT '[span]',
                          p_delimiter     IN VARCHAR2 DEFAULT '<br/]'
                      )
                      RETURN VARCHAR2;
                  END lob_util;
                  /
                  CREATE OR REPLACE PACKAGE BODY lob_util
                  AS
                      PROCEDURE load_file(
                          p_directory IN     VARCHAR2,
                          p_filename  IN     VARCHAR2,
                          p_lob       IN OUT CLOB
                      )
                      IS
                          l_bfile BFILE;
                          l_dest_offset INTEGER := 1;
                          l_src_offset  INTEGER := 1;
                          l_lang        INTEGER := 0;
                          l_csid        INTEGER := 0;
                          l_warning     INTEGER := 0;
                      BEGIN
                          l_bfile := bfilename(upper(p_directory),p_filename);
                          dbms_lob.fileopen(l_bfile);
                          dbms_lob.loadclobfromfile(p_lob,l_bfile,dbms_lob.getlength(l_bfile),l_dest_offset,l_src_offset,l_csid,l_lang,l_warning);
                          dbms_lob.fileclose(l_bfile);
                      END load_file;

                      FUNCTION parse_search_string( p_search_str IN VARCHAR2 )
                      RETURN VARCHAR2
                      is
                          l_temp_value   varchar2(32767);
                          l_temp_value2  varchar2(32767);
                          l_return_value varchar2(32767) := NULL;
                          l_start_token  number := 1;
                          l_in_token     boolean := FALSE;
                          l_num_tokens   number := 0;
                          l_quotes       number;
                          l_phrases      dbms_sql.varchar2s;
                          n              number;
                      begin
                          if nvl(length(p_search_str),0) = 0 then
                              return NULL;
                          end if;
                     
                           l_quotes := length(p_search_str) - length(replace(p_search_str,'"', ''));
                           if ( l_quotes > 0 and mod(l_quotes,2) = 0 )
                           then
                                l_temp_value2 := lower(p_search_str);
                                for i in 1 .. l_quotes/2
                                loop
                                     n := instr( l_temp_value2, '"' );
                                     l_temp_value := l_temp_value || substr(     l_temp_value2, 1, n-1 );
                                     l_temp_value2 := substr( l_temp_value2, n+1 );
                                     n := instr( l_temp_value2, '"' );
                                     l_phrases(i) := substr( l_temp_value2, 1, n-1 );
                                     l_temp_value2 := substr( l_temp_value2, n+1 );
                                end loop;
                                l_temp_value := l_temp_value || l_temp_value2;
                           else
                                l_temp_value := lower(p_search_str);
                           end if;
                     
                          --
                          l_temp_value := trim(replace( l_temp_value, '{', ' '));
                          l_temp_value := trim(replace( l_temp_value, '}', ' '));
                          l_temp_value := trim(replace( l_temp_value, ':', ' '));
                          l_temp_value := trim(replace( l_temp_value, ';', ' '));
                          l_temp_value := trim(replace( l_temp_value, '"', ' '));
                          l_temp_value := trim(replace( l_temp_value, ':'',', ' '));
                          l_temp_value := trim(replace( l_temp_value, '(', ' '));
                          l_temp_value := trim(replace( l_temp_value, ')', ' '));
                          l_temp_value := trim(replace( l_temp_value, '!', ' '));
                          l_temp_value := trim(replace( l_temp_value, '&', ' '));
                          l_temp_value := trim(replace( l_temp_value, '+', ' '));
                          l_temp_value := trim(replace( l_temp_value, '\', ' '));
                          l_temp_value := trim(replace( l_temp_value, '-', ' '));
                          l_temp_value := trim(replace( l_temp_value, ',', ' '));
                          l_temp_value := trim(replace( l_temp_value, ' and ', ' ' ));
                          l_temp_value := trim(replace( l_temp_value, ' or ', ' ' ));
                          --
                          if length(l_temp_value) > 0 then
                              l_in_token := TRUE;
                          end if;
                          --
                          for i in 1..nvl(length(l_temp_value),0) loop
                              if substr(l_temp_value,i,1) = ' ' then
                                  if l_in_token = TRUE then
                                          l_return_value := l_return_value || '{' ||
                                          substr(l_temp_value,l_start_token, i-l_start_token) || '}' || ' and ';
                                      l_in_token := FALSE;
                                      l_num_tokens := l_num_tokens + 1;
                                  end if;
                              elsif l_in_token = FALSE then
                                  l_in_token := TRUE;
                                  l_start_token := i;
                              end if;
                              exit when l_num_tokens > 30;
                          end loop;
                          if l_in_token = TRUE then
                              l_return_value := l_return_value || '{' || substr(l_temp_value,l_start_token) || '} and ';
                          end if;
                     
                           for i in 1 .. nvl(l_phrases.count,0)
                           loop
                                if ( l_phrases(i) is not null )
                                then
                                     l_return_value := l_return_value || ' {' || l_phrases(i) || '} and ';
                                end if;
                           end loop;
                           return trim( substr( l_return_value, 1, length(l_return_value)-4 ) );
                      end parse_search_string;

                      PROCEDURE markup (
                           p_index_name IN VARCHAR2,
                           p_textkey    IN VARCHAR2,
                           p_text_query IN VARCHAR2,
                           p_restab     IN OUT CLOB,
                           p_starttag   IN VARCHAR2 DEFAULT '[span class="hl"]',
                           p_endtag     IN VARCHAR2 DEFAULT '[span]'
                      )
                      IS
                      BEGIN
                          ctx_doc.set_key_type('PRIMARY_KEY');
                          ctx_doc.markup (
                              index_name    => p_index_name,
                              textkey       => p_textkey,
                              text_query    => p_text_query,
                              restab        => p_restab,
                              starttag      => p_starttag,
                              endtag        => p_endtag    
                          );
                      END markup;

                      FUNCTION snippet (
                          p_clob          IN CLOB,
                          p_index_name    IN VARCHAR2,
                          p_textkey       IN VARCHAR2,
                          p_text_query    IN VARCHAR2,
                          p_start_context IN INTEGER  DEFAULT 25,
                          p_end_context   IN INTEGER  DEFAULT 25,
                          p_max_matches   IN INTEGER  DEFAULT 5,
                          p_starttag      IN VARCHAR2 DEFAULT '[span class="hl"]',
                          p_endtag        IN VARCHAR2 DEFAULT '[span]',
                          p_delimiter     IN VARCHAR2 DEFAULT '<br/]'
                      )
                      RETURN VARCHAR2
                      IS
                          l_restab    ctx_doc.highlight_tab;
                          l_snippet   VARCHAR2(32767);
                          l_start     INTEGER;
                          l_end       INTEGER;
                          l_length    INTEGER;
                          l_clob_size INTEGER;
                          l_matched   VARCHAR2(32767);
                          l_clob      CLOB;
                      BEGIN
                          ctx_doc.set_key_type('PRIMARY_KEY');

                          /*
                          ctx_doc.markup (
                              index_name    => p_index_name,
                              textkey       => p_textkey,
                              text_query    => p_text_query,
                              restab        => l_clob,
                              starttag      => p_starttag,
                              endtag        => p_endtag    
                          );

                          l_clob_size := dbms_lob.getlength(l_clob);
                          FOR i IN 1..p_max_matches LOOP
                              l_start := dbms_lob.instr(l_clob,p_starttag,1,i);
                              IF (l_start = 0)
                              THEN
                                  EXIT;
                              END IF;

                              l_start := l_start - p_start_context;
                              IF (l_start <= 0)
                              THEN
                                  l_start := 1;
                              END IF;

                              l_end   := dbms_lob.instr(l_clob,p_endtag,1,i) + length(p_endtag) + p_end_context;
                              IF (l_end > l_clob_size)
                              THEN
                                  l_end := l_clob_size;
                              END IF;

                              l_snippet := l_snippet || p_delimiter || dbms_lob.substr(l_clob,l_end-l_start+1,l_start);
                              IF (length(l_snippet) > 4000)
                              THEN
                                  RETURN ltrim(l_snippet,p_delimiter);
                              END IF;
                          END LOOP;

                          RETURN ltrim(l_snippet,p_delimiter);
                          */

                          ctx_doc.highlight (
                              index_name    => p_index_name,
                              textkey       => p_textkey,
                              text_query    => p_text_query,
                              restab        => l_restab
                          );

                          l_clob_size := dbms_lob.getlength(p_clob);
                          FOR i IN 1..least(l_restab.count,p_max_matches)
                          LOOP
                              l_start := l_restab(i).offset - p_start_context;
                              IF (l_start <= 0)
                              THEN
                                  l_start := 1;
                              END IF;

                              l_length := l_restab(i).length + p_end_context;
                              IF (l_length + l_restab(i).offset >= l_clob_size)
                              THEN
                                  l_length := l_clob_size - l_restab(i).offset + 1;
                              END IF;

                              l_matched := p_starttag || dbms_lob.substr(p_clob,l_restab(i).length,l_restab(i).offset) || p_endtag;
                              l_snippet := l_snippet || p_delimiter || dbms_lob.substr(p_clob,l_restab(i).offset-l_start-1,l_start) || l_matched || dbms_lob.substr(p_clob,l_length,l_restab(i).offset+l_restab(i).length);
                              IF (length(l_snippet) > 4000)
                              THEN
                                  RETURN l_snippet;
                              END IF;
                          END LOOP;

                          RETURN l_snippet;
                      END snippet;
                  BEGIN
                      NULL;
                  END lob_util;
                  /
                  3. The APEX report region query on the table is
                  select
                  rec_id,
                  score(1) score,
                  filename,
                  dbms_lob.getlength(doc_clob) filesize,
                  lob_util.snippet(doc_clob,'text_idx',rec_id,:P300_SEARCH_STRING,:P302_START_CONTEXT,:P302_END_CONTEXT) snippet
                  from documents
                  where 1=1
                  and contains(doc_clob,:P300_SEARCH_STRING,1) > 0
                  order by 2 desc,3 asc
                  text_idx is a simple Text index on the CLOB column created using

                  create index text_idx on documents(doc_clob) indextype is ctxsys.context;
                  Hope this helps.
                  • 21. Re: Using Oracle Text with Apex
                    abe50
                    Vikas,

                    Many Many Thanks. I can't wait to try it.

                    abe
                    • 22. Re: Using Oracle Text with Apex
                      Sohil Bhavsar
                      I have made a search application using oracle text's BFILE to search

                      from word documents. It will give doc filename as a result and then I will provide

                      facility to download it from server. It is working fine. But I want to display results

                      like GOOGLE i.e. by highlighting the search keyword in a paragraph.

                      Can it be possible using Oracle Text?

                      if possible plz give more details with example,
                      os:windows xp ,
                      oracle 10g.
                      • 23. Re: Using Oracle Text
                        Sohil Bhavsar
                        I have made a search application using oracle text's BFILE to search

                        from word documents. It will give doc filename as a result and then I will provide

                        facility to download it from server. It is working fine. But I want to display results

                        like GOOGLE i.e. by highlighting the search keyword in a paragraph.

                        Can it be possible using Oracle Text?

                        if possible plz give more details with example,
                        os:windows xp ,
                        oracle 10g.
                        1 2 Previous Next