5 Replies Latest reply: Sep 4, 2012 3:20 AM by 570248 RSS

    Plain text from RTF - ctx_doc.filter

    570248
      Hello,

      starting from the documentation (http://docs.oracle.com/cd/E11882_01/text.112/e24436/cdocpkg.htm#i997868) I'm trying to write a function which would convert RTF stored in CLOB in some table into plain text. Here's where I got so far with a test case:

      CREATE TABLE rtf_table (id NUMBER PRIMARY KEY, rtf CLOB)
      /
      INSERT INTO rtf_table
      VALUES (1,
      '{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset238 CorporateSLight;}}
      \viewkind4\uc1\pard\lang1050\f0\fs24\ This is
      \par regular rtf file
      \par spanning
      \par several
      \par lines...
      \par The End
      \par }' )
      /
      COMMIT
      /
      CREATE INDEX ix_ctx_rtf ON rtf_table (rtf)
      INDEXTYPE IS ctxsys.CONTEXT PARAMETERS ('filter ctxsys.null_filter')
      /

      CREATE OR REPLACE FUNCTION get_rtf (
      p_id IN NUMBER,
      p_plain IN NUMBER
      )
      RETURN VARCHAR2
      IS
      mklob CLOB;
      amt NUMBER := 40;
      line VARCHAR2 (80);
      plain BOOLEAN;
      BEGIN
      IF p_plain = 0
      THEN
      plain := FALSE;
      ELSE
      plain := TRUE;
      END IF;

      ctx_doc.filter ('IX_CTX_RTF', p_id, mklob, plain);
      -- mklob is NULL when passed-in, so ctx-doc.filter will allocate a temporary
      -- CLOB for us and place the results there.
      DBMS_LOB.READ (mklob, amt, 1, line);
      -- have to de-allocate the temp lob
      DBMS_LOB.freetemporary (mklob);
      RETURN line;
      END;
      /

      SELECT a.*, get_rtf (a.id, 1) AS plain,
      get_rtf (a.id, 0) AS other
      FROM rtf_table a
      /

      ID RTF PLAIN OTHER
      -------------------------------------- ---------- ------------------------------------------------ ------------------------------------------------
      1 (CLOB) {\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcha         {\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcha        

      1 row(s) retrieved


      As you can see, I cannot get plain text out. I am probably missing something obvious but I just can't figure it out.

      Thanks in advance,
      Bruno
        • 1. Re: Plain text from RTF - ctx_doc.filter
          Roger Ford-Oracle
          You're using NULL_FILTER. If you want it converted, you need to use AUTO_FILTER.

          Not sure what your actual use-case is, but are you aware you can use CTX_DOC.POLICY_FILTER to filter a document without having to create an index?
          • 2. Re: Plain text from RTF - ctx_doc.filter
            Roger Ford-Oracle
            Try this (in SQL*Plus)
            exec ctx_ddl.drop_policy ('mypol')
            exec ctx_ddl.create_policy ('mypol', 'ctxsys.auto_filter' )
            
            variable clobtofilter clob
            variable filteroutput clob
            
            begin
              :clobtofilter := '{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset238 CorporateSLight;}}
            \viewkind4\uc1\pard\lang1050\f0\fs24\ This is
            \par regular rtf file
            \par spanning
            \par several
            \par lines...
            \par The End
            \par }';
              
             dbms_lob.createtemporary( :filteroutput, true );
             ctx_doc.policy_filter ('mypol', :clobtofilter, :filteroutput, true );
            end;
            /
            
            set long 500000
            print filteroutput
            • 3. Re: Plain text from RTF - ctx_doc.filter
              570248
              Roger, thank you very much for this fast solution (tried it, works like a charm). It's really late here in Europe :-(

              Just one more question if I may - if I read documentation correctly, using index can only give me HTML from RTF, I have to use your approach with policy_filter if I want plain text?

              Thanks again, best regards,
              Bruno
              • 4. Re: Plain text from RTF - ctx_doc.filter
                Roger Ford-Oracle
                I'm in Europe too (UK, in fact) - so didn't get to answer this last night.

                There shouldn't be any difference between policy_filter and (index) filter. I'm actually surprised that the test I created yesterday seemed to output plain text rather than HTML - I was expecting to get HTML. I'm not quite sure why that is. If you definitely need plain text, I would look at some real examples, and if you're getting HTML then you'll have to figure out a way to remove all the HTML tags yourself. No doubt a PL/SQL procedure with REGEXP_REPLACE ought to be able to do it.

                - Roger
                • 5. Re: Plain text from RTF - ctx_doc.filter
                  570248
                  Your example gave me plain text (exactly what I need), and if I change last parameter to false (ctx_doc.policy_filter ('mypol', :clobtofilter, :filteroutput, false );) I get HTML.

                  When I first read Tom's answer on this topic (http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:25695084847068) I misunderstood that I can get plain text only if I use index.

                  Anyway, this solves my problem so thanks again.

                  Bruno