This discussion is archived
5 Replies Latest reply: Sep 4, 2012 1:20 AM by 570248 RSS

Plain text from RTF - ctx_doc.filter

570248 Newbie
Currently Being Moderated
Hello,

starting from the documentation (http://docs.oracle.com/cd/E11882_01/text.112/e24436/cdocpkg.htm#i997868) I'm trying to write a function which would convert RTF stored in CLOB in some table into plain text. Here's where I got so far with a test case:

CREATE TABLE rtf_table (id NUMBER PRIMARY KEY, rtf CLOB)
/
INSERT INTO rtf_table
VALUES (1,
'{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset238 CorporateSLight;}}
\viewkind4\uc1\pard\lang1050\f0\fs24\ This is
\par regular rtf file
\par spanning
\par several
\par lines...
\par The End
\par }' )
/
COMMIT
/
CREATE INDEX ix_ctx_rtf ON rtf_table (rtf)
INDEXTYPE IS ctxsys.CONTEXT PARAMETERS ('filter ctxsys.null_filter')
/

CREATE OR REPLACE FUNCTION get_rtf (
p_id IN NUMBER,
p_plain IN NUMBER
)
RETURN VARCHAR2
IS
mklob CLOB;
amt NUMBER := 40;
line VARCHAR2 (80);
plain BOOLEAN;
BEGIN
IF p_plain = 0
THEN
plain := FALSE;
ELSE
plain := TRUE;
END IF;

ctx_doc.filter ('IX_CTX_RTF', p_id, mklob, plain);
-- mklob is NULL when passed-in, so ctx-doc.filter will allocate a temporary
-- CLOB for us and place the results there.
DBMS_LOB.READ (mklob, amt, 1, line);
-- have to de-allocate the temp lob
DBMS_LOB.freetemporary (mklob);
RETURN line;
END;
/

SELECT a.*, get_rtf (a.id, 1) AS plain,
get_rtf (a.id, 0) AS other
FROM rtf_table a
/

ID RTF PLAIN OTHER
-------------------------------------- ---------- ------------------------------------------------ ------------------------------------------------
1 (CLOB) {\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcha         {\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcha        

1 row(s) retrieved


As you can see, I cannot get plain text out. I am probably missing something obvious but I just can't figure it out.

Thanks in advance,
Bruno
  • 1. Re: Plain text from RTF - ctx_doc.filter
    Roger Ford Expert
    Currently Being Moderated
    You're using NULL_FILTER. If you want it converted, you need to use AUTO_FILTER.

    Not sure what your actual use-case is, but are you aware you can use CTX_DOC.POLICY_FILTER to filter a document without having to create an index?
  • 2. Re: Plain text from RTF - ctx_doc.filter
    Roger Ford Expert
    Currently Being Moderated
    Try this (in SQL*Plus)
    exec ctx_ddl.drop_policy ('mypol')
    exec ctx_ddl.create_policy ('mypol', 'ctxsys.auto_filter' )
    
    variable clobtofilter clob
    variable filteroutput clob
    
    begin
      :clobtofilter := '{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset238 CorporateSLight;}}
    \viewkind4\uc1\pard\lang1050\f0\fs24\ This is
    \par regular rtf file
    \par spanning
    \par several
    \par lines...
    \par The End
    \par }';
      
     dbms_lob.createtemporary( :filteroutput, true );
     ctx_doc.policy_filter ('mypol', :clobtofilter, :filteroutput, true );
    end;
    /
    
    set long 500000
    print filteroutput
  • 3. Re: Plain text from RTF - ctx_doc.filter
    570248 Newbie
    Currently Being Moderated
    Roger, thank you very much for this fast solution (tried it, works like a charm). It's really late here in Europe :-(

    Just one more question if I may - if I read documentation correctly, using index can only give me HTML from RTF, I have to use your approach with policy_filter if I want plain text?

    Thanks again, best regards,
    Bruno
  • 4. Re: Plain text from RTF - ctx_doc.filter
    Roger Ford Expert
    Currently Being Moderated
    I'm in Europe too (UK, in fact) - so didn't get to answer this last night.

    There shouldn't be any difference between policy_filter and (index) filter. I'm actually surprised that the test I created yesterday seemed to output plain text rather than HTML - I was expecting to get HTML. I'm not quite sure why that is. If you definitely need plain text, I would look at some real examples, and if you're getting HTML then you'll have to figure out a way to remove all the HTML tags yourself. No doubt a PL/SQL procedure with REGEXP_REPLACE ought to be able to do it.

    - Roger
  • 5. Re: Plain text from RTF - ctx_doc.filter
    570248 Newbie
    Currently Being Moderated
    Your example gave me plain text (exactly what I need), and if I change last parameter to false (ctx_doc.policy_filter ('mypol', :clobtofilter, :filteroutput, false );) I get HTML.

    When I first read Tom's answer on this topic (http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:25695084847068) I misunderstood that I can get plain text only if I use index.

    Anyway, this solves my problem so thanks again.

    Bruno

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points