3 Replies Latest reply on Oct 21, 2009 2:08 PM by 442199

    Semantic Indexing for Documents using RDFCTX_WS_EXTRACTOR

    671506
      Hi all,
      I'm trying this interesting new feature of the Release 2. In our project ( http://www.ba2015.org/swportal/servlet/Entry?action=v&ds=ftd ) we use a third party extractor using web service.
      The problem now is that i can't find a way to serialize a clob using webservice callout, anyone has an example of using the RDFCTX_WS_EXTRACTOR, wich I hope, support the sending of a larg clob document?
      Thank you in advance
        • 1. Re: Semantic Indexing for Documents using RDFCTX_WS_EXTRACTOR
          442199
          Hello,

          A document index can be directly created on a CLOB column, in which case the web service call out will serialize the document and embed it in the SOAP request. For example, the following steps show the use of CLOB documents with OpenCalais (web-service) extractor.
          SQL> create table statenotes (docid NUMBER, notes CLOB);
          
          Table created.
          
          SQL> 
          SQL> insert into statenotes values(1,'Montgomery is the capital of Alabama. Birmingham is the state''s largest city.');
          
          1 row created.
          
          SQL> insert into statenotes values(2,'Anchorage is the capital of Alaska. Anchorage is the state''s largest city.');
          
          1 row created.
          
          SQL> set echo off;
          more data ..
          SQL> update statenotes set notes = notes||' '||notes||' '||notes||' '||notes||' '||notes;
          SQL> commit;
          SQL> update statenotes set notes = notes||' '||notes||' '||notes||' '||notes||' '||notes;
          SQL> commit;
          SQL> update statenotes set notes = notes||' '||notes||' '||notes||' '||notes||' '||notes;
          SQL> commit;
          SQL> 
          SQL> select dbms_lob.getlength(notes) from statenotes;
          
          DBMS_LOB.GETLENGTH(NOTES)
          -------------------------
                               9749
                               9374
          
          SQL> -- Create extractor policies -- calais_extractor is the subtype of RDFCTX_WS_EXTRACTOR which 
          SQL> -- sets the OpenCalais end-point and soap action in its constructor 
          SQL> begin
            2    sem_rdfctx.create_policy (policy_name => 'CAPITAL_CITIES',
            3                       extractor  => mdsys.calais_extractor(null));
            4  end;
            5  /
          
          PL/SQL procedure successfully completed.
          
          SQL> create index notesindex on statenotes (notes) indextype is
            2       mdsys.semcontext parameters ('CAPITAL_CITIES');
          
          Index created.
          
          SQL> select count(*) from mdsys.rdfctx_index_exceptions;
          
            COUNT(*)
          ----------
                0
          
          SQL> select docId from statenotes  where sem_contains(notes, 'select ?o1 where  { ?s ?p "Alabama"^^xsd:string }', 1) = 1 order by 1;
            
               DOCID
          ----------
                1
          Alternately, the documents to be indexed could be external to the database, in which case, the table column simply stores a reference to the file. Please review the following section from the Developer's guide for this option.

          http://download.oracle.com/docs/cd/E11882_01/appdev.112/e11828/indexing_for_docs.htm#BEIIGBIE

          Hope this helps,
          -Aravind.
          • 2. Re: Semantic Indexing for Documents using RDFCTX_WS_EXTRACTOR
            671506
            Hi, thanks for your answer.
            I have already stored my documents in a table using clob, but the problem is that i cannot send the clob to the third-party webservice, which is not calais.

            I made a custom type extending mdsys.rdfctx_extractor in this way


            CREATE OR REPLACE TYPE cogito_extractor UNDER mdsys.rdfctx_extractor (
            xsl_trans SYS.XMLTYPE,
            CONSTRUCTOR FUNCTION cogito_extractor2 (xsl_trans SYS.XMLTYPE)
            RETURN SELF AS RESULT,
            OVERRIDING MEMBER FUNCTION getdescription
            RETURN VARCHAR2,
            OVERRIDING MEMBER FUNCTION rdfreturntype
            RETURN VARCHAR2,
            OVERRIDING MEMBER FUNCTION extractrdf (document CLOB, docid VARCHAR2)
            RETURN CLOB
            );
            /


            CREATE OR REPLACE TYPE BODY cogito_extractor2
            AS
            CONSTRUCTOR FUNCTION cogito_extractor2 (xsl_trans SYS.XMLTYPE)
            RETURN SELF AS RESULT
            IS
            BEGIN
            SELF.extr_type := 'Cogito Extractor ';
            -- XML style sheet to generate RDF/XML from proprietary XML documents
            SELF.xsl_trans := xsl_trans;
            RETURN;
            END cogito_extractor2;
            OVERRIDING MEMBER FUNCTION getdescription
            RETURN VARCHAR2
            IS
            BEGIN
            RETURN 'Extactor for Cogito .';
            END getdescription;
            OVERRIDING MEMBER FUNCTION rdfreturntype
            RETURN VARCHAR2
            IS
            BEGIN
            RETURN 'RDF/XML';
            END rdfreturntype;
            OVERRIDING MEMBER FUNCTION extractrdf (document CLOB, docid VARCHAR2)
            RETURN CLOB
            IS
            clobRisposta CLOB;
            ce_xmlt ANYDATA;
            errormessage varchar2(4000);
            risposta varchar2(32767);
            flag PLS_INTEGER;
            BEGIN
            BEGIN
            INSERT INTO RDF_INDEX_TEST.LOG(messaggio) values('Indicizzo doc con id:'||docid);
            EXECUTE IMMEDIATE 'begin :1 := info_extract_xml(:2, :3); end;'
            USING IN OUT ce_xmlt, IN document,in docid;
            flag:=ce_xmlt.getclob(clobRisposta);

            SELECT risposta into clobRisposta from DUAL;
            EXCEPTION WHEN OTHERS THEN
            errormessage := sqlerrm;
            INSERT INTO RDF_INDEX_TEST.LOG(messaggio) values('ERRORE:'||errormessage);
            END;
            commit;

            RETURN clobRisposta;

            -- Now pass the ce_xmlt through RDF/XML transformation --
            --RETURN ce_xmlt.transform (SELF.xsl_trans).getclobval ();
            END extractrdf;
            END;


            then i have created a function called info_extract_xml which should send the clob to webservice, and receive the RDF/XML

            CREATE OR REPLACE FUNCTION RDF_INDEX_TEST.info_extract_xml (document IN CLOB, docId IN VARCHAR2)
            RETURN ANYDATA
            AS
            l_service UTL_DBWS.service;
            l_call UTL_DBWS.call;
            l_result ANYDATA;

            l_wsdl_url VARCHAR2(32767);
            l_namespace VARCHAR2(32767);
            l_service_qname UTL_DBWS.qname;
            l_port_qname UTL_DBWS.qname;
            l_operation_qname UTL_DBWS.qname;
            l_input_params UTL_DBWS.anydata_list;
            BEGIN
            dbms_output.PUT_LINE('chiamata al servizio');
            l_wsdl_url := 'http://graimond.dev.ba2015.org:8888/ExtractInfoRDF/ExtracInfoRDFServiceSoapHttpPort?WSDL';
            l_namespace := 'http://webservices.web.cogitoexporter.edem.ccoraclebari.oracle.it/';

            l_service_qname := UTL_DBWS.to_qname(l_namespace, 'ExtracInfoRDFService');
            l_port_qname := UTL_DBWS.to_qname(l_namespace, 'ExtracInfoRDFServiceSoapHttpPort');
            l_operation_qname := UTL_DBWS.to_qname(l_namespace, 'extractInfoFromDocument');

            l_service := UTL_DBWS.create_service (
            wsdl_document_location => URIFACTORY.getURI(l_wsdl_url),
            service_name => l_service_qname);

            l_call := UTL_DBWS.create_call (
            service_handle => l_service,
            port_name => l_port_qname,
            operation_name => l_operation_qname);

            l_input_params(0) := ANYDATA.ConvertClob(document);
            l_input_params(1) := ANYDATA.ConvertVarchar2(docId);

            l_result := UTL_DBWS.invoke (
            call_handle => l_call,
            input_params => l_input_params);
            UTL_DBWS.release_call (call_handle => l_call);
            UTL_DBWS.release_service (service_handle => l_service);
            return l_result ;
            END;
            /

            Then i invoke create index:

            CREATE INDEX CORPUSINDEX ON CORPUS
            (DOCUMENT)
            INDEXTYPE IS MDSYS.SEMCONTEXT
            PARAMETERS('SEM_EXTR')
            NOPARALLEL;

            where CORPUS is my table, and Document is the COLUMN storing the documents.

            The problem now is that when the function is called, the code "ANYDATA.ConvertClob(document);" fails because it is impossibile to serialize a Clob object and send it to the webservice.
            Do you have any idea how i can proceed?

            Thank you
            • 3. Re: Semantic Indexing for Documents using RDFCTX_WS_EXTRACTOR
              442199
              Hello,

              I recommend using UTL_HTTP to write CLOB data as stream of data instead of using a UTL_DBWS and having to convert the data. In fact, this is exactly how the Calais extractor is implemented. Also, you may be able to implement your custom extractor with fewer steps by extending the rdfctx_ws_extractor instead of the rdfctx_extractor. The ws (web-service) extractor already has provisions to read and write to given web-service end points. So, your extractor type implementation would be something as follows (untested code).
              create or replace type cogito_extractor under rdfctx_ws_extractor (
              
                constructor function cogito_extractor (xsl_trans SYS.XMLTYPE) 
                  return self as result
              ); 
              / 
              
              create or replace type body cogito_extractor under rdfctx_ws_extractor (
              
                constructor function cogito_extractor (xsl_trans SYS.XMLTYPE) 
                  return self as result is 
                begin
                  self.extr_type := 'COGITO';
                  -- correct the  following end-point and soap action settings if necessary
                  self.ws_end_point := 'http://graimond.dev.ba2015.org:8888/ExtractInfoRDF/ExtracInfoRDFServiceSoapHttpPort'; 
                  self.ws_soap_act :=  '???/ExtracInfoRDFService'
                  -- this is the style sheet that operates on the full SOAP response. So you should identify the payload from
                  -- the SOAP response and further operate on it to generate RDF/XML (if it is not already RDF/XML). 
                  self.ws_xsltrans := xsl_trans; 
              
                  --- correct the SOAP envelope to include the correct message format 
                  --- Note the use of *$ora$doc* in the SOAP envelope as the placeholder for the document. 
                  self.ws_envelope :=
                   '<soap12:Envelope  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                                               xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
                                               xmlns:soap12="http://www.w3.org/2003/05/soap-envelope">
                      <soap12:Body>
                         <XYZ xmlns="http://webservices.web.cogitoexporter.edem.ccoraclebari.oracle.it/">
                            <content><![CDATA[$ora$doc]]></content>
                            </XYZ>
                     </soap12:Body>
                  </soap12:Envelope>';
                  return;
                end cogito_extractor; 
              ); 
              / 
              The reason the above extractor implementation should work is that the rdfctx_ws_extractor type already has the extractRdf method implemented (using the UTL_HTTP based web-service call outs, which also handle CLOBs). So, you should be able to configure any web-service extractor by simply setting the end-point, SOAP action, and the SOAP envelope.

              Even before creating the index, you can test your extractor type and see if it is generating desired output using the following command.
               
              
              select mdsys.sem_rdfctx.extract_rdfxml('This is the text that will be sent to the extractor',
                         cogito_extractor(sys.xmltype('<your style sheet goes here>')) from dual; 
              Hope this helps,
              -Aravind.
              1 person found this helpful