1 Reply Latest reply: Jun 10, 2013 12:53 PM by Nelson Maia RSS

    OSES 11g Crawl OUCM 11g

    881570
      We are trying to integrate OSES 11.1.2 with OUCM 11.1.1.5. We have done the following on OUCM side:

      1. UCM Snapshot completed
      2. Configured the SESCrawlerExport

      We can see that configdefault.xml file has been created with the feeds.

      Below is the log and error inside:


      19:18:05:438 INFO     main          =================== Crawling status ===================
      19:18:05:439 INFO     main          Crawling started at 8/10/11 7:18 PM
      19:18:05:841 INFO     main          WebCrawler.init: set system property: oracle.search.event.27000 = 0
      19:18:05:843 INFO     main          Secure Caching is ON
      19:18:05:908 INFO     main          URL manager connecting to Oracle...
      19:18:05:931 INFO     main          connected
      19:18:05:965 INFO     main          Time of last crawl is Wed Aug 10 16:22:58 GMT+05:30 2011
      19:18:05:989 INFO     main          Queue manager connecting to Oracle...
      19:18:06:008 INFO     main          connected
      19:18:06:323 INFO     main          Continue remaining tasks in the indexing pipeline left from the previous run...
      19:18:06:326 INFO     main          Done
      19:18:06:326 INFO     main          Invoking "oracle.search.plugin.stellent.StellentCrawlerManager"
      19:18:06:329 INFO     main          URL manager connecting to Oracle...
      19:18:06:352 INFO     main          connected
      19:18:06:354 INFO     main          Initializing crawler plug-in manager "oracle.search.plugin.stellent.StellentCrawlerManager"
      19:18:06:360 INFO     main          URIHandler initialized for the URI http://ucmhost:16200/cs/idcplg?IdcService=SES_CRAWLER_DOWNLOAD_CONFIG&source=default
      19:18:06:391 INFO     main          HTTP status code: 200
      19:18:06:470 INFO     main          URIHandler initialized for the URI http://ucmhost:16200/cs/idcplg?IdcService=SES_CRAWLER_DOWNLOAD_CONTROL&source=default
      19:18:06:480 INFO     main          HTTP status code: 200
      19:18:06:480 INFO     main          RSS SAX queue - init on http://ucmhost:16200/cs/idcplg?IdcService=SES_CRAWLER_DOWNLOAD_CONTROL&source=default
      19:18:06:480 INFO     main          Initialized error feed at /tmp/ses_xml_39670_scratch/idcplg.err
      19:18:06:486 INFO     main          Created thread to parse the feed: Thread-2
      19:18:06:486 INFO     main          Starting Thread-2
      19:18:06:487 ERROR     Thread-2          EQP-60303: Exiting saxthread due to errors
      19:18:06:487 ERROR     Thread-2     EQP-80330: Unrecognized QName <http://schemas.xmlsoap.org/soap/envelope/>:Envelope oracle.search.sdk.crawler.PluginException     oracle.search.plugin.rss.SAXThread:checkNamespace:200     oracle.search.plugin.rss.SAXThread:startElement:218     oracle.xml.parser.v2.NonValidatingParser:parseElement:1296     oracle.xml.parser.v2.NonValidatingParser:parseRootElement:340     oracle.xml.parser.v2.NonValidatingParser:parseDocument:307     oracle.xml.parser.v2.XMLParser:parse:212     oracle.xml.jaxp.JXSAXParser:parse:292     oracle.search.plugin.rss.SAXThread:run:159     java.lang.Thread:run:595
      19:18:06:487 ERROR     Thread-2     EQP-60305: Exception when parsing channel: EQP-80330: Unrecognized QName <http://schemas.xmlsoap.org/soap/envelope/>:Envelope. Verify that the feed conforms to the feed schema and there are no XML parsing errors in the feed. java.lang.Exception     oracle.search.plugin.rss.SAXThread:run:171     java.lang.Thread:run:595
      19:18:06:488 ERROR     Thread-2     caused by:EQP-80330: Unrecognized QName <http://schemas.xmlsoap.org/soap/envelope/>:Envelope
      19:18:06:488 ERROR     Thread-2          EQP-60305: Exception when parsing channel: EQP-60305: Exception when parsing channel: EQP-80330: Unrecognized QName <http://schemas.xmlsoap.org/soap/envelope/>:Envelope. Verify that the feed conforms to the feed schema and there are no XML parsing errors in the feed.. Verify that the feed conforms to the feed schema and there are no XML parsing errors in the feed.
      19:18:06:490 ERROR     Thread-2          EQP-60307: Error when processing item channel_error: EQP-60305: Exception when parsing channel: EQP-80330: Unrecognized QName <http://schemas.xmlsoap.org/soap/envelope/>:Envelope. Verify that the feed conforms to the feed schema and there are no XML parsing errors in the feed.
      Error Stack:
      java.lang.Exception: EQP-60305: Exception when parsing channel: EQP-80330: Unrecognized QName <http://schemas.xmlsoap.org/soap/envelope/>:Envelope. Verify that the feed conforms to the feed schema and there are no XML parsing errors in the feed.
           at oracle.search.plugin.rss.SAXThread.run(SAXThread.java:171)
           at java.lang.Thread.run(Thread.java:595)
      Caused by: EQG-30237: Crawler plug-in warning: EQP-80330: Unrecognized QName <http://schemas.xmlsoap.org/soap/envelope/>:Envelope
           at oracle.search.plugin.rss.SAXThread.checkNamespace(SAXThread.java:200)
           at oracle.search.plugin.rss.SAXThread.startElement(SAXThread.java:218)
           at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingParser.java:1296)
           at oracle.xml.parser.v2.NonValidatingParser.parseRootElement(NonValidatingParser.java:340)
           at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingParser.java:307)
           at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:212)
           at oracle.xml.jaxp.JXSAXParser.parse(JXSAXParser.java:292)
           at oracle.search.plugin.rss.SAXThread.run(SAXThread.java:159)
           ... 1 more
      . Verify that the details of the item such as URL, ACL, etc. are correct.
      19:18:06:490 INFO     main          EQP-80325: Empty control queue. Nothing to process.
      19:18:06:491 INFO     main          Loading document service pipeline : Default pipeline
      19:18:06:492 INFO     Thread-2          Waiting for items to be consumed
      19:18:06:492 INFO     Thread-2          Thread-2: All items have been consumed
      19:18:06:492 INFO     Thread-2          Unable to move the erroneous channel file idcplg to idcplg.prcsdErr
      19:18:06:493 INFO     Thread-2          posting status feed to http://ucmhost:16200/cs/idcplg?IdcService=SES_CRAWLER_STATUS&IsJava=1&source=default&StatusFeed=
      19:18:06:504 INFO     Thread-2          HTTP status code: 200
      19:18:06:504 INFO     Thread-2          idcplg.err uploaded successfully
      19:18:06:505 INFO     Thread-2          Finished processing RSS channel
      19:18:06:574 INFO     main          Loading document service manager "oracle.search.plugin.doc.extractor.DocumentSummarizerManager"...
      19:18:06:582 INFO     main          Stopword directory = /home/oracle/oracle/product/11.1.2.0.0/sesFinal/seshome/search/lib/plugins/doc/extractor/phrasestopwords
      19:18:06:588 INFO     main          Max number of terms = 20
      19:18:06:588 INFO     main          Max number of phrases = 10
      19:18:06:588 INFO     main          Max terms per phrase = 6
      19:18:06:589 INFO     main          Min term frequency = 3
      19:18:06:589 INFO     main          Min phrase frequency = 2
      19:18:06:589 INFO     main          Enable sentence extraction = false
      19:18:06:589 INFO     main          Max sentences = 3
      19:18:06:655 INFO     main          Initializing language detection module...
      19:18:06:655 INFO     main          Loading training data /home/oracle/oracle/product/11.1.2.0.0/sesFinal/seshome/search/data/training//danish.dat for language da
      19:18:06:721 INFO     main          Loading training data /home/oracle/oracle/product/11.1.2.0.0/sesFinal/seshome/search/data/training//dutch.dat for language nl
      19:18:06:733 INFO     main          Loading training data /home/oracle/oracle/product/11.1.2.0.0/sesFinal/seshome/search/data/training//english.dat for language en
      19:18:06:737 INFO     main          Loading training data /home/oracle/oracle/product/11.1.2.0.0/sesFinal/seshome/search/data/training//french.dat for language fr
      19:18:06:746 INFO     main          Loading training data /home/oracle/oracle/product/11.1.2.0.0/sesFinal/seshome/search/data/training//german.dat for language de
      19:18:06:752 INFO     main          Loading training data /home/oracle/oracle/product/11.1.2.0.0/sesFinal/seshome/search/data/training//italian.dat for language it
      19:18:06:758 INFO     main          Loading training data /home/oracle/oracle/product/11.1.2.0.0/sesFinal/seshome/search/data/training//portugue.dat for language pt
      19:18:06:763 INFO     main          Loading training data /home/oracle/oracle/product/11.1.2.0.0/sesFinal/seshome/search/data/training//spanish.dat for language es
      19:18:06:768 INFO     main          Done
      19:18:06:954 INFO     filter_1          Initializing crawler plug-in "Oracle Content Server crawler plug-in"
      19:18:06:954 INFO     crawler_3          Initializing crawler plug-in "Oracle Content Server crawler plug-in"
      19:18:06:954 INFO     filter_1          Crawler plug-in "Oracle Content Server crawler plug-in" crawl starts
      19:18:06:954 INFO     crawler_3          Crawler plug-in "Oracle Content Server crawler plug-in" crawl starts
      19:18:06:954 INFO     filter_1          No more items to process
      19:18:06:954 INFO     filter_0          Initializing crawler plug-in "Oracle Content Server crawler plug-in"
      19:18:06:955 INFO     filter_1          No more items. Crawler thread exiting.
      19:18:06:955 INFO     filter_0          Crawler plug-in "Oracle Content Server crawler plug-in" crawl starts
      19:18:06:955 INFO     filter_1          Crawler plug-in "Oracle Content Server crawler plug-in" crawl finishes
      19:18:06:955 INFO     crawler_3          No more items to process
      19:18:06:954 INFO     crawler_2          Initializing crawler plug-in "Oracle Content Server crawler plug-in"
      19:18:06:955 INFO     crawler_2          Crawler plug-in "Oracle Content Server crawler plug-in" crawl starts
      19:18:06:955 INFO     crawler_2          No more items to process
      19:18:06:955 INFO     crawler_2          No more items. Crawler thread exiting.
      19:18:06:955 INFO     crawler_3          No more items. Crawler thread exiting.
      19:18:06:955 INFO     crawler_2          Crawler plug-in "Oracle Content Server crawler plug-in" crawl finishes
      19:18:06:955 INFO     crawler_3          Crawler plug-in "Oracle Content Server crawler plug-in" crawl finishes
      19:18:06:954 INFO     crawler_4          Initializing crawler plug-in "Oracle Content Server crawler plug-in"
      19:18:06:955 INFO     filter_0          No more items to process
      19:18:06:956 INFO     filter_0          No more items. Crawler thread exiting.
      19:18:06:956 INFO     filter_0          Crawler plug-in "Oracle Content Server crawler plug-in" crawl finishes
      19:18:06:956 INFO     filter_0          Shut down document service agent "Default pipeline"
      19:18:06:956 INFO     crawler_4          Crawler plug-in "Oracle Content Server crawler plug-in" crawl starts
      19:18:06:956 INFO     crawler_4          No more items to process
      19:18:06:956 INFO     crawler_4          No more items. Crawler thread exiting.
      19:18:06:956 INFO     crawler_4          Crawler plug-in "Oracle Content Server crawler plug-in" crawl finishes
      19:18:06:956 INFO     crawler_4          Shut down crawler plug-in "Oracle Content Server crawler plug-in"
      19:18:06:957 INFO     crawler_4          Crawler thread stopping due to stop crawl command
      19:18:06:957 INFO     crawler_4          Shut down crawler plug-in "Oracle Content Server crawler plug-in"
      19:18:06:958 INFO     crawler_4          Crawler thread stopping due to stop crawl command
      19:18:06:958 INFO     crawler_4          Shut down crawler plug-in "Oracle Content Server crawler plug-in"
      19:18:06:958 INFO     crawler_4          Crawler thread stopping due to stop crawl command
      19:18:06:958 INFO     crawler_4          Shut down crawler plug-in "Oracle Content Server crawler plug-in"
      19:18:06:958 INFO     crawler_4          Crawler thread stopping due to stop crawl command
      19:18:06:958 INFO     crawler_4          Shut down crawler plug-in "Oracle Content Server crawler plug-in"
      19:18:06:958 INFO     crawler_4          Crawler thread stopping due to stop crawl command
      19:18:06:961 INFO     cache_0          Caching thread cache_0 returns without getting a file
      19:18:06:961 INFO     cache_0          Shutting down all caching threads...
      19:18:06:962 INFO     cache_1          Caching thread cache_1 returns without getting a file
      19:18:06:962 INFO     cache_2          Caching thread cache_2 returns without getting a file
      19:18:06:962 INFO     cache_0          Total number of documents cached = 0
      19:18:06:962 INFO     cache_0          Total data collected = 0 bytes
      19:18:06:962 INFO     cache_0          Indexing log file is "search_i1ds25.log" under oracle_home/ctx/log/
      19:18:06:967 INFO     cache_0          Indexing started at 8/10/11 7:18 PM
      19:18:06:967 INFO     cache_0          Task ID = 94
      19:18:07:069 INFO     cache_0          Indexing completed at 8/10/11 7:18 PM
      19:18:12:072 INFO     cache_0          Done
      19:18:12:072 INFO     main          Shutting down crawler...
      19:18:12:072 INFO     main          Shut down crawler plug-in "oracle.search.plugin.stellent.StellentCrawlerManager"
      19:18:12:112 INFO     monitor          Remote command "reportstatistics" received, argument = "quit"
      19:18:12:112 INFO     monitor          Executing remote command "reportstatistics"
      19:18:12:113 INFO     monitor          Send back remote command execution result
      19:18:12:140 INFO     main          Shutting down all crawling threads...
      19:18:12:142 INFO     main          Done

      Any help or guidance would be appreciated.

      Additionally, we tried with File and Web sources, it worked fine and we were able to search which means to me that there should not be any issue with OSES installation.

      Thanks