This content has been marked as final. Show 14 replies
Did you do this using large volumes of documents at a time?
In this case, for the failed conversion:
When jobs created in the refineryjobs table are there longer than some configurable time without any result (sorry, I do not know the actual default value, but it was a few hours I think) the records are just cleaned up and the document stays in GenWWW forever. Easiest way to fix this for LARGE amounts of documents is by creating IdcCommand script with RESUBMIT_FOR_CONVERSION calls, for smaller volumes you can use the repository manager.
Note: this does not explain why they go through the conversion in the first place ... maybe the calculated dConversion value gets passed along in the archive?
Thanks for the reply, but no, the volume is pretty small: I would say a max of a few hundred per day.
It seems counter-intuitive, but I'm wondering if perhaps it's really more of an indexing issue than a conversion issue.
I've seen documents getting pulled in via Archiver in 11g for some reason during the import/indexing process not being able to write out the exported text file that the indexing process grabbed from the document - you get some funky message in the system audit trace about the text export having timed out.
Once the text export bombed for one document, none of the other documents in the archive would process. I set "TextExtractorTimeoutInSec=60" in config.cfg (by default it's 3 seconds I think), and the docs finished processing cleanly.
Worth a shot. YMMV.
Not likely an indexer issue. The GENWWW state is valid while the document is going through conversion. Once conversion is complete it is set to Done. Only then is it ready for indexing. You can prove this by switching off the indexer auto update cycle and switching on systemdatabase and services verbose tracing. You can actually trace the inserts of GENWWW and DONE into the database even though no indexing is possible at that point.
So in this case, the likelihood is that the web viewable file isn't being created. If you're using pass through, then this is a file copy which is failing for some reason. If not, you'll need to look at the IBR logs to find out what's going on.
Hope this helps, and is of interest.
Frank Abela wrote:Not likely an indexer issue. The GENWWW state is valid while the document is going through conversion. Once conversion is complete it is set to Done. Only then is it ready for indexing. You can prove this by switching off the indexer auto update cycle and switching on systemdatabase and services verbose tracing. You can actually trace the inserts of GENWWW and DONE into the database even though no indexing is possible at that point.With all due respect, I'm not trying to sound like a jerk. But it's likely an indexer issue - here's why:
I'd agree with you if we were simply talking about normal checkins and updates with an attached IBR instance. We aren't talking about the normal here.
- You have documents being ingested via archiver, with a webviewable document already present in the archive, so IBR shouldn't be the issue - you aren't "creating" a webviewable in this case. Unless "AlwaysResubmit=1" is being set somewhere, these documents have no need to go back to IBR - plus according to the OP (Andrew, I think), IBR is not even enabled in the target environment.
stellentpmp wroteOnce I resubmit it (via Repository Manager or doc info page), the conversion succeeds about 99% of the timeIf IBR is broke, doesn't it seem like 99% is a pretty high success rate for a broken system for simply clicking a button - and no IBR is present to do the conversions to boot? (How can IBR be broke in an environment where IBR isn't even present?)
- As I mentioned in my previous post, I saw ALMOST the EXACT same behavior less than a month ago when archiving documents into a system (documents archived had webviewables with no real need for reconversion). The archive imported, but only about 20% of the content indexed. Resubmitting individual items failed sporadically. Forcing collection rebuilds to index everything would start, but stop for no reason. When I looked at the system console output, there would always be a message ("text export timed out") on different items (never the same ones) and once this error was thrown, the entire indexing process stopped cold with no recovery. Increasing the timeout allowed the text export process to finish, and the collection was successfully rebuilt.
Now the reason for the need to increase the timeout is unknown, but that may be due to file system access and the speed of the access to the file system where the text export file is being written, which is probably the "~temp" directory under "vault".
Frank Abela wrote: If you're using pass through, then this is a file copy which is failing for some reason.Again, this would be a function of file system access, which fits with the text export failure scenario mentioned above. It would interesting to see if the behavior is worse at a given time of the day, or is it truly random.
I only included the IBR discussion just in case anybody found this thread. The OP identified that they are using pass through, but he also says that the docs ARE being put through conversion on import. In that case, the 'conversion process' is to merely copy the file to the web viewable location and this is failing.
It cannot be an indexer issue, because while the document is in GENWWW state, the indexer isn't even involved in the process, irrespective of whether this is a manual ingestion or an archiver process. The indexer doesn't even know about the document until it's status is set to Done.
Be that as it may, it should be simple enough for the OP to verify if the web viewables get created for these stuck documents.
Of course, my argument doesn't explain how you saw GENWWW documents failing on text export - that simply shouldn't be possible. Hands up, can't explain that!
Nice to see healthy discussion here. :)
William Phelps wrote:It would interesting to see if the behavior is worse at a given time of the day, or is it truly randomFrom what I've seen, it's been pretty random. There haven't been any failures today; but then again, it is Friday :) When you said "may be due to file system access and the speed of the access to the file system", it came to me that MAYBE these failures coincide with our "less than suitable" (for lack of a better term) infrastructure/network. So I wouldn't be surprised if the "conversion" just fails due to a timeout. I plan to test the "TextExtractorTimeoutInSec" config over the weekend and will report back my findings.
Frank Abela wrote:Be that as it may, it should be simple enough for the OP to verify if the web viewables get created for these stuck documents.What do you mean by "created"? Since the documents are stuck in GENWWW, there will never be a web viewable. If you're asking about after resubmitting the documents, then yes, the web viewables get created about 99% of the time.
The whole point of my post was to determine WHY these docs are even going through conversion in the first place; archiver should just be dumping/placing the archived web content as the web viewable, but that doesn't appear to be the case. I'm not sure if AlwaysResubmit=1 is set anywhere because I inherited this fubar system, so I really don't know the history and/or design implementations. I'll see what I can dig up. What I do know is that I'm tempted to just rebuild a new environment from scratch if I can't figure this out :)
Thanks for everyone's input!
I was wondering if you'd had any luck trying to see if this was related to timeouts. In the last week or so, we've gotten several reports that files are getting stuck in GenWWW and are never made web viewable. Given the time frame involved, and given that we've made no changes to our infrastructure, this does seem to be a likely culprit. I also stumbled on this thread, which may shed some light:
All content except for PDF are stuck in GenWWW - IBR IS converting them..
"So according to some of the Stellant guys that work with Oracle support
on rare occasions job ID's can get mistakenly swapped in the middle of a
conversion job causing UCM to lose sight of the refineryjobs work table.
The temporary solution we used to resubmit all of our conversion jobs was
to truncate this table and resubmit files one at a time (at first - just
to keep an eye on them) and we successfully cleaned up several hundred
documents stuck in GenWWW."
If you've had any luck, I'd love to hear about it!
Thanks in advance!
This is very interesting also. See page 10-9:
I decided against testing the timeout configuration variable for the time being because I wanted to perform a complete collection rebuild first to see if that would solve the problem. It did not, so I actually plan to add the configuration variable tonight or later this week.
Wish me luck, and I will report back my findings!
Good luck indeed!
Meanwhile, I did determine that this does not solve the issue:
My sys admins indicate that the load on the servers is acceptable, but might need to be increased. Hoping that will make a difference.
Turns out our guys upgraded from Visual C++ 2005 to 2010. Documentation says:
OutsideIn Technology requires the Visual C++ libraries included in the Visual C++ Redistributable Package for a Windows operating system. Three versions of this package (x86, x64, and IA64) are available from the Microsoft Download Center at
Search for and download the version of the package that corresponds to the version of your Windows operating system:
The required version of each of these downloads is the Microsoft Visual C++ 2005 SP1 Redistributable Package. The redistributable module that Outside In requires is msvcr80.dll.
The WinNativeConverter has some vb.Net code, so it also requires Microsoft .NET Framework 3.5 Service Pack 1.
We'll be trying to reinstall the latest supported version of Visual C++ SP1 Redistributable next.
Mystery solved on our end. There was a missing temp directory, and so converted files had no where to go. No idea how the directory disappeared, but it did. I know I saw a post on this somewhere, but I haven't located it yet. Something to keep in mind in case you aren't able to solve your issues. Good luck!
well unfortunately, setting TextExtractorTimeoutInSec did not work for me. Back to the drawing board!