I work for a software development company. We have developed a CRM package. The basic architecture of our application is as follows:
This app contains 12 partitions, including several replicatable database connection partitions. The majority of our policies exist in partition 1.
This app serves as a gateway to our server component that clients can use to access it.
We have a number of adapters including flatfile based adapters, XML web services, etc. All of these exist to support one UI for our application.
Recently our Gateway has begun receiving DistributedAccessExceptions from partition one of our Server Component. The Gateway is designed to handle this by shutting itself offline. The problem is, we haven't been able to figure out what is causing the DistributedAccessException, and believe me.. we've been trying for a while.
The actual exception that we receive is:
Task ####: CM Keepalive terminating unresponsive connection for hose #### to location Internet Location - Host: Port Number: #### Dot: ###.##.##.###)
According to what I could find in the Sun Support site this is typically caused "If a client computer has recently crashed..", but in our case there are no client connected to the app server or have been since the app was brought online (I verified this several times).
Does anyone else know of a possible cause to this? I'd greatly appreciate ANY advice or knowledge any else has on this!
This error doesn't usually result in a backtrace. Is there a way to force that? Here is an exact example of what we see:
Task 4061: CM Keepalive terminating unresponsive connection for hose 1396 to location Internet Location - Host: Port Number: 1174 Dot: 172.17.40.232
aud Wed Jan 21 09:03:31 : Shutting down partition as requested.
I think the 'Attempt to send to a partition (F92A9860-B089-11D7-800D-8B9CD7FEAA77:0x209b) which has no locations associated with its shell partition.' is caused because one of ur replicates got shutdown, but Forte did not remove the entry in the naming serivce. Look to see if the number of ftexecs on the machine is the same as the number of paritions(including nones) on the e-console. How long has this bin happening ? Is it recent?
The Keepalive error may caused by irresponsive Forté partition due to Forté threading on UNIX. Check v$session_longops on Oracle.
These traces are Forté specific.
Add the flags to the shell script that starts the partition or alternatively there is a monitoring app called eConsole where you can see Foré nodes and Forté partitions running on those nodes. Use it to get to the partition that fails and add the log flags. Please RTFM (F = Forté) on how to do that.
trc:lo:25:* is not to be used as this will raise many errors that are in fact not real errors.
Given the little Forté knowledge you seem to have you'd better get your manager to hire a Forté guy.