I am using Oracle Tuxedo , Version 10.3.0.0 , 64 Bit, Patch Level (none) on AIX 6.1 running on Power 7. Application where in we have one Tuxedo Queue Space (Group Level) and there are queues in that. Queue are logically divided as per requirement so each queue have their TMQFORWARD configured to execute different services. Number of instance of TMQFORWARD per queue are more than one.
I use to get following error message in ULOG on regular interval.
000420.uaix4072!TMQFORWARD.19398676.1.0: gtrid x0 x4ff1a015 x4cbd8: Q_CAT:1447: WARN: [Semaphore appears stuck - currently held by 10223792]
000420.uaix4072!TMQFORWARD.19398676.1.0: gtrid x0 x4ff1a015 x4cbd8: : additional deadlock diagnostic (-2/0/781/2/2/-1/-1/10223792)
Where as the i do not get any message like this with same application running on Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 042 on HP Itanium HP-UX B.11.31.
What could be probable cause of this ? Is this relates to Tuxedo patch level or because of Tuxedo configuration.
There is no obvious patch on Tuxedo 10gR3 from 000=>042 which would have made a difference.
Your best bet is to open an SR (provide ulog, ubb, machine architecture, ps -ef, ipcs -qa, CPU usage info,truss info, and environment variables)
and have Oracle Tuxedo support review what is occurring.
The machine architecture (CPUs, memory , ipc resources) number of Tuxedo servers, interaction with other servers, and the load can contribute to Q_CAT:1447.
Here is the documentation for it:
1447 WARN: [Semaphore appears stuck - currently held by pid]
While trying to lock a portion of the queue space (using a user-level semaphore), the process is unable to get the lock for a long period.
This WARNING message is from qmlock() function, which locks any semaphore used by the queue space. The process identifier printer in this WARNING message
should give you some indication of which process is trying to lock the semaphore.
If the process is hung it must be stopped and the IPC resources must be removed using qmadmin ipcrm command
1) The ULOG for other errrors.
2) Check the IPC resources (e.g. ipcs -qa) to see what process 10223792 is doing and if it is in a deadlock with another process.
3) Check for processes using very high CPU %. Truss that process to see what it is executing(truss 10223792 if it is not the high CPU process)
There is an environment variable which may help - TM_QM_NAPTIME
Here are notes from Doc ID 976199.:
This variable "TM_QM_NAPTIME" is specific to the queue access management to avoid contention; this allows time between retry of access to a semaphore.
TM_QM_NAPTIME - Time in nano seconds a process takes a break between attempts to acquire lock. A value of 5000 is a good place to start.