This content has been marked as final. Show 4 replies
the error message is received eg when the qmaster is not running. I think your qmaster is running.
The qsub is not able to communicate with the qmaster. This looks like a setup problem, eg hostname resolving or another could be the environment is not correct.
Do call or source the settings file before execing the qsub?
Could you invoke a "env" command to see, if all necessary settings are done?
On the other hand, the qsub already knows you master host and the port.
Does the qsub work outside of python?
It appears as if the environment is preserved. Calling 'env' in the subprocess gives basically identical output to the normal environment. For example, SGE_ROOT points to the right location, as does PE_HOSTFILE etc. As you noted, it looks as if qsub can source settings correctly and is trying to reach the right host, but for some reason it's not getting a response from qmaster.
The qsub command works fine outside of python. It also works fine inside python under most circumstances. The problem only seems to appear when using the 'multiprocessing' module (http://docs.python.org/library/multiprocessing.html) to create the subprocesses. It might be useful to know if anyone else is having this problem, or if it is unique to our SGE setup - I can post a short script here that can be used to test it.
Any assistance/direction would be appreciated!
Edited by: Chris Davoren on Apr 13, 2011 5:07 PM
Chris,1 person found this helpful
it looks like the multiprocessing module does not pass down the environment or anyhow influences the communication.
You can turn on the communication layer debugging, to find out more.
Setting the: SGE_COMMLIB_DEBUG=3
in you environment before executing the qsub command may help to see more, if the communication has problems.
That sounds like a great idea. I will test it as soon as I get the opportunity - hopefully it will point me in the right direction. Thank you.