4 Replies Latest reply on Apr 13, 2011 9:40 PM by 854888

    qsub and python multiprocessing

    854888
      Hi all,

      I seem to be having trouble calling qsub from a inside a newly spawned Process (from the python 2.6 multiprocessing module). It fails consistently with the error:

      Unable to run job: unable to contact qmaster using port 5360 on host "[our head node name]".

      As far as I can tell, the listed port and hostname are correct. Could anyone shed some light on how qsub's ability to contact the master might be compromised in this way?

      Invoking qsub via os.system() and subprocess.popen() doesn't appear to have problems.

      Cheers,
      Chris.
        • 1. Re: qsub and python multiprocessing
          829499
          Hi Chris,

          the error message is received eg when the qmaster is not running. I think your qmaster is running.
          The qsub is not able to communicate with the qmaster. This looks like a setup problem, eg hostname resolving or another could be the environment is not correct.
          Do call or source the settings file before execing the qsub?
          Could you invoke a "env" command to see, if all necessary settings are done?
          On the other hand, the qsub already knows you master host and the port.

          Does the qsub work outside of python?

          Regards,
          Marco
          • 2. Re: qsub and python multiprocessing
            854888
            Hi dom,

            It appears as if the environment is preserved. Calling 'env' in the subprocess gives basically identical output to the normal environment. For example, SGE_ROOT points to the right location, as does PE_HOSTFILE etc. As you noted, it looks as if qsub can source settings correctly and is trying to reach the right host, but for some reason it's not getting a response from qmaster.

            The qsub command works fine outside of python. It also works fine inside python under most circumstances. The problem only seems to appear when using the 'multiprocessing' module (http://docs.python.org/library/multiprocessing.html) to create the subprocesses. It might be useful to know if anyone else is having this problem, or if it is unique to our SGE setup - I can post a short script here that can be used to test it.

            Any assistance/direction would be appreciated!

            Cheers,
            Chris.

            Edited by: Chris Davoren on Apr 13, 2011 5:07 PM
            • 3. Re: qsub and python multiprocessing
              829499
              Chris,

              it looks like the multiprocessing module does not pass down the environment or anyhow influences the communication.
              You can turn on the communication layer debugging, to find out more.

              Setting the: SGE_COMMLIB_DEBUG=3
              in you environment before executing the qsub command may help to see more, if the communication has problems.

              Regards,
              Marco
              1 person found this helpful
              • 4. Re: qsub and python multiprocessing
                854888
                That sounds like a great idea. I will test it as soon as I get the opportunity - hopefully it will point me in the right direction. Thank you.