8 Replies Latest reply on Dec 16, 2011 2:18 AM by omarh-oracle - oracle

    SGE crash with error in qmaster message: got NULL element for JB_type

    905625
      We are currently running SGE 6.2u5p1 on CentOS release 5.3. The issue we see is that when we cancel, using "qdel -f", individual subtasks of an array job, launched with the command like "qsub -q prod_b.q -t 1-10 /opt/sge/examples/jobs/simple.sh", SGE occasionally crashes with the following error in the qmaster message log:

      <snip>
      12/15/2011 09:50:03|worker|ghostxen|E|warning: root forced the deletion of job-array task 12354.2
      12/15/2011 09:50:03|worker|ghostxen|E|warning: root forced the deletion of job-array task 12377.8
      12/15/2011 09:50:03|worker|ghostxen|E|warning: root forced the deletion of job-array task 12395.10
      12/15/2011 09:50:03|worker|ghostxen|E|warning: root forced the deletion of job-array task 12419.6
      12/15/2011 09:50:04|worker|ghostxen|E|warning: root forced the deletion of job-array task 12279.1
      12/15/2011 09:50:04|worker|ghostxen|E|warning: root forced the deletion of job-array task 12285.8
      12/15/2011 09:50:04|worker|ghostxen|E|warning: root forced the deletion of job-array task 12306.7
      12/15/2011 09:50:04|worker|ghostxen|E|warning: root forced the deletion of job-array task 12341.7
      12/15/2011 09:50:04|worker|ghostxen|E|warning: root forced the deletion of job-array task 12348.5
      12/15/2011 09:50:04|worker|ghostxen|E|warning: root forced the deletion of job-array task 12383.5
      12/15/2011 09:50:04|worker|ghostxen|E|warning: root forced the deletion of job-array task 12390.3
      12/15/2011 09:50:04|worker|ghostxen|E|warning: root forced the deletion of job-array task 12425.3
      12/15/2011 09:50:05|worker|ghostxen|E|can't update remote queue state (910) on queue "prod_b.q@ip-10-167-14-88"
      12/15/2011 09:50:05|worker|ghostxen|C|!!!!!!!!!! got NULL element for JB_type !!!!!!!!!!
      <snip>

      Any ideas? What other information do you need?