We have a cluster of machines, each with 4 GPUs. Each job should be able to ask for 1-4 GPUs. Here's the catch: I would like the SGE to tell each job which GPU(s) it should take. Unlike the CPU, a GPU works best if only one process accesses it at a time. So I would like to have:
Job #1 GPU: 0, 1, 3
Job #2 GPU: 2
Job #4 wait until 1-4 GPUs are avaliable
The problem I've run into, is that the SGE will let me create a GPU resource with 4 units on each node, but it won't explicitly tell a job which GPU to use (only that it gets 1, or 3, or whatever).
I thought of creating 4 resources (gpu0, gpu1, gpu2, gpu3), but am not sure if the -l flag will take a glob pattern, and can't figure out how the SGE would tell the job which gpu resources it received. Any ideas?