I am trying to understand how SGE allocates instructions across multiple processing cores when I submit a single executable to a job queue. This becomes a significant issue when I am comparing the advantages to manually implementing message passing versus letting SGE automatically parallelize an executable that may have dependencies.
For example, suppose we submit a small program that calculates the sum of adding the numbers 1 through n, where the code itself is written for a serial environment (n-1 steps). Say we have access to n/2 processing units. Theoretically, we could finish the computation in log_2(n) steps (ignore other time factors). If I were to submit this serial code and request n/2 processes with qsub, how will the instructions be allocated amongst the processors?
Does anyone have any experience with this?
P.S. I would test this myself, but my system administrators have to rebuild the entire system to give me this functionality for the platform I am interested in (long story).