The qmaster has 1GbE ONLY. Our original plan was install and setup GE via 1GbE network and all the management, monitoring traffic all in 1GbE network. Then we configured the 10GbE NIC on each execution host for running MPI jobs.
Now the problem is, we added the 10GbE hostnames into a GE host list then created a new queue. qmaster simply treats all the 10GbE hostnames as new hosts separated from the 1GbE ones. Since qmaster does not have 10GbE, thus not in the MPI network.
Therefore the entire MPI queue is down because the qmaster cannot contact the hosts via 10GbE network.
I am wondering if any config in GE can overcome this issue?