I am not sure if my problem is rather unique but any help will be very appreciated.
We have a small GE cluster whose each execution host has one 1GbE and one 10GbE NICs. Each NIC has its own IP and name, for example:
The qmaster has 1GbE ONLY. Our original plan was install and setup GE via 1GbE network and all the management, monitoring traffic all in 1GbE network. Then we configured the 10GbE NIC on each execution host for running MPI jobs.
Now the problem is, we added the 10GbE hostnames into a GE host list then created a new queue. qmaster simply treats all the 10GbE hostnames as new hosts separated from the 1GbE ones. Since qmaster does not have 10GbE, thus not in the MPI network.
Therefore the entire MPI queue is down because the qmaster cannot contact the hosts via 10GbE network.
I am wondering if any config in GE can overcome this issue?