I am trying to "tune" a 16 node cluster with mpitune. My application runs on this segment of the cluster fine (see a different thread for a problem where other nodes with non ib0 cards do not participate in the cluster ... that's not the problem here).
The debug information from mpitune show that several of the machines get "Skiped" [sic...] during some sort of DNS resolution step.
At first I figured that the process just didn't recognize my host file (which, incidentally, does not contain head-n2), but this looks like more of a problem with resolving the names themselves. Almost like cluster-n1 and cluster-n11 are treated as duplicates and get skipped. The IP addresses definitely resolve to different numbers.
Is there another reason these nodes may get skipped?
Thank you Dmitry, this change helped quite a bit and I was able to see my complete cluster with the --skip-check-hosts option. I'm still not getting good results but at least I'm getting farther in the process, I'll post a separate thread to address the next issue though as it is not related to this one.