Hello, we have a little cluster with Rocks Cluster Distribution.
Intel Cluster Toolkit installed to shared filesystem, path's is ok, ssh pass-less access working.
I select 3 nodes: 1 headnode and 2 computational, and put into file mach 3 lines: headnode, node1, node2
on headnode i run:
mpirun -r ssh -machinefile mach -np 3 ./test.mpi
WARNING: Unable to read mpd.hosts or list of hosts isn't provided. MPI job will be run on the current machine only.
mpiexec: unable to start all procs; may have invalid machine names
remaining specified hosts:
Has anybody info so what is the problem?
Intel MPI Library for Linux, 64-bit applications, Version 3.2.2 Build 20090827
Copyright (C) 2003-2009 Intel Corporation. All rights reserved.
I try with replace -machinefile to -f :
mpirun -r ssh -f mach -np 3 ./test.mpi
there are not enough hosts on which to start all processes
I think that count of starting mpd processes increment automatically (local mpd process adds). How to avoid this?
The command 'hostname -s' returns the same string as in mach file on all nodes. However, the command 'hostname' returns node_name.domain_name on our head node, and node_name.local on other nodes - could this be the reason?
We want to use only mpirun command, because it can be implemented in our PBS(Torque) - mpirun command understands $PBS_NODEFILE variable. Also, the command 'mpirun -r ssh -np $proc ./test.mpi' (when $proc = cat $PBS_NODEFILE | wc -l) runs normally in the PBS script only if requested resources don't include head node - otherwise, it hangs up. We think it can be connected to the hostname problem. Could you also help us with this?