- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, we have a little cluster with Rocks Cluster Distribution.
Intel Cluster Toolkit installed to shared filesystem, path's is ok, ssh pass-less access working.
I select 3 nodes: 1 headnode and 2 computational, and put into file mach 3 lines: headnode, node1, node2
on headnode i run:
mpirun -r ssh -machinefile mach -np 3 ./test.mpi
WARNING: Unable to read mpd.hosts or list of hosts isn't provided. MPI job will be run on the current machine only.
mpiexec: unable to start all procs; may have invalid machine names
remaining specified hosts:
Has anybody info so what is the problem?
mpiexec -V
Intel MPI Library for Linux, 64-bit applications, Version 3.2.2 Build 20090827
Copyright (C) 2003-2009 Intel Corporation. All rights reserved.
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mpirun is a utility which runs mpdboot after that mpiexec. So, options for mpdboot comes first and after that options for mpixec. '-machinefile' is an option for mpiexec.
Could you try to change '-machinefile' to '-f'? Does it work?
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I try with replace -machinefile to -f :
mpirun -r ssh -f mach -np 3 ./test.mpi
totalnum=4 numhosts=3
there are not enough hosts on which to start all processes
I think that count of starting mpd processes increment automatically (local mpd process adds). How to avoid this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can use these commands:
1. mpdboot -r ssh -f mach -n 3 - will start an mpd ring on 3 nodes including local host.
2. mpiexec -nolocal -n 3 ./test.mpi - will start your application on node1 and node2
3. mpdcleanup
Pay attention that '-n' for mpdboot and for mpiexec has different meaning.
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The command 'hostname -s' returns the same string as in mach file on all nodes. However, the command 'hostname' returns node_name.domain_name on our head node, and node_name.local on other nodes - could this be the reason?
We want to use only mpirun command, because it can be implemented in our PBS(Torque) - mpirun command understands $PBS_NODEFILE variable. Also, the command 'mpirun -r ssh -np $proc ./test.mpi' (when $proc = cat $PBS_NODEFILE | wc -l) runs normally in the PBS script only if requested resources don't include head node - otherwise, it hangs up. We think it can be connected to the hostname problem. Could you also help us with this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you don't mind you could updgrade you Intel MPI Library to version 4.0 update 1 and use mpiexec.hydra instead of mpirun. It should be optimal for your purposes. Just run:
mpiexec.hydra -rmk pbs ./your_application
and this new process manager will read needed information from PBS' environment.
4.0 update 1 will be installed in another directory so can use either 4.0.1 or 3.2.2
Regards!
Dmitry
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page