- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I'm trying to get the intel MPI library to work on a cluster with 16 nodes. I'm following the instructions as outlined in the "Getting_Started.pdf" file under the "Setting up MPD Daemons" section. I'm at the point where I am supposed to start the MPD daemon with mpdboot. I use the command:
mpdboot -v -n 16 -r ssh -f .mpd.hosts
Things start to boot properly, but then I get an error message saying that the syntax of the mpdboot.py file is incorrect:
mpdboot_rank_0 (mpdboot 256): starting local mpd on cluster2
mpdboot_rank_0 (mpdboot 308): starting remote mpd on c2n2
mpdboot_rank_0 (mpdboot 322): starting remote mpd on c2n3
File "/opt/intel_mpi_10/bin/mpdboot.py", line 84
argidx += 2
^
SyntaxError: invalid syntax
File "/opt/intel_mpi_10/bin/mpdboot.py", line 84
argidx += 2
^
SyntaxError: invalid syntax
Does anyone know what's wrong? Also, is there any way to use LAM (as in lamboot, lamstart, etc) instead of MPD?
Thanks,
Alexis
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alexis,
What happens if you just type 'mpdboot' on the local host? Do you get the same error or does 'mpdtrace' show an mpd daemon running?
Best regards,
Henry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply. It seems that the problem is due to the fact that on the node 0 of my cluster, python2.2 is properly installed, whereas on all of the other nodes, I just have python 1.5. I am going to try to install an updated python on the other nodes to fix the problem. Is there any way, though, that I can get the other nodes to use the python on the main machine?
On a seperate note, is there any way to get lamboot to work with intel mpi? I can boot all my nodes with lamboot, and run mpi with mpirun when I test with a simple c driver program. However, when I try to run something that I compiled with mpiifort with mpirun, I get an error message that there is no mpd running on this host. Is there a flag that I can use (when compiling maybe?) so that the code generated by mpiifort looks for the lam topology?
I hope that this makes some sense, I really don't know anything yet about writing/running parallel code!
Thanks,
Alexis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I finally got python working on all my nodes, and things seem to be working fine now, except for one thing. The cluster is composed of 16 dual processor machines, so in effect there are 32 processor available. Each node has a name like c2cX, where the X is between 2 and 15. The first node is called cluster2. So here are the names:
cluster2
c2n2
c2n3
c2n4
...
c2n15
The problem is that each node had 2 processors, but I don't know how to boot both of them up using mpd. With lam, I just repeat the names of the each node twice in the host list like so:
cluster2
cluster2
c2n2
c2n2
etc.
and when I boot using this file with lamboot, it knows to use two cpus for each node. However, when I try to do the equivalent with mpdboot, it tells me that "there are not enough hosts on which to start all processes".
So my question is: how would I go about booting both cpu's on each node? Similarly, these are Xeon boxes with hyperthreading. How would I boot 2 virtual cpus * 2 real cpus per node (thus having 4 effective cpus per node?).
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alexis -
What results are you getting with mpdtrace after you boot the daemons on the system? Do all of the nodes get included?
You should only need to have one daemon on each node of the cluster and the daemon should be able to determine that there are two processors on each node. However, if you're not getting all 16 nodes covered, you might be seeing that message.
If you are getting all daemons started, look into the Intel MPI documentation on how to target nodes with a specific number of processes and the application that should be run on those processes. Try starting your job that way. Use a configuration file, otherwise you'll have a lot of command line typing to launch processes on 16 nodes.
If you're still getting the error message, you should report your problem to Intel Premier Support.
--clay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a similar problem. I install Intel_MPI on a dual processor dual core machine. In the other hand my machine has 4 processor. However, when I try do with mpdboot, it tells me that "there are not enough hosts on which to start all processes"
How would I go about booting all cpu's on my machine?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply.
I try to run example test.f in test directory of mpi.
My mpd.hosts is
localhost
localhost
localhost
localhost
but when I try boot using this file with mpdboot,"mpdboot -n 4", it tells me that:
"totalnum=4 numhosts=1"
"there are not enough hosts on which to start all processes"
and when I run "mpirun -n 2 a.out" it tells me:
totalnum=2 numhosts=1
there are not enough hosts on which to start all processes
What is wrong?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Apparently, you have just the one node visible, so only one copy of mpd could be started. Why not try to get the mpdboot right first; then mpiexec should work.
mpirun combines mpdboot and mpiexec. It may be confused by listing a node more than once in mpd.hosts.
Can you ping localhost ? On one of my machines, the only working entry in mpd.hosts is the current IP address, so I am at the mercy of the people running the LAN, as well as requiring a working ethernet driver. An active ethernet connection appears to be required, even if you don't try to simulate multiple nodes on a single node.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have one node but my node is a dual processor dual core machine, however I have 4 processor on one node.
I can ping and ssh localhost. When I run "mpdboot -n 1" it correctly work, but when I use an other number for example "mpdboot -n 2 or 3" it tells me that:
"there are not enough hosts on which to start all processes"
Is it possible to simulate multiple node on single node with Intel_MPI? How?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mpdboot, start mpd daemons on the specified number of nodes by providing a list of node names in .
The mpd daemons are started using the rsh command by default. If the rsh connectivity is not enabled, use the r ssh option to switch over to ssh. Make sure that all nodes in the cluster can connect to each other via the rsh command without a password or, if the r ssh option is used, via the ssh command without a password.
-1 Remove the restriction of starting only one mpd per machine.
I finally solve my problem by flowing command:
mpdboot -n -r ssh -1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried the above and received the following.
[admin@localhost ready_GNU]$ mpdboot -n -r ssh -1 -verbose
Traceback (most recent call last):
File "<stdin>", line 1068, in <module>
File "<stdin>", line 386, in mpdboot
ValueError: invalid literal for int() with base 10: '-r'
I am using the following version
[admin@localhost ready_GNU]$ mpdboot --version
Intel(R) MPI Library for Linux, 64-bit applications, Version 4.1 Build 20120831
Copyright (C) 2003-2012 Intel Corporation. All rights reserved.
Any help appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Maurice,
You should not need to use mpdboot anymore. Try just using mpirun, this will use the Hydra process manager, which does not need a daemon running ahead of time.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page