Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Seifer_Lin
Beginner
339 Views

MPICH_PORT_RANGE for intel mpi ?

Hi all:
Our software product now use Intel MPI library for parallel computing. Before that, we use MPICH2.
For Windows, you need to add mpiexec, smpd, and the MPI apps (the app which calls MPI functions) into the firewall exception list to make the parallel computing work properly.
For MPICH2, by the following command you are able to use port 50000~51000 for MPI apps.
mpiexec -env MPICH_PORT_RANGE 50000:51000 MPIApp.exe
Therefore, you can just open the port 50000:51000 of the firewall instead of creating lots of exception items in the list (especially if there're lots of MPI apps in your software product).
My question is : Are there any MPICH_PORT_RANGE equivalent parameters for Intel MPI ?
I have used intel MPI with [ -genv MPICH_PORT_RANGE ] for the following simple code, and MPI_Barrier() never returns.
#include "mpi.h"
#include
#include
int main(int argc, char **argv)
{
int cpuid = 0;
int ncpu = 0;
MPI_Init(&argc, &argc)
MPI_Comm_rank(MPI_COMM_WORLD, &cpuid);
MPI_Comm_size(MPI_COMM_WORLD, &ncpu);
printf("Before barrier\\n"); fflush(stdout);
MPI_Barrier(MPI_COMM_WORLD);
printf("After barrier\\n"); fflush(stdout);
MPI_Finalize();
return 0;
}
Thanks very much!
0 Kudos
7 Replies
Dmitry_K_Intel2
Employee
339 Views

Hi Seifer,

Unfortunately you cannot use MPICH_PORT_RANGE.
The firewall needs to let through the socket traffic from both the smpd.exe AND the program itself.
But, you can limit port range used by smpd.

c:\smpd stop
c:\set SMPD_PORT_RANGE=50000:51000
c:\smpd

smpd will use ports from the range (50000-51000) and I hope that this will solve you problem.

Regards!
Dmitry
Seifer_Lin
Beginner
339 Views

Hi Dmitry:
Thanks for your help. But after doing the steps, smpd still uses random ports (shown by TcpView of Windows).
By the way, we also bought Intel MPI for Linux.
I have to machines installed by CentOS 5.5 32bit, and I do the followings for testing.
(1) Add
-A INPUT -p tcp -m tcp --dport 10000:11000 -j ACCEPT
into /etc/sysconfig/iptables on both machines.
(2) Add
MPD_PORT_RANGE=10000:11000
into ~/.mpd.conf
(3)Executing
mpdboot.py -n 2 -f ~/mpdhost.txt -r rsh
The contents of mpdhost.txt:
192.168.120.162
192.168.120.163
mpdboot seems OK. From netstat, I get
tcp 0 0 0.0.0.0:10001 0.0.0.0:* LISTEN 3505/python
tcp 0 0 0.0.0.0:10002 0.0.0.0:* LISTEN 3505/python
tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 6973/python
tcp 0 0 192.168.120.162:10000 192.168.120.163:42694 ESTABLISHED 6973/python
tcp 0 0 192.168.120.162:10000 192.168.120.163:42695 ESTABLISHED 6973/python
After executing mpdtrace.py, I get
192.168.120.162
192.168.120.163
(4)Executing the MPIApp.out
mpiexec.py -machinefile machine.txt -n 2 MPIApp.out
The contents of machine.txt:
192.168.120.162
192.168.120.163
And I get
Assertion failed in file ../../socksm.c at line 2577: (it_plfd->revents & 0x008) == 0
internal ABORT - process 0
rank 0 in job 1 192.168.120.162_10000 caused collective abort of all ranks
exit status of rank 0: return code 1
(5)Do step (4) again after stopping the iptables
The MPIApp.out runs just fine....
For Intel MPI (Windows), we may tell our customers just put mpiexec, smpd, MPIApps into the Windows Firewall Exception List.
For Intel MPI (Linux), is there any way to set the port range used by MPIApp ?
Thanks very much.
Regards,
Seifer
Seifer_Lin
Beginner
339 Views

Hi Dmitry:

I do the step (4) of my last post again by the new commandline with MPICH_PORT_RANGE parameter.
mpiexec.py -genv MPICH_PORT_RANGE 10000:11000 -machinefile machine.txt -n 2 MPIApp.out
And everything works fine. ^^
Therefore,
Intel MPI for Windows --> Can't use MPICH_PORT_RANGE
Intel MPI for Linux --> MPICH_PORT_RANGE works fine
It will be appreciated if Intel MPI for Windows provides the way to limit the port range of MPIApps.
regards,
Seifer
Dmitry_K_Intel2
Employee
339 Views

Hi Seifer,

Thank you for the update. I was just trying to reproduce you problem...

Yeah, on Linux MPICH_PORT_RANGE IS supported. And it should be sopprted on Windows as well, but there is some either error or missunderstanding inside of the library and application doesn't work.

Pay attention, that there are socket connection for mpd (smpd) daemons and for internal net module (tcp communication between mpd and application) . They use different environment variables: MPD_PORT_RANGE and MPICH_PORT_RANGE.

On Windows: could you please try to add '-genv I_MPI_PLATFORM 0' and check your test case with barrier. If it doesn't work add also '-genv I_MPI_DEBUG 500' and check one more time. I'm not sure that tcp connection will use ports from the range but at least it doesn't hang in my experiments.

Regards!
Dmitry
Seifer_Lin
Beginner
339 Views

Hi Dmitry:
I set the Windows firewall as the following on both nodes (192.168.120.36 & 192.168.120.11)
(1) smpd.exe is added into the firewall exception list.
(2) mpiexec.exe is added into the firewall exception list.
(3) TCP port 10000~11000 is opened in the firewall.
[Test1]: One MPI process at each node by the following command
mpiexec.exe -genv MPICH_PORT_RANGE 10000:11000 -hosts 2 192.168.120.36 1 192.168.120.11 1 \\\\192.168.120.36\\share\\test_intel_mpi.exe
Before barrier
Before barrier
After barrier
After barrier
[Test2]: Two MPI processes at each node by the following command
mpiexec.exe -genv MPICH_PORT_RANGE 10000:11000 -hosts 2 192.168.120.36 2 192.168.120.11 2 \\\\192.168.120.36\\share\\test_intel_mpi.exe
And no printf is shown..., even the "Before barrier" is NOT shown. :( (I've used fflush after printf.)
[Test3]: Same as Test2, by adding more debug options
mpiexec.exe -genv I_MPI_PLATFORM 0 -genv I_MPI_DEBUG 500 -genv MPICH_PORT_RANGE 10000:11000 -hosts 2 192.168.120.36 2 192.16
8.120.11 2 \\\\192.168.120.36\\share\\test_intel_mpi.exe
[0] MPI startup(): Intel MPI Library, Version 4.0 Update 1 Build 20100910
[0] MPI startup(): Copyright (C) 2003-2010 Intel Corporation. All rights reserv
ed.
[0] MPI startup(): fabric dapl failed: will try use tcp fabric
[1] MPI startup(): fabric dapl failed: will try use tcp fabric
[2] MPI startup(): fabric dapl failed: will try use tcp fabric[3] MPI startup():
fabric dapl failed: will try use tcp fabric
[1] MPI startup(): shm and tcp data transfer modes
[0] MPI startup(): shm and tcp data transfer modes
[3] MPI startup(): shm and tcp data transfer modes[2] MPI startup(): shm and tcp
data transfer modes
And no printf from the MPIApp is shown..., even the "Before barrier" is NOT shown.:(
regards,
Seifer
Dmitry_K_Intel2
Employee
339 Views

Hi Seifer,

Does it work with I_MPI_PLATFORM but without I_MPI_DEBUG?

As I wrote before there are 2 different programs: smpd and mpiexec. You need to set both SMPD_PORT_RANGE (and restart smpd service) and MPICH_PORT_RANGE. But I'm not sure that MPICH_PORT_RANGE works properly - you can check ports by tcpview.

Windows firewall is constantly a headache. We recommend to turn it off.
From my point of view firewall should be set and configured on a dedicated computer for external connections. And internal network should be behind the firewall without restrictions.

BTW: by default smpd listen to port 8678. This number can be changed by '-port' option.

Regards!
Dmitry

Seifer_Lin
Beginner
339 Views

Hi Dmitry:
I have some problems about the Intel MPI for Linux.
I have 2 machines 192.168.120.162(node1) and 192.168.120.163(node2), both of them open the port range 10000:11000 via iptables setting.
MPD_PORT_RANGE=10000:11000 is set to ~/.mpd.conf
I run the following command at 192.168.120.162
mpdboot.py -n 2 -f ~/mpdhost.txt
The contents of ~/mpdhost.txt:
192.168.120.162
192.168.120.163
mpdboot.py is done successfully.
Now I created a file named ~/machinefile.txt, which contains the following lines
192.168.120.162
192.168.120.163
Now I run the following command on 192.168.120.162 (node1)
mpiexec.py -l -machinefile ~/machinefile.txt -genv MPICH_PORT_RANGE 10000:11000 -n 2 hostname
everything is OK.
The output is
0: node1
1: node2
Now I modified the ~/machinefile.txt to the followings
192.168.120.163
192.168.120.162
(I exchange the seqeunce of my nodes.)
Then I run the following command again on 192.168.120.162 (node1)
mpiexec.py -l -machinefile ~/machinefile.txt -genv MPICH_PORT_RANGE 10000:11000 -n 2 hostname
Then it hangs........
But if the iptables are turned off. it never hangs.
My experience is that the node to launch mpiexec.py must be the first node in machinefile.txt if
iptables is turned on....
For example:
mpiexec.py is launched on node1, then the contents of machinefile.txt must be
node1
node2
when mpiexec.py is launched on node2, then the contents of machinefile.txt must be
node2
node1
otherwise, it will just hang.... even the app is only a "hostname"!
Therefore, it seems that for the hanging cases, mpd uses random port even MPD_PORT_RANGE is set in ~/.mpd.conf.
Is there any work around ?
Thanks very much!
(BTW, I still have no time to try your suggestions for Windows smpd.)
best regards,
Seifer
Reply