Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
20 Views

Wait for long time before the parallel began

Dear all,

I launch the parallel program by Intel MPI 4.1.3.047. I launched 240 processes in 10 calculate nodes. Everytime I launched the program, The execute files are launched in every node after 2-3 seconds. However, the CPU usage of every process is 0% and the program is waiting for something. The wait time could reach to 10 minutes. To further check the location of waiting. I have the following test code in my program:

program test
! define variables
write(*,*)1
call MPI_Init ( ierr )
write(*,*)2
comm = MPI_COMM_WORLD
call MPI_COMM_SIZE (comm, mysize, ierr)
write(*,*)3
call MPI_COMM_RANK (comm, myid, ierr)
if(myid==0)write(*,*)4
...
end

It seems that the number 1 was printed soon (about 2-3 seconds after launched the program). However, it will wait for about 10 minutes the number 2 be printed. So my problem is: what lead to the MPI_Init take so long time?

Thanks,

Zhanghong Tang

0 Kudos
3 Replies
Highlighted
Beginner
20 Views

Dear all,

I am still trying to solve the problem. Forgot to say, I have created a domain in my cluster and 10 nodes are included in the domain, they are N01, N02,..., N10. The IP address of these nodes are:

10.0.0.5
10.0.0.2
10.0.0.3
10.0.0.4
10.0.0.1
10.0.0.6
10.0.0.7
10.0.0.8
10.0.0.9
10.0.0.10

I installed Windows 2012 HPC on N05 (10.0.0.1) and N06 (10.0.0.6) and the head node is N05. By further test I found that if I launch processes without N05, i.e., the head node, the processes begin very fast (about 3 seconds after I entered the command line). but if the head node is launched, the wait time is more than 10 minutes. What could lead to this problem?

Thanks

0 Kudos
Highlighted
Employee
20 Views

Hi,

You use pretty old version of Intel MPI Library 4.1.3 - is it possible for you to switch to the latest one?

Possibly this delay was caused by some specific network settings - check the connections between the compute nodes (for example with ping utility).

0 Kudos
Highlighted
Beginner
20 Views

Dear Artem,

Thank you very much for your kindly reply. I have tested many times and found latest version doesn't work. Only the version 4.1.3.047 works for me. See here:
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/644828

Could you please give more details on how to check the connections?

Thanks

0 Kudos