Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
1828 Discussions

Wait for long time before the parallel began

Zhanghong_T_
Novice
100 Views

Dear all,

I launch the parallel program by Intel MPI 4.1.3.047. I launched 240 processes in 10 calculate nodes. Everytime I launched the program, The execute files are launched in every node after 2-3 seconds. However, the CPU usage of every process is 0% and the program is waiting for something. The wait time could reach to 10 minutes. To further check the location of waiting. I have the following test code in my program:

program test
! define variables
write(*,*)1
call MPI_Init ( ierr )
write(*,*)2
comm = MPI_COMM_WORLD
call MPI_COMM_SIZE (comm, mysize, ierr)
write(*,*)3
call MPI_COMM_RANK (comm, myid, ierr)
if(myid==0)write(*,*)4
...
end

It seems that the number 1 was printed soon (about 2-3 seconds after launched the program). However, it will wait for about 10 minutes the number 2 be printed. So my problem is: what lead to the MPI_Init take so long time?

Thanks,

Zhanghong Tang

0 Kudos
3 Replies
Zhanghong_T_
Novice
100 Views

Dear all,

I am still trying to solve the problem. Forgot to say, I have created a domain in my cluster and 10 nodes are included in the domain, they are N01, N02,..., N10. The IP address of these nodes are:

10.0.0.5
10.0.0.2
10.0.0.3
10.0.0.4
10.0.0.1
10.0.0.6
10.0.0.7
10.0.0.8
10.0.0.9
10.0.0.10

I installed Windows 2012 HPC on N05 (10.0.0.1) and N06 (10.0.0.6) and the head node is N05. By further test I found that if I launch processes without N05, i.e., the head node, the processes begin very fast (about 3 seconds after I entered the command line). but if the head node is launched, the wait time is more than 10 minutes. What could lead to this problem?

Thanks

Artem_R_Intel1
Employee
100 Views

Hi,

You use pretty old version of Intel MPI Library 4.1.3 - is it possible for you to switch to the latest one?

Possibly this delay was caused by some specific network settings - check the connections between the compute nodes (for example with ping utility).

Zhanghong_T_
Novice
100 Views

Dear Artem,

Thank you very much for your kindly reply. I have tested many times and found latest version doesn't work. Only the version 4.1.3.047 works for me. See here:
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/644828

Could you please give more details on how to check the connections?

Thanks

Reply