Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
1829 Discussions

Porting code from MPI/Pro 1.7 to Intel MPI 3.1

I am in the process of switching from MPI/Pro 1.7 to Intel MPI 3.1 and I am seeing very strange (and poor) performance issues that have stumped me.

I am seeing poor performance throughout the entire code, but the front end is a good illustration of some of the problems I am seeing. The front end consists of two processes, process 0 (or I/O process) reads in a data header and data and passes them to process 1 (or compute process). Process 1 then processes the data and sends the output field(s) back to process 0 which saves them to disk.

Here is the outline of the MPI framework for the two processes for the simple case of 1 I/O process and 1 compute process:

Process 0:

For (ifrm=0; ifrm <= totfrm; ifrm++;) {

if (ifrm != totfrm) {
data_read (..., InpBuf, HD1,...);
MPI_Ssend (HD1,...);
MPI_Ssend (InpBuf,...);

if (ifrm > 0) {
MPI_Recv (OutBuf,...);
sav_data (OutBuf,...);

} // for (ifrm=0...

// No more data, send termination message
MPI_Send (MPI_BOTTOM, 0, ...);

Process 1:

// Initialize persistent communication requests
MPI_Recv_init (HdrBuf, ..., req_recvhdr);
MPI_Recv_init (InpBuf, ..., req_recvdat);
MPI_Ssend_init (OutBuf, ..., req_sendout);

// Get header and data for first frame
MPI_Start (req_recvhdr);
MPI_Start (req_recvdat);

while (1) {

MPI_Wait (req_recvhdr, status);
MPI_Get_Count (status, count);
if (count = 0) {
execute termination code

MPI_Wait (req_recvdat);

// Start receive on next frame while processing current one
MPI_Start (req_recvhdr);
MPI_Start (req_recvdat);

process data

if (curr_frame > start_frame) {
MPI_Wait (req_sendout);

process data

// Send output field(s) back to I/O process
MPI_Start (req_sendout);

} // while (1)

The problem I am having is that the MPI_Wait calls are chewing up a lot CPU cycles for no obvious reason and in a very erratic way. When using MPI/Pro, the above MPI framework works in a very reliable and predicable way. However, with Intel MPI, the code can spend almost no time (expected) or several minutes (very unexpected) on one of the MPI_Wait calls. The two waits that are giving me the most problems are the ones associated with req_recvhdr and req_sendout.

The code is compiled using the 64-bit versions of the Intel compiler 10.1 and Intel MKL 10.0 and is run on RHEL4 nodes. Both processes are run on the same core.

Like I have already said, this framework works well under MPI/Pro and I am stumped in terms of locating the problem(s) or what things I should try in order to fix the code. Any insight or guidance you could provide would be greatly appreciated.
0 Kudos
2 Replies
Hi jburri,

Thanks for posting to the Intel HPC forums and welcome!

You probably need to use wait-mode. Please try to set environment variable I_MPI_WAIT_MODE to 'on'.
Also you could try to set env variable I_MPI_RDMA_WRITE_IMM to 'enable'.
And you could play with I_MPI_SPIN_COUNT variable setting different values.

Best wishes,

Thanks Dmitry. I will play around with those paramter and see what the impact is on performance.