Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Haoyan_H_
Beginner
537 Views

MPI - Code hangs when send/recv large data

Hi all,

I have been confused by the strange behaviour of intel mpi library for days. When I send small data, everything is fine. However, when I send large data, the following code hangs.

 

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>


int main(){
	int length = MSG_LENGTH;
	char* buf = malloc(length);
	int size, rank;

	MPI_Init(0, NULL);
	MPI_Comm_rank(MPI_COMM_WORLD, &rank);
	MPI_Comm_size(MPI_COMM_WORLD, &size);

	if(rank == 0){
		MPI_Send(buf, length, MPI_BYTE, 1, 0, MPI_COMM_WORLD);
		printf("Sent");
	}else{
		MPI_Recv(buf, length, MPI_BYTE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
		printf("Received");
	}
	free(buf);
	return 0;
}

 

This is the test makefile:

 

default: compile test-small test-large

compile:
	mpiicc mpi.c -DMSG_LENGTH=1024 -o mpi-small
	mpiicc mpi.c -DMSG_LENGTH=1048576 -o mpi-large

test-small: compile
	@echo "Testing Recv/Send with small data"
	mpiexec.hydra -n 2 ./mpi-small
	@echo "Test done"

test-large: compile
	@echo "Testing Recv/Send with large data"
	mpiexec.hydra -n 2 ./mpi-large
	@echo "Test done"

 

Thank you very much!

0 Kudos
5 Replies
Artem_R_Intel1
Employee
537 Views

Hi Haoyan,

Could you please try to run the following test scenario and provide its output:

mpirun -n 2 IMB-MPI1 pingpong

Also please specify OS and Intel MPI Library version (for example, with 'mpirun -V').

Haoyan_H_
Beginner
537 Views

Artem R. (Intel) wrote:

Hi Haoyan,

Could you please try to run the following test scenario and provide its output:

mpirun -n 2 IMB-MPI1 pingpong

Also please specify OS and Intel MPI Library version (for example, with 'mpirun -V').

The test scenario gives:

#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 4.1 Update 1, MPI-1 part    
#------------------------------------------------------------
# Date                  : Sat Feb 20 17:04:59 2016
# Machine               : x86_64
# System                : Linux
# Release               : 3.16.0-60-generic
# Version               : #80~14.04.1-Ubuntu SMP Wed Jan 20 13:37:48 UTC 2016
# MPI Version           : 3.0
# MPI Thread Environment: 

# New default behavior from Version 3.2 on:

# the number of iterations per message size is cut down 
# dynamically when a certain run time (per message size sample) 
# is expected to be exceeded. Time limit is defined by variable 
# "SECS_PER_SAMPLE" (=> IMB_settings.h) 
# or through the flag => -time 
  


# Calling sequence was: 

# IMB-MPI1 pingpong

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# PingPong

#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2 
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         1.06         0.00
            1         1000         1.23         0.78
            2         1000         1.23         1.55
            4         1000         1.23         3.10
            8         1000         1.21         6.32
           16         1000         1.21        12.63
           32         1000         1.18        25.87
           64         1000         1.32        46.08
          128         1000         1.24        98.73
          256         1000         1.23       198.01
          512         1000         1.53       319.15
         1024         1000         1.64       595.09
         2048         1000         2.04       959.76
         4096         1000         2.97      1315.46
         8192         1000         4.52      1727.86
        16384         1000         8.19      1907.81
        32768         1000        15.59      2004.62
***hangs here

 

And my mpi version is "Intel(R) MPI Library for Linux* OS, Version 5.1.2 Build 20151015 (build id: 13147)".

Artem_R_Intel1
Employee
537 Views

Hi Haoyan,

As far as I see you use Ubuntu* OS 14.04.1 and the hang is a known issue (see the topic "MPI having bad performance in user mode, runs perfectly in root").

Could you please try the workaround:

export I_MPI_SHM_LMT=shm

Haoyan_H_
Beginner
537 Views

Hi Artem,

Yes, setting I_MPI_SHM_LMT solves this problem. Thank you very much!

#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 4.1 Update 1, MPI-1 part    
#------------------------------------------------------------
# Date                  : Sat Feb 20 17:50:21 2016
# Machine               : x86_64
# System                : Linux
# Release               : 3.16.0-60-generic
# Version               : #80~14.04.1-Ubuntu SMP Wed Jan 20 13:37:48 UTC 2016
# MPI Version           : 3.0
# MPI Thread Environment: 

# New default behavior from Version 3.2 on:

# the number of iterations per message size is cut down 
# dynamically when a certain run time (per message size sample) 
# is expected to be exceeded. Time limit is defined by variable 
# "SECS_PER_SAMPLE" (=> IMB_settings.h) 
# or through the flag => -time 
  


# Calling sequence was: 

# IMB-MPI1 pingpong

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# PingPong

#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2 
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         1.24         0.00
            1         1000         1.31         0.73
            2         1000         1.32         1.44
            4         1000         1.31         2.91
            8         1000         1.27         6.02
           16         1000         1.27        12.02
           32         1000         1.24        24.65
           64         1000         1.34        45.55
          128         1000         1.29        94.55
          256         1000         1.30       188.08
          512         1000         1.55       315.54
         1024         1000         1.88       520.42
         2048         1000         2.09       932.92
         4096         1000         2.97      1314.82
         8192         1000         4.47      1749.54
        16384         1000         8.05      1940.14
        32768         1000        15.92      1962.43
        65536          640        18.32      3412.26
       131072          320        33.20      3765.21
       262144          160        58.60      4266.03
       524288           80       116.81      4280.56
      1048576           40       233.65      4279.90
      2097152           20       470.89      4247.24
      4194304           10       915.41      4369.64


# All processes entering MPI_Finalize
Tom_K_1
Beginner
537 Views

Just to update this problem is not resolved as of Intel 2017 update 1 (with Ubuntu 16.04 LTS).

Rather than hanging however I now get an error message after the 32768 row as follows:

Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(224)...................: MPI_Recv(buf=0x1639880, count=65536, MPI_BYTE, src=0, tag=MPI_ANY_TAG, comm=0x84000002, status=0x7fff751dbb50) failed
PMPIDI_CH3I_Progress(623).......: fail failed
pkt_RTS_handler(317)............: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(288): fail failed
dcp_recv(154)...................: Internal MPI error!  cannot read from remote process

 

The test passes with the workaround given above as before.

Reply