Solved: Re: MPI_Bcast writing to buffer on root node

jtepper · ‎07-09-2009

When using MPI_Bcast one should expect that, in the root process, the buffer will be left unchanged. While this is certainly true of Intel's implementation, there is a certain peculiarity. When the buffer exceeds a certain size (somewhere between 3.7k and 7.5k), the root process needs write access to the buffer. In the application I am currently developing, I first mapped the data to be broadcast as read-only in the root process's vm. Unfortunately this lead to a segfault, and changing the mapping fixed the problem. I later confirmed that the buffer is the same before and after MPI_Bcast.

Can anyone tell me why I need write access to the buffer in the root process? This is not true in any other implementation of MPI that I've used.

Dmitry_K_Intel2 · ‎07-13-2009

Hi jtepper

I was able to reproduce this behaviour and I submitted feature request because this beahaviour cannot be considered as erroneous.

BTW: Have you tried MPICH2?

Best wishes,
Dmitry

View solution in original post

Dmitry_K_Intel2 · ‎07-09-2009

Quoting - jtepper

When using MPI_Bcast one should expect that, in the root process, the buffer will be left unchanged. While this is certainly true of Intel's implementation, there is a certain peculiarity. When the buffer exceeds a certain size (somewhere between 3.7k and 7.5k), the root process needs write access to the buffer. In the application I am currently developing, I first mapped the data to be broadcast as read-only in the root process's vm. Unfortunately this lead to a segfault, and changing the mapping fixed the problem. I later confirmed that the buffer is the same before and after MPI_Bcast.

Can anyone tell me why I need write access to the buffer in the root process? This is not true in any other implementation of MPI that I've used.

Hi jtepper,

Thanks for your question! It's very interesting finding. What version of Intel MPI Library did you use?
I've checked with a developer current implementation of MPI_Bcast and we were not able to find a place where MPI_Bcast could require write access to the buffer for the root process.

BTW, according to the MPI standard there is no requirements to support read-only buffers.

Best wishes,
Dmitry

jtepper · ‎07-10-2009

Thanks Dmitry,
I'm using impi v. 3.1 on x86-64 nodes.

As an example, consider the following code:
-------------

[cpp]#include 
#include 
#include 
#include 

int main(int argc, char **argv) {
	int rank, fd, data_len = 800*600*1;
	void *data; 
	
	MPI_Init(&argc, &argv);
	MPI_Comm_rank(MPI_COMM_WORLD, &rank);
	
	if(rank == 0) {
		fd = open("picture", O_RDWR, 0);
		data = mmap(NULL, data_len, PROT_READ,
				MAP_PRIVATE, fd, 0);
	} else {
		data = malloc(data_len);
	}
	
	MPI_Bcast(data, data_len, MPI_CHAR,
		0, MPI_COMM_WORLD);
	
	MPI_Finalize();
	return 0;
}[/cpp]

-------------

Note that I mmap the data as read-only. When this is not the case everything runs smoothly. However, when run as above with multiple processes per node, the following occurs:

------------
0:
0: Program received signal SIGSEGV, Segmentation fault.
0: 0x00002ad24740e8c6 in I_MPI_memcpy_shm_rd ()
0: from /opt/intel/impi/3.1/lib64/libmpi.so.3.2
0: (gdb) 0: (gdb) where
0: #0 0x00002ad24740e8c6 in I_MPI_memcpy_shm_rd ()
0: from /opt/intel/impi/3.1/lib64/libmpi.so.3.2
0: #1 0x00002ad24740e3a5 in MPIDI_CH3I_SHM_read_progress ()
0: from /opt/intel/impi/3.1/lib64/libmpi.so.3.2
0: #2 0x00002ad247407cca in MPIDI_CH3I_RDSSM_Progress ()
0: from /opt/intel/impi/3.1/lib64/libmpi.so.3.2
0: #3 0x00002ad247408896 in MPIDI_CH3_Progress_wait ()
0: from /opt/intel/impi/3.1/lib64/libmpi.so.3.2
0: #4 0x00002ad247454f95 in MPIC_Wait ()
0: from /opt/intel/impi/3.1/lib64/libmpi.so.3.2
0: #5 0x00002ad247454c0e in MPIC_Sendrecv ()
0: from /opt/intel/impi/3.1/lib64/libmpi.so.3.2
0: #6 0x00002ad2473f9a74 in MPIR_Bcast ()
0: from /opt/intel/impi/3.1/lib64/libmpi.so.3.2
0: #7 0x00002ad2473f8d4a in PMPI_Bcast ()
0: from /opt/intel/impi/3.1/lib64/libmpi.so.3.2
0: #8 0x0000000000400e36 in main ()
0: (gdb)
----------

When more than one node is used this issue produces an error that looks like there are insufficient resources in the communications layer (not the case). In order to produce the above stack trace I ran the processes on a single node. Further, when there is only a single process on each node, this issue does not come up. Of course, changing the mapping fixes the issue in all cases.

Dmitry_K_Intel2 · ‎07-13-2009

Hi jtepper

I was able to reproduce this behaviour and I submitted feature request because this beahaviour cannot be considered as erroneous.

BTW: Have you tried MPICH2?

Best wishes,
Dmitry

jtepper · ‎07-13-2009

Dmitry,
Thanks again for taking the time to look into this. I gave MPICH 2 a shot and there doesn't seem to be the same restrictions on the memory mappings in the MPICH implementations, although I didn't test this as extensively as I did with impi. In any case, I just changed the mapping in my app.

So, to be clear, you're saying that while this is not a bug, support for read only memory is enough of an issue that it warrants a feature request? I would probably agree with this statement.

Thanks again for your help,
~Josh