Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

MPI_Allgatherv with large message sizes

Steven_V_
Beginner
1,621 Views

Hi,

I'm trying to collect data with MPI_Allgatherv with a large receive buffer for which the total size is larger than 2GB. As I could understand here (http://software.intel.com/en-us/forums/topic/361060) this is not supported. Unfortunately when I try to use the -ilp64 option with mpiifort I run into several problems:

1) when using include 'mpif.h' to  include mpi, then after the following commands:

mpiifort -warn -O1 -g -traceback -check bounds -i8 -c gather.f

mpiifort -warn -O1 -g -traceback -check bounds -ilp64  gather.o -o gather.exe-ilp64

mpirun -ilp64 ./gather.exe-ilp64

I aborts with:

Assertion failed in file ../../i_rtc_cache.c at line 638: buf_end_palign > buf_start_palign
Assertion failed in file ../../i_rtc_cache.c at line 638: buf_end_palign > buf_start_palign

2) when including the mpi types through a "use mpi" statement, I can't compile the test program with '-i8' as it tells me the interface is incompatible. I guess this is because it doesn't know that i want to use the ilp64 interface. When compiling + linking in one go, it does work with only '-ilp64', but not if I add '-i8':

mpiifort -warn -O1 -g -traceback -check bounds -ilp64  gather.f -o gather.exe-ilp64

mpirun -ilp64 ./gather.exe-ilp64

after this, the program still crashes but now with the following error message:

Fatal error in PMPI_Allgatherv: Invalid count, error stack:
PMPI_Allgatherv(1430): MPI_Allgatherv(sbuf=0x2b33c8000010, scount=0, dtype=0x4c000829, rbuf=0x2b34b66b3010, rcounts=0x7fff2b9e7b70, displs=0x7fff2b9e7b60, dtype=0x4c000829, MPI_COMM_WORLD) failed
PMPI_Allgatherv(1375): Negative count, value is -1071939176
 BUFRECV =    5.55500000000000     
Fatal error in PMPI_Allgatherv: Invalid count, error stack:
PMPI_Allgatherv(1430): MPI_Allgatherv(sbuf=0x2b53e8000010, scount=0, dtype=0x4c000829, rbuf=0x2b54d66b3010, rcounts=0x7fff75f545f0, displs=0x7fff75f545e0, dtype=0x4c000829, MPI_COMM_WORLD) failed
PMPI_Allgatherv(1375): Negative count, value is -484441656

or with

Fatal error in PMPI_Allgatherv: Invalid count, error stack:
PMPI_Allgatherv(1430): MPI_Allgatherv(sbuf=0x2b8c98000010, scount=0, dtype=0x4c000829, rbuf=0x2b8d866b3010, rcounts=0x7fff83e96470, displs=0x7fff83e96460, dtype=0x4c000829, MPI_COMM_WORLD) failed
PMPI_Allgatherv(1375): Negative count, value is -1883799144
forrtl: error (69): process interrupted (SIGINT)
Image              PC                Routine            Line        Source             
libpthread.so.0    00002B5F11907251  Unknown               Unknown  Unknown
libdaploucm.so.2   00002B5F12F7869C  Unknown               Unknown  Unknown
libmpi_dbg.so.4    00002B5F10E8676F  Unknown               Unknown  Unknown
libmpi_dbg.so.4    00002B5F10E83718  dapl_rc_poll_recv         296  dapl_poll_rc.c
libmpi_dbg.so.4    00002B5F10E8330D  MPID_nem_dapl_rc_         124  dapl_poll_rc.c
libmpi_dbg.so.4    00002B5F10FC18C7  MPID_nem_network_          23  mpid_nem_network_poll.c
libmpi_dbg.so.4    00002B5F10DCD90E  MPIDI_CH3I_Progre         735  ch3_progress.c
libmpi_dbg.so.4    00002B5F10F2B592  MPIC_Wait                 568  helper_fns.c
libmpi_dbg.so.4    00002B5F10F290E9  MPIC_Sendrecv             206  helper_fns.c
libmpi_dbg.so.4    00002B5F10F2BA18  MPIC_Sendrecv_ft          717  helper_fns.c
libmpi_dbg.so.4    00002B5F10D7890E  MPIR_Allgatherv_i         770  allgatherv.c
libmpi_dbg.so.4    00002B5F10D7965F  MPIR_Allgatherv           955  allgatherv.c
libmpi_dbg.so.4    00002B5F10D799B0  MPIR_Allgatherv_i        1000  allgatherv.c
libmpi_dbg.so.4    00002B5F10D7C822  PMPI_Allgatherv          1400  allgatherv.c
libmpigf.so.4      00002B5F10AA4279  Unknown               Unknown  Unknown
libmpi_ilp64.so    00002B5F108709C3  Unknown               Unknown  Unknown
gather.exe-ilp64   0000000000403D1B  MAIN__                     56  gather.f
gather.exe-ilp64   0000000000402F1C  Unknown               Unknown  Unknown
libc.so.6          00002B5F11DBECDD  Unknown               Unknown  Unknown
gather.exe-ilp64   0000000000402E19  Unknown               Unknown  Unknown

So, that makes me wonder if I actually compiled it properly?

Test program is attached, mpiifort -show:

ifort -I/software/intel/impi/4.1.3.048/intel64/include -I/software/intel/impi/4.1.3.048/intel64/include -L/software/intel/impi/4.1.3.048/intel64/lib -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /software/intel/impi/4.1.3.048/intel64/lib -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/4.1 -lmpigf -lmpi -lmpigi -ldl -lrt -lpthread

and ifort --version:

ifort.orig (IFORT) 13.1.3 20130607
Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.

grtz

Steven

0 Kudos
1 Solution
James_T_Intel
Moderator
1,621 Views

Hi Steven,

You are (in the first step) correctly compiling and linking with ILP64.  However, this does not enable support for messages larger than 2 GB.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

View solution in original post

0 Kudos
3 Replies
James_T_Intel
Moderator
1,622 Views

Hi Steven,

You are (in the first step) correctly compiling and linking with ILP64.  However, this does not enable support for messages larger than 2 GB.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
Steven_V_
Beginner
1,621 Views

Thanks for the confirmation! I worked around it by partitioning in smaller message sizes.

0 Kudos
phonlawat_k_
Beginner
1,621 Views

if it doesn't enable. How about this link http://ijssst.info/Vol-12/No-1/paper3.pdf ? From my summarization, they evaluate performance between Intel MPI and MVAPICH with infiniband technology by using intel micro benchmark on Intel-Westmere Processor. In experiment they vary message size until 16 MB. From my point, if you see the result of Allgather testing it shows that everything is ok but it should not run because message larger than 2 GB or i misunderstand about Intel MPI limitation.

0 Kudos
Reply