ILP64 model: using MPI_IN_PLACE in MPI_REDUCE seems to yield wrong results

Stefan_K_2 · ‎04-22-2013

hi,

i am using the ifort compiler v. 13.0.1 20121010 together with Intel MPI v.4.1.0.024 on an x86_64 Linux cluster. Using 64-bit integers as default (ILP64 model) in my little Fortran program i obtain wrong results when i use MPI_IN_PLACE in MPI_REDUCE calls (both for integer and real(8)):

my code is as follows:

[fortran]

program test
include "mpif.h"
! use mpi
integer :: iraboof
integer :: mytid, numnod, ierr
real(8) :: rraboof

mytid = 0
! initialize MPI environment
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, mytid,ierr)
call mpi_comm_size(mpi_comm_world, numnod,ierr)

iraboof = 1
if (mytid == 0) then
call mpi_reduce(MPI_IN_PLACE, iraboof, 1, mpi_integer, mpi_sum, 0, mpi_comm_world, ierr)
else
call mpi_reduce(iraboof, 0 , 1, mpi_integer, mpi_sum, 0, mpi_comm_world, ierr)
end if
if (mytid == 0) then
print *, 'raboof mpi reduce', iraboof, numnod
end if
rraboof = 1.0d0
if (mytid == 0) then
call mpi_reduce(MPI_IN_PLACE, rraboof, 1, mpi_real8 , mpi_sum, 0, mpi_comm_world, ierr)
else
call mpi_reduce(rraboof, 0 , 1, mpi_real8 , mpi_sum, 0, mpi_comm_world, ierr)
end if
if (mytid == 0) then
print *, 'raboof mpi reduce', rraboof, numnod
end if
call mpi_finalize(ierr)
end program

[/fortran]

Compilation is done with

[bash]

mpiifort -O3 -i8 impi.F90

[/bash]

It compiles and links fine

[bash]

ldd ./a.out

linux-vdso.so.1 => (0x00007ffff7893000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003357c00000)
libmpi_ilp64.so.4 => /global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib/libmpi_ilp64.so.4 (0x00002ad1a4a3f000)
libmpi.so.4 => /global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib/libmpi.so.4 (0x00002ad1a4c69000)
libmpigf.so.4 => /global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib/libmpigf.so.4 (0x00002ad1a528e000)
librt.so.1 => /lib64/librt.so.1 (0x0000003358800000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003358000000)
libm.so.6 => /lib64/libm.so.6 (0x0000003357800000)
libc.so.6 => /lib64/libc.so.6 (0x0000003357400000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003359c00000)
/lib64/ld-linux-x86-64.so.2 (0x0000003357000000)

[/bash]

Running the program I however obtain

[bash]

mpirun -np 4 ./a.out
raboof mpi reduce 3 4
raboof mpi reduce 3.00000000000000 4

[/bash]

whereas it should produce

[bash]

mpirun -np 4 ./a.out
raboof mpi reduce 4 4
raboof mpi reduce 4.00000000000000 4

[/bash]

which is what I also obtain with other MPI libraries.

I would appreciate any comment/help.

with best regards,

stefan

p.s.: when i use the F90-interface ("use mpi") i obtain the following warnings at compile time:

[bash]

mpiifort -O3 -i8 impi.F90
impi.F90(9): warning #6075: The data type of the actual argument does not match the definition. [IERR]
call mpi_init(ierr)
-----------------^
impi.F90(10): warning #6075: The data type of the actual argument does not match the definition. [MYTID]
call mpi_comm_rank(mpi_comm_world, mytid,ierr)
--------------------------------------^
impi.F90(10): warning #6075: The data type of the actual argument does not match the definition. [IERR]
call mpi_comm_rank(mpi_comm_world, mytid,ierr)
--------------------------------------------^
impi.F90(11): warning #6075: The data type of the actual argument does not match the definition. [NUMNOD]
call mpi_comm_size(mpi_comm_world, numnod,ierr)
--------------------------------------^
impi.F90(11): warning #6075: The data type of the actual argument does not match the definition. [IERR]
call mpi_comm_size(mpi_comm_world, numnod,ierr)
---------------------------------------------^

[/bash]

and a crash at runtime

[bash]

mpirun -np 4 ./a.out
Fatal error in PMPI_Reduce: Invalid buffer pointer, error stack:
PMPI_Reduce(1894): MPI_Reduce(sbuf=MPI_IN_PLACE, rbuf=0x693828, count=1, MPI_INTEGER, MPI_SUM, root=0, MPI_COMM_WORLD) failed
PMPI_Reduce(1823): sendbuf cannot be MPI_IN_PLACE
Fatal error in PMPI_Reduce: Invalid buffer pointer, error stack:
PMPI_Reduce(1894): MPI_Reduce(sbuf=MPI_IN_PLACE, rbuf=0x693828, count=1, MPI_INTEGER, MPI_SUM, root=0, MPI_COMM_WORLD) failed
PMPI_Reduce(1823): sendbuf cannot be MPI_IN_PLACE
Fatal error in PMPI_Reduce: Invalid buffer pointer, error stack:
PMPI_Reduce(1894): MPI_Reduce(sbuf=MPI_IN_PLACE, rbuf=0x693828, count=1, MPI_INTEGER, MPI_SUM, root=0, MPI_COMM_WORLD) failed
PMPI_Reduce(1823): sendbuf cannot be MPI_IN_PLACE

[/bash]

TimP · ‎04-22-2013

Your ldd result showing that you linked against the gfortran compatible library looks like a problem. This shouldn't happen if you use mpiifort consistently. The gfortran and ifort libraries can't coexist. Adding -# to the mpiifort command should give a lot more detail about what goes into the script which will pass over to ld.

Stefan_K_2 · ‎04-22-2013

dear Tim,

thanks for your immediate reply. please find below the output for compiling my program (the one above in the file impi.F90) with your suggested flag:

[bash]

mpiifort -i8 -# imi.F90

[/bash]

this compilation yields:

[bash]

mpiifort -i8 -# impi.F90
/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/bin/intel64/fpp \
-D__INTEL_COMPILER=1300 \
-D__unix__ \
-D__unix \
-D__linux__ \
-D__linux \
-D__gnu_linux__ \
-Dunix \
-Dlinux \
-D__ELF__ \
-D__x86_64 \
-D__x86_64__ \
-D_MT \
-D__INTEL_COMPILER_BUILD_DATE=20121010 \
-D__INTEL_OFFLOAD \
-D__i686 \
-D__i686__ \
-D__pentiumpro \
-D__pentiumpro__ \
-D__pentium4 \
-D__pentium4__ \
-D__tune_pentium4__ \
-D__SSE2__ \
-D__SSE__ \
-D__MMX__ \
-I. \
-I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include \
-I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include \
-I/global/apps/intel/2013.1/mkl/include \
-I/global/apps/intel/2013.1/tbb/include \
-I/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/include/intel64 \
-I/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/include \
-I/usr/local/include \
-I/usr/lib/gcc/x86_64-redhat-linux/4.4.7/include \
-I/usr/include \
-4Ycpp \
-4Ncvf \
-f_com=yes \
impi.F90 \
/tmp/ifortBOT7lB.i90

/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/bin/intel64/fortcom \
-D__INTEL_COMPILER=1300 \
-D__unix__ \
-D__unix \
-D__linux__ \
-D__linux \
-D__gnu_linux__ \
-Dunix \
-Dlinux \
-D__ELF__ \
-D__x86_64 \
-D__x86_64__ \
-D_MT \
-D__INTEL_COMPILER_BUILD_DATE=20121010 \
-D__INTEL_OFFLOAD \
-D__i686 \
-D__i686__ \
-D__pentiumpro \
-D__pentiumpro__ \
-D__pentium4 \
-D__pentium4__ \
-D__tune_pentium4__ \
-D__SSE2__ \
-D__SSE__ \
-D__MMX__ \
-mGLOB_pack_sort_init_list \
-I. \
-I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include \
-I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include \
-I/global/apps/intel/2013.1/mkl/include \
-I/global/apps/intel/2013.1/tbb/include \
-I/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/include/intel64 \
-I/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/include \
-I/usr/local/include \
-I/usr/lib/gcc/x86_64-redhat-linux/4.4.7/include \
-I/usr/include \
"-integer_size 64" \
-O2 \
-simd \
-offload_host \
-mP1OPT_version=13.0-intel64 \
-mGLOB_diag_file=/tmp/ifort7GVk2e.diag \
-mGLOB_source_language=GLOB_SOURCE_LANGUAGE_F90 \
-mGLOB_tune_for_fort \
-mGLOB_use_fort_dope_vector \
-mP2OPT_static_promotion \
-mP1OPT_print_version=FALSE \
-mCG_use_gas_got_workaround=F \
-mP2OPT_align_option_used=TRUE \
-mGLOB_gcc_version=447 \
"-mGLOB_options_string=-I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include -I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include -ldl -i8 -# -L/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/4.1 -lmpi_ilp64 -lmpi -lmpigf -lmpigi -lrt -lpthread" \
-mGLOB_cxx_limited_range=FALSE \
-mCG_extend_parms=FALSE \
-mGLOB_compiler_bin_directory=/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/bin/intel64 \
-mGLOB_as_output_backup_file_name=/tmp/ifortK2gIZoas_.s \
-mIPOPT_activate \
-mIPOPT_lite \
-mGLOB_machine_model=GLOB_MACHINE_MODEL_EFI2 \
-mGLOB_product_id_code=0x22006d91 \
-mCG_bnl_movbe=T \
-mGLOB_extended_instructions=0x8 \
-mP3OPT_use_mspp_call_convention \
-mP2OPT_subs_out_of_bound=FALSE \
-mGLOB_ansi_alias \
-mPGOPTI_value_profile_use=T \
-mP2OPT_il0_array_sections=TRUE \
-mP2OPT_offload_unique_var_string=ifort607026576Zo54LN \
-mP2OPT_hlo_level=2 \
-mP2OPT_hlo \
-mP2OPT_hpo_rtt_control=0 \
-mIPOPT_args_in_regs=0 \
-mP2OPT_disam_assume_nonstd_intent_in=FALSE \
-mGLOB_imf_mapping_library=/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/bin/intel64/libiml_attr.so \
-mIPOPT_obj_output_file_name=/tmp/ifort7GVk2e.o \
-mIPOPT_whole_archive_fixup_file_name=/tmp/ifortwarchNyvxkL \
"-mGLOB_linker_version=2.20.51.0.2-5.36.el6 20100205" \
-mGLOB_long_size_64 \
-mGLOB_routine_pointer_size_64 \
-mGLOB_driver_tempfile_name=/tmp/iforttempfilenQtt0t \
-mP3OPT_asm_target=P3OPT_ASM_TARGET_GAS \
-mGLOB_async_unwind_tables=TRUE \
-mGLOB_obj_output_file=/tmp/ifort7GVk2e.o \
-mGLOB_source_dialect=GLOB_SOURCE_DIALECT_FORTRAN \
-mP1OPT_source_file_name=impi.F90 \
-mP2OPT_symtab_type_copy=true \
/tmp/ifortBOT7lB.i90

ld \
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o \
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crti.o \
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/crtbegin.o \
--eh-frame-hdr \
--build-id \
-dynamic-linker \
/lib64/ld-linux-x86-64.so.2 \
-L/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib \
-o \
a.out \
/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/lib/intel64/for_main.o \
-L/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib \
-L/global/apps/intel/2013.1/mkl/lib/intel64 \
-L/global/apps/intel/2013.1/tbb/lib/intel64 \
-L/global/apps/intel/2013.1/ipp/lib/intel64 \
-L/global/apps/intel/2013.1/composerxe/lib/intel64 \
-L/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/lib/intel64 \
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7/ \
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64 \
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/ \
-L/lib/../lib64 \
-L/lib/../lib64/ \
-L/usr/lib/../lib64 \
-L/usr/lib/../lib64/ \
-L/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib/ \
-L/global/apps/intel/2013.1/mkl/lib/intel64/ \
-L/global/apps/intel/2013.1/tbb/lib/intel64/ \
-L/global/apps/intel/2013.1/ipp/lib/intel64/ \
-L/global/apps/intel/2013.1/composerxe/lib/intel64/ \
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../ \
-L/lib64 \
-L/lib/ \
-L/usr/lib64 \
-L/usr/lib \
-ldl \
/tmp/ifort7GVk2e.o \
--enable-new-dtags \
-rpath \
/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib \
-rpath \
/opt/intel/mpi-rt/4.1 \
-lmpi_ilp64 \
-lmpi \
-lmpigf \
-lmpigi \
-lrt \
-lpthread \
-Bstatic \
-lifport \
-lifcore \
-limf \
-lsvml \
-Bdynamic \
-lm \
-Bstatic \
-lipgo \
-lirc \
-Bdynamic \
-lpthread \
-Bstatic \
-lsvml \
-Bdynamic \
-lc \
-lgcc \
-lgcc_s \
-Bstatic \
-lirc_s \
-Bdynamic \
-ldl \
-lc \
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/crtend.o \
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crtn.o

rm /tmp/ifortlibgccyi9h59
rm /tmp/ifortgnudirs06mNow
rm /tmp/ifort7GVk2e.o
rm /tmp/ifortBOT7lB.i90
rm /tmp/ifortakfVFX.c
rm /tmp/ifortdashvdk0IZj
rm /tmp/ifortargC1wikG
rm /tmp/ifortgas65oTE2
rm /tmp/ifortK2gIZoas_.s
rm /tmp/ifortldashv7B4mF7
rm /tmp/iforttempfilenQtt0t
rm /tmp/ifortargvFMClQ
rm /tmp/ifortgnudirsMR2abY
rm /tmp/ifortgnudirsHeROwk
rm /tmp/ifortgnudirsDsnJSG
rm /tmp/ifortldashvJ79Ve3
rm /tmp/ifortgnudirsXiurBp
rm /tmp/ifortgnudirsp3WeYL
rm /tmp/ifortgnudirsmUDkl8
rm /tmp/ifort7GVk2e.o

[/bash]

James_T_Intel · ‎04-22-2013

Hi Stefan,

The problem is not related to gfortran. The libmpigf.so library is used both for gfortran and the Intel® MPI Library. I am able to get the same behavior here. I'll check with the developers, but I'm expecting that MPI_IN_PLACE may not be correctly handled in ILP64.

As a note, the MPI Fortran module is not supported for ILP64 programming in the Intel® MPI Library. Please see Section 3.5.6 of the Intel® MPI Library Reference Manual for more information on ILP64 support.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Stefan_K_2 · ‎04-22-2013

hi James,

thanks for your detailed answer. I am looking forward to hear about the feedback from the developers. a similar part of the MPI-parallelized code above constitutes a central piece in a core functionality of a quantum chemistry program package (called "Dirac") where I am contributing developer. It would be great to know that with one of the next releases IntelMPI with the ILP64 model could then be fully supported.

with best regards,

stefan

James_T_Intel · ‎04-22-2013

Hi Stefan,

Try compiling and running with -ilp64.

[plain]mpiifort -ilp64 -O3 test.f90 -o test[/plain]

[plain]mpirun -ilp64 -n 4 ./test[/plain]

This works for me.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Stefan_K_2 · ‎04-22-2013

hi James,

indeed reduce+MPI_IN_PLACE works with that setup also for me. However, MPI_COMM_SIZE does no longer work:

[fortran]

program test
include "mpif.h"
integer :: mytid, numnod, ierr

mytid = 0
! initialize MPI environment
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, mytid,ierr)
call mpi_comm_size(mpi_comm_world, numnod,ierr)

print *, 'mytid, numnod ', mytid, numnod

call mpi_finalize(ierr)
end program

[/fortran]

Compiling and running the above test program with

[bash]

mpiifort -ilp64 -O3 test.F90
mpirun -ilp64 -np 4 ./a.out
mytid, numnod 1 0
mytid, numnod 0 0
mytid, numnod 2 0
mytid, numnod 3 0

[/bash]

yields a "0" for the size of the communicator MPI_COMM_WORLD.

Any idea what could be wrong?

with best regards,

stefan

James_T_Intel · ‎04-22-2013

Hi Stefan,

So I see. I am able to get the correct results by compiling and linking with -ilp64, but without -i8, and changing the declaration of numnod to integer*8. Let me check with the developers and see what we can do about this.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Stefan_K_2 · ‎04-22-2013

hi James,

thanks for your feedback, i get exactly the same now as you described above. what i should maybe emphasize is that i was aiming at a working compilation with 64-bit integers as default size (-i8 or -integer-size 64) which somehow implies the ILP64 model as far as i can see.

What exactly does the [bash]-ilp64[/bash] flag set during compilation? obviously, it does not imply 64-bit default integers in the Fortran code as such. does it only enable linking to the ILP64 Intel libraries?

with best regards,

stefan

James_T_Intel · ‎04-22-2013

Hi Stefan,

Using -ilp64 links to libmpi_ilp64 instead of libmpi. The correct way to utilize this is to compile with -i8, then link and run with -ilp64. However, this is not giving correct results either.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Stefan_K_2 · ‎04-22-2013

hi James,

thanks for the clarification and your patience. Let's see what the developers can come up with.

with best regards,

stefan

James_T_Intel · ‎05-08-2013

Hi Stefan,

There are two workarounds for this. The first is to not use MPI_IN_PLACE in a program with -i8. The second is to modify mpif.h. Change

[plain] INTEGER MPI_BOTTOM, MPI_IN_PLACE, MPI_UNWEIGHTED[/plain]

to

[plain] INTEGER*4 MPI_BOTTOM, MPI_IN_PLACE, MPI_UNWEIGHTED[/plain]

This works for your test program. Try it on your

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

James_T_Intel · ‎09-02-2014

Stefan,

If you're still watching this, how did the workarounds work for your program?