Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

ILP64 model: using MPI_IN_PLACE in MPI_REDUCE seems to yield wrong results

Stefan_K_2
Beginner
1,308 Views

hi,

i am using the ifort compiler v. 13.0.1 20121010 together with Intel MPI v.4.1.0.024 on an x86_64 Linux cluster. Using 64-bit integers as default (ILP64 model) in my little Fortran program i obtain wrong results when i use MPI_IN_PLACE in MPI_REDUCE calls (both for integer and real(8)):

my code is as follows:

[fortran]

program test
include "mpif.h"
! use mpi
integer :: iraboof
integer :: mytid, numnod, ierr
real(8) :: rraboof

mytid = 0
! initialize MPI environment
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, mytid,ierr)
call mpi_comm_size(mpi_comm_world, numnod,ierr)

iraboof = 1
if (mytid == 0) then
call mpi_reduce(MPI_IN_PLACE, iraboof, 1, mpi_integer, mpi_sum, 0, mpi_comm_world, ierr)
else
call mpi_reduce(iraboof, 0 , 1, mpi_integer, mpi_sum, 0, mpi_comm_world, ierr)
end if
if (mytid == 0) then
print *, 'raboof mpi reduce', iraboof, numnod
end if
rraboof = 1.0d0
if (mytid == 0) then
call mpi_reduce(MPI_IN_PLACE, rraboof, 1, mpi_real8 , mpi_sum, 0, mpi_comm_world, ierr)
else
call mpi_reduce(rraboof, 0 , 1, mpi_real8 , mpi_sum, 0, mpi_comm_world, ierr)
end if
if (mytid == 0) then
print *, 'raboof mpi reduce', rraboof, numnod
end if
call mpi_finalize(ierr)
end program

[/fortran] 

Compilation is done with

[bash]

mpiifort -O3 -i8 impi.F90

[/bash]

It compiles and links fine

[bash]

ldd ./a.out

linux-vdso.so.1 => (0x00007ffff7893000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003357c00000)
libmpi_ilp64.so.4 => /global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib/libmpi_ilp64.so.4 (0x00002ad1a4a3f000)
libmpi.so.4 => /global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib/libmpi.so.4 (0x00002ad1a4c69000)
libmpigf.so.4 => /global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib/libmpigf.so.4 (0x00002ad1a528e000)
librt.so.1 => /lib64/librt.so.1 (0x0000003358800000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003358000000)
libm.so.6 => /lib64/libm.so.6 (0x0000003357800000)
libc.so.6 => /lib64/libc.so.6 (0x0000003357400000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003359c00000)
/lib64/ld-linux-x86-64.so.2 (0x0000003357000000)

[/bash]

Running the program I however obtain

[bash]

mpirun -np 4 ./a.out
raboof mpi reduce 3 4
raboof mpi reduce 3.00000000000000 4

[/bash]

whereas it should produce

[bash]

mpirun -np 4 ./a.out 
raboof mpi reduce 4 4
raboof mpi reduce 4.00000000000000 4

[/bash]

which is what I also obtain with other MPI libraries.

I would appreciate any comment/help. 

with best regards,

stefan

p.s.: when i use the F90-interface ("use mpi") i obtain the following warnings at compile time:

[bash]

mpiifort -O3 -i8 impi.F90
impi.F90(9): warning #6075: The data type of the actual argument does not match the definition. [IERR]
call mpi_init(ierr)
-----------------^
impi.F90(10): warning #6075: The data type of the actual argument does not match the definition. [MYTID]
call mpi_comm_rank(mpi_comm_world, mytid,ierr)
--------------------------------------^
impi.F90(10): warning #6075: The data type of the actual argument does not match the definition. [IERR]
call mpi_comm_rank(mpi_comm_world, mytid,ierr)
--------------------------------------------^
impi.F90(11): warning #6075: The data type of the actual argument does not match the definition. [NUMNOD]
call mpi_comm_size(mpi_comm_world, numnod,ierr)
--------------------------------------^
impi.F90(11): warning #6075: The data type of the actual argument does not match the definition. [IERR]
call mpi_comm_size(mpi_comm_world, numnod,ierr)
---------------------------------------------^

[/bash]

and a crash at runtime

[bash]

mpirun -np 4 ./a.out
Fatal error in PMPI_Reduce: Invalid buffer pointer, error stack:
PMPI_Reduce(1894): MPI_Reduce(sbuf=MPI_IN_PLACE, rbuf=0x693828, count=1, MPI_INTEGER, MPI_SUM, root=0, MPI_COMM_WORLD) failed
PMPI_Reduce(1823): sendbuf cannot be MPI_IN_PLACE
Fatal error in PMPI_Reduce: Invalid buffer pointer, error stack:
PMPI_Reduce(1894): MPI_Reduce(sbuf=MPI_IN_PLACE, rbuf=0x693828, count=1, MPI_INTEGER, MPI_SUM, root=0, MPI_COMM_WORLD) failed
PMPI_Reduce(1823): sendbuf cannot be MPI_IN_PLACE
Fatal error in PMPI_Reduce: Invalid buffer pointer, error stack:
PMPI_Reduce(1894): MPI_Reduce(sbuf=MPI_IN_PLACE, rbuf=0x693828, count=1, MPI_INTEGER, MPI_SUM, root=0, MPI_COMM_WORLD) failed
PMPI_Reduce(1823): sendbuf cannot be MPI_IN_PLACE

[/bash]

0 Kudos
12 Replies
TimP
Honored Contributor III
1,308 Views

Your ldd result showing that you linked against the gfortran compatible library looks like a problem.  This shouldn't happen if you use mpiifort consistently.  The gfortran and ifort libraries can't coexist. Adding -# to the mpiifort command should give a lot more detail about what goes into the script which will pass over to ld.

0 Kudos
Stefan_K_2
Beginner
1,308 Views

dear Tim,

thanks for your immediate reply. please find below the output for compiling my program (the one above in the file impi.F90) with your suggested flag:

[bash]

mpiifort -i8 -# imi.F90

[/bash]

this compilation yields:

[bash]

mpiifort -i8 -# impi.F90
/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/bin/intel64/fpp \
-D__INTEL_COMPILER=1300 \
-D__unix__ \
-D__unix \
-D__linux__ \
-D__linux \
-D__gnu_linux__ \
-Dunix \
-Dlinux \
-D__ELF__ \
-D__x86_64 \
-D__x86_64__ \
-D_MT \
-D__INTEL_COMPILER_BUILD_DATE=20121010 \
-D__INTEL_OFFLOAD \
-D__i686 \
-D__i686__ \
-D__pentiumpro \
-D__pentiumpro__ \
-D__pentium4 \
-D__pentium4__ \
-D__tune_pentium4__ \
-D__SSE2__ \
-D__SSE__ \
-D__MMX__ \
-I. \
-I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include \
-I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include \
-I/global/apps/intel/2013.1/mkl/include \
-I/global/apps/intel/2013.1/tbb/include \
-I/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/include/intel64 \
-I/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/include \
-I/usr/local/include \
-I/usr/lib/gcc/x86_64-redhat-linux/4.4.7/include \
-I/usr/include \
-4Ycpp \
-4Ncvf \
-f_com=yes \
impi.F90 \
/tmp/ifortBOT7lB.i90

/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/bin/intel64/fortcom \
-D__INTEL_COMPILER=1300 \
-D__unix__ \
-D__unix \
-D__linux__ \
-D__linux \
-D__gnu_linux__ \
-Dunix \
-Dlinux \
-D__ELF__ \
-D__x86_64 \
-D__x86_64__ \
-D_MT \
-D__INTEL_COMPILER_BUILD_DATE=20121010 \
-D__INTEL_OFFLOAD \
-D__i686 \
-D__i686__ \
-D__pentiumpro \
-D__pentiumpro__ \
-D__pentium4 \
-D__pentium4__ \
-D__tune_pentium4__ \
-D__SSE2__ \
-D__SSE__ \
-D__MMX__ \
-mGLOB_pack_sort_init_list \
-I. \
-I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include \
-I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include \
-I/global/apps/intel/2013.1/mkl/include \
-I/global/apps/intel/2013.1/tbb/include \
-I/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/include/intel64 \
-I/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/include \
-I/usr/local/include \
-I/usr/lib/gcc/x86_64-redhat-linux/4.4.7/include \
-I/usr/include \
"-integer_size 64" \
-O2 \
-simd \
-offload_host \
-mP1OPT_version=13.0-intel64 \
-mGLOB_diag_file=/tmp/ifort7GVk2e.diag \
-mGLOB_source_language=GLOB_SOURCE_LANGUAGE_F90 \
-mGLOB_tune_for_fort \
-mGLOB_use_fort_dope_vector \
-mP2OPT_static_promotion \
-mP1OPT_print_version=FALSE \
-mCG_use_gas_got_workaround=F \
-mP2OPT_align_option_used=TRUE \
-mGLOB_gcc_version=447 \
"-mGLOB_options_string=-I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include -I/global/apps/intel/2013.1/impi/4.1.0.024/intel64/include -ldl -i8 -# -L/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/4.1 -lmpi_ilp64 -lmpi -lmpigf -lmpigi -lrt -lpthread" \
-mGLOB_cxx_limited_range=FALSE \
-mCG_extend_parms=FALSE \
-mGLOB_compiler_bin_directory=/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/bin/intel64 \
-mGLOB_as_output_backup_file_name=/tmp/ifortK2gIZoas_.s \
-mIPOPT_activate \
-mIPOPT_lite \
-mGLOB_machine_model=GLOB_MACHINE_MODEL_EFI2 \
-mGLOB_product_id_code=0x22006d91 \
-mCG_bnl_movbe=T \
-mGLOB_extended_instructions=0x8 \
-mP3OPT_use_mspp_call_convention \
-mP2OPT_subs_out_of_bound=FALSE \
-mGLOB_ansi_alias \
-mPGOPTI_value_profile_use=T \
-mP2OPT_il0_array_sections=TRUE \
-mP2OPT_offload_unique_var_string=ifort607026576Zo54LN \
-mP2OPT_hlo_level=2 \
-mP2OPT_hlo \
-mP2OPT_hpo_rtt_control=0 \
-mIPOPT_args_in_regs=0 \
-mP2OPT_disam_assume_nonstd_intent_in=FALSE \
-mGLOB_imf_mapping_library=/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/bin/intel64/libiml_attr.so \
-mIPOPT_obj_output_file_name=/tmp/ifort7GVk2e.o \
-mIPOPT_whole_archive_fixup_file_name=/tmp/ifortwarchNyvxkL \
"-mGLOB_linker_version=2.20.51.0.2-5.36.el6 20100205" \
-mGLOB_long_size_64 \
-mGLOB_routine_pointer_size_64 \
-mGLOB_driver_tempfile_name=/tmp/iforttempfilenQtt0t \
-mP3OPT_asm_target=P3OPT_ASM_TARGET_GAS \
-mGLOB_async_unwind_tables=TRUE \
-mGLOB_obj_output_file=/tmp/ifort7GVk2e.o \
-mGLOB_source_dialect=GLOB_SOURCE_DIALECT_FORTRAN \
-mP1OPT_source_file_name=impi.F90 \
-mP2OPT_symtab_type_copy=true \
/tmp/ifortBOT7lB.i90

ld \
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o \
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crti.o \
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/crtbegin.o \
--eh-frame-hdr \
--build-id \
-dynamic-linker \
/lib64/ld-linux-x86-64.so.2 \
-L/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib \
-o \
a.out \
/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/lib/intel64/for_main.o \
-L/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib \
-L/global/apps/intel/2013.1/mkl/lib/intel64 \
-L/global/apps/intel/2013.1/tbb/lib/intel64 \
-L/global/apps/intel/2013.1/ipp/lib/intel64 \
-L/global/apps/intel/2013.1/composerxe/lib/intel64 \
-L/global/hds/home/install/intel/2013.1/composer_xe_2013.1.117/compiler/lib/intel64 \
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7/ \
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64 \
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/ \
-L/lib/../lib64 \
-L/lib/../lib64/ \
-L/usr/lib/../lib64 \
-L/usr/lib/../lib64/ \
-L/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib/ \
-L/global/apps/intel/2013.1/mkl/lib/intel64/ \
-L/global/apps/intel/2013.1/tbb/lib/intel64/ \
-L/global/apps/intel/2013.1/ipp/lib/intel64/ \
-L/global/apps/intel/2013.1/composerxe/lib/intel64/ \
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../ \
-L/lib64 \
-L/lib/ \
-L/usr/lib64 \
-L/usr/lib \
-ldl \
/tmp/ifort7GVk2e.o \
--enable-new-dtags \
-rpath \
/global/apps/intel/2013.1/impi/4.1.0.024/intel64/lib \
-rpath \
/opt/intel/mpi-rt/4.1 \
-lmpi_ilp64 \
-lmpi \
-lmpigf \
-lmpigi \
-lrt \
-lpthread \
-Bstatic \
-lifport \
-lifcore \
-limf \
-lsvml \
-Bdynamic \
-lm \
-Bstatic \
-lipgo \
-lirc \
-Bdynamic \
-lpthread \
-Bstatic \
-lsvml \
-Bdynamic \
-lc \
-lgcc \
-lgcc_s \
-Bstatic \
-lirc_s \
-Bdynamic \
-ldl \
-lc \
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/crtend.o \
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crtn.o

rm /tmp/ifortlibgccyi9h59
rm /tmp/ifortgnudirs06mNow
rm /tmp/ifort7GVk2e.o
rm /tmp/ifortBOT7lB.i90
rm /tmp/ifortakfVFX.c
rm /tmp/ifortdashvdk0IZj
rm /tmp/ifortargC1wikG
rm /tmp/ifortgas65oTE2
rm /tmp/ifortK2gIZoas_.s
rm /tmp/ifortldashv7B4mF7
rm /tmp/iforttempfilenQtt0t
rm /tmp/ifortargvFMClQ
rm /tmp/ifortgnudirsMR2abY
rm /tmp/ifortgnudirsHeROwk
rm /tmp/ifortgnudirsDsnJSG
rm /tmp/ifortldashvJ79Ve3
rm /tmp/ifortgnudirsXiurBp
rm /tmp/ifortgnudirsp3WeYL
rm /tmp/ifortgnudirsmUDkl8
rm /tmp/ifort7GVk2e.o

[/bash]

0 Kudos
James_T_Intel
Moderator
1,308 Views

Hi Stefan,

The problem is not related to gfortran.  The libmpigf.so library is used both for gfortran and the Intel® MPI Library.  I am able to get the same behavior here.  I'll check with the developers, but I'm expecting that MPI_IN_PLACE may not be correctly handled in ILP64.

As a note, the MPI Fortran module is not supported for ILP64 programming in the Intel® MPI Library.  Please see Section 3.5.6 of the Intel® MPI Library Reference Manual for more information on ILP64 support.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
Stefan_K_2
Beginner
1,308 Views

hi James,

thanks for your detailed answer. I am looking forward to hear about the feedback from the developers. a similar part of the MPI-parallelized code above constitutes a central piece in a core functionality of a quantum chemistry program package (called "Dirac") where I am contributing developer. It would be great to know that with one of the next releases IntelMPI with the ILP64 model could then be fully supported. 

with best regards,

stefan

0 Kudos
James_T_Intel
Moderator
1,308 Views

Hi Stefan,

Try compiling and running with -ilp64.

[plain]mpiifort -ilp64 -O3 test.f90 -o test[/plain]

[plain]mpirun -ilp64 -n 4 ./test[/plain]

This works for me.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
Stefan_K_2
Beginner
1,308 Views

hi James,

indeed reduce+MPI_IN_PLACE works with that setup also for me. However, MPI_COMM_SIZE does no longer work:

[fortran]

program test
include "mpif.h"
integer :: mytid, numnod, ierr

mytid = 0
! initialize MPI environment
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, mytid,ierr)
call mpi_comm_size(mpi_comm_world, numnod,ierr)

print *, 'mytid, numnod ', mytid, numnod

call mpi_finalize(ierr)
end program

[/fortran]

Compiling and running the above test program with 

[bash]

mpiifort -ilp64 -O3 test.F90
mpirun -ilp64 -np 4 ./a.out
mytid, numnod 1 0
mytid, numnod 0 0
mytid, numnod 2 0
mytid, numnod 3 0

[/bash]

yields a "0" for the size of the communicator MPI_COMM_WORLD. 

Any idea what could be wrong?

with best regards,

stefan

0 Kudos
James_T_Intel
Moderator
1,308 Views

Hi Stefan,

So I see.  I am able to get the correct results by compiling and linking with -ilp64, but without -i8, and changing the declaration of numnod to integer*8.  Let me check with the developers and see what we can do about this.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
Stefan_K_2
Beginner
1,308 Views

hi James,

thanks for your feedback, i get exactly the same now as you described above. what i should maybe emphasize is that i was aiming at a working compilation with 64-bit integers as default size (-i8 or -integer-size 64) which somehow implies the ILP64 model as far as i can see. 

What exactly does the [bash]-ilp64[/bash] flag set during compilation? obviously, it does not imply 64-bit default integers in the Fortran code as such. does it only enable linking to the ILP64 Intel libraries?

with best regards,

stefan 

0 Kudos
James_T_Intel
Moderator
1,308 Views

Hi Stefan,

Using -ilp64 links to libmpi_ilp64 instead of libmpi.  The correct way to utilize this is to compile with -i8, then link and run with -ilp64.  However, this is not giving correct results either.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
Stefan_K_2
Beginner
1,308 Views

hi James,

thanks for the clarification and your patience. Let's see what the developers can come up with. 

with best regards,

stefan

0 Kudos
James_T_Intel
Moderator
1,308 Views

Hi Stefan,

There are two workarounds for this.  The first is to not use MPI_IN_PLACE in a program with -i8.  The second is to modify mpif.h.  Change

[plain]       INTEGER MPI_BOTTOM, MPI_IN_PLACE, MPI_UNWEIGHTED[/plain]

to

[plain]       INTEGER*4 MPI_BOTTOM, MPI_IN_PLACE, MPI_UNWEIGHTED[/plain]

This works for your test program.  Try it on your

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
James_T_Intel
Moderator
1,308 Views

Stefan,

If you're still watching this, how did the workarounds work for your program?

0 Kudos
Reply