Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
26745 Discussions

Segmentation fault with mpiifort but works with mpif90

AThar2
Beginner
186 Views

I have got into a rather peculiar situtation when running with mpi shared memory.

I am trying to measure bandwidth using shared memory in MPI and when running for different array sizes/bytes I am getting a segmentation fault when reaching 0.83886E+00 Mb of memory.

However, When running with `mpif90` (I am using the wrapper that comes with the Intel package installation in dir /intel/impi/to/bin- although it is MPI using the GNU Fortran) it works completely fine.

 

I can only thing of two things,

1) A intel bug

2) I have bug which the mpif90 does not catch, however, the intel MPI does.

 

I have tried to cut down the code as much as possible to keep it simple

 

The way I am doing MPI communication is between two codes, (test_sup1 and test_sup2), both in this case are almost identical. test_sup1 is the sender and test_sup2 is the receiver.  The only that differs really is the measure_bandwidth routine.

To compile and run I am using following command:

 

mpiifort -O0 test_sup2.f -o test2.a ; mpiifort -O0 test_sup1.f -o test1.a


mpirun -np 1 ./test1.a : -np 1 ./test2.a

 

 

Can anybody please help me to figure out what is wrong here

0 Kudos
1 Reply
AThar2
Beginner
186 Views

I have now tested it on two different machines, (Linux and MacOS). Both give the same outcome. I tried using valgrind, and it did not complain when running with gfortran. However, it did write an error message (when running with mpi ifort) when I reached to  0.83886E+00 Mb. The error messages was saying 

 

==29835== Invalid write of size 8

==29835==    at 0x4087FB: parallel_mp_measure_bandwitdh_ (test_sup1.f:366)

==29835==    by 0x40B0F9: MAIN__ (osc_1.f:486)

==29835==    by 0x4038A1: main (in test_sup1.a)

==29835==  Address 0x1ffedfcf30 is on thread 1's stack

==29835==  in frame #0, created by parallel_mp_measure_bandwitdh_ (test_sup1.f:351)

 

 

An error does refer to the MPI_SEND call @LINE:366. However, I tried to put some print commands before that call to see what was going on. In that case, the error was referred to the print statement?

Does anybody have a clue on what is going on please?

 

Reply