Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Beginner
8 Views

Segmentation fault with mpiifort but works with mpif90

I have got into a rather peculiar situtation when running with mpi shared memory.

I am trying to measure bandwidth using shared memory in MPI and when running for different array sizes/bytes I am getting a segmentation fault when reaching 0.83886E+00 Mb of memory.

However, When running with `mpif90` (I am using the wrapper that comes with the Intel package installation in dir /intel/impi/to/bin- although it is MPI using the GNU Fortran) it works completely fine.

 

I can only thing of two things,

1) A intel bug

2) I have bug which the mpif90 does not catch, however, the intel MPI does.

 

I have tried to cut down the code as much as possible to keep it simple

 

The way I am doing MPI communication is between two codes, (test_sup1 and test_sup2), both in this case are almost identical. test_sup1 is the sender and test_sup2 is the receiver.  The only that differs really is the measure_bandwidth routine.

To compile and run I am using following command:

 

mpiifort -O0 test_sup2.f -o test2.a ; mpiifort -O0 test_sup1.f -o test1.a


mpirun -np 1 ./test1.a : -np 1 ./test2.a

 

 

Can anybody please help me to figure out what is wrong here

0 Kudos
1 Reply
Highlighted
Beginner
8 Views

I have now tested it on two

I have now tested it on two different machines, (Linux and MacOS). Both give the same outcome. I tried using valgrind, and it did not complain when running with gfortran. However, it did write an error message (when running with mpi ifort) when I reached to  0.83886E+00 Mb. The error messages was saying 

 

==29835== Invalid write of size 8

==29835==    at 0x4087FB: parallel_mp_measure_bandwitdh_ (test_sup1.f:366)

==29835==    by 0x40B0F9: MAIN__ (osc_1.f:486)

==29835==    by 0x4038A1: main (in test_sup1.a)

==29835==  Address 0x1ffedfcf30 is on thread 1's stack

==29835==  in frame #0, created by parallel_mp_measure_bandwitdh_ (test_sup1.f:351)

 

 

An error does refer to the MPI_SEND call @LINE:366. However, I tried to put some print commands before that call to see what was going on. In that case, the error was referred to the print statement?

Does anybody have a clue on what is going on please?

 

0 Kudos