- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have got into a rather peculiar situtation when running with mpi shared memory.
I am trying to measure bandwidth using shared memory in MPI and when running for different array sizes/bytes I am getting a segmentation fault when reaching 0.83886E+00 Mb of memory.
However, When running with `mpif90` (I am using the wrapper that comes with the Intel package installation in dir /intel/impi/to/bin- although it is MPI using the GNU Fortran) it works completely fine.
I can only thing of two things,
1) A intel bug
2) I have bug which the mpif90 does not catch, however, the intel MPI does.
I have tried to cut down the code as much as possible to keep it simple
The way I am doing MPI communication is between two codes, (test_sup1 and test_sup2), both in this case are almost identical. test_sup1 is the sender and test_sup2 is the receiver. The only that differs really is the measure_bandwidth routine.
To compile and run I am using following command:
mpiifort -O0 test_sup2.f -o test2.a ; mpiifort -O0 test_sup1.f -o test1.a mpirun -np 1 ./test1.a : -np 1 ./test2.a
Can anybody please help me to figure out what is wrong here
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have now tested it on two different machines, (Linux and MacOS). Both give the same outcome. I tried using valgrind, and it did not complain when running with gfortran. However, it did write an error message (when running with mpi ifort) when I reached to 0.83886E+00 Mb. The error messages was saying
==29835== Invalid write of size 8
==29835== at 0x4087FB: parallel_mp_measure_bandwitdh_ (test_sup1.f:366)
==29835== by 0x40B0F9: MAIN__ (osc_1.f:486)
==29835== by 0x4038A1: main (in test_sup1.a)
==29835== Address 0x1ffedfcf30 is on thread 1's stack
==29835== in frame #0, created by parallel_mp_measure_bandwitdh_ (test_sup1.f:351)
An error does refer to the MPI_SEND call @LINE:366. However, I tried to put some print commands before that call to see what was going on. In that case, the error was referred to the print statement?
Does anybody have a clue on what is going on please?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page