I have attached two codes which are meant to communicate together. I also employ MPI shared memory to across these codes, and I am trying to profile the latency first, before moving to bandwidth.
What I do not understand is that my latency for communicating one byte is at one hand ~40-50 microsec and when I instead do statistics for say 10000 times I get a latency of 9.3 microsec. And the figure just seem to decrease at higher number of repeats.
Does this make sense? Also, the latency seems quite small for MPI at higher number of repeats?
Can anybody please look at the code and guide whether I am measuring it sensibly.
The code is run by following command :
mpirun -np Nproc osc_1.a : -np 1 osc_2.a
Keep np at 1 for osc_2 and always after the colon.
In the code:
The subroutine MEASURE_LATENCY inside osc_1.f will share inside a character which is then loaded in osc_2.f at an equivalent measure_latency routine.
I am also measuring the time in osc_2.f (but this is only to sensibly see if it is approximately similar to the time measured in osc_1.f)