Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

MPICH2

gib
New Contributor II
895 Views
Hello. I have been on a wild goose chase trying to track down a multiplicity of mutating errors in my code using MPICH2 on a quad core Intel box. These are Heisenbugs with a vengeance. I see things like access violation, stack overflow, array index out of range, and side effects from subroutine calls changing values of local variables. I'm using allocatable arrays quite widely, but as far as I can tell making these static doesn't improve matters, although it changes them unpredictably. I get different results with debug and release versions, and also changing the compiler optimization switch often leads to different results. At times I've thought all was working correctly, but then adding or removing a write statement showed me I was wrong.

In 99.99% of previous situations like this the fault has turned out to be mine, but I've been over my code exhaustively and I'm starting to wonder if it could be a problem with MPICH. I'd like to know of anyone else's experience with MPICH2. Failing that, any suggestions about how I might track down the source of these problems? I have a sneaking suspicion that it could be stack-related, but activating the runtime stack checking doesn't show anything. Thanks.
Gib
0 Kudos
4 Replies
Steven_L_Intel1
Employee
895 Views
One of the issues with MPI is that the array or variables you pass may get read or written after the call returns. In such cases, at present, you must declare those arrays VOLATILE. There is a proposal in progress to enhance the Fortran standard to allow ASYNCHRONOUS to apply better to such calls, but right now it has some limitations.

See if adding VOLATILE to arrays you pass to MPI send and receive calls helps.
0 Kudos
gib
New Contributor II
895 Views
That sounds very promising, Steve, since it has occurred to me that slowing execution down with write statements seems to help. But this information surprises me greatly, since it means that you can't be sure about when the data transfer has completed. Isn't this a major problem?
0 Kudos
Steven_L_Intel1
Employee
895 Views
Ah, my unfamiliarity with MPI is showing. Normally, MPI_SEND and MPI_RECV block until complete. But it looks as if a non-blocking send/receive is being proposed.

In any event, your symptoms sound like data corruption. Are you sure that the lengths specified on MPI_RECV calls are correct? You may want to download a trial version of Intel Trace Analyzer and Collector to see if it can spot errors.
0 Kudos
gib
New Contributor II
895 Views
You raise my hopes, only to dash them! I do seem to have some sort of data corruption, but most of the time it isn't picked up by the bounds checking, and the instances that it does detect are spurious.

Thanks for the suggestion of trying the Trace Analyzer, I will do this.

Gib
0 Kudos
Reply