Minimizing differences between machines and compilations

esatel · ‎05-11-2012

Hi everyone,
I have an MPInumerical code that is misbehaving on one Centos/Rocks architecture with Myrinet and seems to do fineon other clusters and workstationsmostly RHE with Inifiniband or shared memory. It would be much easier to identify where things are going wrongif the code compiled in the respective locations produced exactly the same answer when compiled on the two machines and used with the same number of cores. I realize there are numerous reasons thismight not happen, but the OS is similar enough that I am hopeful.

Are there a recommended set of compiler switches that will encourage this? In other words, do things like ensured reproducible algebra, initializing the remote digits of my variables and promoting floating constants that "should have been doubles" to doubles (or just ensuring the coersion is predictable).

Many of the things I am asking for are of coursea la cartecommand line switches, I am just looking for a minimum complete set since I am sure there is something I won't think of and I don't want to perturb the layout of data if possible (ie I want to "debug the release version").

Thanks,
Eli

Steven_L_Intel1 · ‎05-11-2012

-fimf-arch-consistency=true is what you should start with. There is nothing that will "ensure reproducible algebra". Do make sure you are not using -ax switches. Also, if you are calling Intel MKL, you can run into architecture-dependent results.

TimP · ‎05-11-2012

You might choose a single architecture option which is supported by all the machines, such as -msse3. You should set the -fimf-arch-consistency=true option, if consistent results from math functions are more important than performance. "-fp-model source" will avoid several optimizations which produce numerical results which differ slightly with data alignment.
You will still encounter legitimate cases under MPI where numerical results vary with order of arrival of data. If there are conservative settings or settings which use extra precision accumulation, for allreduce and the like, you would want to try those.
If practical, wholesale promotion of a single precision application to double (ifort and other Fortrans have such options) would hide accidental numerical variations and assure you that variations you do find indicate a problem, but those will alter layout of data (depending on what you mean by that).

esatel · ‎05-11-2012

Thanks Tim and Steve.

I'll start with the flags you both suggested. I hadn't thought about the order of operations implied by allreduce ... I doubt it is the source of my woes but it will certainly hurt me trying to debug.