Solved: same source code on two different machines gives different outp

nooj · ‎07-28-2010

I recently bought a new Mac i7 laptop and promptly installed ifort. Things were great. I never noticed any problems compiling or running my code. I went back to using the compute servers for a week, and came back to the laptop. Now the exact same code with the same compiler (different OS version, different architecture) gives different results.

Is there some diagnostic tests I can run? I've installed several program on the laptop in the past week, most notably MathType.

Both machines are macs, both running ifort version 11.1.088 20100401.

I've tried reinstalling ifort on the laptop.

The compiler options are the same for both.

In one test (using interprocess optimization), the code output is the same for both, up to the point where the laptop executable crashes (forrtl: error (72): floating overflow) and the desktop continues correctly.

In another test (without interprocess optimization), the code on the laptop simply diverges: calculations are different, and the Newton-Raphson convergence loop performs poorly, eventually causing my code to choose nonphysical solutions, which causes a crash. The weirdest thing is that in this case, the code "works": output data files get written correctly, and the bulk of the code is behaving correctly.

I've checked everything I can think of to check, and the two computers should produce the same results.

Tomorrow I leave for Singapore to give a presentation. This laptop is my primary work computer while I am away.

- Nooj

mecej4 · ‎07-28-2010

I don't have a Mac and I don't know if this applies, but it is easy to rule it out: Xcode 3.2.2, which you use on your laptop, has been reported to create some problems for which a workaround is available. See Ron's sticky note at the top of this forum:
http://software.intel.com/en-us/forums/showthread.php?t=73942&o=d&s=lr .

View solution in original post

nooj · ‎07-28-2010

compiling my code with -O0 -g and running my code with valgrind gave a thousand false positives, including an error on this line:

pt-data.f:113 write(*,*) "pt-data values, in mmHg:"

==4624== Conditional jump or move depends on uninitialised value(s)
==4624== at 0x75CDCC: for__open_proc (in /h2/nooj/research/code/victor/GNR/obj/ices/valgrind/AneurysmElast_speedup.valgrind.ices)
==4624== by 0x71E464: for__open_default (in /h2/nooj/research/code/victor/GNR/obj/ices/valgrind/AneurysmElast_speedup.valgrind.ices)
==4624== by 0x74DC54: for_write_seq_lis (in /h2/nooj/research/code/victor/GNR/obj/ices/valgrind/AneurysmElast_speedup.valgrind.ices)
==4624== by 0x5CA643: patientdata_mp_init_patientdata_ (pt-data.f:113)
==4624== by 0x4403E5: MAIN__ (driver.f:110)
==4624== by 0x4035BB: main (in /h2/nooj/research/code/victor/GNR/obj/ices/valgrind/AneurysmElast_speedup.valgrind.ices)

I even compiled valgrind (version 3.5.0) with whatever compiler you're supposed to use to reduce false positives.

nooj · ‎07-28-2010

i have tested the code on other, identical laptops (i7) and desktops (quad-core intel xeon) and i have confirmed that the same compiler gives the same result on the same architecture, and different results with different architectures.

my colleague has noticed numerical differences with gcc between the architectures in the past.

my question now is: have other people found similar behavior? were you able to mitigate it by choosing ifort flags which favor precision over speed? currently, i favor speed over precision.

- nooj

Steven_L_Intel1 · ‎07-28-2010

When you say "different architectures", what exactly do you mean? 32-bit vs. 64-bit? Core i7 vs. Core 2?

The math library will take different paths depending on the capability of the processor, and this can yield slightly different results. In the next major release we're adding an option to use the same (albeit slower) math library on all architectures. You can also try compiling with -mia32 and see if it helps.

mecej4 · ‎07-28-2010

Some of the symptoms described indicate that you may have some uninitialized variables, and perhaps some variables which should be given the SAVE attribute.

However, if you are doing a long calculation with real numbers, and the algorithm is not well-conditioned, some variability in the results is to be expected.

nooj · ‎07-28-2010

Steve -

different architectures in this case means
laptop: 2.66GHz Intel Core i7, OSX 10.6.4
desktop: 2.93 GHz Quad-Core Intel Xeon, OSX 10.5.8

-mia32 did not appear to have changed anything.

I am beginning to suspect that this new i7 chip is at the core of the problem somehow (pun intended). I have run this code on four different linux systems plus the quad-core xeon mac and never seen an appreciable difference in the output. Certainly nothing worse than the third significant figure. (I use 8-byte reals everywhere.) I have access to other mac laptops and will keep trying to figure this out.

mecej4 -

I'm pretty careful about the uninitialized variables thing, but not 100%. I have been burned by that before, and try very hard to init everything to 0d0 (or whatever is appropriate) at the start. I would use a flag to crash instantly as soon as I access any memory location whose value was not set by my program if there were one.

Here in the real world, I use -ftrapuv, -C, -automatic and never use save, because I was trained not to think of local variables as retaining values and because modules turn out to be more useful for me anyway. (I also have a legacy named common block.)

Long story short, I'm pretty familiar with the differences in memory management between SAVE and modules, but could you explain to me how to distinguish between these two situations in terms of the value of one over the other:
1. a program main which USEs a module, and one subroutine which USEs the module in which to store data between calls.
2. instead having that subroutine define the variables locally with the SAVE attribute (and not using the module).

Based on my deep-seated assumption of local variables never retaining their values, I think of SAVE as being for coders who are either lazy or desperate. I could use some insight from an expert on how SAVE can make my life easier.

Thanks to both of you for your insights.

- Nooj

Steven_L_Intel1 · ‎07-28-2010

Which Xcode version are you using?

As for SAVE, it sounds as if you don't need it as your code does not assume local variables retain their values across calls.

I am not sure if I could tell - does the exact same executable give different results on different Macs?

nooj · ‎07-28-2010

laptop: 2.66GHz Intel Core i7, OSX 10.6.4
Xcode 3.2.2 64-bit
Xcode IDE: 1650.0
Xcode Core: 1648.0
ToolSupport: 1631.0
ifort --version: 11.1.088 20100401

desktop: 2.93 GHz Quad-Core Intel Xeon, OSX 10.5.8
Xcode 3.1.2
Xcode IDE: 1149.0
Xcode Core: 1148.0
ToolSupport: 1102.0
ifort --version: 11.1.088 20100401

using the laptop executable (without -mia32) on desktop:
dyld: unknown required load command 0x80000022
Trace/BPT trap

using the desktop executable on laptop:
works correctly!

this is weird, because i said -mia32 had no effect. i will try it again.

mecej4 · ‎07-28-2010

I don't have a Mac and I don't know if this applies, but it is easy to rule it out: Xcode 3.2.2, which you use on your laptop, has been reported to create some problems for which a workaround is available. See Ron's sticky note at the top of this forum:
http://software.intel.com/en-us/forums/showthread.php?t=73942&o=d&s=lr .

mecej4 · ‎07-28-2010

Concerning the choice between:

1. a program main which USEs a module, and one subroutine which USEs the module in which to store data between calls.
2. instead having that subroutine define the variables locally with the SAVE attribute (and not using the module).

Both have their places, plus and minuses, and can be used with good effect. Placing data to be shared in a module is useful if that data is repeatedly used in a number of subprograms, or used in a number of projects.

Modules are a good place to keep static data as well as data that is variable but needs to be shared. On the other hand, public data in a module can be overexposed, and heavy usage of modules makes makefile rules more complicated.

As you have noted, SAVE, especially SAVE with no restricting variable list, can be abused as a false cure for not doing proper initialization. On the other hand, specifying SAVE for selected local variables keeps those private; SAVE is also very valuable for implementing reverse communication; see, for example,

http://www.caam.rice.edu/software/ARPACK/UG/node9.html

nooj · ‎07-29-2010

The Xcode 3.2.2 workaround was the solution for me. Thank you all for your time and attention, my apologies for not reading the article which you put in a very prominent location!

I'll check out that link when I have a chance. Is memory access particularly faster for either SAVE or modules?

- nooj

mecej4 · ‎07-31-2010

About the question Is memory access particularly faster for either SAVE or modules?

The answer would be quite dependent on the compiler implementation, the complexity of the data structures involved and the type and size of cache used.

For a simple program, such as the following code to select an item from an array, the SAVEd variables version

[fortran]subroutine sub(ipar,iv)
integer, dimension(4) :: ivar = (/1,2,3,4/)
integer, intent(in) :: ipar
integer, intent(out) :: iv

iv=ivar(ipar)
return

end subroutine sub

program tst
integer :: ipa,iva

ipa=3
call sub(ipa,iva)
write(*,*)iva

end program tst[/fortran]

and the module version

[fortran]module svars_mod
  integer :: ivar(4)
end module svars_mod

subroutine sub(ipar,iv)
use svars_mod
integer, intent(in) :: ipar
integer, intent(out) :: iv

iv=ivar(ipar)
return

end subroutine sub

program tst
use svars_mod
integer :: ipa,iva

ivar = (/1,2,3,4/)
ipa=3
call sub(ipa,iva)
write(*,*)iva
end program tst[/fortran]

produce fairly similar machine code. In particular, the shared data in array IVAR is put into the .data section of the a.out file in both versions.

same source code on two different machines gives different output: diagnostic tests?