Most optimizations don't have

Uwe_H_ · ‎10-05-2016

For our modeling software we run regression tests every night. After upgrading the fortran compiler from 15 to 17 these tests still give the exact same results on the test server. However on my local machine I started getting slightly different results. The difference is so small I wouldn´t worry, but I need to be able to get the same results as on the test server. I ended up setting up a virtual machine running the tests, and it turns out that when I run that VM on a server the tests pass, when I run that VM on my local machine (no other changes) the tests fail. This leads me to believe that the cpu is what makes the difference. The server has this CPU: Intel Xeon CPU E5-2687W v2 @ 3.40GHz. Locally I have this: Intel i5-4570 3.2GHz. The project in question uses /fpe:0 /fp:source, and I´m not keen on changing this. So if I should go for a different cpu locally, is there any way to tell which one would behave the same way as that on the server? Or do you see any other options? Thanks Uwe

Steven_L_Intel1 · ‎10-05-2016

See this presentation. 443546

Uwe_H_ · ‎10-12-2016

Hi Steve,

thanks for the interesting read. It took me a while to test this. I changed all involved projects (~25) to use FloatingPointModel="strict", FloatingPointSpeculation="fpSpeculationStrict" and /Qimf-arch-consistency:true, then I also disabled openmp (which of course is not an option for production). Comparing the results from our test cases run on a xeon-machine to the results from my core-i machine shows that they are closer now, but still far from being identical. Do you have any further suggestions? Is there maybe any specific optimization that was introduced with compiler version 16 or 17 that I could turn off?

thanks
Uwe

Steven_L_Intel1 · ‎10-12-2016

Most optimizations don't have specific controls. I would suggest finding which particular operation gave different results and then you'll have a better chance of seeing if there's a way to reduce differences. It could be a vectorization difference or even stack alignment.

It is of course also possible that your program is referencing uninitialized memory.

Uwe_H_ · ‎10-12-2016

I actually did find a few uninitialized variables when investigating occasional program crashes, so I can´t rule it out, but as the results are reproducible when run on the same machine I still favor the different cpu as the reason. Trying to find the operation that produces different results seems logical, however for practical reasons I will now consider how I can change my workflow so that I can avoid running those tests locally at all.

Thanks!
Uwe

Steven_L_Intel1 · ‎10-12-2016

What kind of difference are we talking about? What is the precision of your inputs?

Uwe_H_ · ‎10-12-2016

All the relevant data is real*4. Picking one of the result files (containing ~360k values) the biggest difference is 0.001666605 where the reference actual value is 0.2652121. The first difference (it´s a simulation over time) however is just around 3e-8 with a value around 0.4 (this would be within our accepted tolerance), so at least part of the larger difference may be due to accumulation.

Different CPU, different results