Solved: Need help : different result with the same computation

stephaneBelsim · ‎03-08-2021

hello, using ifort I have a numerical difference in the last digit when the same computation is done twice in a row... from what I understand, while this digit isn't reliable, doing the same computation twice should give exactly the same result.

And in the end, since the number is used in a difference with another number of the same order, we get many different digits. Those differences represent nothing (the "useful" parts of the numbers are the same), but it make our integrations tests difficult (we get a lot of différences and we must check if they are all numerical) ; and the strange thing is that it appears only in a limited set of equation (all our equation are, by nature, difference between number of the same order, and many use the same kind of sum...).

here is the code snippet that produce strange results :

As you see :

- this is a very old code. It uses synthaxes we shouldn't use anymore... But hey, it works, we don't have the ressources to change the whole code (but we don't use old synthaxes in old routines) and we never had this problem before.

- obviously, I can give some code snippet... But I can't give the whole code. And I'm not sure I can reproduce this problem on a very short example.

- But I tried to create a minimal example that create the problem without "invisible" operations or change in the memory.

So, how the example works :
- The program needs a variable "Dtot" which is a sum of other variables.

- I get different results on this sum, when running the same data several time in a row.

- I never get different results in debug mode, so I needed to write the results in a file. I never get different results when I ask the program to write dtot as it compute it, so I commented the write instruction in the first sum, and instead, I make it re-do the computation, re-initializing Dtot with the same value, re-doing the sum and asking the program to write the results at each step.

here is a comparison (using winmerge) of the files "fort.253" obtained after two run of the whole program with exactly the same data :

As you see, the two runs give different results for dtot, but, more importantly, on the right window, it gives different results when computing dtot twice in a row : the last digit is "2" in the sum that doesn't write anything, and "1" in the sum that write the results in the file.

additionnal note : this is the second time this subroutine is called in this run (the first time, it doesn't have any difference between the two runs ; the line "sysimp for r3v" is here to localize when we enter and exit this subroutine). Anyway, the run on the right window shouldn't give different sum whether we write the result in a file or not.

what I need is someone who has an idea why we can have different results when the same computation is done twice : even if the last digit isn't reliable due to rounding errors, it should be always the same. I know it's really hard to debug this without our whole code nor our data, but I think you understand I can't publish our code nor our data here.

mecej4 · ‎03-08-2021

The same source code in different surroundings can, especially when optimization has been turned on, result in different machine code. It is possible that one version of the machine code stores intermediate results into memory and pulls them back into registers later. If you are using x87 instructions, storing 80-bit floats into 8 bytes (64 bits) of memory, and then fetching back to an 80-bit register can give you a slightly different value than if there had been no storing in memory.

Look at the /fp options in the documentation and explore the influence of using those options.

Code snippets that you post are a lot more readable and useful as text rather than as images. Click on the "..." button on the toolbar, then the "</>" button, set the language to Fortran and paste in the code.

If you want comments on what machine code is being generated, you will need to report the CPU, the OS, the compiler options, etc. See the guidelines.

View solution in original post

mecej4 · ‎03-08-2021

The same source code in different surroundings can, especially when optimization has been turned on, result in different machine code. It is possible that one version of the machine code stores intermediate results into memory and pulls them back into registers later. If you are using x87 instructions, storing 80-bit floats into 8 bytes (64 bits) of memory, and then fetching back to an 80-bit register can give you a slightly different value than if there had been no storing in memory.

Look at the /fp options in the documentation and explore the influence of using those options.

Code snippets that you post are a lot more readable and useful as text rather than as images. Click on the "..." button on the toolbar, then the "</>" button, set the language to Fortran and paste in the code.

If you want comments on what machine code is being generated, you will need to report the CPU, the OS, the compiler options, etc. See the guidelines.

stephaneBelsim · ‎03-15-2021

Thank for your answer ; I'm not sure I understand the technical part of your answer, but I changed the /fp option to "precise" and it seems to solve my problem. Since I'm not sure I understand the technical part, I'm not sure if there can be any other side effect (beside a slightly slower computation, but I don't think we'll even see it in our program). Note : I can't find the "consistent" option in Visual Studio 2013 (and I didn't try to know our compilator version since "Precise" seems to be enough).

jimdempseyatthecove · ‎03-08-2021

Are any of the arrays (W and/or FONCTI) allocatable, and reallocated between runs?

If so, and if the compiler generated vectorized code, then the alignment and/or lack of alignment can affect the code path taken.

For arrays that are not (known to be) aligned, the compiler, when generating vector code, will (depending on what it knows), insert what is called peal code that attempts to align the data fetches on vector boundaries. The peal code runs in scalar mode. This section is then followed by vector code, that advances vector(s) at a time to the last whole vector in the array(s), then if there is a remainder, the remainder is processed in scalar mode. Therefore, if there be a change in starting positions of either or both arrays (and array alignment is not known), the code paths may differ between runs.

A potential correction is to either not reallocate the arrays or, better yet, allocate with the allignment attribute .AND. since this is old code and likely the arrays are dummies, attribute the dummies of the arrays to declare that they are aligned.

By the way, on future posts, click on the ... toolbar button, then click on </> button, then Pull down Markup and select Fortran. In the resulting text box, paste your source code.

Subroutine xxx
...

This is much easier to read than a low-res screenshot, ... and can be copied by other readers for use in testing.

Jim Dempsey

jimdempseyatthecove · ‎03-08-2021

One more thing. Please be aware that the compiler now has realloc_lhs on as default. Therefore, depending on your coding style, it may not be obvious from the source code as to if a reallocation occurred or not. In particular, for an allocatable array:

array = array expression

may result in reallocation. Whereas

array(:) = array expression

will not.

Jim Dempsey