I have a fortran 77 code that is several hundred files and thousands of lines long and unfortunately uses cray pointers and written in a convoluted way to save memory instead of clarity. It has been used as a 32 bit application for years and I recently recompiled it in 64 bit using the intel compiler. But my output differs for all variables by almost 80%. Most of the difference is accumulated over several time-steps.
Since both of the versions compile perfectly, I have no way of identifying the offending variable/variables or piece of code that are causing this difference. What are some techniques I can use to get the two outputs to agree within 1% or less?
Of course you have a way of identifying the offending variables/code - it just takes work. "Instrument" the code to display intermediate values that are dumped to a log file. Run the 32-bit and 64-bit versions and compare logs to see where results start to diverge. Then refocus on that step - after some iterations, you should be able to identify the operation that gives a different result.
What you may find is that some operation produces a slightly different REAL value, and since you say there is accumulation, this can build up. You may find a coding error (uninitialized variable, out-of-bounds array access, mismatched types) or it may just be a platform difference. See Improving Numerical Reproducibility in C/C++/Fortran for more.
Thanks for the reply. I will try and follow your suggestions. I read somewhere that you once debugged a large code from 32bit to 64 bit that had cray pointers. How do you debug a situation where a cray pointer address is being assigned to an integer?
pointer (kp007, ro (0:ip2,0:jp2) )
but kp007 has (or has not) been declared and hence assumed as integer?
If you are using that extension, the pointer variable (kp007 in your example) gets automatically declared as the correct-size integer by its appearance in the POINTER declaration. If it was previously declared to be a different type/kind, you'll get a warning. For example:
t.f90(2): warning #7226: An integer pointer variable has been explicitly given a data type that is not the integer type and kind for an address on the current platform. [FOO]
If your issue was with Cray pointer assignment (32-bit vs 64-bit issue), I would have expected a crash.
The old 32-bit code likely was using the FPU (8087) code, whereas the 64-bit code is using SSE or AVX code. The FPU internally used 80-bit format, thus expressions that could be held within the FPU internal stack would have higher precision that than using SSE/AVX real or real(8), which get rounded to 32 or 64 bits after each operation (excepting for newest FMA).
Follow Steve's advice to try to locate the point in the calculations that significantly diverge.
Locating the point in the calculations that significantly diverge has been my focus in the past. The problem is that I have over 500 cray pointers and thousands of local variables ( I haven't counted my local variables so am just guessing the number based on the number of cray pointers). How do I compare the output at each time step? I would have to dump all those variables to files to compare them at the end of each time step or subroutine. Is there an automatic way of doing this?
I after a bit of googling, I found something called comparative debugging but no tools to perform it. Is there a way to set a breakpoint in two executables at the same location and compare memory and return the name of the array variable that includes that memory address?
Yes, dump the variables to files. I did this many times over the years. There is no automated way I know of. Ignore for now that you use pointers - it isn't relevant yet. You are doing computations on variables, and at some point displaying a result. Locate intermediate steps and dump those to a file, then diff the files.
When you dump to files for comparison purposes you should not be interested in speed of the dump nor compactness of the dump file. Rather you should be focused on reducing the time to find the problems. To this end, assure that you annotate your dumps (copiously).
T, Varun wrote:
.. I after a bit of googling, I found something called comparative debugging but no tools to perform it. Is there a way to set a breakpoint in two executables at the same location and compare memory and return the name of the array variable that includes that memory address?
Note that almost all the practical techniques with tracing and instrumentation of code mentioned in the vast body of programming literature, especially that is easily accessible online with C, C++, etc. - can be applied with a bit of common sense to Fortran as well. You can look them up and evaluate how to apply them in your code.
Also, consider using a graphical debugging environment such as Code::Blocks or Eclipse on your platform, if you're not doing so already.
On Windows, Microsoft Visual Studio IDE is a great help.: things like two concurrent debugging sessions and stepping through the code for the same simulations are straightforward.