I am facing some problems running a fortran code compiled with ifort. Initially, I wrote the code on a Core 2 machine running a 32 bit debian OS (Lenny) and I used the Lahey fortran compiler. Everything runs very smoothly. In order to speed things upI switched to an i5 machine running the 64-bit version of the same OS and I changed the compiler to ifort 11.1 .
Before I state my problem, I would like t mention my code is error-free and produces the desired results using the Lahey compiler. I cannot post it here because It's huge. When I compile the code with ifort, the results I get are completely different than the ones I obtain with Lahey. I am using the following compiler options: -g -autodouble -shared-intel -mcmodel=medium -m64 -fp-model precise I am not sure if these options are optimal or whether my problem is architecture-related. Maybe it's due to the manner by which the compiler handles floating points. I do not have much experience with ifort, and my choice of compiler options is based on the descriptions available in the manual.
I was wondering if someone has faced similar problems when switching between compilers. If yes, what is the optimal and most reliable combination of compiler flags for the newer intel processors?
What compiler options did you use with the Lahey compiler, and do they imply the same selections as those you used for IFort Specifically, why did you use the -autodouble and -fp-model precise options with IFort?
As far as optimization for speed is concerned, with IFort you may start out with -fast, after you have succeeded in obtaining results that match (to the extent that is reasonable) the 32-bit results.
The Lahey fortran compiler options I use are: --dbl --chk --nsav --staticlink --trace
Here's a description of what each does:
--dbl extends all single-precision REAL and single-precision COMPLEX variables, arrays, constants, and functions to REAL (KIND=8) and COMPLEX (KIND=8) respectively.
--chk generates a fatal runtime error message when substring and array subscripts are out of range, when non-common variables are accessed before they are initialized, when array expression shapes do not match, or when procedure arguments do not match in type, attributes, size, or shape.
--nsav allocates variables on the stack.
--staticlink creates an executable linked with the static LF95 Fortran runtime libraries, and the shared versions of the Linux system libraries. Specifying --staticlink will result
--trace causes a call traceback with routine names and line numbers to be generated with runtime error messages.
With ifort, -autodouble is the equivalent of real-size 64. As for, -fp-model precise, I use it to minimize the optimization of floating point data because in my code, some double precision variables assume very tiny or huge values (something related to the physics of my problem).
I will try the -fast option tomorrow and I'll let youy know how it goes.
Again, thanks for your help. Any feedback is really appreciated.
What you have done seems OK. However, you are making three changes simultaneously (32 bits to 64 bits a.outs, 32 bit OS to 64 bit OS, and Lahey to IFort). And, as you have stated, the code is large. You may consider making the changes more gradually, noting whether each change causes noticeable changes to the program results. If you use any unformatted files, note that they may not be compatible.
1. Run the 32 bit a.out produced by the 32-bit Lahey compiler on the 64 bit OS and also on the old computer, without using the -dbl option; you may need to copy the Lahey runtime to the 64 bit OS, and install 'compat-32' support on the 64 bit OS, if you have not done so already.
2. Generate a 32 bit a.out using the Intel compiler.
3. Add the -dbl / -real-size 64 options and repeat.
Thanks for the tip. I will give a shot. I have been experimenting with the code, and apparently everything works fine on the 64-bit OS with ifort except for one chunk of the code. However, I couldn't find anything unusual in it. The data types of the variables and their dynamic values are compatible. I believe it is something related related to floating point exceptions. This part of the code uses Ridders' algorithm to approximate the derivative of continuous smooth functions. This algorithm returns the derivative using central differencing and polynomial extrapolation (Richardson tabulation). When applying central differencing, the mesh spacing is decreased gradually to the order of the machine round-off error. This error is machine-dependent.
Do you recommend any compiler options that prevent problems when the values close to the round-off error are involved in the computations?
I forgot to mention that I used gfortran as well (64-bit) and I obtained the same result with that chunk I described above. So it's something specific to that chunk. I shouldn't have blamed it on ifort! I'll look into it and let you know how it goes.
I have no experience with Ridder's extrapolation method, but its characteristics suggest the following questions (some parts may be mere speculation!).
Were you generating x87 (80 bit) instructions with the Lahey compiler? Or SSE instructions (64 bit)? Does the code chunk that you alluded to require the use of denormals?
An extrapolation method to compute derivatives seems appropriate for use when the function values can be calculated to machine precision. If this is not true, it is more appropriate to use a single central-difference approximation, without Richardson extrapolation, but with an estimated optimum step.
If you solve a PDE by using a standard central-difference approximation, there is an inherent discretization error. Given function values f(x) that have relative error \delta, the error in a central-difference approximation (with steps +h and -h) to the derivative consists of two parts: the discretization error, proportional to h^2/6, and the error caused from inexactness in the function itself, which is proportional to (the error in f)/h. The two terms change in opposite directions with changes in h, so that there is an optimum h that is neither too small (when function errors dominate) or too large (when discretization errors dominate).