Solved: Does Intel math library behave differently on Intel and AMD processors ?

e745200 · ‎02-01-2009

I would like to share my experience about a different behavior of the Intel Fortran math library (I guess) shown on Intel Xeon and AMD Athlon processors in a specific situation.

It took a significant amount of time chasing this error. Should someone else incur in a similar case, I would be happy if he/she might save some time in fixing it, by means of this report.

I have extracted the relevant fragments from a large and complex legacy program, which had never shown the behavior reported below in several years of work on different platforms.

Well, the case raises from a trivial bad call of a C function from a FORTRAN program.
The call was bad because the C function returned a double, while it was referenced by the FORTRAN program as a subroutine.
In the following example, I simplified it as much as possible :

cfunc.c
------------------------------------------------------
double cfunc_() { return((double) 1.0); }
------------------------------------------------------

fmain.f
---------------------------------------------
PROGRAM
CALL CFUNC()
A=10.D0
PRINT *,A,LOG(10.D0)
PRINT *,A,LOG(A)
PRINT *,A,LOG(A)
END
---------------------------------------------

$ gcc -c cfunc.c
$ ifort -fp-model strict fmain.f cfunc.o -o fmain
$ ./fmain
10.0000000000000 2.30258509299405
10.0000000000000 NaN <<<<<<<<<<<<<<<<<
10.0000000000000 2.30258509299405

You can say that the second call to LOG returns a NaN.
It can be seen in several ways that an INVALID FP operation occurs during the LOG function execution.

The bad reference to CFUNC seems to cause the misfunction when :

the FORTRAN program is compiled using -fp-model strict option (which is needed in the original full program)
the executable is run on AMD Athlon 64 X2 Dual Core Processor 3600+ (featured by the system where new developments of the program were in progress).
it is the first time the LOG function is referenced, after the bad call, using a variable as the argument (not a constant : perhaps LOG(10.D0) is computed at compile time ?)

Note that the very same executable runs fine on Intel Xeon, perhaps in this case the error does not hurt because of the way the FP unit works (?).

On both system, the OS was RHEL 5.2 and Intel FORTRAN compiler was version 11.074.

Of course, the simplest way to fix the error is as follows, calling the C function as a function, popping up the returned value into a double precision variable:

....
DOUBLE PRECISION RETVAL
....
RETVAL = CFUNC()
....

This trivial fix requested several days of investigation, also because :

The misfunction occurred relatively far from the bad call: the LOG function was first used in a subroutine at least five levels deeper than the one featuring the bad call, and all the passed through subroutines, and those called by them, had to be investigated;
I devoted my attention primarily to the more recent changes made in the program, in a different, but somehow related area (floating point environment management);
That section of code had been working for several year, and on different platforms;
As said, it seemed to be not matter of languages, compilers or operating systems, but of processors, given that the same executable works differently in environments in which the processor is the only apparent difference.

Said this, I have two questions :

Is it really a processor-level issue, or does the LOG function in the Intel Math library behave in different ways, detecting at run-time which is the processor in use (and with the behavior on AMD less robust than the behavior on the Intel processor ) ?
Which is an effective diagnostic tool which can trap this sort of errors caused by mixed-language misuses (aside an expensive, thorough inspection) ?

Thanks a lot.

PS.
How is true what is said in the documentation about mixed-language programming overview :
"During execution, the bad call causes indeterminate results and/or a fatal error. The error, caused by memory or stack corruption due to calling errors, often occurs in a seemingly arbitrary place in the program."

Steven_L_Intel1 · ‎02-02-2009

Tim has the essence of the problem. The Intel math library does internal automatic CPU dispatching and will take different code paths depending on the processor type. Non-Intel processors are treated as generic IA-32 and will usd x87 instructions for FP operations, while the Intel Xeon processor you used would take a path that used SSE instructions.

Becuase you called the function as a subroutine, the function's return value was left on the FP stack, corrupting the stack for later operations. If the log function used x87 instructions, as it would on the non-Intel processor, it could very well give incorrect results. You'd see the same issue if the program was run on some older Intel processors.

There are some compiler options that can help you diagnose odd FP problems. The set I generally recommend but which would not have helped in this case is "-gen-interface -warn interface". This would be good if the called function was Fortran, but not if it's C. I'd rather see you write explicit interfaces for external functions so that the compiler can check them.

You can also use -fp-stack-check. With this enabled, if the FP stack is corrupted around a call or other operation, you'll get a segmentation fault at the point of the error. Hopefully, with a traceback in hand you can then identify the problem.

View solution in original post

TimP · ‎02-01-2009

Your information doesn't have much value, without indication of compile options or 32/64-bitness of compiler and OS. If you are breaking the x87 floating point stack, or the CPU general stack, you can't draw the conclusion that the math library has more robust behavior on Intel.
The math library apparently does check the type of CPU it is running on, and sometimes takes different code paths accordingly. We have raised the possibility that it might be more reliable to make the math library behavior depend only on the compile/link architecture switches, but it seems a long shot to hope this change might be made.

e745200 · ‎02-02-2009

Quoting - tim18

Your information doesn't have much value, without indication of compile options or 32/64-bitness of compiler and OS. If you are breaking the x87 floating point stack, or the CPU general stack, you can't draw the conclusion that the math library has more robust behavior on Intel.
The math library apparently does check the type of CPU it is running on, and sometimes takes different code paths accordingly. We have raised the possibility that it might be more reliable to make the math library behavior depend only on the compile/link architecture switches, but it seems a long shot to hope this change might be made.

Thanks for your reply.

Sorry for having missed this relevant information.
In the systems we are working on, the Intel Xeon processor is a 32 bit one, and of course runs 32 bit OS and Compiler.
The Amd Athlon is a 64 bits processor, but running the same 32 bit OS and Compiler.

As far as the compiling options are concerned, I posted the command lines as they were actually used. No change had been made to the compiler configuration file, then the not showed options are the default ones for version 11.074 of the compiler.

I am not drawing any conclusion, I have not enough elements to dare so much. I am just doing questions and hypothesis, based on the observation that, if any stack had been broken (has the bad call this effect ? always ? I'm not an expert in this field, I'm asking), this does seem to hurt on the Intel processor, while it does on the AMD processor. This is what I would call "higher overall robustness", in some sense.
I was wondering if this is an internal processor matter or a processor-depending library matter; just a casual result or something depending on the library design.

Thanks again.

Steven_L_Intel1 · ‎02-02-2009

Tim has the essence of the problem. The Intel math library does internal automatic CPU dispatching and will take different code paths depending on the processor type. Non-Intel processors are treated as generic IA-32 and will usd x87 instructions for FP operations, while the Intel Xeon processor you used would take a path that used SSE instructions.

Becuase you called the function as a subroutine, the function's return value was left on the FP stack, corrupting the stack for later operations. If the log function used x87 instructions, as it would on the non-Intel processor, it could very well give incorrect results. You'd see the same issue if the program was run on some older Intel processors.

There are some compiler options that can help you diagnose odd FP problems. The set I generally recommend but which would not have helped in this case is "-gen-interface -warn interface". This would be good if the called function was Fortran, but not if it's C. I'd rather see you write explicit interfaces for external functions so that the compiler can check them.

You can also use -fp-stack-check. With this enabled, if the FP stack is corrupted around a call or other operation, you'll get a segmentation fault at the point of the error. Hopefully, with a traceback in hand you can then identify the problem.

jimdempseyatthecove · ‎02-02-2009

The problem is not the LOG statement.
The problem, as you have pointed out is the CALL statement used where a function call was to be used.
This particular problem can quickly be identified with /warn:interfaces (may require /geninterfaces at least once).

The differingsymptoms observed may be due to the code optimizations reordering the placement of the CALL. The use of CALL verses function call (is going to)/(may) corrupt the stack. If it does, and if the optimizations have omit frame pointer, then your local variables referenced by code following the corruption will be off by size of intptr_t number of bytes.

I suggest you enforce the checking of interfaces, at least on the Debug configuration.

Jim Dempsey

e745200 · ‎02-02-2009

Thanks a lot for your answers and your valuable explanations and hints !

I will try to use the flags you suggested, to check if other bad calls (or other strange things) need to be fixed.

Jim, for sure the LOG is not the problem in itself, and its results only pointed out the existence of a problem generated by some previous operation, that I finally found being the bad call to the C function; as I said, in the original full program the are really far away to each other, and this distance made the investigation harder.

GM