Re: invalid floating point operation - Page 2

ylmz · ‎01-21-2009

Hi all;

I am getting an invalid floating point operation error in release mode, which is not the case in the debug mode.I do not think thatthis is because of project setting differences between these two modes, since I already tried almost every combination.

The most interesting point is, I get the same error also in debug mode with "debug information format = line numbers only".
I get the error in the following line:

if (x .gt.0 .and. y .gt. 0) then
check=.not. ((REAL(t)/x .le. p/y).and.(REAL(t)/x .lt. 2))
else
check = .true.

However, unfortunately,I cannot see the current values of these variables since debug information format is
"line numbers only".

When I put a write statement in front of the lines above, the problem interestingly disappears.

What may be the problem ? Someting about compiler optimization or floating point stack ?

jimdempseyatthecove · ‎01-22-2009

Quoting - ylmz

Hi Steve;

The function declaration is as follows:

bool __stdcall fortranCall(long *p_res);

Then if the other values within the function remain constant for the duration of the application the only variable is the pointer to p. What is the value of the pointer to p at the time of error? (not the value pointed to by p)

p is likely pointing into La La Land

Also, I believe you want extern "C" for calling convention (caller cleans up stack) as opposed to __stdcall.
If the Fortran subroutine is not using the same calling convention it will be likely that you have corrupted the stack.
e.g.
float A, B, C;
...
retA = fortranCall(&A); // calling convention screws-up stack pointer
retB = fortranCall(&B); // *** not passing address of B ***
//(and calling convention screws-up stack pointer again)
C = A + B; // not using A, B, C

And be careful about the return value. C/C++ bool true is 1, Fortran LOGICAL .TRUE. = -1.

Upon return of "true" if your subsequent code uses

if(retValue) // will work
if(retValue == true) // will not work as intended

Jim Dempsey

ylmz · ‎01-22-2009

Yes, we declare it as extern "C".

By the way, I do not know what does COMPILER_IS_IVF preprocessor definition mean. Could it be leading the problem ?

ylmz · ‎01-22-2009

Quoting - ylmz

Yes, we declare it as extern "C".

By the way, I do not know what does COMPILER_IS_IVF preprocessor definition mean. Could it be leading the problem ?

I guess it is Intel Visual Fortran :)

ylmz · ‎01-22-2009

Quoting - Les Neilson

For a start I would suggest you look at Project->Properties->Fortran-> Run time-> Run time Error Checkingand turn on the checks for array and string bounds, and uninitialised variables.

I tried this with "check all" option. The following is the only warning that I get:

[fortl: warning(402): fort:(1): In call to I/O routine, an array temporary was created for argument #1.]

I guessthis has a drawback only in terms of performance. Or would it bethe reason forthis error ?

Steven_L_Intel1 · ‎01-22-2009

The array temporary message is just a performance warning.

I suggest adding the option Fortran > Floating Point > Check Floating Point Stack > Yes.

If you get an access violation when running with this on, this will indicate that the floating point stack has become corrupted.

I also notice that the option "Generate Interface Blocks is NOT enabled. You should turn that on and rebuild.

It would really help if we could see a complete test case.

jimdempseyatthecove · ‎01-22-2009

Quoting - ylmz

Yes, we declare it as extern "C".

By the way, I do not know what does COMPILER_IS_IVF preprocessor definition mean. Could it be leading the problem ?

In an earlier post you stated
>>
The function declaration is as follows:

bool __stdcall fortranCall(long *p_res);
<<

which is not an extern "C" declaration in your C++ function prototype.

To verify, place the following in theC++ code where you firstissuefortranCall(...

intptr_t here; // here is a local stack variable
here = (intptr_t)&here; // place address of here into here
fortranCall(YourArgumentHere);
if((intptr_t)&here != here) // verify stack pointer not bunged up
{
printf("Calling convention errorn");
exit(-1);
}

ylmz · ‎01-22-2009

Quoting - jimdempseyatthecove

In an earlier post you stated
>>
The function declaration is as follows:

bool __stdcall fortranCall(long *p_res);
<<

which is not an extern "C" declaration in your C++ function prototype.

To verify, place the following in theC++ code where you firstissuefortranCall(...

intptr_t here; // here is a local stack variable
here = (intptr_t)&here; // place address of here into here
fortranCall(YourArgumentHere);
if((intptr_t)&here != here) // verify stack pointer not bunged up
{
printf("Calling convention errorn");
exit(-1);
}

Hi;

Actually I did not write the prototype exactly, but it was declared inside a block as follows:

extern "C"
{
...
bool __stdcall fortranCall(long *p_res);
...
}

And I tried what you said, but I did not get an unusual result.
Actually I could not exactly understand the point in doing this.
"here" is a variable, and its address is pointing to a constant memory
location.Is it subject to change with stack pops and pushes ?

ylmz · ‎01-22-2009

Quoting - Steve Lionel (Intel)

The array temporary message is just a performance warning.

I suggest adding the option Fortran > Floating Point > Check Floating Point Stack > Yes.

If you get an access violation when running with this on, this will indicate that the floating point stack has become corrupted.

I also notice that the option "Generate Interface Blocks is NOT enabled. You should turn that on and rebuild.

It would really help if we could see a complete test case.

Actually I am trying to preapare a test case, but it is really difficult to catch the error when I do not use all the project files as a whole.

I added the "check floating point stack" option and ran again, however I did not receive any errors. Does this mean we can be sure that the problem is not originating from floating point stack corruption ?

When I enable "Generate Interface Blocks" option, I really get a lot of compilation errors stating that actual argument and expected parameter types do not match. Some of these violations are

REAL(4) --> REAL(8)
REAL(8) --> REAL(4)
INTEGER(2) --> INTEGER(4)
INTEGER(4) --> INTEGER(2)

As far as I know, the compiler will handle the situations where a smaller size is assigned to a larger sizevariable. Is this also the case in the opposite direction ? If not, the compiler is pushing the passed parameters asis ? If so,
why do not we getting an error when "Check Floating Point Stack"option is on ?

Steven_L_Intel1 · ‎01-22-2009

Those errors are the cause of your problem. Yes, in an assignment statement conversion is done but not for argument association. You are corrupting memory and/or referencing garbage data. Fix the errors detected by Generate Interface Blocks and the other errors will likely go away.

jimdempseyatthecove · ‎01-23-2009

Quoting - ylmz

Hi;

Actually I did not write the prototype exactly, but it was declared inside a block as follows:

extern "C"
{
...
bool __stdcall fortranCall(long *p_res);
...
}

And I tried what you said, but I did not get an unusual result.
Actually I could not exactly understand the point in doing this.
"here" is a variable, and its address is pointing to a constant memory
location.Is it subject to change with stack pops and pushes ?

The stack local variable "here", for the duration of any individual call, is supposed to remain constant.
The suggested test code was to assert that the location of the variable did not change through the function call.
The suggested test code also has the side effect of verifying if the value contained in "here" prior to the call (was address of variable "here") is not corrupted by the call (called routine stomped on it).

The things than can cause this apparently impossible circumstance are:

The calling convention between C++ and Fortran are not the same and therefor the stack pointer (ESP) is not properly restored.

The calling convention between C++ and Fortran are not the same andsave/restore stack frame pointer (EBP) is not consistantly used andis not properly restored.

(rare) The calling convention is messed up such that the C++ this pointer gets hosed.

The calling convention is messed up where you call the fortran routine with the address of an array of data, however the fortran routine is expecting the address of an array descriptor.

In some cases the address of "here" changes, in other cases you stomp on memory you ought not be stomping onto.

Jim

ylmz · ‎01-25-2009

Hi;

I enabled the "generate interface blocks" option. However,itbehaves nondeterministically.
That is, when I try to compile, I get some errors not received in one of the previous builds,
although the codes were exactly the same. What could be the reason ?

(i mean only the errors like "formal parameter and argument type mismatch")

Steven_L_Intel1 · ‎01-25-2009

This feature is dependent on the order of compilation. If you are calling a routine in a different file, it can't check the interface unless the other file was compiled first.

ylmz · ‎01-26-2009

Hi;

I solved all issues related with "generate interface blocks". However, I still get the same error (invalid floating point operation)

In most of the cases, I solved the issues in the following manner:

Assume that subroutine S expects a REAL(4) as a parameter. And S used to be called with a REAL(2).
I changed thecalls as S(REAL(parameter, 4)) That is, I applied explicit type casting. Buthow the dynamics
of fortran act here?A new dummy address is created, or subroutine S is informed that it will be able to use the address of "parameter" as if it holds 4 bytes, not 2 ?

And what about the situations like Q(19) ==> Q(INT(19, 2)) ? I mean, whatthe situation will be in case of constant arguments ?

Steven_L_Intel1 · ‎01-26-2009

There is no REAL(2) in our implementation - just 4, 8 and 16.

The answer to your question is that if you pass an expression such as INT(19,2) then the compiler makes a temporary value and passes that to the routine. The routine can store into the value, but any changes are lost when the routine returns.

You can change the kind of literal constants by adding a kind specifier like this: 19_2. This means the number 19 of kind INTEGER(2). You can do this with reals as well - 4.5_8 means 4.5 as REAL(8). Better practice is to declare some PARAMETER constants for the kinds you want to use and then use those constants. For example:

[plain]integer, parameter :: SP = SELECTED_REAL_KIND(6)
integer, parameter :: DP = SELECTED_REAL_KIND(15)
integer, parameter :: SINT = SELECTED_INT_KIND(4)
integer, parameter :: LINT = SELECTED_INT_KIND(9)
integer(LINT) :: A ! INTEGER(4)
real(DP) :: B ! REAL(8)
call sub(3.14_DP, 4_SINT) ! REAL(8), INTEGER(2)[/plain]

At this point you may want to change the property Floating Point > Floating Point Exception Handling > /fpe:0 and see if you get a different error.

jimdempseyatthecove · ‎01-26-2009

>> And S used to be called with a REAL(2).

Simply changing the argument to REAL(2) for both caller and callee may or may not fix the problem. An example where this will not fix the problem is if you are reading a binary data file created with insturmentation that is using a 16-bit floating point formatted number (one of several formats). And there may be other reasons. To correct for this you will need to create function wrappers that handle this format. To do this you would a user defined type to express the REAL(2), the type name may have to be something like "SHORT_REAL".

Jim

anthonyrichards · ‎01-27-2009

Going back to your very first post, why don't you test

((y*REAL(t) .le. x*p) .and.(REAL(t) .lt. 2*x))

and avoid possible divide by zero?

jimdempseyatthecove · ‎01-27-2009

Anthony,

Although your suggestion is good, the users code as written should have run without error, however it is erroring out. This error is symptomatic of a problem elsewhere in his code. Should your code suggestion eliminate the symptom, it will hide the fact that there is something seriously wrong elsewhere. It would be wise for this user to code with the error symptom and track down what is causing the error. Once found, he can than come back and insert the non-division test as you suggest.

There is nothing worse than a hidden error in a program. Especially if the error floats arround. When you have a consistant test case that exposes the error - use it now to find and fix the error. If the error does expose a compiler problem, then the error case needs to be reported back to the developers.

One of the oddest errors I had, and one of the hardest to track down, was not a programming error, nor a compiler code generation error. Instead, there was a Visual Studio WinDbg internal problem where it inserted a break point INT03 into (over the top of) code where a break point was not registered. What made this difficult to track down was this INT03 was not inserted over an instruction (a break would have occured), but it was placed over the tail end of the bytes of the instruction. This changed the instruction to reference other data than it was intended to reference. What made this particularly hard to track down was, whenever the debugger was in control, such as at break point or after GP fault, the INT03 was un-patched from the code (no evidence of coding problem). Also, when I edited the code, the INT03 got patched to the same address but was a different position in the code (when edit was to code that preceeded the problem location). Once I structured a test to verify the (almost undetectible) corruption of code, I could see the culpret was and INT03 (hex CC) being poked into and out of the code, this pointed the finger at WinDbg. A quick remove all break points, save project, exit, restart VS, and ta-da problem went away.

Jim

ylmz · ‎01-28-2009

Hi all;

Yes,me toodo not think that this invalid floating point operation errorresults from a computational expression.
Its place differs according to different situations.

What I tried lastly was as follows:

I have a subroutine in fortran that checks the current status of the floating status register.It reports invalid operations by writing it into a text file. After every detection, it clears the status register.

subroutine floating_status_check (callNo, lineNo)

use IFPORT
include 'params.cmn'

integer(4) callNo, lineNo
integer(2) status

call GETSTATUSFPQQ(status)

! check for invalid result
if (IAND(status, FPSW$INVALID) .NE. 0) then
write (nfchkstat,*) callNo, lineNo
call CLEARSTATUSFPQQ()
end if

end subroutine floating_status_check

My main fortran function, which is called in a loop from C++ code is as follows:
(I place floating_status_check after every line of code in this function)

logical(1) function fortranCall(rsp_ptr)

...
integer(4) callNo
data callNo /0/
...

callNo = callNo + 1

call floating_status_check (callNo, 1)
...
call floating_status_check (callNo, 2)
...
...
...
call floating_status_check (callNo, 300)

end

Finally, when I check my output file, I get the following result:

7 1
19 1

Meaning, the error occured at 7th and 9th calls to the main fortran function, just at the beginning (line 1)
indicating that the error actually were there before this call.

However, if I put an additional floating_status_check just before the line "callNo = callNo + 1" with lineNo = 0
(call floating_status_check (callNo, 0)), then I get the following result:

6 0
18 0

That is, the error occures one iteration earlier.

Do you have any comments on this situation ?

What I can see is that, there is actually no invalid floating point operation occuring in the fortran part.
There is something wrong that comes from the C++ part, where no computational expressions exist.

1) Can it be a stack issue ? If yes, why I do not get any errors although all compiler options related to stack
anomalies are enabled ?

2) Also, despite this error (invalid operation), all the results stillseem to be meaningful. If it is corrupting the memory at some point, why strange results do not appear after that point ?

3) Can this be a problem related to struct alignment differences between fortran and c++ compiler ?

4) Can the problem be related to fortran string parameter passing ? That is, it allows us to omit the string length.
However, in our code there are some calls to functions which requires additional integer values beyond the string and its length. In this case if we omit the string length, how does the compiler distinguish among the string length parameter and justthe nextinteger parameter ?

Thank you all very much again for your patience :)

g_f_thomas · ‎01-28-2009

Quoting - ylmz

Hi all;

I am getting an invalid floating point operation error in release mode, which is not the case in the debug mode.I do not think thatthis is because of project setting differences between these two modes, since I already tried almost every combination.

The most interesting point is, I get the same error also in debug mode with "debug information format = line numbers only".
I get the error in the following line:

if (x .gt.0 .and. y .gt. 0) then
check=.not. ((REAL(t)/x .le. p/y).and.(REAL(t)/x .lt. 2))
else
check = .true.

However, unfortunately,I cannot see the current values of these variables since debug information format is
"line numbers only".

When I put a write statement in front of the lines above, the problem interestingly disappears.

What may be the problem ? Someting about compiler optimization or floating point stack ?

FWIT, this doesn't raise an invalid floating point operation for me in VS2008 whatever the configuration. Which version of VS are you using? If it's 2003 get rid of it.

I haven't followed this discursive thread in its entirety but in your latest post you indicate that the problem appears to be in the C client. Have you implemented an exception handler on the C side and walked the stack to determine where in the C code the invalid floating point operation occurs? If, as you say, no fp is going on in C then have your handler mask invalid fp's and see if it still crashes.

Gerry

gib · ‎01-28-2009

I hope you got rid of REAL(2). Note that, as Steve said, there is no REAL(2) in Intel Fortran.

ylmz · ‎01-28-2009

Quoting - gib

I hope you got rid of REAL(2). Note that, as Steve said, there is no REAL(2) in Intel Fortran.

Hi;

Yes, actually there were no REAL(2) usage in our code. I was just trying to make a quick type cast example in my question, and I mistakenly wrote REAL(2) :)

Actually it has been 2 weeks we have been dealing with this problem, but today we realized that it has nothing to do with the Fortran part. We just detected an uninitialized float array at C++ part, and when we initialize it there does not seem to be any problems any more, at least for now :)

We were getting this problem since that memory location allocated for the array holded garbage values (therefore garbage real value such as -1.#NAN etc.). And it is of course quite normal concidentially not to get it in debug version (I guess that location shifts to another region in memory im debug version, and by chance with some valid real values in it)

Again thanks to you all very much for your invaluable comments and suggestions...