Solved: IVF 11.1.051 and 12.0.0.104, crash on AMD processors in 32-bit

teosim · ‎11-26-2010

Hallo,

The sample code below is compiled with IVF v11.1.051/32-Bit on a PC running Intel processor.

REAL :: A, B, C
A = 2.0E-15
B = 3.0E-24
C = 1.0
C = C + A*B
print*, C

The explicit compiler flag combination is: /fpe:0 /fp:source /opt:1. The expected results 1.0 can be seen on any Intel processor.

This code just crashes on AMD processors (Athlon or Phenom), with "Program Exception : code 0x2b4 (269)

It looks like IVF has a problem with processing denormals on AMD processors.

There are remedies: /fp:precise ; /fpe:3 or /Qfpz-. At least one is almost acceptable.
What is not acceptable, in my case, is to build processor dependent executables.

Is that a bug or it is well known and expected?
Thank you.

Best regards.
Teo

Steven_L_Intel1 · ‎11-30-2010

This is the sort of bug we would consider high priority...

View solution in original post

lklawrie · ‎11-26-2010

Try compiling with /arch:IA32

Some older AMD processors can't handle the SSE instruction set (or SSE2/SSE3).

Linda

mecej4 · ‎11-28-2010

I can reproduce the bug on Windows XP-64, SP2 ( Microsoft Windows [Version 5.2.3790] ), using the 32-bit version of the recently released 12.0.0.104 compiler, running on an HP Pavilion a1540n with an Athlon 64 X2 CPU, 6 G of RAM.

On the same machine, there is no corresponding bug with the 64-bit version of the compiler.

The bug is in an exception handler routine in LIBIFCOREMD.DLL Here is the source code:

[bash]program crash_amd
REAL :: A, B, C
A = 2.0E-15
B = 3.0E-24
C = 1.0
C = C + A*B
print*, C
end program crash_amd
[/bash]

Compiled with the 32-bit version 12 compiler, with options -fpe -traceback -Zi -MD and run, it produces the following machine code for line-6 of the source above

[bash]C = C + A*B
00401055 F3 0F 10 45 F0       movss       xmm0,dword ptr   
0040105A F3 0F 10 4D F4       movss       xmm1,dword ptr   
0040105F F3 0F 59 C1          mulss       xmm0,xmm1            <=== EXCEPTION TAKEN HERE
00401063 F3 0F 10 4D F8       movss       xmm1,dword ptr   
00401068 F3 0F 58 C8          addss       xmm1,xmm0  
0040106C F3 0F 11 4D F8       movss       dword ptr ,xmm1  
[/bash]

and this call stack at crash time:

[bash]AMD_CRASH(void):
......
10029F98 8B 50 28             mov         edx,dword ptr [eax+28h]  
10029F9B 0F B7 12             movzx       edx,word ptr [edx]      <== CRASH, edx=0  
10029F9E 0F B6 C2             movzx       eax,dl  
......

>	libifcoremd.dll!10029f9b() 	
 	[Frames below may be incorrect and/or missing, no symbols loaded for libifcoremd.dll]	
 	crash_amd.exe!_for__nt_handler_jacket.()  + 0x30 bytes	
 	msvcr90.dll!785a9152() 	
 	msvcr90.dll!785a268b() 	
 	ntdll.dll!7d61ca3b() 	
 	ntdll.dll!7d61cb2b() 	
 	kernel32.dll!7d4dcd0c() 	
 	msvcr90.dll!78596cb8() 	
 	msvcr90.dll!78542f8c() 	
 	msvcr90.dll!78543161() 	
 	msvcr90.dll!785427b4() 	
 	crash_amd.exe!pre_cpp_init()  Line 326 + 0x27 bytes	C
 	msvcr90.dll!78542201() 	
 	crash_amd.exe!__tmainCRTStartup()  Line 582 + 0x17 bytes	C
 	kernel32.dll!7d4e7d42() 	
[/bash]

The crash occurs when a NULL pointer is dereferenced at address 0x10029F9B. To add to the confusion, the error notification box that pops up is completely misleading:

If the -MT option is used instead of the -MD option, the error occurs, but now the AMD_CRASH() routine code is in the .EXE file itself and the traceback is shorter. However, the immediate cause is the same: trying to access memory through a NULL pointer in EDX.

Steven_L_Intel1 · ‎11-28-2010

First of all, there is no such routine "AMD_CRASH" in the Intel libraries. Even if there were, the DLLs have no symbols so the debugger could not symbolize it. I suspect that this is the name you gave your program.

There is a known problem with the way the Fortran library handles floating point exceptions that occur during execution of SSE instructions. It is not sensitive to CPU manufacturer, but is sensitive to other things such as memory content. I believe this is fixed for update 1.

If you have set /fpe:0, then the underflow needs to get set to zero by the exception handler. This may or may not work right due to the bug we found. We should be releasing update 1 within a few weeks and I suggest you try it then. If it still fails, let us know and we'll be glad to investigate.

mecej4 · ‎11-29-2010

"I suspect that this is the name you gave your program." -- guilty as charged. I was using the VS10 debugger for the first time and misread the way it reports debuggee information.

I find it intriguing that handling a 'denormal result' exception caused by the multiplication of two registers can be dependent on the contents of memory. Are there previous threads on this question that I may read?

Martyn_C_Intel · ‎11-29-2010

Answering because Steve is temporarily unavailable, he may know more.

I presume you are using /fpe:0 because you wish to unmask the common floating point exceptions. For historical reasons, /fpe:0 also enables abrupt underflow (also known as flush-to zero), whereby denormals get set to zero. For some instructions, this happens in hardware; for others, in software. Its in this area that there appears to be a bug. Ive seen a somewhat similar bug in the C compiler/libraries, Im not sure if thats what Steve is thinking of. However, I have tested the forthcoming 12.0 compiler update, and I still see the problem, so well need to escalate this to the developers for further investigation.

If all you need is to unmask exceptions, and youarent concerned about the treatment of denormals, the simplest workaround is /Qftz-. Then all underflows are gradual, theres no flush-to-zero and you get IEEE arithmetic for denormals. If you intend something else, please explain, and we may be able to suggest other workarounds using RTL calls instead of command line switches.

Steven_L_Intel1 · ‎11-29-2010

The issue I know of deals with how the run-time library deals with an unmasked FP exception. It turned out that if it was an SSE instruction, it did not do so correctly. I am not initimately familiar with the details, but it had something to do with not looking in the right place for exception information. I don't think this came from a forum issue. Unfortunately, I am traveling this week and don't have all the access to info I would back at the office. I have sent Martyn some pointers that perhaps he can follow.

mecej4 · ‎11-30-2010

Steve and Martyn:

Thanks for your replies.

This is not a pressing question. It is seldom that I use the troublesome option, and there are other means for locating underflow "errors". Now that Intel has become aware of the bug, I am content to wait until it is resolved in future issues of the compilers (C and Fortran).

I do not know if the OP agrees with my assessment of the importance of fixing the bug.

Steven_L_Intel1 · ‎11-30-2010

This is the sort of bug we would consider high priority...

teosim · ‎12-02-2010

Your position is much appreciated.
Thank you all.

Teo

lklawrie · ‎12-02-2010

Maybe this should be a new topic. I use /fpe:0 in my compiler settings because i don't want NANs to show up in output or exceptions that should have been caught early to show up later. That is, I've felt the other settings let invalids flow through only to show up where it's harder to figure out the real cause.

So, I'm wanting the abort on IEEE exceptions more than the underflow to abrupt 0.0 -- Is there another way to achieve that?

Linda

Martyn_C_Intel · ‎12-02-2010

Quoting lklawrie

Maybe this should be a new topic. I use /fpe:0 in my compiler settings because i don't want NANs to show up in output or exceptions that should have been caught early to show up later. That is, I've felt the other settings let invalids flow through only to show up where it's harder to figure out the real cause.

So, I'm wanting the abort on IEEE exceptions more than the underflow to abrupt 0.0 -- Is there another way to achieve that?

Linda

Yes, you can unmask floating-point exceptions from your source code. There are older runtime library routines such as SETCONTROLFPQQ and IEEE_FLAGS; however, the best way is to use the new intrinsics introduced in the Fortran 2003 standard, such as IEEE_SET_HALTING_MODE. A typical call at the start of your main program would be:
call ieee_set_halting_mode(ieee_usual, .true.)
This gives you more detailed control than /fpe:0, though that should also work, (in conjunction with /Qftz-, if you are running on a non-Intel system, until the above fix is made). I always compile my code with /traceback when I unmask exceptions; that way, I know immediately where the exception occurred, if it is inthe compiledcode. The overhead from /traceback is small, it's designed for use in production codes.

I have some little sample programs that illustrate the use of the Fortran 2003 intrinsics for IEEE arithmetic. I'll try to get them reviewed and approved for posting.

lklawrie · ‎12-02-2010

Thanks! Yes, we use /traceback too even in our release mode.

Linda

Steven_L_Intel1 · ‎12-23-2010

We have found and fixed this problem and I expect the fix to be included in Update 2 to Fortran Composer XE 2011.

IVF 11.1.051, crash on AMD processors