AMD problems - still lingering

lklawrie · ‎04-22-2007

Two compile settings:

Release:
/nologo /assume:buffered_io /gen-interfaces /noaltparam /fpscomp:nolibs /warn:declarations /warn:truncated_source /warn:interfaces /real_size:64 /Qfp_port /Qftz /names:lowercase /module:"$(INTDIR)/" /object:"$(INTDIR)/" /traceback /RTCu /4Yportlib /c

Link:
/OUT:"Release/EnergyPlus.exe" /INCREMENTAL:NO /NOLOGO /SUBSYSTEM:CONSOLE /OPT:REF /OPT:NOWIN98

Release with checks:
/nologo /assume:buffered_io /gen-interfaces /noaltparam /fpscomp:nolibs /warn:declarations /warn:unused /warn:truncated_source /warn:interfaces /real_size:64 /Qfp_port /Qftz /names:lowercase /module:"$(INTDIR)/" /object:"$(INTDIR)/" /traceback /check:bounds /RTCu /4Yportlib /c

Link:
/OUT:"ReleaseWithRunTimeChecks/EnergyPlusChk.exe" /INCREMENTAL:NO /NOLOGO /SUBSYSTEM:CONSOLE /OPT:REF /OPT:NOWIN98

The first one exhibits the problems. The second doesn't.

1) Shows a NAN in one of the runs and fatals out.

2) Shows a warning message that one number is less than another (not a big problem), but the numbers are (when printed out):

volume= 0.825000000000000 volume stability= 0.825000000000000

(same number of digits, etc. first tests < second).

As I say, the second compile does not show either problem (same source).

Where can I look next?

Linda

lklawrie · ‎04-22-2007

Half an answer to my own question:

Removing the CheckBounds (array and string) from the second compile makes it now exhibit the same behavior as the first.

Is this a bug? Only happening on AMD processors. (or this AMD processor, I can't test the others right now).

Linda

TimP · ‎04-22-2007

I guess you've demonstrated a forum bug. Your first post doesn't display correctly in my Intel-configured copy of IE6, so I have to open up Firefox to read it. Then, of course, I can't reply in FF, must post the reply with IE.

As to your program, it's possible check bounds changes the code generation and hides a problem (either with compiler or with your source code). Did you check for uninitialized as well?

Among the things which may be important to know:

Which compiler is it (including telling if it is 32- or 64-bit)?

If you are using the 32-bit compiler, setting /Qftz without setting an SSE option (/QxK, /QxW) may not be a good idea. I've been bitten by that. I might guess that is the situation, since you set /Qfp_port, which isn't needed with SSE code.

If you are looking to avoid implicit extra precision (temporary double precision evaluation of single precision expressions), to avoid inequalities in expressions which look as if they should be identical, you might want SSE anyway.

If you are using ifort 9.1, did you try -fp:precise to see if it helps with numerical problems? With older compilers, you might need to compare results with-Op.

lklawrie · ‎04-22-2007

Check uninitialized is on in both settings. redisplay the settings, wrapped. maybe you can see them all.

Release:
/nologo /assume:buffered_io /gen-interfaces /noaltparam /fpscomp:nolibs /warn:declarations /warn:truncated_source
/warn:interfaces /real_size:64 /Qfp_port /Qftz /names:lowercase /module:"$(INTDIR)/" /object:"$(INTDIR)/" /traceback /RTCu
/4Yportlib /c

V9.1-37. Same thing happens in V10. 17.

It is 32 bit compiler. As I remember (previous thread), the QxK or QxW caused the AMD processor to go belly up. (actually it was a "use" fp extensions, not require them -- if I'm reading the Visual Studio defs correctly)

Compiler should be generating full double precision. Not just a temporary expansion.

fp:precise?

Linda

TimP · ‎04-22-2007

AMD Opteron CPUs should work with -QxW. Athlon 32-bit platforms for some time previous should work with -QxK.

-fp:precise should be described in the docs for 9.1. It removes some"aggressive" optimizations, and some which don't comply with IEEE standards. If you aren't going to use SSE anyway, the old /Op option may not be much different.

lklawrie · ‎04-23-2007

/fp:strict seems to be helping. It is not available on the Visual Studio integration for 9.1?

(I did try it with the 10 beta -- the program did not work as expected on the AMD processor but I will play around a bit more with the compiler options there before I submit a problem report).

I will try the /Q options later. The two AMD processors I have immediate access to are both K6, I believe -- pretty old.

Thanks for your help.

Linda

Steven_L_Intel1 · ‎04-23-2007

/fp is not in the VS integration - you have to add it under Command Line.

Are you able to determine where in your program the results start to diverge? Regardless of what you set for the /Qx or /Qax options, the math library does it's own automatic CPU dispatching and results may vary slightly depending on which instructions are used.

Before you submit the issue, please try to identify the point of divergence.

lklawrie · ‎06-11-2007

With version 10 of the compiler, I started with my old 9.1 options and tried again with the AMD testing. Basically, I think adding the option "improve consistency" seems to have worked. But takes an execution time hit of about 25%. At least it will work for now.

Linda

TimP · ‎06-11-2007

Ifort 10.0 has a new option, -assume:protect_parens, which observes parentheses correctly, without removing other optimizations, such as short vector math calls. From the other direction, you could set -fp:precise, and then try restoring some more aggressive optimizations by -Qno-prec-div -Qno-prec-sqrt -Qftz .

Nick2 · ‎06-11-2007

Are there any plans at this time to have the compiler generate all (and only) SSE2 code (no x87)for 32-bit compilers? (none of thegeneral population really uses 64-bitoperating systems...) That would help out a LOT! I feel uncomfortable telling people they will get different results on their Athlons versus Pentium 4's, whereas CVF doesn't have that problem.

lklawrie · ‎06-12-2007

remove the "improve consistency" and add the protect_parens. Again, one has to add the protect_parens from the command line -- not reachable from the IDE?

Is that correct?

Linda