Possible math function AMD v INTEL chipset issues?

bmchenry · ‎03-02-2011

I have a project which is two IVF DLL's which are used/called from a 3rd party C++ win32 program.

Works great on INTEL chips. No problems.

However, on an AMD, it exhibits 'bizarre' behavior...

It is a math intensive program. The DLL is mainly for storing/swapping data between C++ and the primary IVF DLL.

Again, everything works as planned (works GREAT) on Intel chips.

I'm using /arch:IA32 and no optimization (options below)

The only possible issue was the 'storage' DLL (the 2ndary DLL) did have optimization set and no /arch option set (so it would use the default). The 'storage' DLL does no calculations.

Could different compilation options for the two IVF DLL's cause an issue ONLY with AMD processors?

I won't know on how it behaves on the AMD with my synchronizing all compile/link options until the programmer from the other side of the world links them up tomorrow (since its after midnight there now).

But in the meantime and as a check to see if anything else I should do, can anyone enlighten me to possible issues?

Thanks!

brian

Here's the compilation options:

/nologo /Od /I"c:\\wint\\include" /arch:IA32 /extend_source:132 /warn:interfaces /Qsave /fpe:0 /Qftz /Qfp-speculation=strict /module:"Release\\\\" /object:"Release\\\\" /Fd"Release\\vc90.pdb" /check:bounds /libs:dll /threads /winapp /4Yportlib

/c

TimP · ‎03-02-2011

For ifort "12.0" "xe 2011" you should try
/Qimf-arch-consistency:true
as the default is to choose run-time math libraries according to the CPU seen at run time.
This links to a different math library which should use identical code on all CPUs.
I would be interested to know if this improves your situation.
I don't know how /fpe:0 might affect the issue.

If your application depends on /arch:IA32 (other than to be able to run on very old CPUs) or on /Qsave or /Od, it would be preferable to correct that. If you remove /Od, /fp:source would improve safety. However, those questions should not enter in to CPU type dependencies.

bmchenry · ‎03-02-2011

Thanks Tim.
will try /Qimf-arch-consistency:true
Think the /arch:IA32 and some of those others wereholdovers from the SSE issues back in the day
thanks for the heads up

brian

bmchenry · ‎03-02-2011

Tim
where do i find information on/Qimf-arch-consistency:true?
searches for Qimf or imf or arch or consistency or run-time math libraries...etc
can't find theoption (i know i can set it in additional options, but would like the write up on it.

Thanks

Steven_L_Intel1 · ‎03-02-2011

All options are documented in the "Compiler Options" 'book' in the on-disk documentation. This is a new option for version 12. Not all options are exposed in the IDE - this one is not.

bmchenry · ‎03-02-2011

Steve,

AHA! I had a shortcut on the desktop to fortran manuals and it was still set to 11.1
will remedy that situation!

Thanks

Brian

IDZ_A_Intel · ‎03-03-2011

Tim,

Well one interestingthing about /Qimf-arch-consistency:true is that it causes the same type of 'strange' behavior as exhibiited on AMD processors.

turned on/off mostof theoptions (/fpe:0, /arch:IA32, /Qsave)
w/ and w/o the /Qimf-arch-consistency:true
and only way things work 'normal' on INTEL is when i do not include/Qimf-arch-consistency:true

Any ideas what changes that might cause changes in the executable program on an Intel I7 CPU, Q720 @1.60 GHz, 8 GB memory, 64-bit operatin system
IVF is the (but installed and compiling only thewin32 version of IVF)
Where might idig for the possible issue which seems produced when /Qimf-arch-consistency:true is set?

Thanks

brian

Steven_L_Intel1 · ‎03-03-2011

Please define "strange" and "bizarre" in this context. What exactly is going wrong?

That setting /Qimf-arch-consistency:true gives you the same results on both systems indicates that your program is sensitive to things such as additional vectorization.

bmchenry · ‎03-03-2011

if i had to characterize what is happening, it is that there is a mismatch in some shared memory.
I added some arrayswhich war equivalenced to allow simple exchange of large amounts of informaiton betwen C++ and fortran (rather than 30 different modules, a single module can be exchanged)
This was the only recent change made to some of the modules.
And so on INTEL machines it works fine.
Hwever, On AMD machines and/or on INTEL machines with the /Qimf-arch-consistency:true option, the program exhibits behavior.
If i had to guess i'd say it is a'memory leak' which causes the simulation to simulate'wild' or 'bizarre' movements like it is occuringon another planet :-))
so just wondering what could be changed by that option which might help me find where the memory mismatch is (and/or whatever is happening and/or why it isn't catching the mismatch? and why it doesn't happen until well into a sample simulation)
i would expect a program crash or memory error problem or something but it gos merrily along with something not set quite correctly.

that's about the only way to characterize 'biizzare'

thanks for your assistance.

brian

Steven_L_Intel1 · ‎03-03-2011

It more sounds to me as if you have a reference to uninitialized memory and that anything that perturbs the memory layout changes the results.

bmchenry · ‎03-09-2011

Y'all ain't gonna believe this one... RESOLVED!

Well at least i've found where the issue presented itself whileusing /Qimf-arch-consistency:true.
Still have to wait until tests are run on an AMD processor, but i expect things will work as expected.
Or at least now results are consistent whether using/Qimf-arch-consistency:true or not.

Here was where the issue presented itself:
Two simple calculations: (out of thousands of variables, routines, etc...believe me i looked through many many many of them...)
REAL(8) :: SPSI1, CPSI1
REAL :: PSI1

THE simple calculation which produced the issues:
SPSI1 = SIN(PSI1)
CPSI1 = COS(PSI1)
For some reason with option set of /Qimf-arch-consistency:true the elemental intrinsic functions SIN() and COS() start producing incorrect resultsduring thesimulation
Solution? Use

SPSI1 = SIN(dble(PSI1))
CPSI1 = COS(dble(PSI1))
The programis being called bya C++ program and all storage is kept bythe C++ calling program through pointers/targets.
So i guess i could speculate that the memory gets messed up in resolving the single to double storage somewhere in the various exchanges of information betwen the two programs (believe me, there are thousands of other variables and equations, this was buried deep in the program)
Inquiring minds would love to hear speculation on why? (it sure wasted my time!)
And to repeat,
1) the program, as a stand alone program can be compiled with any of the options and produces consistent results.
2) The program when setup as a DLL and called from a C++ program runs fine on INTEL machines.
3) However, the program had issues when run on AMD machines.
4) Thenthrough thesuggestion of Tim (thank you), I triedthe /Qimf-arch-consistency:true option and it also started acting up on INTEL machines.
the problem was isolated to thosetwo simple equations out of thousands and wasafter lots of head scratching and testing and blah blah blah...
Benefit of the option is that it presents the issues that may arise on other non INTEL machines.
So anyone have a speculation on why?

meanwhile...guess it's time for me to open the bar...

TimP · ‎03-09-2011

Short of a comment from an expert on the math library implementation, I'd comment that sin() and cos() of arguments outside the primary quadrants involves a range reduction, where your application might well require double precision evaluation. In the old IA32 32-bit mode, for scalar math functions, you might expect that to happen automatically. Range reduction in single precision svml (in case the compiler vectorizes your math functions) is likely to be less accurate.
Otherwise, if you need results consistent with the usual C double data type math functions, your explicit promotion to double precision looks like a useful precaution.

bmchenry · ‎03-09-2011

Thanks for your response. On INTEL chips it runs the same double v single precision (wasted time trying that in years gone by), Its a function of large v small numbers which wash out most needs for extra precision (within reason).
I'd venture to guess thatin the 'consistent across architectures' /Qimf-arch-consistency: true option somehow causes the program to incorrectly stores the single precision result in the double precision variable ORas you guess, the range reduction OR some memory storage issues between c++ vs Intel Fortran.

Since it didnt present itself in earnest until the angles run outside primary quadrants the consistent across architecture range reduction might be the primary area of issue.

Thanks again for your assistance.