- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel OneAPI and Parallel Studio compilers do not produce assembly files unless requested, so any compiler directives (and everything else, for that matter) that you see in the output assembly files are for information purposes. It would be better for you to record the two sets of compiler options that you used with the old and new compilers.
Older versions of the compiler accepted the /arch:IA32 option, and generated x87 FPU instructions. The current OneAPI (correction: ifx) compiler does not generate x87 instructions, and rejects the arch:IA32 option. Since the x87 and SSE floating point registers and instructions are quite different, it is not surprising that the x87 and SSE versions of your program gave somewhat different results.
You may confirm my speculation by producing disassembly listings (dumpbin /disasm xyz.obj) from the new and old OBJ files and comparing them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think ifort still supports /arch:IA32. I disassembled both bins and I can see the x87 instructions and ST registers in both. The same compiler options were used in both builds AFAIK but I'll scrape the actual options used out of the build log here in a second.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The actual ifort options used in oneAPI 2023.1:
/nologo
/O2
/arch:IA32
/Qdiag-error-limit:50
/warn:all
/Qsave
/Qinit:zero
/libs:static
/threads
/c
/Qm32
The same for IPSXE2018:
/nologo
/O2
/arch:IA32
/Qdiag-error-limit:50
/warn:all
/Qsave
/libs:static
/threads
/c
/Qm32
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I took a short (9 lines, single assignment statement in a DO loop) subroutine, and compiled it with the options that you reported (I am using the current OneAPI compiler and the Parallel Studio 2013SP1 compilers). Comparing the assembler files and the disassembled OBJ files did not show me any differences that could cause the results to differ. I did note the .486P versus the .686P, but that should not matter.
Will you be able to supply a short "reproducer", i.e., a minimal working example code, that can be used to establish that the two compiler versions lead to different floating point results after running?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> I did note the .486P versus the .686P, but that should not matter.
It will if SSE/AVX code is generated as the FPU internally uses 80-bit floating point. When temporarry intermediary results in an expression are stored on the FPU stack they are stored as 80-bit(as opposed to 64/32 bit when in SSE/AVX register). This can result in loss of precision.
Also, optimizaton may differ sequence of operations, which can alter result approximation.
CAUTION it has been rumored that future CPU's might eliminate the FPU instructions. You are advised to migrate to AVX. Which may require you to produce or find a different set of certified results data.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We're using
/arch:IA32
So AFAIK MMX/SSE/AVX instructions are not in use. I'll try to come up with some sanitized example code that reproduces the behavior we're seeing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suggest you read Improving Numerical Reproducibility in C/C++/Fortran - there are many factors that can change last-bit results, some of which you cannot control.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve's reference gives you the core options for FP control.
I have attached another reference for additional details.
You can also use
-dryrun
or
-#
with both compilers. This will cause the driver to print all defines, libs, paths, and options passed from the driver to the backend compiler.
You could also try IFX instead of IFORT. In the interplay of optimization versus accuracy, IFX leans more towards accuracy.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page