Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29081 Discussions

Issues with ifx Compiler Optimization and Migration Challenges from ifort

Frank_R_1
New Contributor I
430 Views

Hi,

Background: We maintain a large Fortran codebase developed over the past 30 years, originally in Fortran 77, heavily relying on C memory allocation.

Specifically, we use a C allocation function:

void alloc_(long long size, long long* base, long long* address, long long* offset) {
address = malloc(size);
offset = address - base;
}

In Fortran, it is used like:

integer field(1)
integer*8 size
integer*8 address
integer*8 offset

size = 100
call alloc(size, field, address, offset)
...
call func(field(1+offset))

Problem:

With ifort, this non-standard use of offsets worked for decades without issue.

ifx enforces stricter optimization and fails because field is technically only of size 1 — field(1+offset) is undefined behavior, and ifx optimizes incorrectly.

To solve this, we modernized the entire codebase to Fortran 90 using allocatable arrays, replacing the old allocation mechanism via a Perl script.

After these changes, all regression tests pass bit-identical under ifort on Windows/Linux (both debug and optimized).

Current Setup:

Platforms: Windows 10/11 with Visual Studio 2019, Linux RHEL 8/9

Compiler: Intel oneAPI 2024.2, upgraded to 2025.0.1 and then 2025.1.0

Compiler flags:

Linux: -O3 -inline-level=2 -fp-model=precise -fimf-arch-consistency=true -no-fma

Windows: -O3 -Ob2 -fp:precise -Qimf-arch-consistency:true -Qfma-

Tools: Valgrind and Address Sanitizer report no issues.

Current Issues with ifx:

Incorrect Optimizations:

Some regression tests fail with ifx at -O3, though they pass in debug mode.

While compiler updates fixed some failures, a few still remain even on 2025.1.0.

In one example, inserting a simple write statement eliminates the deviation, suggesting a compiler optimization bug.

Obviously, keeping debug write statements in production is not acceptable.

Windows-specific Crash:

A crash occurs during Fortran allocation only under -O3 on Windows (not on Linux, nor in debug mode).

Performance:

With the specified compiler flags, ifort still produces slightly faster code than ifx.

Using plain -O3, ifx can outperform ifort, but this isn't viable without other flags ensuring numerical consistency.

Compiler Stability:

Both ifort and ifx occasionally crash during compilation, suggesting that while the frontend may be shared, backend differences cause instability.

Questions:

Are you aware of these or similar optimization bugs in ifx?

How can we assist you in reproducing these issues (e.g., by providing reproducer cases)?

Are there recommended workarounds or compiler flags to mitigate these problems in ifx?

Is there ongoing work to further stabilize and optimize ifx for production use, especially for large legacy codebases?

Final Comment: ifort has been a highly reliable compiler for us over the past 35 years. Unfortunately, as it is deprecated, we are forced to migrate. We are keen to continue using Intel compilers if these issues can be resolved.

Looking forward to your advice and assistance.

Best regards,
Frank

0 Kudos
3 Replies
Ron_Green
Moderator
310 Views

We do not have an open issues specific to -O3 on Windows.  Did you try -O2 and does the issue exist at O2? 

 

Breaking this down, you are reporting 3 issues or questions

1) Incorrect Optimizations

2) Performance

3) General question about stability

 

#3: I will answer #3 first:  You are correct that the compiler front end is mostly shared code.  The non-shared code in the front end is the code to lower the AST to Intermediate Representation (IR).  That is different as ifort used IL0 as IR, and ifx uses LLVM IR + Intel metadata.  This translation is different.

After that, ALL the optimization, vectorization, loop transform, code generation - NO shared code.  IFX is a completely different compiler wrt optimization.  Same is true for OpenMP, very different compilers. 

As for compilation crashes - if both compilers crash, it is in the Fortran Front-end tokenizer, semantic analysis, and AST creation (all shared code).  If only IFX crashe at compilation timem it's due to that step in translating the AST to LLVM IR.  

Stability between the 2:   Ifort had a lot of years to stabilize.  If you remember the first 2 or 3 or so major versions of ifort after Intel bought CVF they were pretty unstable.  particularly the first version of ifort which was near unusable.   This transition to ifx is similar:  it takes a while to exercise all the code paths in a compiler, specifically this translation step taking the AST and creating IR.  With DVF/CVF the IR was something called DEC GEM.  Intel uses IR named IL0.   So the initial ifort instability was due to a brand new IR translation.  It is very hard to do, and it takes a LOT of user code testing to tickle all the code paths in this complex translation.

LLVM uses LLVM IR.  To make matter harder, LLVM IR is super super primitive compared to what we had with Intel IL0 or DEC GEM.  This lowering of the AST to LLVM IR is much more difficult than going from the AST to IL0. 

 

#1 Incorrect optimizations:  I get the impression that you have a good understanding of impacts of optimizations on accuracy.  So I won't dwell on that much.  I assume you know are seeing bigger deltas in results comparing IFX debug to IFX O3.   Hence, you are using -fp-model precise and imf arch consistency.  Those are good.  Also consider a test of -fp-model source -prec-div -prec-sqrt -no-fma.  you can throw in the arch consistency too if you want.    OH and try -fp-model source -fp-speculation-safe.  How far off is ifx wrt to ifort at O2 and O3?  clearly WRONG results, or just out of a tolerance threshold? 

Since there is no current bug report SPECIFIC to O3 I think your code may be unique.  In this article, scroll down and look for the example of the -mllvm -opt-bisect-limit=XXX.  This LLVM optimizer is made up of a series of optimization passes.  In the case of O3 there are 800+ passes.  This options runs opt passes 1 through XXX, then skips the remaining passes.  You can use this option and a bisection (binary) search to narrow down to the opt pass that causes the problem.  

Start with 

ifx -O3 -c -mllvm -opt-bisect-limit=999 <all other options, etc>
Do this on just 1 file perhaps.  There should be ~805 passes. 
Run the code, it will still crash or give bad results. 

Now bisect the pass search space:
ifx -O3 -c -mllvm -opt-bisect-limit=402 <all other options, etc>

IF still erroring, set the limit to 201. (go lower to turn off more opt passes)
IF not erroring, set the limit to 603 (go higher to enable more passes)

I think you can see the pattern:  it's a simple binary search algorithm run manually.   If you can send me the name of the optimization pass I can search the bug database AND maybe it will clue us into where the error may be occurring.  

 

#2 PERFORMANCE.   There is a key difference between ifort O2 O3 and ifx O2 O3 - ifort did interprocess optimization, -qipo, by default.  IFX does NOT do IPO by default at O2 and O3.  You have to explicitly add the -qipo option to BOTH compilation and linking (assumes you use ifx as your linker, which you should).   This makes the performance comparison between ifx and ifort more fair and even. BUt I see you are trying to control inlining, which is the main feature of qipo, manually.  Can you try to remove the inlining option and simply use -qipo? 

0 Kudos
Frank_R_1
New Contributor I
278 Views

Hi,

Thank you very much for this very detailed answer, that gave some good insights!

I'll definitely try the bisection method to find out where the optimization may break.

To be clear, the code which ifx produces does not crash but it produces different results on release builds with -O3 on Linux and Windows in comparison to the debug code, which is the ground truth. Also ifort 2024.2 does produce code which give bit identical results to debug code in -O3 release code.

Pure -O3 code on ifx is slightly faster than ifort, but the -fp-model=precise -fimf-arch-consistency=true -no-fma flag slows it a bit down in comparison with ifort.

The Fortran .o/.obj files are linked together with C/C++ .o/.obj into an executable with icx/icpx linker. We tried -qipo a couple of times, but we got link problems (maybe another task to discuss, since the non ipo link works. But tried not yet with 2025.1.0)

As soon as I got some more information I'll post it here.

Best regards
Frank

0 Kudos
Frank_R_1
New Contributor I
141 Views

Hi,

 

Please find attached a zip with a reproducer. Folder 191 contains source file and compile options which gives correct behavior

Folder 192 gives wrong behavior (everthing is the same as in folder 191 except the -opt-bisect-limit=192)

 

until this pass the program ran exactly like in debug code with ifx or with old ifort
BISECT: running pass (191) HIRVecDirInsertPass on calexo_
from here on it fails to reproduce the same output like ifort or debug code with ifx
BISECT: NOT running pass (192) vpo::VPlanDriverHIRPass on calexo_

#correct 191
ifx.exe -c -fp=precise -Qimf-arch-consistency:true -Qfma- -MD -bigobj -Qftz -DMSFTZ -nologo -warn:nousage,nounused,declarations,truncated_source,interfaces,general -fpp -names:lowercase -assume:underscore -QxCORE-AVX2 -check:none -O3 -Ob2 -mllvm -opt-bisect-limit=191 -DNDEBUG -Z7 -debug:all -DNDEBUG calexo.f90

#incorrect 192
ifx.exe -c -fp=precise -Qimf-arch-consistency:true -Qfma- -MD -bigobj -Qftz -DMSFTZ -nologo -warn:nousage,nounused,declarations,truncated_source,interfaces,general -fpp -names:lowercase -assume:underscore -QxCORE-AVX2 -check:none -O3 -Ob2 -mllvm -opt-bisect-limit=192 -DNDEBUG -Z7 -debug:all -DNDEBUG calexo.f90

 

So hopefully you can find the bug in the optimizer at step 192.

 

Best regards

Frank

0 Kudos
Reply