Re: ifort to ifx transition problems

Frank_R_1 · ‎07-04-2022

Hi,

We use Intel 2022.2.0 and want to change from ifort to ifx.

We get bit identical results for all our regressiontests on Windows and Linux in debug and release (O3 -Ob2/O3 -inline-level=2) builds with

Windows:

icl -fp:consistent

ifort -fp:consistent

Linux:

icc -fp-model consistent

ifort -fp-model consistent

Unfortunately the ifort compiler has a bug with common blocks on Linux(same problem in debug and release) so that we tried out ifx compiler.

To get hopefully the same behavior as in icc/icl ifort case we use:

Windows:

icx -fp=precise -Qimf-arch-consistency=true -Qfma-

ifx -fp=precise -Qimf-arch-consistency=true -Qfma-

Linux:

icx -fp-model=precise -fimf-arch-consistency=true -no-fma

ifx -fp-model=precise -fimf-arch-consistency=true -no-fma

In debug we get the same results as with icl/icc and ifort and also the common block problem vanishes.

But in release builds we got heavily different results for the ifx tests.

What compile flags did you recommend to get the same bit identical results as we have in classic compiler?

We also want to use dwarf split for smaller object files, but with icx,ifx we do not get debug information in totalview. What is the appropriate command line argument for this to archive?

Best regards

Frank

Steve_Lionel · ‎07-04-2022

Just a comment - expecting bit-identical results when changing ANYTHING, especially to a compiler built on entirely different code generation and optimization technology, is doomed to disappointment. More about this in a paper I presented in 2013: Improving Numerical Reproducibility in C/C++/Fortran (supercomputing.org)

Frank_R_1 · ‎07-05-2022

Hi,

Thank you for your answer and the valuable information.

What I mean is that by the above compiler options we expect to get the same results in debug and release on both platforms.

With ifort, icc/icl (classic compiler 2020.1) we achieved this goal on Linux and Windows.

If there wasn't the problem with common blocks in one of our programs on Linux (on Windows everything runs fine) with the new ifort classic 2022.2.0 we would not change to ifx (with icx we get bit identical results on release, debug, Linux and Windows).

But as far as I understand ifort is now deprecated and Intel pushes the users to transit to the new ifx compiler.

Unfortunately also compilers have bugs, especially when using combination of O3 and Ob2 we encountered bugs in several older compilers which we report to Intel. Intel classic icc/icl ifort 2020.1 is stable for us, so we use it since it was released.

Best regards

Frank

Ron_Green · ‎07-08-2022

IFORT is NOT deprecated. It is fully supported at this time.

Deprecation is an official notification from Intel, which we have NOT done for IFORT.

Deprecation: The act or process of marking the feature or product as obsolete, to discourage its use and warn users that it *may* be phased out in the future, but not removing the capability immediately, so as to allow for continued compatibility for a period of time.

When we announce deprecation, we do so in our Release Notes. We have not done this for IFX. But obviously you have gotten correct our longer term goal to move users from IFORT to IFX someday in the future. When? As I mention in our webinars, we will look at the performance and features of IFX in mid/late 2023 and decide if we should deprecate IFORT in the 2024 release, and then maybe remove in 2025 release. BUT it depends on IFX meeting our features and performance expectations. So you have a few years ahead as we mature IFX and get it to parity with IFORT.

Steve_Lionel · ‎07-05-2022

You'll need to manage your expectations - it's rarely the case that you'll get bit-identical results between debug and release configurations simply due to changes in order of operations, vectorization and the like.

ifort isn't deprecated yet, and it is more feature complete than ifx. It is true that ifx is the future, but there's nothing wrong with continuing to use ifort for now.

Frank_R_1 · ‎07-06-2022

Hi,

It turns out that ifx in release mode does wrong optimizations (see below) : (

#ifx release build
-nologo -fp=precise -Qimf-arch-consistency=true -Qfma- -MD -bigobj -warn:nousage,declarations,truncated_source,interfaces,general -4I4 -4L72 -fpp -names:lowercase -assume:underscore -W1 -check:none -O3 -Ob2 -DNDEBUG -Z7 -debug:all -DNDEBUG -module:vobs\root\lib\LinAl\modules -Qopenmp

#ifx debug build
-nologo -fp=precise -Qimf-arch-consistency=true -Qfma- -MD -bigobj -warn:nousage,declarations,truncated_source,interfaces,general -4I4 -4L72 -fpp -names:lowercase -assume:underscore -W1 -check:none -debug:all -Od -Ob0 -Z7 -DDEVELOP -module:vobs\root\lib\LinAl\modules -Qopenmp

When we build with debug configuration like above all our regressiontests references are reproduced bit identical!

When we build release configuration like above some of our regressiontest references are calculated wrong or program crashes.

Our software suite consists of simulation software with mesh generator and thermal simulation in C/C++/Fortran. The C/C++ code runs fine in both configurations.

I found out that when I set in release config a particular Fortran variable to volatile one of our program runs as expected (before, it crashes due to wrong optimization) Also, putting in a write statement of this variable also lead to the correct behavior.

In the past we encountered several problems like this on both platforms in older compilers. Often the combination of -O3 and -Ob2 was the problem (Linux and/or Windows)

Best regards

Frank

jimdempseyatthecove · ‎07-06-2022

>>I found out that when I set in release config a particular Fortran variable to volatile one of our program runs as expected (before, it crashes due to wrong optimization) Also, putting in a write statement of this variable also lead to the correct behavior.

That is indicative of an optimization issue. e.g. assuming value is in register (previously calculated) when it is not or falsely determining code is never executed and thus elided. These errors generally cause incorrect results, or failure to converge. Seldom would it cause a crash (SegFault) unless the error caused an index out of bounds that in turn damaged code/data or referenced unmapped/protected Virtual Memory.

A reproducer would be handy for Intel to resolve this problem.

The volatile work around is a good find.

FWIW I often make use of the Fortran PreProcessor to enable/disable workaround and/or other code modifications. While you can use the !DIR$ directives, I find FPP a better choice, in particular FPP has macro expansion.

Jim Dempsey

Barbara_P_Intel · ‎07-06-2022

Are you aware of this article, Porting Guide for ifort Users to ifx? There are some things to be aware of regarding compiler options between the two compilers.

Frank_R_1 · ‎07-06-2022

Hi,

Thanks for your answers. Yes I read the ifort to ifx transition, there I got the compile flags

-fp=precise -Qimf-arch-consistency=true -Qfma

which

substitute

-fp=consistent

which is not available any more on ifx,icx.

Intel ifort 2022.2.0 does the right job on Windows in debug and release everything is bit identical.

Unfortunately we got a problem on Linux in release where ifort does strange things which a common block.

(The same code from ifort Intel 2020.1 does perfectly compile and gives correct results)

That was the intention to go from ifort to ifx, but as you see ifx has more severe problems...

When is the next release of Intel 2022.3?

Best regards

Frank

Barbara_P_Intel · ‎07-06-2022

The compiler code freeze is imminent. I expect oneAPI 2022.3 in the fall.

Frank_R_1 · ‎07-07-2022

Hi,

Unfortunately writing a reproducer is not possible at the moment (don't know how to extract the correct location of the code)

Here is what we found out so far what you can use as hint.

We very often use the following construct to get dynamic memory in Fortran77 (our main program is in C and a lot of sub code is written in Fortran77):

subroutine(maxstc)

integer maxstc <- from subroutine argument

integer*8 adlist
integer*8 oflist
integer*8 stlist(1)

call integeralloc (maxstc, stlist, adlist, oflist) <- call of a c routine which calls malloc

....

where adlist is the returned address from c malloc and oflist is the difference of stlist and adlist.

Then we use this as follows

stlist(1+oflist+iterator) = something

In some subroutines we encountered wrong optimization with ifx (-O3 -Ob2) but ifort does the right job.

When we change integer*8 stlist(1) to integer*8 stlist(2) then everything works like in debug or in ifort etc.

That would indicate that this stlist array is wrongly optimized away!?

Best regards

Frank

jimdempseyatthecove · ‎07-08-2022

The above is indicative of pre-allocatable Fortran code using Cray pointers to memory blocks. I suggest you make stlist allocatable.

integer*8, allocatable :: stlist(:)

...

allocate(stlist(sizeYouWant)) ! was call integeralloc(...

...

slist(iterator) = something

Jim Dempsey

Frank_R_1 · ‎07-08-2022

Hi,

Thank you for your answer. We use a lot of legacy code in fixed Fortran77 style so we will not touch this code base.

From my point of view ifx should work exactly like ifort on the same input so thar I assume there is a optimization bug in the new ifx.

From what I have heard the ifx uses llvm forntend and optimizer and intel backend will create machine code. So the bug should be in the optimizer, since ifort does behave correctly

Another question concerning ifort->ifx,icc/icl->icx

We also want to use dwarf split for smaller object files, but with icx,ifx we do not get debug information in totalview like with icc/icl/ifort. What is the appropriate command line argument for this to achieve (lld linker uses command line argument --gdb-index)

Best regards

Frank

Frank_R_1 · ‎07-15-2022

Hi,

here i got a screenshot from one of our programs where we encountered problems with common blocks in intel ifort 2022.2:

As you can see there is no address behind the variables (-O3 -inline-level 2 -g):

Does anyone have a clue why this happens and what happens?

With ifort 2020.1 it works!

Best regards

Frank

Frank_R_1 · ‎07-15-2022

This only happens for this comm5.cmn!

We do not have double definitions or something like that.

In my opinion this can't be a linker problem since with ifort 2020.1 release/debug and 2022.2 debug it works!

Best regards

Frank

Steve_Lionel · ‎07-15-2022

Why do you expect there to be an address for the variable when you enable high optimization? The compiler may have figured out that it did not need to "materialize" that part of the COMMON in memory. Debug information is unreliable with optimization.

Frank_R_1 · ‎07-18-2022

Hi,

Because it is a common block variable which is used elsewhere in other compile units, therefore it has to have an address which holding the value.

And one can see in the debugger that the current register holding its address has value 0!

On Windows in debug and release and Linux debug it runs perfectly and one can the it's address.

I the past have reported a lot of wrong optimized code(icl/icc/ifort on both platforms) which was compiled with -O3 -inline-level 2, when going back to less optimization (O3 or O3 with -inline-level 0 or 1) these problems usually vanish.

Apparently I can only wait for next release and hoping this bug is ruled out, otherwise stay at Intel classic C/C++/Fortran 2020.1

Best regards

Frank

jimdempseyatthecove · ‎07-18-2022

S.L.>>The compiler may have figured out that it did not need to "materialize" that part of the COMMON in memory. Debug information is unreliable with optimization.

F.R.>>Because it is a common block variable which is used elsewhere in other compile units, therefore it has to have an address which holding the value.

Additionaly, the debugger's Debug symbols have scope. If (when) the compiler optimization optimizes out (either elides or registers) references to a symbol, it may either: eliminate the symbol from the scope, or provide a NULL address (registers do not have addresses).

Hint, try compiling the top level procedure (PROGRAM) without optimizations (and the remainder of the code with optimization), then when at break point some place else, and when having an interest to see the "missing" COMMON variable, set the call-stack focus to the top level procedure. Those variables should be visible then/there.

Jim Dempsey

Frank_R_1 · ‎07-18-2022

Unfortunately another problem occurs.

Our compiled code with Intel classic 2020.1 which runs bit identical on Intel Xeon 2 or lower and also on AMD Epyc processors now has deviations on the new Intel Xeon 3 processor : (

We built our product with:

Windows:

O3 -Ob2

icl -fp:consistent

ifort -fp:consistent

Linux:

O3 -inline-level=2

icc -fp-model consistent

ifort -fp-model consistent

With the new icx and ifort classic from Intel 2022.2 the results are bit identical also on Xeon 3 processor

Our system is a dual socket Xeon 3 Gold 6354 with 512gb ram.

Therefore we are highly interested to use the new compiler 2022.2, but there is this only one problem with the ifort and common block on Linux.

Best regards

Frank

Steve_Lionel · ‎07-18-2022

Try adding -fimf-arch-consistency=true This will make the Intel math library use the same code on all processors. But even so, you can't guarantee bit-same results when you change processors. See Improving Numerical Reproducibility in C/C++/Fortran (supercomputing.org) for more info.

Frank_R_1 · ‎07-21-2022

Unfortunately this didn't help : (

I think the Intel 2020.1 compiler does not generate fp consistent code for Xeon 3 since it is too old.

We will wait for Intel 2022.3 and try again.

Best regards

Frank