Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

ifort to ifx transition problems

Frank_R_1
Beginner
3,541 Views

Hi,

We use Intel 2022.2.0 and want to change from ifort to ifx.

We get bit identical results for all our regressiontests on Windows and Linux in debug and release (O3 -Ob2/O3 -inline-level=2) builds with

Windows:

icl -fp:consistent

ifort -fp:consistent

Linux:

icc -fp-model consistent

ifort -fp-model consistent

 

Unfortunately the ifort compiler has a bug with common blocks on Linux(same problem in debug and release) so that we tried out ifx compiler.

To get hopefully the same behavior as in icc/icl ifort case we use:

Windows:

icx -fp=precise -Qimf-arch-consistency=true -Qfma-

ifx -fp=precise -Qimf-arch-consistency=true -Qfma-

Linux:

icx -fp-model=precise -fimf-arch-consistency=true -no-fma

ifx -fp-model=precise -fimf-arch-consistency=true -no-fma

 

In debug we get the same results as with icl/icc and ifort and also the common block problem vanishes.

But in release builds we got heavily different results for the ifx tests.

What compile flags did you recommend to get the same bit identical results as we have in classic compiler?

 

We also want to use dwarf split for smaller object files, but with icx,ifx we do not get debug information in totalview. What is the appropriate command line argument for this to archive?

 

Best regards

Frank

0 Kudos
28 Replies
jimdempseyatthecove
Honored Contributor III
589 Views

try using:

  -arch IVYBRIDGE

 

(for all systems)

If that works, then you might be able to isolate the specific source files that require the older ISA to produce consistent results.

 

Jim Dempsey

0 Kudos
Frank_R_1
Beginner
578 Views

Thank you for your answer. What exactly does -arch IVYBRIDGE and how can I exploit this to find the sources files which could probably cause problems on Xeon 3. Could it be that Intel 2020.1 classic C/C++ compiler is too old to create code for Xeon 3 to be consistent?

 

Best regards

Frank

0 Kudos
jimdempseyatthecove
Honored Contributor III
572 Views

Different architectures over the years have added (and removed) instructions. For example Fused Multiply Add/Subtract. When these instructions are incorporated into the compiler, it is often the case that this requires more code change than a simple substitution. This results in having to have a broader area of code to be changed in order to use the newer instruction. These changes from ISA to ISA (generally) improve performance, though with the potential of producing different approximate results (round off differences).

 

I believe (you must check) that the Xeon 3 you have was built on the Ivy Bridge series of processors. Therefore, if you use -arch IVYBRIDGE the compiler will target the code to an Ivy Bridge ISA, and not to, say, your current build system or build to use two code paths (one for Host build system ISA and one for "generic" SSE4.n ISA).

What I am suggesting you do is to instruct the compiler to generate only one code path. That being that of the least common ISA amongst the systems you intend to run the code upon.

 

Note, this (-arch IVYBRIDGE) may need to be performed on only the files that cause the differences in the results.

 

Additional note:

 

In addition to ISA differences, if your code is multi-threaded. Differences in thread count can produce different results. For example, summing the values of an array can (at times) produce different results depending on the number of partitions (threads) producing partial results.

 

Jim Dempsey

0 Kudos
Frank_R_1
Beginner
570 Views

We have this processor with dual socket

https://ark.intel.com/content/www/de/de/ark/products/212460/intel-xeon-gold-6354-processor-39m-cache-3-00-ghz.html

We use Intel 2020.1 C/C++/Fortran and the compiler settings are:

C
icl.exe -nologo -Qvc14.2 -fp:consistent -Qimf-arch-consistency:true -MD -Qrestrict -Wall -Wp64 -EHsc -Qstd=c11 -Qlong_double -Qpc80 -O3 -Ob2 -DNDEBUG -Oi -Z7 -Zo
C++
icl.exe -nologo -Qvc14.2 -fp:consistent -Qimf-arch-consistency:true -MD -Qrestrict -Wall -Wp64 -EHsc -Qstd=c++17 -Qeffc++ -Qcxx-features -O3 -Ob2 -DNDEBUG -Oi -Z7 -Zo
Fortran
ifort.exe -nologo -Qvc14.2 -fp:consistent -Qimf-arch-consistency:true -MD -warn:nousage,declarations,truncated_source,interfaces,general -4I4 -4L72 -fpp -names:lowercase -assume:underscore -W1 -check:none -O3 -Ob2 -DNDEBUG -Z7 -Zo -debug:all -debug-parameters:all

 

So we do not use AVX or any other extension explicitly, nor do we use 2 different code paths like -Qax<code1>[,<code2>,...]

Is -Qx<code> with SSE4.2 the default?

 

Thanks in advance and best regards

Frank

 

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
561 Views

You stated that you have (at least) two different CPU generations: "Xeon 3" and "Xeon Gold 6354" and that you want consistent results between the two systems. This will require using the same instruction set (common to both/all) for the build.

Check the instruction set for the oldest generation CPU that you have.

I suggest a methodology that locates (isolates) the problem at the sacrifice of performance. Then tweak up the performance (newer ISA), of the instruction set until difference is found. Then back off the ISA for all files (to consistent results). Then advance, ISA file by file to locate the problematic file.

Start with single code path.

 

Order to test: SSE4.1, SSE4.2, AVX, AVX2

*** If your Xeon 3 is quite old, then it may not support AVX2. And it which case, your program may crash with illegal instruction fault. If this is the case, then the most common instruction set is AVX.

 

Once you find the most common code path, then you can consider, on a file by file bases, introducing dual code path. For example

-xAVX -axICELAKE-SERVER

 

IOW start with all files compiled with -xAVX (assuming consistent results). Then one file at a time, introduce dual path with ICELAKE-SERVER (assuming that is the correct ISA for your Xeon Gold 6354). Do this until the specific file produced different results, if so, compile that file using AVX. And then progress dual code path-ing the remaining file(s), retrograding to single path AVX when required.

 

Jim Dempsey

0 Kudos
Frank_R_1
Beginner
558 Views

Uh I think I didn't explain too well what I mean by Xeon 3 : )

With" Xeon 3" I mean Xeon processors of 3rd generation (e.g. our "Xeon Gold 6354" )!

 

So far our code runs on Xeon 2nd generation or older at thousands of customer machines (even AMD Epyc or non server cpu's like core i7).

We also have a lot of different Intel and AMD machines (Xeon 2nd gen or older, AMD Epyc etc.) in our development and engineering sites and get bit identical results on Linux and Windows regardless of debug and release builds. But as we run test on "Xeon Gold 6354" our regression tests show some deviations that we never saw before. As I said with Intel 2022.2 icx/ifort we get bit identical results on "Xeon 3" with nearly the same compiler options as in Intel 2020.1 with icl/icc/ifort.

 

I would like to know what the standard instruction set is taken of the compiler if we do not explicitly use -Qx: ... or -Qax:...?

 

Thank you for your time and best regards

Frank

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
552 Views

>>I would like to know what the standard instruction set is taken of the compiler if we do not explicitly use -Qx: ... or -Qax:...?

That is subject to change depending on the version of the compiler. It would be best if you specify a desired instruction set.

Current compiler(s) default to SSE2.

You may find AVX or AVX2 is compatible amongst system...

but also provide an SSE2 version for legacy systems (with a caveat that the results may not precisely match those of newer generation CPUs).

 

Jim Dempsey

0 Kudos
Frank_R_1
Beginner
488 Views

Hi,

 

Thanks again for your answer. Here is just some Information: We found out that it is the Intel 2020.1.0 MKL which does not behave as expected on Xeon 3rd gen. We use mkl_cbwr_set(CBWR_COMPATIBLE) but this does not work on Xeon 3rd gen. By using the Intel 2022.2.0 MKL it works as expected, so THIS time it is not the compiler which makes problems : )

 

Best regards

Frank

 

0 Kudos
Reply