Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28504 Discussions

ifort of oneAPI package generates slower executable

Jon_D
New Contributor I
1,053 Views

I am noticing a large slowdown (around 50%) in my code when I compile it with the latest ifort (2021.0.5) distributed as part of oneAPI compared to the Parallel Studio XE version 19.1.3.311. When I perform a hotspots checking with VTune, I see  functions like

_for_ieee_quite_eq_k8_

_for_ieee_signaling_gt_k8

showing up in oneAPI version of the compiler but not in the Parallel Studio XE version. I see that these function are invoked in logical comparisons and they seem to be adding quite a bit of extra overhead which is leading to the 50% slowdown.

Is there any way to turn off or limit the impact of these functions on the code efficiency? Again, they don't show up in pre-oneAPI version of the compiler.

Thanks for any help,

Jon 

0 Kudos
1 Solution
andrew_4619
Honored Contributor II
1,009 Views

I don't see anything interesting in those options so I am out of suggestions. Maybe someone from Intel can comment.

check out standard-semantics some things such as "ieee_compares" stand out.....

Option standard-semantics enables option 

fpscomp logicals and the following settings for option 
assume: byterecl, failed_images, fpe_summary, ieee_compares, ieee_fpe_flags (if the 
fp-model option setting is strict or precise), minus0, nan_compares, noold_inquire_recl, noold_ldout_format, noold_ldout_zero, noold_maxminloc, noold_unit_star, noold_xor, protect_parens, realloc_lhs, recursion, std_intent_in, std_minus0_rounding, std_mod_proc_name, and std_value

 

View solution in original post

6 Replies
andrew_4619
Honored Contributor II
1,030 Views

I would suggest posting  or looking at your build options. Various run time checking options can have a big impact. The items you quote look like checks for NANs.

0 Kudos
Jon_D
New Contributor I
1,018 Views

Here are my build options:

/nologo

/O2

/Qipo

/fpp 

/standard-semantics

/Qdiag-disable:5462,8290,10212  

/libs:static

/threads

/c

/fp:consistent

/assume:norealloc_lhs

0 Kudos
andrew_4619
Honored Contributor II
1,010 Views

I don't see anything interesting in those options so I am out of suggestions. Maybe someone from Intel can comment.

check out standard-semantics some things such as "ieee_compares" stand out.....

Option standard-semantics enables option 

fpscomp logicals and the following settings for option 
assume: byterecl, failed_images, fpe_summary, ieee_compares, ieee_fpe_flags (if the 
fp-model option setting is strict or precise), minus0, nan_compares, noold_inquire_recl, noold_ldout_format, noold_ldout_zero, noold_maxminloc, noold_unit_star, noold_xor, protect_parens, realloc_lhs, recursion, std_intent_in, std_minus0_rounding, std_mod_proc_name, and std_value

 

Jon_D
New Contributor I
991 Views

standard-semantics build option was indeed the issue. I set /assume: noieee_compares and the program now runs even faster than the one generated with the Parallel Studio XE version of the compiler.

I am curious though: What is the benefit of turning on the ieee_compares flag, given how much overhead it introduces to the program? What does it exactly do?

Jon

0 Kudos
andrew_4619
Honored Contributor II
978 Views

I guess it gives increased quality as errors get flagged rather than  staying hidden and maybe giving bad results. But if your code is bomb proof that safety guard just slows you down.... 

0 Kudos
Steve_Lionel
Honored Contributor III
964 Views

If one is using the IEEE_ARITHMETIC, etc. modules, the compiler needs to know that it can't take shortcuts or make assumptions about compares, etc. The ieee_compares option is new to me, but I know that in the past it was advised to use /fp:strict when using these modules. I speculate that the options were broken out and then folded into standard-semantics, which makes sense.

0 Kudos
Reply