Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Strange code behaviour

cr362
Beginner
529 Views

Dear all,

I have a piece of code that is giving strange behaviour, producing different results. My question has two parts.

Part 1: I'm compiling my code with the following set of options. Is there anything wrong with selecting this combination of options? I.e. will certain options conflict with each other?

ifort -extend-source -O3 -fp-model precise -i_dynamic -mcmodel=large -parallel lininterp.for -ipo -ftz -xT -axT -no-prec-div

Part 2: Part of my code reads like this:

x=0

if (x.lt.0) write(6,*) x

Of course, the second line is not needed, if x is set to 0. So, I comment out the second line. BUT, when I comment out the line, I get a different end result in my code! Why might this be? -check bounds yields no problems. Note, if I exclude the option -no-prec-div, I do not get this problem (hence my first question).

Sorry if this is trival, I am really stuck. Happy to provide more info if needed. Cheers.

0 Kudos
3 Replies
rreis
New Contributor I
529 Views

from the man page:

---------------------------------------------------

-no-prec-div

Improves precision of floating-point divides.

Architectures: IA-32, Intel 64, IA-64 architectures

Default:

-prec-div The compiler uses this method for float-
ing-point divides.

Description:

This option improves precision of floating-point divides.
It has a slight impact on speed.

With some optimizations, such as -xSSE2 (Linux) or /QxSSE2
(Windows), the compiler may change floating-point division
computations into multiplication by the reciprocal of the
denominator. For example, A/B is computed as A * (1/B) to
improve the speed of the computation.

However, sometimes the value produced by this transforma-
tion is not as accurate as full IEEE division. When it is
important to have fully precise IEEE division, use this
option to disable the floating-point division-to-multipli-
cation optimization. The result is more accurate, with
some loss of performance.

If you specify -no-prec-div (Linux and Mac OS X) or
/Qprec-div- (Windows), it enables optimizations that give
slightly less precise results than full IEEE division.
----------------------------------------------------

0 Kudos
TimP
Honored Contributor III
529 Views

I don't see any problem with the options which would affect this behavior. It's possible that simplifying your source code permitted the compiler to perform additional optimizations, leading to approximations in division which affected your results. -no-prec-div could certainly produce different results in corner cases, particularly since you set -ftz explicitly. I usually set -prec-div explicitly unless I can demonstrate a reason for -no-prec-div, as CPU models of the last 3 years aren't so dependent on those approximations to enable good performance.

If you want both an old architecture and an SSSE3 code path generated, -axT (spelling prior to version 11.0) would do it. If you are running only on an SSSE3, -xT is sufficient.

I'm not certain what happens when you put the source file name in the middle of the option string.

0 Kudos
Martyn_C_Intel
Employee
529 Views

Taking on part 1:

-xT and -axT sort of conflict. As Tim explained, -xT creates an executable that uses SSE instructions up to SSSE3. -axT creates an executable with (in places) two code paths, one using SSE instructions up to SSSE3, the other a default code path using instructions up to and including SSE2. But adding -xT modifies this default code path to also use instructions up to including SSSE3; this second code path is therefore redundant, and causes an unnecessary (if small) overhead. So use one switch or the other according to your needs, as Tim explained, but not both.

If you are using the version 11 or later compiler, you might prefer to use the newer doems of these switches, which are more intuitive: -xssse3or -axssse3.

-mcmodel large allows both the code and data sections to exceed 2 GB. It is extremely rare for code to be so large, so for most normal applications that need to address more than 2 GB of static data, -mcmodel medium -i_dynamic should be sufficient. If the only data that exceeds 2GB total size is data that has been allocated dynamically, e.g. with ALLOCATE, -mcmodel mediumshould not beneeded.

-parallel invites the compiler to look for simple loops that can be threaded automatically. I encourage you to get your application running satisfactorily without -parallel before you add this option and measure whether it helps.

The same applies to -ipo. This can be powerful, but it adds to complexity. When you think you are doing a link, the real link is preceded by a second compilation phase that optimizes across source file boundaries. Build without this; then add it, and measure whether it helps.

-fp-model precise is an option that disables certain optimizations in the interest of more exact reproducibility of floating-point results, e.g. between different optimization levels. This comes at some (usually modest) cost in performance. The variations in floating-point results that might otherwise occur are usually very tiny, unless your program involves large cancellations. So you should only use this option if you have a particular need for more or less exact reproducibility. The options -ftz and -no-prec-div override the corresponding features of -fp-model precise and re-enable two optimizations that can lead to slight variations in floating point results. So you wouldn't normally use the combination

-fp-model precise -ftz -no-prec-div unless you had a fairly specific reason for doing so, usually based on testing of your own code.

If it were me, starting on a new application and assuminga need for 64 bit addressing, I'd begin with

ifort -extend-source -mcmodel=medium -shared-intel lininterp.for

(-shared-intel is the newer form of -i_dynamic, which is deprecated).

Next, I'd add -O3 and retest. Next, I'd try -ipo; and after that -parallel, retesting each time.

For part 2, I agree with Tim's comments. The WRITE statement is effectively a function call that could impact other optimizations.

If the variations in results are small, consistent with variations in rounding, I wouldn't worry about it. If you want results to be as consistent as possible, (e.g., between -O0 and -O3), use -fp-model precise without -no-prec-div. If you can accept tiny variations in results, then don't use any floating point options. Note that these variations don't necessarily mean that one result is more accurate than another, all results should be accurate within the expected uncertainty of a floating-point calculation.

If the variations in results seem too large to be explicable by variations in rounding effects, this might be an indication either ofuninitialized variables or of a compiler bug. In this case, we'd need more detail and an example that we could compile and execute to reproduce the problem, in order to investigate further.

0 Kudos
Reply