Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29389 Discussions

Possible bug in architecture targetting code

Keith_R_
Novice
1,423 Views
I am attempting to compile an executable of a large code which is portable to all fairly recent AMD as well as Intel processors using the run-time dispatch instructions. But the behaviour I see makes no sense and looks like a logic bug.

If I compile with options "-O3 -msse3", the application runs on both AMD and Opteron processors.

If I compile with "-O3 -msse3 -axssse3" the appication fails at runtime on Opteron (see below for specific capabilities) with
"Fatal Error: This program was not built to run on the processor in your system.
The allowed processors are: Intel Pentium 4 and compatible Intel processors with Intel Streaming SIMD Extensions 3 (Intel SSE3) instruction support."

Surely this code should just take the "-msse" default code path and run. I expect adding the -axssse3 flag to simply allow the possibility of taking other code-paths on processors with the corresponding instruction set. Is
this a bug in Composer?

Intel Fortran Intel 64 Compiler XE for applications running on Intel 64, Version 12.0.4.191 Build 20110427
Copyright (C) 1985-2011 Intel Corporation. All rights reserved.

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_ts
c rep_good nopl pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt

0 Kudos
9 Replies
TimP
Honored Contributor III
1,423 Views
If your intention is to make dual path code which takes ssse3 on Intel and sse3 on AMD, I think changing the order of the flags should accomplish it:
-axssse3 -msse3
but I wouldn't advise this. Even on a Woodcrest style CPU (the one which gets the most benefit from ssse3), the overhead of the extra code is likely to outweigh any local gain. On Nehalem and later CPUs, the ssse3 code may well be slower than sse3.
Remember that early Opteron CPUs (those as old as the Intel ssse3 CPUs) would not run sse3.
0 Kudos
Keith_R_
Novice
1,423 Views
Tim, you are correct. Well inasmuch as the executable made with the flags reversed starts on the
AMD processors and the error bort disappears. I can't tell what instruction set is actually being used.

Do you know where this order sensitivity of the flags is documented. I have read both the manual pages
and the documentation and missed that point entirely.

0 Kudos
TimP
Honored Contributor III
1,423 Views
In my opinion, this has been a confusing point in the documentation as well as in the operation of this combination of option flags. There should be examples which put the ax option first followed by the option to reset the fall-back code option, but I haven't seen such examples for many months. I don't think there has been enough customer input to create an incentive to improve it. Nor, as I indicated already, is that particular combination useful enough to make a case for explaining it better. Certainly, you would be justified in filing a problem report about the order of options which results in apparently bad code, when the compiler doesn't warn you to change your options. I suppose the problem report would have to be against current release (12.0.4) for it to stand a chance of being addressed in 13.0, but the issue is becoming more complicated now that AMD CPUs are coming out which support (in principle) each of the available ifort options.
0 Kudos
TimP
Honored Contributor III
1,423 Views
In my opinion, this has been a confusing point in the documentation as well as in the operation of this combination of option flags. There should be examples which put the ax option first followed by the option to reset the fall-back code option, but I haven't seen such examples presented for 12.0 compilers. I don't think there has been enough customer input to create an incentive to improve it. Nor, as I indicated already, is that particular combination useful enough to make a case for explaining it better. Certainly, you would be justified in filing a problem report about the order of options which results in apparently bad code, when the compiler doesn't warn you to change your options.
0 Kudos
TimP
Honored Contributor III
1,423 Views
In my opinion, this has been a confusing point in the documentation as well as in the operation of this combination of option flags. There should be examples which put the ax option first followed by the option to reset the fall-back code option, but I haven't seen such examples for many months. I don't think there has been enough customer input to create an incentive to improve it. Nor, as I indicated already, is that particular combination useful enough to make a case for explaining it better. Certainly, you would be justified in filing a problem report about the order of options which results in apparently bad code, when the compiler doesn't warn you to change your options. I suppose the problem report would have to be against current release (12.0.4) for it to stand a chance of being addressed in 13.0, but the issue is becoming more complicated now that AMD CPUs are coming out which support (in principle) each of the available ifort options.
0 Kudos
mecej4
Honored Contributor III
1,423 Views
Apart from the question as to how the order of issue of option flags affects selection of CPU instructions, there is one fact that raised a question in my mind.

The CPU capability flags that you listed include

sse, sse2, sse4a

but not

sse3, ssse3

Why would you specify flags that are not within the capability of your CPU and expect things to work? Or am I missing something?
0 Kudos
TimP
Honored Contributor III
1,423 Views
I suppose sse4a capability implies sse3. According to the posts, sse3 was working on this platform.
0 Kudos
Keith_R_
Novice
1,423 Views
Mecej4,

For reasons I don't inderstand, sse3 is always reported as "pni" - see http://en.wikipedia.org/wiki/SSE3#Intel_instructions.
That is why the "-msse3" code *does* work on that processor.
The absence of ssse3 on that processor was exactly the point in question - please read the
manual page to find out what "-axsss3" should do if you still don't understand.

I take Tim's point about the ssse3 instructions not being worth the bother though. Which raises another point-
is there any documentation/benchmarks anywhere showing actual performance figures using sse3, sse4.x and +AVX
for example double precision floating point programs or benchmarks? Do all of these instructions over and above
sse2 really gain anything?



0 Kudos
TimP
Honored Contributor III
1,423 Views
The advantages in the newer instructions accrue only in specific applications.
Probably the most significant gain available with sse3 is vectorization of complex arithmetic.
ssse3 promoted efficiency of certain mis-aligned vectorized memory accesses on Woodcrest CPUs, but Barcelona and Nehalem introduced superior handling of those without the special ssse3 instructions.
sse4.1 permits "vectorization" of some loops which involve scalar memory access.
AVX introduces several means for reducing number of instructions required for a given task, again with significant advantages only in specific situations, such as with MKL ?gemm.
0 Kudos
Reply