- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am attempting to compile an executable of a large code which is portable to all fairly recent AMD as well as Intel processors using the run-time dispatch instructions. But the behaviour I see makes no sense and looks like a logic bug.
If I compile with options "-O3 -msse3", the application runs on both AMD and Opteron processors.
If I compile with "-O3 -msse3 -axssse3" the appication fails at runtime on Opteron (see below for specific capabilities) with
"Fatal Error: This program was not built to run on the processor in your system.
The allowed processors are: Intel Pentium 4 and compatible Intel processors with Intel Streaming SIMD Extensions 3 (Intel SSE3) instruction support."
Surely this code should just take the "-msse" default code path and run. I expect adding the -axssse3 flag to simply allow the possibility of taking other code-paths on processors with the corresponding instruction set. Is
this a bug in Composer?
Intel Fortran Intel 64 Compiler XE for applications running on Intel 64, Version 12.0.4.191 Build 20110427
Copyright (C) 1985-2011 Intel Corporation. All rights reserved.
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_ts
c rep_good nopl pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt
If I compile with options "-O3 -msse3", the application runs on both AMD and Opteron processors.
If I compile with "-O3 -msse3 -axssse3" the appication fails at runtime on Opteron (see below for specific capabilities) with
"Fatal Error: This program was not built to run on the processor in your system.
The allowed processors are: Intel Pentium 4 and compatible Intel processors with Intel Streaming SIMD Extensions 3 (Intel SSE3) instruction support."
Surely this code should just take the "-msse" default code path and run. I expect adding the -axssse3 flag to simply allow the possibility of taking other code-paths on processors with the corresponding instruction set. Is
this a bug in Composer?
Intel Fortran Intel 64 Compiler XE for applications running on Intel 64, Version 12.0.4.191 Build 20110427
Copyright (C) 1985-2011 Intel Corporation. All rights reserved.
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_ts
c rep_good nopl pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt
Link Copied
9 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If your intention is to make dual path code which takes ssse3 on Intel and sse3 on AMD, I think changing the order of the flags should accomplish it:
-axssse3 -msse3
but I wouldn't advise this. Even on a Woodcrest style CPU (the one which gets the most benefit from ssse3), the overhead of the extra code is likely to outweigh any local gain. On Nehalem and later CPUs, the ssse3 code may well be slower than sse3.
Remember that early Opteron CPUs (those as old as the Intel ssse3 CPUs) would not run sse3.
-axssse3 -msse3
but I wouldn't advise this. Even on a Woodcrest style CPU (the one which gets the most benefit from ssse3), the overhead of the extra code is likely to outweigh any local gain. On Nehalem and later CPUs, the ssse3 code may well be slower than sse3.
Remember that early Opteron CPUs (those as old as the Intel ssse3 CPUs) would not run sse3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim, you are correct. Well inasmuch as the executable made with the flags reversed starts on the
AMD processors and the error bort disappears. I can't tell what instruction set is actually being used.
Do you know where this order sensitivity of the flags is documented. I have read both the manual pages
and the documentation and missed that point entirely.
AMD processors and the error bort disappears. I can't tell what instruction set is actually being used.
Do you know where this order sensitivity of the flags is documented. I have read both the manual pages
and the documentation and missed that point entirely.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my opinion, this has been a confusing point in the documentation as well as in the operation of this combination of option flags. There should be examples which put the ax option first followed by the option to reset the fall-back code option, but I haven't seen such examples for many months. I don't think there has been enough customer input to create an incentive to improve it. Nor, as I indicated already, is that particular combination useful enough to make a case for explaining it better. Certainly, you would be justified in filing a problem report about the order of options which results in apparently bad code, when the compiler doesn't warn you to change your options. I suppose the problem report would have to be against current release (12.0.4) for it to stand a chance of being addressed in 13.0, but the issue is becoming more complicated now that AMD CPUs are coming out which support (in principle) each of the available ifort options.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my opinion, this has been a confusing point in the documentation as well as in the operation of this combination of option flags. There should be examples which put the ax option first followed by the option to reset the fall-back code option, but I haven't seen such examples presented for 12.0 compilers. I don't think there has been enough customer input to create an incentive to improve it. Nor, as I indicated already, is that particular combination useful enough to make a case for explaining it better. Certainly, you would be justified in filing a problem report about the order of options which results in apparently bad code, when the compiler doesn't warn you to change your options.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my opinion, this has been a confusing point in the documentation as well as in the operation of this combination of option flags. There should be examples which put the ax option first followed by the option to reset the fall-back code option, but I haven't seen such examples for many months. I don't think there has been enough customer input to create an incentive to improve it. Nor, as I indicated already, is that particular combination useful enough to make a case for explaining it better. Certainly, you would be justified in filing a problem report about the order of options which results in apparently bad code, when the compiler doesn't warn you to change your options. I suppose the problem report would have to be against current release (12.0.4) for it to stand a chance of being addressed in 13.0, but the issue is becoming more complicated now that AMD CPUs are coming out which support (in principle) each of the available ifort options.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Apart from the question as to how the order of issue of option flags affects selection of CPU instructions, there is one fact that raised a question in my mind.
The CPU capability flags that you listed include
sse, sse2, sse4a
but not
sse3, ssse3
Why would you specify flags that are not within the capability of your CPU and expect things to work? Or am I missing something?
The CPU capability flags that you listed include
sse, sse2, sse4a
but not
sse3, ssse3
Why would you specify flags that are not within the capability of your CPU and expect things to work? Or am I missing something?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose sse4a capability implies sse3. According to the posts, sse3 was working on this platform.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mecej4,
For reasons I don't inderstand, sse3 is always reported as "pni" - see http://en.wikipedia.org/wiki/SSE3#Intel_instructions.
That is why the "-msse3" code *does* work on that processor.
The absence of ssse3 on that processor was exactly the point in question - please read the
manual page to find out what "-axsss3" should do if you still don't understand.
I take Tim's point about the ssse3 instructions not being worth the bother though. Which raises another point-
is there any documentation/benchmarks anywhere showing actual performance figures using sse3, sse4.x and +AVX
for example double precision floating point programs or benchmarks? Do all of these instructions over and above
sse2 really gain anything?
For reasons I don't inderstand, sse3 is always reported as "pni" - see http://en.wikipedia.org/wiki/SSE3#Intel_instructions.
That is why the "-msse3" code *does* work on that processor.
The absence of ssse3 on that processor was exactly the point in question - please read the
manual page to find out what "-axsss3" should do if you still don't understand.
I take Tim's point about the ssse3 instructions not being worth the bother though. Which raises another point-
is there any documentation/benchmarks anywhere showing actual performance figures using sse3, sse4.x and +AVX
for example double precision floating point programs or benchmarks? Do all of these instructions over and above
sse2 really gain anything?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The advantages in the newer instructions accrue only in specific applications.
Probably the most significant gain available with sse3 is vectorization of complex arithmetic.
ssse3 promoted efficiency of certain mis-aligned vectorized memory accesses on Woodcrest CPUs, but Barcelona and Nehalem introduced superior handling of those without the special ssse3 instructions.
sse4.1 permits "vectorization" of some loops which involve scalar memory access.
AVX introduces several means for reducing number of instructions required for a given task, again with significant advantages only in specific situations, such as with MKL ?gemm.
Probably the most significant gain available with sse3 is vectorization of complex arithmetic.
ssse3 promoted efficiency of certain mis-aligned vectorized memory accesses on Woodcrest CPUs, but Barcelona and Nehalem introduced superior handling of those without the special ssse3 instructions.
sse4.1 permits "vectorization" of some loops which involve scalar memory access.
AVX introduces several means for reducing number of instructions required for a given task, again with significant advantages only in specific situations, such as with MKL ?gemm.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page