Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Gaiger_Chen
New Contributor I
449 Views

compiler optimized : - msse4.2 -xSSE4.2 and -arch SSE4.2 ?


Dear all:

I would like to know what is the different in the three option.

By my know, the -xSSE4.2 means : turn on the auto-vectorizing, by instruction which is support by Nehalem

CPUs.

By icc/ifort --help, the - msse4.2 and -arch SSE4.2 is very similiar with -xSSE4.2

so, I would like to know what is the different.

thank you !!
0 Kudos
7 Replies
Dale_S_Intel
Employee
449 Views

Autovectorization is enabled at -O2 by default (which assumes SSE2 support, I believe). Support for other levels of SSE enable more opportunities for vectorization. The -x option will generate code that requires Intel processors, e.g. (from the output of 'icc -h'):

-x generate specialized code to run exclusively on processors
indicated by as described below
SSSE3 Intel Core2 processor family with Supplemental
Streaming SIMD Extensions 3 (SSSE3)
SSE4.1 Intel 45nm Hi-k next generation Intel Core
microarchitecture with support for Streaming SIMD
Extensions 4 (Intel SSE4) Vectorizing
Compiler and Media Accelerator instructions
SSE4.2 Can generate Intel SSE4 Efficient Accelerated String
and Text Processing instructions supported by Intel
Core i7 processors. Can generate Intel SSE4
Vectorizing Compiler and Media Accelerator, Intel SSSE3,
SSE3, SSE2, and SSE instructions and it can optimize for
the Intel Core processor family.
TimP
Black Belt
449 Views

-msse4.2 would be the linux/Mac equivalent of Windows /arch:SSE4.2. In principle, when fully implemented, they might avoid the platform identity check involved in -xSSE4.2, and should permit use of SSE4.2 intrinsics.
A full implementation of -msse4.2 might be desirable for compatibility with recent gcc, but I'd be reluctant to depend on that in place of -xSSE4.2 or -xSSE4.1. gcc doesn't have a good -march=corei7 option until gcc 4.6, which only just recently released.
Gaiger_Chen
New Contributor I
449 Views

Dear TimP:

So you means, the -xSSE4.2 would do the platform cheking ?

you means, if the code in pentium3 machine :

/*pentium3 is just support SSE1 , which do not support _m128i (4 int ) type*/
:
#define N 65536
int i;
int A;
:
for(i = 0; i< N; i++)
A = i + 1;

:

by -xSSE2 , the loop would NOT be vectorizing.
by -mSSE2, the loop woud be vectoring, thought the machine could not execute the binary ?


thank you lots.




TimP
Black Belt
449 Views

-msse2 is the default option for recent Intel compilers. I wouldn't expect much difference in auto-vectorization between -msse2 and -xSSE2; certainly, -xSSE2 should vectorize wherever -msse2 does.
-xSSE4.2, as you say, would make a binary which puts up a message and refuses to run on a platform which is not recognized as supporting the option.
-axSSE4.2 would run SSE4.2 code on the platform which is known to support it, and (by default) -msse2 code on a platform which is not so recognized.
If you want to support P-III with current compilers, you must use -mia32 (no SSE code; evidently, no vectorization). I don't know that it would refuse to accept m128i data types, but it could not use SSE code to implement them.
Gaiger_Chen
New Contributor I
449 Views

Dear Timp:

In turn, If my Compiling machine is cheap core architecture
(for example E6500, S-SSE3 only)

but runing machine is Nehalem. (for example, i7-980x cluster)

what option I should use ?

-axSSE4.2 or -msse4.2 ??

you means -msse4.2?

which I could use to force the compiler skiping local machine instruction set check ?

thank you.
TimP
Black Belt
449 Views

You can use any of the options to compile, if you don't care about running the code. If your application benefits from the SSE4.2 options, one would expect -xSSE4.2 to perform best on Core I7 Nehalem-Westmere.
If you build with -msse4.2 and try to run on your SSSE3 platform, instead of getting a message about not being built for your platform, you would expect an instruction fault, subject to the question of whether this option is fully implemented. If you built with -axSSE4.2, it will run the non-SSE4 instruction paths on the SSSE3 platform.
-msse2 code sometimes outperforms -xSSE4.2 when running on Westmere. If you want the answer, you'll need to try it. More frequently, -msse2 code will out-perform -xSSSE3 when running on Nehalem or Westmere.
Gaiger_Chen
New Contributor I
449 Views


Dear TimP

thank you.


Gaiger