Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
7782 Discussions

[Questton] How to build sample codes for Intel AVX


CPU:Intel Core i7-2600
OS:Windows XP sp3
Compiler:Intel(R) C++ Compiler XE [IA-32]

I use the code in vec_samples in "Tutorial: Auto-vectorization".
(path: ..\Composer XE 2011 SP1\Samples\en_US\C++\vec_samples)

when command line switch is: /O2 /QaxSSE3 /Qstd=c99, the vectorized code is SSE2.

Then I change align(16) to align(32) and add some "#pragma" directives.
when command line switch is /QxAVX, the vectorized code is still SSE2!!

So my question is "how to generate the AVX codr by auto-vectorization"?

_declspec(align(32)) FTYPE a[ROW][COLWIDTH];
_declspec(align(32)) FTYPE b[ROW];
_declspec(align(32)) FTYPE x[COLWIDTH];

void matvec(int size1, int size2, FTYPE a[][size2], FTYPE b[], FTYPE x[])
    int i, j;

for (i = 0; i < size1; i++) {

   b = 0;

#pragma simd
#pragma vector aligned
   for (j = 0;j < size2; j++) {
      b += a * x;

0 Kudos
3 Replies
Black Belt

It's difficult to believe the assertion that /QxAVX produces SSE2.  It's possible that AVX-128 might be used to handle odd values of size2; the align(32) couldn't do anything useful unless size2 were known to correspond to a multiple of 32 bytes.

QaxSSE3 of course will produce SSE2, as you are asking for SSE2 and also a separate SSE3 path in the case where the compiler sees value, but in this case there is no value in SSE3 unless FTYPE is a supported complex data type.

If you choose to use #pragma simd for sum reduction, you must specify the reduction in accordance with the model.

 FTYPE bsum = 0;

#pragma simd reduction(+: bsum)
#pragma vector aligned
   for (j = 0;j < size2; j++) {
      bsum += a * x;

   b = bsum;

I myself don't see the appeal of this; if you want an Intel extension, there are Cilk reducers; if not, there is inner_product().  The only point of the #pragma simd reduction is to over-ride your compiler option, in case you set an option such as -fp:source which turns off reduction vectorization.

Valued Contributor II
Intel Core i7-2600 supports SSE3 and it is not clear why SSE3 instructions are not generated. In case of AVX it was mentioned many times ( on the forum ) that an operating system should support AVX technology. Please take a look at:
Valued Contributor II
>>...It's difficult to believe the assertion that /QxAVX produces SSE2... The user actually proved it... because his operating system is Windows XP SP3 and I don't see another reason. Please correct me if I'm wrong.