VS2010 installation problem

bkimbrough · ‎07-23-2012

Hello,I have recently installed VS2010 and Intel Parallel Studio XE. I have been trying to use the ArrayNotation example from the Intel website.I am unable to resolve the following error: "ERROR: identifier __assume_aligned is undefined"I'm sure there is something basic I need to due, but I have been unable to sort it out.Additionally, the array notation in the code is showing up as red underline as if it isn't recognized syntax - How can I resolve this.Any help would be greatly apprciated, I have been pulling my hair out.Regards,Brad

Georg_Z_Intel · ‎07-23-2012

Hello Brad,

did you switch the solution to Intel C++ Compiler?

Especially the "__assume_aligned(a,n)" directive is built-in and won't require any more action, besides selecting "Use Intel C++".

Regards,

Georg Zitzlsberger

TimP · ‎07-23-2012

While gcc-compatibility in this respect has recognized value, I'm not certain it is implemented in past Windows compilers, where you may need the equivalent declspec. CEAN array notation should be recognized in any ICL 12.x version, so you should check that you have switched in ICL in place of CL.

Georg_Z_Intel · ‎07-23-2012

Hello Tim,

it is supported and semantically different to the __declspec(align(...)) variants, see documentation:

Feature	Description
`__declspec(align(n))`	Directs the compiler to align the variable to an `n`-byte boundary. Address of the variable is `address mod n=0`.
`__declspec(align(n,off))`	Directs the compiler to align the variable to an `n`-byte boundary with offset off within each `n`-byte boundary. Address of the variable is `address mod n=off`.



`__assume_aligned(a,n)`	Instructs the compiler to assume that array `a` is aligned on an `n`-byte boundary; used in cases where the compiler has failed to obtain alignment information.

Best regards,

Georg Zitzlsberger

bkimbrough · ‎07-23-2012

Hello Georg,

Thank you for your suggestion.

I verified that I have been using the Intel C++ compiler. I can build the project without errors. However, it doesn't appear to be running any faster than the serial implementation. I noticed your __assume_aligned is also underlined red. Any ideas on how to fix this.

Regards,

Brad Kimbrough

bkimbrough · ‎07-23-2012

Hello,

I can build the ArrayNotation example, and it will run without error. However, It is not vectorizing one of the loops as outlined in the example documentation.

Here is my code:

#include

#define S 1024

#define TCOUNT 16

// Use 16-byte alignment for a CPU with 128-bit vector registers. For the CPUs with Intel AVX support

// use 32-byte alignment, and use 64-byte alignment for Intel MIC architecture.

#define ALIGNMENT 16

#define S 1024

#define ITERS 1024*1024*10

// Request the compiler to use 16-byte alignments for the arrays.

__declspec(align(16)) float A~~, B~~, C;~~~~

__declspec(align(16)) int mask;

int main() {

// Initialize the global arrays

A[:] = 0.0f;

B[:] = 1.0f / (A[:] + 1);

C[:] = B[:];

mask[:] = 0; mask[0:S/2:2] = 1;

for (int i = 0; i < ITERS; i++) {

//Invocation of the Array Notation implementation

startTime = clock_it();

longvector(A,B,C,1.1f,mask);

endTime = clock_it();

execTime += (endTime - startTime);

}

printf("Time taken in seconds with default Vector Length Array Notation implementation is %2.6f\n", execTime);

return 0;

}

__declspec(noinline) void longvector(float A~~, float B~~, float C~~, float k,~~~~~~

int mask~~) {~~

// Let the compiler know it is safe to assume that the function arguments

// are 64-byte aligned.

__assume_aligned(A,ALIGNMENT);

__assume_aligned(B,ALIGNMENT);

__assume_aligned(C,ALIGNMENT);

if (mask[:]) {

A[:] = B[:] + C[:] * k;

}

The loop in the longvector() function is not being vectorized as it should. The report is:

ArrayNotation.cpp(56): warning : loop was not vectorized: existence of vector dependence.

ArrayNotation.cpp(56:5-56:5):VEC:?longvector@@YAXQAM00MQAH@Z: loop was not vectorized: existence of vector dependence

ArrayNotation.cpp(57:7-57:7):VEC:?longvector@@YAXQAM00MQAH@Z: potential FLOW dependence between A and B.

1> potential ANTI dependence between B and A.

ArrayNotation.cpp(56): warning : loop was not vectorized: existence of vector dependence.

ArrayNotation.cpp(56:5-56:5):VEC:?longvector@@YAXQAM00MQAH@Z: loop was not vectorized: existence of vector dependence

ArrayNotation.cpp(57:7-57:7):VEC:?longvector@@YAXQAM00MQAH@Z: potential FLOW dependence between A and B.

1> potential ANTI dependence between B and A.

This is the same report given when simply trying to implement the scalar version of the code.

Any help would be much appreciated.

Thank you,

Brad Kimbrough

Georg_Z_Intel · ‎07-24-2012

Hello Brad,

thank you for the small code example. This makes it easy for us to reproduce. I'm using the latest update version (Intel Composer XE 2011 Update 11) in the following.

The reason IntelliSense from Microsoft Visual Studio* does underline some keywords/directives is because our integration misses to register them. I've created a ticket to fix that in a future release (DPD200294636). It's not critical, though. You can continue without problems.

Using the example you provided I see that function "longvector(...)" is vectorized (excerpt from the function):

[plain].B2.2: ; Preds .B2.2 .B2.1 $LN110: movaps xmm7, XMMWORD PTR [edx+esi*4] ;23.7 $LN111: cvtps2pd xmm3, xmm7 ;23.7 $LN112: movdqu xmm4, XMMWORD PTR [edi+esi*4] ;22.5 $LN113: movhlps xmm7, xmm7 ;23.7 $LN114: pcmpeqd xmm4, xmm5 ;22.5 $LN115: cvtps2pd xmm0, xmm7 ;23.7 $LN116: movaps xmm7, XMMWORD PTR [ecx+esi*4] ;23.7 $LN117: pxor xmm4, xmm6 ;22.5 $LN118: cvtps2pd xmm1, xmm7 ;23.7 $LN119: movhlps xmm7, xmm7 ;23.7 $LN120: cvtps2pd xmm7, xmm7 ;23.7 $LN121: mulpd xmm1, xmm2 ;23.7 $LN122: mulpd xmm7, xmm2 ;23.7 $LN123: addpd xmm3, xmm1 ;23.7 $LN124: addpd xmm0, xmm7 ;23.7 $LN125: movups xmm1, XMMWORD PTR [eax+esi*4] ;23.7 $LN126: cvtpd2ps xmm3, xmm3 ;23.7 $LN127: cvtpd2ps xmm0, xmm0 ;23.7 $LN128: movlhps xmm3, xmm0 ;23.7 $LN129: andps xmm3, xmm4 ;23.7 $LN130: andnps xmm4, xmm1 ;23.7 $LN131: orps xmm3, xmm4 ;23.7 $LN132: movaps XMMWORD PTR [eax+esi*4], xmm3 ;23.7 $LN133: add esi, 4 ;22.5 $LN134: cmp esi, 1024 ;22.5 $LN135: jb .B2.2 ; Prob 99% ;22.5[/plain]
The *ps and *pd op-codes (e.g. andps, mulpd, etc.) indicate packed operations (p = packed), which is good!

So, why don't you see it:

I don't have your implementation of "clock_it()". However, keep in mind that there are some implementations of timer functions that don't work as expected on multi-core systems.
I'd recommend to use the "rdtsc()" intrinsic which reads the clock ticks.
Also, and in general for benchmarking, you might turn off Intel SpeedStep, Intel Turbo Boost and (optional) Intel Hyper-Threading. The reason for this is to get comparable CPU performance between benchmark runs.
Maybe you're using an older compiler version that cannot vectorize the code you provided. The most recent version can to: I don't get warnings about dependencies in the vectorization report and the above assembly is created.

Best regards,

Georg Zitzlsberger

Georg_Z_Intel · ‎10-10-2012

Hello, I'd like to inform you that we can not fix DPD200294636 (incorrect syntax highlighting of keywords/directives). The reason is the API of integrations into Microsoft Visual Studio* that would require unreasonable efforts on our side. So, please ignore the underlining of IntelliSense in such cases. Best regards, Georg Zitzlsberger