- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,I have recently installed VS2010 and Intel Parallel Studio XE. I have been trying to use the ArrayNotation example from the Intel website.I am unable to resolve the following error: "ERROR: identifier __assume_aligned is undefined"I'm sure there is something basic I need to due, but I have been unable to sort it out.Additionally, the array notation in the code is showing up as red underline as if it isn't recognized syntax - How can I resolve this.Any help would be greatly apprciated, I have been pulling my hair out.Regards,Brad
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
did you switch the solution to Intel C++ Compiler?
Especially the "__assume_aligned(a,n)" directive is built-in and won't require any more action, besides selecting "Use Intel C++".
Regards,
Georg Zitzlsberger
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it is supported and semantically different to the __declspec(align(...)) variants, see documentation:
Feature |
Description |
---|---|
__declspec(align(n)) |
Directs the compiler to align the variable to an n-byte boundary. Address of the variable is address mod n=0. |
__declspec(align(n,off)) |
Directs the compiler to align the variable to an n-byte boundary with offset off within each n-byte boundary. Address of the variable is address mod n=off. |
__assume_aligned(a,n) |
Instructs the compiler to assume that array a is aligned on an n-byte boundary; used in cases where the compiler has failed to obtain alignment information. |
Best regards,
Georg Zitzlsberger
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thank you for the small code example. This makes it easy for us to reproduce. I'm using the latest update version (Intel Composer XE 2011 Update 11) in the following.
The reason IntelliSense from Microsoft Visual Studio* does underline some keywords/directives is because our integration misses to register them. I've created a ticket to fix that in a future release (DPD200294636). It's not critical, though. You can continue without problems.
Using the example you provided I see that function "longvector(...)" is vectorized (excerpt from the function):
[plain].B2.2: ; Preds .B2.2 .B2.1 $LN110: movaps xmm7, XMMWORD PTR [edx+esi*4] ;23.7 $LN111: cvtps2pd xmm3, xmm7 ;23.7 $LN112: movdqu xmm4, XMMWORD PTR [edi+esi*4] ;22.5 $LN113: movhlps xmm7, xmm7 ;23.7 $LN114: pcmpeqd xmm4, xmm5 ;22.5 $LN115: cvtps2pd xmm0, xmm7 ;23.7 $LN116: movaps xmm7, XMMWORD PTR [ecx+esi*4] ;23.7 $LN117: pxor xmm4, xmm6 ;22.5 $LN118: cvtps2pd xmm1, xmm7 ;23.7 $LN119: movhlps xmm7, xmm7 ;23.7 $LN120: cvtps2pd xmm7, xmm7 ;23.7 $LN121: mulpd xmm1, xmm2 ;23.7 $LN122: mulpd xmm7, xmm2 ;23.7 $LN123: addpd xmm3, xmm1 ;23.7 $LN124: addpd xmm0, xmm7 ;23.7 $LN125: movups xmm1, XMMWORD PTR [eax+esi*4] ;23.7 $LN126: cvtpd2ps xmm3, xmm3 ;23.7 $LN127: cvtpd2ps xmm0, xmm0 ;23.7 $LN128: movlhps xmm3, xmm0 ;23.7 $LN129: andps xmm3, xmm4 ;23.7 $LN130: andnps xmm4, xmm1 ;23.7 $LN131: orps xmm3, xmm4 ;23.7 $LN132: movaps XMMWORD PTR [eax+esi*4], xmm3 ;23.7 $LN133: add esi, 4 ;22.5 $LN134: cmp esi, 1024 ;22.5 $LN135: jb .B2.2 ; Prob 99% ;22.5[/plain]
The *ps and *pd op-codes (e.g. andps, mulpd, etc.) indicate packed operations (p = packed), which is good!
So, why don't you see it:
- I don't have your implementation of "clock_it()". However, keep in mind that there are some implementations of timer functions that don't work as expected on multi-core systems.
I'd recommend to use the "rdtsc()" intrinsic which reads the clock ticks.
Also, and in general for benchmarking, you might turn off Intel SpeedStep, Intel Turbo Boost and (optional) Intel Hyper-Threading. The reason for this is to get comparable CPU performance between benchmark runs. - Maybe you're using an older compiler version that cannot vectorize the code you provided. The most recent version can to: I don't get warnings about dependencies in the vectorization report and the above assembly is created.
Best regards,
Georg Zitzlsberger
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page