- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm trying to implement a class (first in 1d) to behave like Fortran. I create a "Array1d" class that contains 3 infos: size, offset, and pointer of the array.
I provide the 3 files needed to be able to compile. Using everything to help the compiler (-O3 -xSSSE3 -restrict) I don't understand why one of my loops does not vectorize? (line 30 in main.cpp)
I define () operator to be able to do:
array(i)=something, with i which can be a positive or negative index (depends on the offset).
This operator implemented like (o1 is an offset):
T& restrict operator()(const int64_t i){return ptr[i-o1];}
If I do:
for(i=min;i<=max;++i)
array(i)=3.14f;
This loop does not vectorize. I don't understand why? because I thought that inlining process would perform this loop
for(i=min;i<=max;++i)
array.ptr[i-array.o1]=3.14f;
and this loop vectorize.
So why the icpc compiler cannot vectorize the first loop, and can do it on the explicit inlined version?And curiously when I have a look on the assembly code (generated using -S) it seems that there no problem with dependency issue...
I'm using the 11.1.064 intel c++ compiler on Ubuntu x86_64. It seems that the 11.1.069 give the same result.
Thank you by advance.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Raphael
Thank you for raising this issue. I verified that this loop can be vectorized on windows, but failedon linux. I have enteredthis in our problem tracking system. I will let you know when I have an update on this issue.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Yolanda for your quick answer.
Can you tell me with wich version you could vectorize on windows please? Because I collegue tried on windows before creating this thread and he couldn't vectorize.
I hope in the future, the correction will be able to vectorize more complicated expressions, on 2d 3d... arrays.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
try adding the option -ansi-alias . That seems to work with the given test case.
$ icpc -restrict -vec-report2 -V main.cpp
Intel C++ Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100203 Package ID: l_cproc_p_11.1.069
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
Edison Design Group C/C++ Front End, version 3.10.1 (Feb 3 2010 19:19:06)
Copyright 1988-2007 Edison Design Group, Inc.
main.cpp(20): (col. 5) remark: LOOP WAS VECTORIZED.
main.cpp(24): (col. 10) remark: LOOP WAS VECTORIZED.
main.cpp(30): (col. 5) remark: loop was not vectorized: existence of vector dependence.
GNU ld version 2.17.50.0.6-5.el5 20061020
$ icpc -restrict -vec-report2 -V -ansi-alias main.cpp
Intel C++ Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100203 Package ID: l_cproc_p_11.1.069
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
Edison Design Group C/C++ Front End, version 3.10.1 (Feb 3 2010 19:19:06)
Copyright 1988-2007 Edison Design Group, Inc.
main.cpp(20): (col. 5) remark: LOOP WAS VECTORIZED.
main.cpp(24): (col. 10) remark: LOOP WAS VECTORIZED.
main.cpp(30): (col. 5) remark: LOOP WAS VECTORIZED.
GNU ld version 2.17.50.0.6-5.el5 20061020
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In this case this loop can be vectorized. But as I said previously I hope that the compiler will be able to vectorize more complicated expression.
If I use -ansi-alias, first it is dangerous for other part of the code which does not correspond to the rule. From the icpc man:
If your program adheres to these rules, then this option
allows the compiler to optimize more aggressively. If it
doesn't adhere to these rules, then it can cause the com
piler to generate incorrect code.
second of all if I try something a little bit more complicted (but not so complicated):
[cpp]for(i=min;i<=max;++i) array_copy(i)=array(i);[/cpp]
unfortunately the compiler cannot vectorize this expression even with -ansi-alias.
but the worth thing is, the next expression does not vectorize too:
[cpp]for(i=min;i<=max;++i) array_copy.ptr[i-array_copy.o1]=array.ptr[i-array.o1];[/cpp]
It seems that the restrict qualifier is not well "propagated".
I know that, it is difficult for C/C++ compiler to vectorize compare to Fortran (because of aliasing rules). But I thought that restrict qualifier would be enough to avoid these problem and provide performances (especialy for computing science purpose).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Raphael
I tested with latest Intel C++Compiler for Windows. Version 11.1.054, and 11.1.048 also works. See:
C:\develop\bug\25731>icl /c /Qrestrict /Qvec-report2main.cpp
Intel C++ Compiler Professional for applications running on IA-32, Version 11.1 Build 20100203 Package ID: w_cproc_p_11.1.060
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
main.cpp
C:\develop\bug\25731\main.cpp(20): (col. 5) remark: LOOP WAS VECTORIZED.
C:\develop\bug\25731\main.cpp(24): (col. 10) remark: LOOP WAS VECTORIZED.
C:\develop\bug\25731\main.cpp(30): (col. 5) remark: LOOP WAS VECTORIZED.
To compile on windows I add one more header file for "stdint.h". Attached my build files.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here's result for Intel64:
C:\develop\bug\25731>icl /c /Qrestrict /Qvec-report2 main.cpp
Intel C++ Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100203 Package ID: w_cproc_p_11.1.060
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
main.cpp
C:\develop\bug\25731\main.cpp(20): (col. 5) remark: LOOP WAS VECTORIZED.
C:\develop\bug\25731\main.cpp(24): (col. 10) remark: LOOP WAS VECTORIZED.
C:\develop\bug\25731\main.cpp(30): (col. 5) remark: LOOP WAS VECTORIZED.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Yolanda.
Are you able to vectorize a more complicated expression (on windows)? Something like:
[cpp]for(i=min;i<=max;++i)In my case I'm not able to do it. Thank for your help.
array_copy(i)=array(i);[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, I cannot vectorize this on Windows.
Thanksfor raising the problem. I'll investigate this and get back to you laterwith anupdate.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
http://software.intel.com/en-us/forums/showthread.php?t=70820
-Jeff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This can also get vectorized by the latest Intel C++ Compiler for Windows 11.1.060.
I modify your main program as simple as:
[cpp]#include "array_1d.h" int main() { int64_t i,min,max,size; Array1darray; Array1d array_copy; float* vec_tmp; array.resize(Range(-20,10)); min=array.range1().min(); max=array.range1().max(); for(i=min;i<=max;++i) array_copy(i)=array(i); return 0; }
[/cpp]
Command output:
C:\\develop\\bug\\25731>icl main.cpp /Qvec-report1 /Qrestrict /QxSSSE3 /S
Intel C++ Compiler Professional for applications running on IA-32, Version 11.1 Build 20100203 Package ID: w_cproc_p_11.1.060 Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
main.cpp C:\\develop\\bug\\25731\\main.cpp(16): (col. 2) remark: LOOP WAS VECTORIZED.
Grep from assemler file:
[cpp].B1.16: ; Preds .B1.15 .B1.16 movaps xmm0, XMMWORD PTR [edi+eax*4] ;17.18 movaps xmm1, XMMWORD PTR [16+edi+eax*4] ;17.18 movaps XMMWORD PTR [-80+ebx+eax*4], xmm0 ;8.20 movaps XMMWORD PTR [-64+ebx+eax*4], xmm1 ;8.20[/cpp]
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Finally the intel compiler works much much better than I though (using -ansi-alias).
The following lines:
[cpp]for(i=min;i<=max;++i) array_copy(i)=array(i);[/cpp]
are interpreted as a memcpy. It is the raison why the compile does print vectorize message, but use _intel_fast_memcpy instead. Using "-S -fcode-asm" I could see much more informations. And I could try something like:
[cpp]for(i=min;i<=max;++i) array_copy(i)=2*array(i)+1;[/cpp]the code generated is actually vectorized.
So I will try to do more and more complicated complicated expression.
My final purpose is to try to describe a finite difference schema and to see if we can have roughtly the same performances compared to a fortran code.
Final purpose (example of finite difference schema):
[cpp]for(k=zmin,k<=zmax;++k) { for(j=ymin,j<=ymax;++j) { for(i=xmin,i<=xmax;++i) { lapx=coefx(-1)*u(i-1,j,k)+coefx(0)*u(i,j,k)+coefx(1)*u(i+1,j,k); lapy=coefy(-1)*u(i,j-1,k)+coefy(0)*u(i,j,k)+coefy(1)*u(i,j+1,k); lapz=coefz(-1)*u(i,j,k-1)+coefz(0)*u(i,j,k)+coefz(1)*u(i,j,k+1); u_update(i,j,k)=-beta*u_update(i,j,k)+alpha(i,j,k)*(lapx+lapy+lapz); } } }[/cpp]
Final question, I have tried to see what exactly is the impact of -ansi-alias but it not clear:
This option tells the compiler to assume that the program adheres to ISO C Standard aliasability rules.
does it mean that ONLY pointers qualified with restrict keyword is are concerned by the aliasability rule?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Presumably, the reason for not setting it as a default is that one major Windows compiler doesn't perform optimizations based on the aliasing rules.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Generally on linux people who do numerical computing and use intel compilers need this kind of optimization. Maybe you should think about activating this by default on linux?
Another suggestion, would it be possible to warn the user that a loop is converted into a intel_fast_memcpy when vec-report is activated, even on fortran compiler?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose the fast_memcpy notifications would be appropriate in opt-report. The next question is, what use do you intend to make of the information? If your data moves aren't big enough to benefit from the substitution, I fear it could be awkward to over-ride.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page