- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
long const ddd(dim * dim * dim);
...
...
long h;
#pragma ivdep
#pragma vector always
for (h = 0; h < ddd; ++h) vol
...
...
Compiling, I get:
(col. 28) remark: loop was not vectorized: dereference too complex.
Similar result with a loop such as (scale being a float):
for (h = 0; h < dd6; ++h) rays
Message Edited by Quince on 08-31-2005 08:54 PM
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Quince,
Italways helps if you give more context (full program fragment, compiler version, switches used etc). Something like this (no pragmas required!):
class Vec {
float vol[1000];
public:
void set(int dim) {
long const ddd(dim * dim * dim);
long h;
for (h = 0; h < ddd; ++h) vol
}
};
main() {
Vec x;
x.set(10);
}
Vectorizes both on Windows (8.1 and up) as well as on Linux:
scxl1:abik:5> icc -xP vec.cpp
vec.cpp(14) : (col. 5) remark: LOOP WAS VECTORIZED.
Aart Bik
http://www.aartbik.com/
Message Edited by abik on 09-01-2005 09:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
art.h file:
~~~~~~~~
class ART
{
public:
ART(float *const, float const *const, long const, int const, float const); // constructor
// ... // other functions
private:
float *vol;
// ... // other members
};
// ... // inlined functions
art.cpp file:
~~~~~~~~~~
// ... // includes
ART::ART(float *const nrays, float const *const nfilt, long const ndim, int const nni, float diam) : rays(nrays), filt(nfilt), dim(ndim), ni(nni), cnt(nni)
{
long const ddd(dim * dim * dim);
imin = 0;
vol = 0;
try
{
if (!(vol = new float[ddd])) throw "Couldn't allocate memory for volume data";
// ... // Other allocations
}
catch(const char e[])
{
cerr << e << endl;
good = false;
return;
}
long h;
for (h = 0; h < ddd; ++h) vol
// ... // other constructor stuff
}
// ... // other functions
In file with main(), constructed by:
ART art(rays, filt, DIM, NI, DIAM);
Compilation options:
icpc -Wall -ip -O3 -no-prec-div -xN -ssp -parallel -fomit-frame-pointer -pipe
Also, I'm linking with OpenEXR to read high dynamic range images, so in the linking line I have:
-I../OpenEXR/include -L../OpenEXR/lib-linux -lImath -lIlmImf-lIex -lHalf -lz
...but I don't see why that should make a difference.
Message Edited by Quince on 09-02-2005 02:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By the way, I just did a make clean and rebuilt the whole thing; using -vec_report5 on a full build gives me this error:
icpc: error: Fatal error in .../icc9/bin/mcpcom, terminated by segmentation violation compilation aborted for read.cpp (code 1)
(read.cpp is another part of the program that reads files) I see this error was discussed for v8 of the compiler, but I'm using v9... I don't see how it can be a memory issue, when the machine I'm building on has 3 GB RAM.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Quince,
Thank you for the example by email, that was very helpful to diagnose the vectorization failure. By default, indeed no vectorization occurs. In cases like this, the vectorization diagnostics may give some more insight, as shown below with a simplified switch set:
scxl1:abik:27> icc -xN -vec_report2 -c art.cpp
art.cpp(30) : (col. 2) remark: loop was not vectorized: existence of vector dependence.
art.cpp(35) : (col. 2) remark: loop was not vectorized: existence of vector dependence.
art.cpp(39) : (col. 2) remark: loop was not vectorized: unsupported loop structure.
art.cpp(41) : (col. 2) remark: loop was not vectorized: unsupported loop structure.
Here, data dependences are reported at lines 30 and 35, which are the two loops you are interested in. The reason for these non-intuitive data dependence is due to a rather pedantic aliasing assumption made by the compiler. Luckily, by simply stating that the program adheres to the ANSI standard, type-based disambiguation eliminates the problem:
scxl1:abik:34> icc -xN -vec_report2 -c -ansi_alias art.cpp
art.cpp(30) : (col. 2) remark: LOOP WAS VECTORIZED.
art.cpp(35) : (col. 2) remark: LOOP WAS VECTORIZED.
art.cpp(39) : (col. 2) remark: loop was not vectorized: unsupported loop structure.
art.cpp(41) : (col. 2) remark: loop was not vectorized: unsupported loop structure.
Now two loops remain to be investigated. The loop at line 41 is not a vector loop candidate, so lets move our focus to the loop at line 39, where imin is a global pointer to int.
for (i = 0; i < ni; ++i) imin = i;
Unfortunately, because this loop deals with integer data, -ansi_alias does not help. Below, I give a form that works around all pedantic aliasing assumptions made by the compiler:
{ int *ptr = imin, lni = ni; for (i = 0; i < lni; ++i) ptr = i; }
After this rewrites, all vector loop candidates actually vectorize:
scxl1:abik:70> icc -xN -vec_report2 -c -ansi_alias art.cpp
art.cpp(30) : (col. 2) remark: LOOP WAS VECTORIZED.
art.cpp(35) : (col. 2) remark: LOOP WAS VECTORIZED.
art.cpp(39) : (col. 36) remark: LOOP WAS VECTORIZED.
art.cpp(41) : (col. 2) remark: loop was not
vectorized: unsupported loop structure.
I hope you will find this investigation useful. For much more background on automatic vectorization, please see the online article:
http://www.intel.com/cd/ids/developer/asmo-na/eng/65774.htm
and The Software Vectorization Handbook:
http://www.intel.com/intelpress/sum_vmmx.htm
Aart Bik
http://www.aartbik.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a =(a < b) ? c :d;
to --
cmpps xmm0, xmm1, 1;
movaps xmm2, xmm0;
andps xmm0, xmm3;
andnps xmm2, xmm4;
orps xmm0, xmm2;
(source: http://www.x86.org/articles/sse_pt2/simd2.htm bottom of page)
The reason I'm asking is since I'm walking rays through voxels and I get conditional branches to keep within the array borders. The only idea I had is masking, to bitwise & the float results with 0x0 when outside the borders (compilation of this works if the float is in a union with a long).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But of course J. Given sample code like:
float a
int i;
for (i = 0; i < N; i++) {
a = (a < b) ? c : d;
}
}
After automatic vectorization:
scxl1:abik:16> icc -c -S -xP -unroll0 bra.cpp
bra.cpp(8) : (col. 3) remark: LOOP WAS VECTORIZED.
The assembly looks like:
..B1.2:
movaps a(%eax), %xmm0 #9.14
movaps c(%eax), %xmm2 #9.29
cmpltps b(%eax), %xmm0 #9.21
movaps %xmm0, %xmm1 #9.6
andnps d(%eax), %xmm1 #9.6
andps %xmm0, %xmm2 #9.29
orps %xmm1, %xmm2 #9.29
movaps %xmm2, a(%eax) #9.6
addl $16, %eax #8.3
cmpl $256, %eax #8.3
jb ..B1.2
One caveat: in real-life, the compiler may be worried about moving out-of-bounds or other exceptions into the always-taken path. In those cases, simply add a pragma vector always to override such pedantic analysis. Such tricks are discussed in the documentation I just gave you.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page