Cannot vectorize simple loop: "dereference too complex"

quince · ‎09-01-2005

In the code below, vol is a float array, member of a class. In a function of the class, with one parameter being dim, I have this:

long const ddd(dim * dim * dim);
...
...
long h;
#pragma ivdep
#pragma vector always
for (h = 0; h < ddd; ++h) vol = 0.0f;
...
...

Compiling, I get:
(col. 28) remark: loop was not vectorized: dereference too complex.

Similar result with a loop such as (scale being a float):
for (h = 0; h < dd6; ++h) rays *= scale;

Message Edited by Quince on 08-31-2005 08:54 PM

Intel_C_Intel · ‎09-01-2005

Hi Quince,

Italways helps if you give more context (full program fragment, compiler version, switches used etc). Something like this (no pragmas required!):

class Vec {
float vol[1000];
public:
void set(int dim) {
long const ddd(dim * dim * dim);
long h;
for (h = 0; h < ddd; ++h) vol = 0.0f;
}
};

main() {
Vec x;
x.set(10);
}

Vectorizes both on Windows (8.1 and up) as well as on Linux:

icl -QxP vec.cpp
vec.cpp(14) : (col. 5) remark: LOOP WAS VECTORIZED.

scxl1:abik:5> icc -xP vec.cpp
vec.cpp(14) : (col. 5) remark: LOOP WAS VECTORIZED.

Aart Bik
http://www.aartbik.com/

Message Edited by abik on 09-01-2005 09:44 AM

quince · ‎09-02-2005

Sorry I didn't post more program context. Here it is (sorry about formatting, I haven't figured out how to do it in this forum yet, all indentation disappears):

art.h file:
~~~~~~~~
class ART
{
public:
ART(float *const, float const *const, long const, int const, float const); // constructor
// ... // other functions
private:
float *vol;
// ... // other members
};

// ... // inlined functions

art.cpp file:
~~~~~~~~~~
// ... // includes

ART::ART(float *const nrays, float const *const nfilt, long const ndim, int const nni, float diam) : rays(nrays), filt(nfilt), dim(ndim), ni(nni), cnt(nni)
{
long const ddd(dim * dim * dim);
imin = 0;
vol = 0;
try
{
if (!(vol = new float[ddd])) throw "Couldn't allocate memory for volume data";
// ... // Other allocations
}
catch(const char e[])
{
cerr << e << endl;
good = false;
return;
}

long h;
for (h = 0; h < ddd; ++h) vol = 0.0f;

// ... // other constructor stuff
}

// ... // other functions

In file with main(), constructed by:
ART art(rays, filt, DIM, NI, DIAM);

Compilation options:
icpc -Wall -ip -O3 -no-prec-div -xN -ssp -parallel -fomit-frame-pointer -pipe

Also, I'm linking with OpenEXR to read high dynamic range images, so in the linking line I have:
-I../OpenEXR/include -L../OpenEXR/lib-linux -lImath -lIlmImf-lIex -lHalf -lz
...but I don't see why that should make a difference.

Message Edited by Quince on 09-02-2005 02:50 AM

Intel_C_Intel · ‎09-02-2005

Dear Quince,

Can you please send me a"stand-alone compilable" version of art.cpp and art.h by email (aart.bik@intel.com)? I am not sure if it is the formatting, but many details seem lost.

Aart

quince · ‎09-02-2005

OK, I'll email you the code. Thanks for the help.

By the way, I just did a make clean and rebuilt the whole thing; using -vec_report5 on a full build gives me this error:
icpc: error: Fatal error in .../icc9/bin/mcpcom, terminated by segmentation violation compilation aborted for read.cpp (code 1)

(read.cpp is another part of the program that reads files) I see this error was discussed for v8 of the compiler, but I'm using v9... I don't see how it can be a memory issue, when the machine I'm building on has 3 GB RAM.

Intel_C_Intel · ‎09-02-2005

Dear Quince,

Thank you for the example by email, that was very helpful to diagnose the vectorization failure. By default, indeed no vectorization occurs. In cases like this, the vectorization diagnostics may give some more insight, as shown below with a simplified switch set:

scxl1:abik:27> icc -xN -vec_report2 -c art.cpp
art.cpp(30) : (col. 2) remark: loop was not vectorized: existence of vector dependence.
art.cpp(35) : (col. 2) remark: loop was not vectorized: existence of vector dependence.
art.cpp(39) : (col. 2) remark: loop was not vectorized: unsupported loop structure.
art.cpp(41) : (col. 2) remark: loop was not vectorized: unsupported loop structure.

Here, data dependences are reported at lines 30 and 35, which are the two loops you are interested in. The reason for these non-intuitive data dependence is due to a rather pedantic aliasing assumption made by the compiler. Luckily, by simply stating that the program adheres to the ANSI standard, type-based disambiguation eliminates the problem:

scxl1:abik:34> icc -xN -vec_report2 -c -ansi_alias art.cpp
art.cpp(30) : (col. 2) remark: LOOP WAS VECTORIZED.
art.cpp(35) : (col. 2) remark: LOOP WAS VECTORIZED.
art.cpp(39) : (col. 2) remark: loop was not vectorized: unsupported loop structure.
art.cpp(41) : (col. 2) remark: loop was not vectorized: unsupported loop structure.

Now two loops remain to be investigated. The loop at line 41 is not a vector loop candidate, so lets move our focus to the loop at line 39, where imin is a global pointer to int.

for (i = 0; i < ni; ++i) imin = i;

Unfortunately, because this loop deals with integer data, -ansi_alias does not help. Below, I give a form that works around all pedantic aliasing assumptions made by the compiler:

{ int *ptr = imin, lni = ni; for (i = 0; i < lni; ++i) ptr = i; }

After this rewrites, all vector loop candidates actually vectorize:

scxl1:abik:70> icc -xN -vec_report2 -c -ansi_alias art.cpp
art.cpp(30) : (col. 2) remark: LOOP WAS VECTORIZED.
art.cpp(35) : (col. 2) remark: LOOP WAS VECTORIZED.
art.cpp(39) : (col. 36) remark: LOOP WAS VECTORIZED.
art.cpp(41) : (col. 2) remark: loop was not vectorized: unsupported loop structure.

I hope you will find this investigation useful. For much more background on automatic vectorization, please see the online article:
http://www.intel.com/cd/ids/developer/asmo-na/eng/65774.htm

and The Software Vectorization Handbook:
http://www.intel.com/intelpress/sum_vmmx.htm

Aart Bik
http://www.aartbik.com/

quince · ‎09-03-2005

Thank you kindly! It works now, and #pragma ivdep managed to get loops in the test.cpp also to vectorize. I have just one more question, about code with conditionals. Would the compiler automatically do something like the following example I'm copying off some site, or is some sort of hint needed?
a =(a < b) ? c :d;
to --
cmpps xmm0, xmm1, 1;
movaps xmm2, xmm0;
andps xmm0, xmm3;
andnps xmm2, xmm4;
orps xmm0, xmm2;
(source: http://www.x86.org/articles/sse_pt2/simd2.htm bottom of page)
The reason I'm asking is since I'm walking rays through voxels and I get conditional branches to keep within the array borders. The only idea I had is masking, to bitwise & the float results with 0x0 when outside the borders (compilation of this works if the float is in a union with a long).

Intel_C_Intel · ‎09-03-2005

But of course J. Given sample code like:

float a, b, c, d;
void doit(void) {
int i;
for (i = 0; i < N; i++) {
a = (a < b) ? c : d;
}
}

After automatic vectorization:

scxl1:abik:16> icc -c -S -xP -unroll0 bra.cpp
bra.cpp(8) : (col. 3) remark: LOOP WAS VECTORIZED.

The assembly looks like:

..B1.2:
movaps a(%eax), %xmm0 #9.14
movaps c(%eax), %xmm2 #9.29
cmpltps b(%eax), %xmm0 #9.21
movaps %xmm0, %xmm1 #9.6
andnps d(%eax), %xmm1 #9.6
andps %xmm0, %xmm2 #9.29
orps %xmm1, %xmm2 #9.29
movaps %xmm2, a(%eax) #9.6
addl $16, %eax #8.3
cmpl $256, %eax #8.3
jb ..B1.2

One caveat: in real-life, the compiler may be worried about moving out-of-bounds or other exceptions into the always-taken path. In those cases, simply add a pragma vector always to override such pedantic analysis. Such tricks are discussed in the documentation I just gave you.