Community
cancel
Showing results for
Did you mean:
Highlighted
Beginner
12 Views

## loop is not vectorized

Hi,

I am experimenting with the auto vectorization functionality of Intel compiler. However, the following loop cannot be vectorized.

for (k=0; k < nz; k++) {

for (j=0; j < ny; j++) {

#pragma ivdep

for (i=0; i < nx; i++) {

ptr = i + j*nx + k*nx*ny;

p[ptr] = 1.0f + 0.1*drand48();

q[ptr] = 1.0f + 0.1*drand48();

}

}

}

In the program, both p and q are float arrays. The message I got for the loop from using -vec-report3 is:

remark: loop was not vectorized: existence of vector dependence

However, I cannot see any dependency in the loop. On the other hand, the loop can be vectorized if I don't use the random generator. So is it safe to say that loops with random generators cannot be vectorized?

8 Replies
Highlighted
Black Belt
12 Views
Vectorization means the operation can be performed within the SSE/AVX instruction set (or is an intrinsic function known by the compiler).

Functions that can be inlined, might also maintain vectorization in the caller.

Look for a function that returns alist of random numbers (fortran has such a function).
Generate a table of 2*nx random numbers say into YourRanTable[2*nx]

double* YourRanTable = alloca(2*nx*sizeof(double));
GenerateYourRanTable(YourRanTable, 2*nx);
...
#pragma ivdep

for (i=0; i < nx; i++) {

ptr = i + j*nx + k*nx*ny;

p[ptr] = 1.0f + 0.1*YourRanTable[i*2];

q[ptr] = 1.0f + 0.1*YourRanTable[i*2+1];

}

Jim Dempsey

Highlighted
Black Belt
12 Views

Also consider the following code change:

int _ptr = j*nx + k*nx*ny;
__restrict double* p_ptr = &p[_ptr];
__restrict double* q_ptr = &q[_ptr];
for (i=0; i < nx; i++) {

p_ptr = 1.0f + 0.1*YourRanTable[i*2];

q_ptr = 1.0f + 0.1*YourRanTable[i*2+1];

}

Jim Dempsey

Highlighted
Black Belt
12 Views
maybe make that
double * __restrict

If using a .c source file, with option -std=c99, the spelling is restrict. That C99 spelling also works with -restrict, without the c99, even with icpc. The frequently used spelling __restrict isn't mentioned in Intel docs, although likely to work.
Highlighted
Beginner
12 Views
Jim,

Would it not be better to split the loop in your example in the following way?

```[cpp]int _ptr = j * nx + k * nx * ny;
double * restrict p_ptr = &p[_ptr];
double * restrict q_ptr = &q[_ptr];

for (int i = 0; i < nx; ++i) {
p_ptr = 1.0 + 0.1 * YourRanTable;
}

double * restrict YRT2 = YourRanTable + nx;
for (int i = 0; i < nx; ++i) {
q_ptr = 1.0 + 0.1 * YRT2;
}[/cpp]```

So the stride is equal?
Highlighted
Black Belt
12 Views
So is it safe to say that loops with random generators cannot be vectorized?

There are a couple of interesting observations related to that. One is that a function that has zero arguments, if it is not trivial, is not a deterministic function. That is not hard for a compiler to infer.

Had you used, in place of drand48, some other function with at least one argument, and that argument did not vary with the loop iteration, and that function were not PURE, you might have needed to make that attribute known to the compiler to produce correct code.
Highlighted
Black Belt
12 Views

Using seperate loops should perform better

*** however

If the user were to take his original code and use the same random number generator and initial seed (sequence of pseudo random numbers determinant), then by splitting into multiple loops, thenp and q would take different regions of the sequence of random numbers, whereas the original code was taking alternate numbers in the sequence of random numbers.

Therefore, if the user has the requirement of reproducability between data generated before alteration and after alteration then the double loop method would produce a false failure in the integrity test.

This could be fixed by filling the table out differently

for(int i=0; i < nx; ++i)
{
ranTable = someRanNum();
ranTable[i+nx] = someRanNum();
}

Jim Dempsey

Highlighted
Beginner
12 Views
Jim,

They should be random, should they not? :-)
Highlighted
Black Belt
12 Views
Andreas,

With a few exceptions, random number generators produce a pseudo random number series. Given the same starting point, the random numbers generated will be identical.

Having the same set of random numbers available for each run permits the programmer to assure that when the random sequence is used for testing purposed then the results will be reproducable. Reproducability is important to have while you are making changes to your program.

When the (pseudo)random numbers are used for operational purposes in the program as opposed to use as test data, then you will want the random number generator to start at a different sequence point, randomly selected each time the program is run. For this, you can take the least 16 or 32 bits of the system timestamp counter, and use that for a starting seed. If you require more randomness then there are other random number generators to choose.