- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am experimenting with the auto vectorization functionality of Intel compiler. However, the following loop cannot be vectorized.
for (k=0; k < nz; k++) {
for (j=0; j < ny; j++) {
#pragma ivdep
for (i=0; i < nx; i++) {
ptr = i + j*nx + k*nx*ny;
p[ptr] = 1.0f + 0.1*drand48();
q[ptr] = 1.0f + 0.1*drand48();
}
}
}
In the program, both p and q are float arrays. The message I got for the loop from using -vec-report3 is:
remark: loop was not vectorized: existence of vector dependence
However, I cannot see any dependency in the loop. On the other hand, the loop can be vectorized if I don't use the random generator. So is it safe to say that loops with random generators cannot be vectorized?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Functions that can be inlined, might also maintain vectorization in the caller.
Look for a function that returns alist of random numbers (fortran has such a function).
Generate a table of 2*nx random numbers say into YourRanTable[2*nx]
double* YourRanTable = alloca(2*nx*sizeof(double));
GenerateYourRanTable(YourRanTable, 2*nx);
...
#pragma ivdep
for (i=0; i < nx; i++) {
ptr = i + j*nx + k*nx*ny;
p[ptr] = 1.0f + 0.1*YourRanTable[i*2];
q[ptr] = 1.0f + 0.1*YourRanTable[i*2+1];
}
Jim Dempsey- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also consider the following code change:
int _ptr = j*nx + k*nx*ny;
__restrict double* p_ptr = &p[_ptr];
__restrict double* q_ptr = &q[_ptr];
for (i=0; i < nx; i++) {
p_ptr = 1.0f + 0.1*YourRanTable[i*2];
q_ptr = 1.0f + 0.1*YourRanTable[i*2+1];
}
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
double * __restrict
If using a .c source file, with option -std=c99, the spelling is restrict. That C99 spelling also works with -restrict, without the c99, even with icpc. The frequently used spelling __restrict isn't mentioned in Intel docs, although likely to work.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Would it not be better to split the loop in your example in the following way?
[cpp]int _ptr = j * nx + k * nx * ny; double * restrict p_ptr = &p[_ptr]; double * restrict q_ptr = &q[_ptr]; for (int i = 0; i < nx; ++i) { p_ptr = 1.0 + 0.1 * YourRanTable; } double * restrict YRT2 = YourRanTable + nx; for (int i = 0; i < nx; ++i) { q_ptr = 1.0 + 0.1 * YRT2; }[/cpp]
So the stride is equal?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are a couple of interesting observations related to that. One is that a function that has zero arguments, if it is not trivial, is not a deterministic function. That is not hard for a compiler to infer.
Had you used, in place of drand48, some other function with at least one argument, and that argument did not vary with the loop iteration, and that function were not PURE, you might have needed to make that attribute known to the compiler to produce correct code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using seperate loops should perform better
*** however
If the user were to take his original code and use the same random number generator and initial seed (sequence of pseudo random numbers determinant), then by splitting into multiple loops, thenp and q would take different regions of the sequence of random numbers, whereas the original code was taking alternate numbers in the sequence of random numbers.
Therefore, if the user has the requirement of reproducability between data generated before alteration and after alteration then the double loop method would produce a false failure in the integrity test.
This could be fixed by filling the table out differently
for(int i=0; i < nx; ++i)
{
ranTable = someRanNum();
ranTable[i+nx] = someRanNum();
}
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
They should be random, should they not? :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With a few exceptions, random number generators produce a pseudo random number series. Given the same starting point, the random numbers generated will be identical.
Having the same set of random numbers available for each run permits the programmer to assure that when the random sequence is used for testing purposed then the results will be reproducable. Reproducability is important to have while you are making changes to your program.
When the (pseudo)random numbers are used for operational purposes in the program as opposed to use as test data, then you will want the random number generator to start at a different sequence point, randomly selected each time the program is run. For this, you can take the least 16 or 32 bits of the system timestamp counter, and use that for a starting seed. If you require more randomness then there are other random number generators to choose.
The short answer to your question is:
The return values from the random number generator should not be random. Instead, they should be reproducable pseudo random numbers.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page