Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

ivdep and assumed dependencies

Matthieu_Brucher
Beginner
660 Views
Hello,

I'm trying to use restrict in a C++ code, and I get assumed dependencies in the vectorization report. I've added a #pragma ivdep before the loop, but I still get :

global/computation.h(201): (col. 13) remark: loop was not vectorized: existence of vector dependence.
global/computation.h(203): (col. 15) remark: vector dependence: assumed FLOW dependence between (unknown) line 203 and model line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed ANTI dependence between model line 203 and (unknown) line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed FLOW dependence between (unknown) line 203 and model line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed ANTI dependence between model line 203 and (unknown) line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 203 and (unknown) line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 203 and (unknown) line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 203 and (unknown) line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 203 and (unknown) line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 203 and (unknown) line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 203 and (unknown) line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 203 and (unknown) line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 203 and (unknown) line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 203 and (unknown) line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 203 and (unknown) line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 203 and (unknown) line 203.
global/computation.h(203): (col. 15) remark: vector dependence: assumed OUTPUT dependence between (unknown) line 203 and (unknown) line 203.
...

Shouldn't it vectorize, as it's only assumed dependencies and as I've specified ivdep?
0 Kudos
10 Replies
Dale_S_Intel
Employee
660 Views
Any chance you could include a snippet of code to examine? It may help in giving you a reasonable explanation of what's happening.

Thanks!

Dale

0 Kudos
Matthieu_Brucher
Beginner
660 Views
Any chance you could include a snippet of code to examine? It may help in giving you a reasonable explanation of what's happening.

Thanks!

Dale

Hi !

Thanks for the answer ;)

I can't give a snippet right away, but I will try to explain the structure.
I have a big structure (it's the model variable here) which has several sub structures. Model and the substructures have smart pointers with a restrict pointer (modified boost::scoped_array with the restrict keyword added). Then, I have a 3D loop that makes a complicated kind of convolution (it's a finite difference stencil with several boolean template parameters and the derivation is also a template). This is the call at computation.h line 203.

I can't simplify the actual convolution, but I was wondering if the fact that the restricted pointers are in several nested structures, even if they are declared nested, may lead to the pragma not being effective...

I'll try to shorten the code tomorrow, if noone can give a search direction for this issue (or if noone confims that the hested structures may hinder the vectorization).

0 Kudos
Matthieu_Brucher
Beginner
660 Views
I've started to refactor the code, and I know now that the (Unknown) dependency is in fact model. So I have model dependeing on model, which I don't quite understand :|
0 Kudos
Matthieu_Brucher
Beginner
660 Views
I've started to refactor the code, and I know now that the (Unknown) dependency is in fact model. So I have model dependeing on model, which I don't quite understand :|
I have even a simpler function that exhibits the behavior. I've added it to the post.
I've compiled it with the following command line: icpc -O3 -g -xW -axP -vec-report=3 -restrict main.cpp 2> result

I got the atatched result.

Also, I've used 11.1.046.
0 Kudos
Dale_S_Intel
Employee
660 Views
I have even a simpler function that exhibits the behavior. I've added it to the post.
I've compiled it with the following command line: icpc -O3 -g -xW -axP -vec-report=3 -restrict main.cpp 2> result

I got the atatched result.

Also, I've used 11.1.046.

I'll take a look at it and let you know what I find.

Dale
0 Kudos
Matthieu_Brucher
Beginner
660 Views

I'll take a look at it and let you know what I find.

Dale
Thank you for your time!
0 Kudos
Dale_S_Intel
Employee
660 Views
It looks like are some problems with what we're printing out. In addition we should probably be able to figure this out even without #pragma ivdep. I'll file issues on these and respond back here when I have more info about when it might get fixed.

One comment, I noticed that you linearized the arrays (using the XYZ macro) to emulate a 3d array with 1d. I don't know if you did this because of C array limitations or for other reasons, but this can actually make it more difficult to optimize (you can lose some info, or at least assumptions when you do this). The ideal for the compiler is to have good old Fortran style arrays, with the assupmtion that you're not going to walk off the end of any one dimension.

Thanks for the test case!

Dale

0 Kudos
Matthieu_Brucher
Beginner
660 Views
It looks like are some problems with what we're printing out. In addition we should probably be able to figure this out even without #pragma ivdep. I'll file issues on these and respond back here when I have more info about when it might get fixed.

One comment, I noticed that you linearized the arrays (using the XYZ macro) to emulate a 3d array with 1d. I don't know if you did this because of C array limitations or for other reasons, but this can actually make it more difficult to optimize (you can lose some info, or at least assumptions when you do this). The ideal for the compiler is to have good old Fortran style arrays, with the assupmtion that you're not going to walk off the end of any one dimension.

Thanks for the test case!

Dale

You're welcome for the test case ;) Thank you for trying to address my issue.

You say that there are problems with what is printed. Does it means it should print something like it doesn't support this kind of loop? Without the structure, with only the P and Q pointers passed as parameters to the loop, it can vectorize the loop.

If I do not use my macro, how can I linearize as in Fortran? My arrays are in my real case allocated on the fly depending on the size of my problem. Fortran itself uses some kind of linearization, IIRC.
0 Kudos
Dale_S_Intel
Employee
660 Views
You're welcome for the test case ;) Thank you for trying to address my issue.

You say that there are problems with what is printed. Does it means it should print something like it doesn't support this kind of loop? Without the structure, with only the P and Q pointers passed as parameters to the loop, it can vectorize the loop.

If I do not use my macro, how can I linearize as in Fortran? My arrays are in my real case allocated on the fly depending on the size of my problem. Fortran itself uses some kind of linearization, IIRC.

Well, ideally it should "Loop vectorized" :-) What I mean is that it's assuming dependences that it ought not to, so it's not that the printing is the problem, it's the dependence checking that is too conservative. Unless I'm missing something, the fact that it's in a struct shouldn't inhibit vectorization, but for some reason the compiler is currently thrown off by that.

On the linearization front, I'm not sure you can do anything about it in C if you're dealing with arrays that vary in the sizes of their dimensions. In a well behaved Fortran program with multi-dimensioned arrays, you don't linearize it in the source code. Of course eventually it gets mapped to a 1-d string of memory locations, but you kind of lose some info if you rewrite the source that way. If it's in it's original 3-d form the compiler can more easily do dependence analysis. In any case, that probably won't matter in this case if you're using variable length arrays.

Thanks!

Dale

0 Kudos
Matthieu_Brucher
Beginner
660 Views

Well, ideally it should "Loop vectorized" :-) What I mean is that it's assuming dependences that it ought not to, so it's not that the printing is the problem, it's the dependence checking that is too conservative. Unless I'm missing something, the fact that it's in a struct shouldn't inhibit vectorization, but for some reason the compiler is currently thrown off by that.

On the linearization front, I'm not sure you can do anything about it in C if you're dealing with arrays that vary in the sizes of their dimensions. In a well behaved Fortran program with multi-dimensioned arrays, you don't linearize it in the source code. Of course eventually it gets mapped to a 1-d string of memory locations, but you kind of lose some info if you rewrite the source that way. If it's in it's original 3-d form the compiler can more easily do dependence analysis. In any case, that probably won't matter in this case if you're using variable length arrays.

Sorry for the late answer, I have touble getting on the Internet these days :|

For the first part of your answer, indeed, it should vectorize, as the restrict keyword is meant to say that it can do whatever it sees fit (at least according to my interpretation of the C99 standard).

I agree with you, Fortran is a better fit for 3D arrays, except that if I pass a structure with arrays as fields, I don't know if it can do the same analysis as when the arrays are passed as arguments. In C/C++, the restrict keyword adds some beauty in the code one can write ;)
0 Kudos
Reply