Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7944 Discussions

Regarding parallel dependency in case of nested loops

venkatakiran_myahoo_
363 Views
Hi,

Could somebody please give some idea how the intel c++ compiler deals with parallelizing nested loops.

And let me know how to deal with nested loops, in case of applying parallelization with '-parallel'.

Because no matter what changes i make, to make the loops parallelizable, the compiler is throwing some remarks saying 'loop cannot be parallelized' because of FLOW, ANTI, OUTPUT dependencies between statements.

Thanks in advance.

Regards
Kiran.
0 Kudos
5 Replies
TimP
Honored Contributor III
363 Views
I believe the compiler performs some automatic inner/outer loop swaps in an attempt to move a loop without dependencies, but with a large enough expected trip count, to the outside. The information given in opt-report isn't always helpful, but it should be much better than what you have given us. An example seems needed to move forward with this.
0 Kudos
venkatakiran_myahoo_
363 Views
Quoting - tim18
I believe the compiler performs some automatic inner/outer loop swaps in an attempt to move a loop without dependencies, but with a large enough expected trip count, to the outside. The information given in opt-report isn't always helpful, but it should be much better than what you have given us. An example seems needed to move forward with this.


Thanks Tim... for the reply..
Here i am giving the sample program with complete compiler remarks output.

As i mentioned in the other post an example program which i'm giving here for your reference:

I have a program to reverse a string

ex: "Its a nice intel compiler"(I/P) -> "compiler intel nice a Its"(O/P)

The logic is to reverse the words in the string first in the 1st nested loop and then reversing the complete string as a whoel in the second loop.

I am trying to compile the code for this with intel c++ compiler (icpc) with '-parallel' option. Below is the code:
with the command:
icpc -parallel -par-report=3 stringrev_.cpp

1 #include
2 #include
3
4 char s[]="proud to be indian";
5 char temp;
6
7 int main()
8 {
9 int i=0, j=0, k=0;
10 int size = strlen(s);
11 int a[] = {0,6,9,12};
12 int b[] = {4,7,10,17};
13
14 /**********************THE ERROR BLOCK***************************/
15 //#pragma nounroll
16 while(i<4)
17{
18 k = b;
19 j = a;
20 //#pragma nounroll
21 while(j22 {
23 temp = s;
24 s=s;
25 s=temp;
26 k--;
27 j++;
28 }
29 i++;
30 }
31/*************************END BLOCK*****************************/
32 printf("the string is: %sn",s);
33 i=0;
34 j = size-1;
35/***********************NO ERRORS SHOWN*****************/
36 while(i < j)
37 {
38 temp = s;
39 s = s;
40 s = temp;
41 i++;
42 j--;
43 }
/****************************END***************************/

I am facing some compiler errors as :

procedure: main
stringrev_.cpp(16): (col. 2) remark: loop was not parallelized: existence of parallel dependence.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 25, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 24, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 25, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 24, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 25, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 25, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 25, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 25, and s line 25.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 24, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 25, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 24, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven OUTPUT dependence between s line 25, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 24, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 24.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven OUTPUT dependence between temp line 23, and temp line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven OUTPUT dependence between temp line 23, and temp line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 24.
stringrev_.cpp(24): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 24, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 23.
stringrev_.cpp(23): (col. 5) remark: parallel dependence: proven ANTI dependence between s line 23, and s line 25.
stringrev_.cpp(25): (col. 5) remark: parallel dependence: proven FLOW dependence between s line 25, and s line 23.

0 Kudos
TimP
Honored Contributor III
363 Views
As far as I can see, you're hoping to have multiple threads reading and writing the same data region. That meets the definition of data race, and the compiler is obligated to report dependencies.
0 Kudos
venkatakiran_myahoo_
363 Views
Quoting - tim18
I believe the compiler performs some automatic inner/outer loop swaps in an attempt to move a loop without dependencies, but with a large enough expected trip count, to the outside. The information given in opt-report isn't always helpful, but it should be much better than what you have given us. An example seems needed to move forward with this.

I've gone through the document Compiler optimizations, I could see that there are some optimizations that compiler can do on a nested loop as below:
- Loop Interchange
I think this option should not be applied here, cause that doesnt make sense to interchange in this case.

- Unrolling
I have tried with the pragma '#pragma nounroll' before the nested loop starts, but it is giving the same problem. So I am thinking this is not applied by compiler.

- Cache Blocking
No Idea if this is applicable here.

- Loop Distribution
I think this is not applicable here, cause outer loop has only 4 iterations.

- Loop Fusion
Shouldnt be applied

What do you think about the above, if any one of them could have applied?

0 Kudos
jimdempseyatthecove
Honored Contributor III
363 Views

Your code is not performing what your comments state.

Your comments state to the effect reverse word order
Your first while loop is performing keep word order same, swap letter order within word.

Which is it?


Parallizing a loop of a few bytes within each word (byte reversal of words) is hardly worth parallization.
Parallizing a loop of per word, byte reversal within word is worth parallization.

For word swap (no byte reversal) you wold required an extra buffer of at least the size of the largest word (+/- one byte) or use larger second buffer. Not doing so will result in overstriking words in process of being read.

a bb ccc ... zzzzzzzzzzzzzzzzzzzzzzzzzz

1) z bb ccc ... zzzzzzzzzzzzzzzzzzzzzzzzza

Notice you wacked last letter of last word, when you complete new 1st word, it will contain overstruck letters of last word (not prior contents of last word).

Jim Dempsey

0 Kudos
Reply