Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Guillaume_S_
Beginner
39 Views

struggling to vectorize code

Hey everyone, 

 

I'm looking for help about loop vectorization. I'm trying to vectorize and optimize some loop but I don't understand what I'm doing wrong. The compiler can not vectorize because of FLOW and ANTI dependency. I thought I could remove it by doing a change in the code cf. "my attempt code" but this is not working. Can someone explain me why ?

I can not post the whole code becasue this is too large: (knowing that "i,j" are indices  and "np, npa, cX, K, t, exprdt" are constants.). Basically, I make some computation with an input vector S, the output is the vector P (which is an input too ~ updating values).

The original code:

			#pragma omp parallel for num_threads(nThreads) schedule(auto)	
			for( int k(i*npa); k<(i+1)*npa; ++k )
			{
				double CT = 0., tmp;
							
				double s_ = s[j*np+k];
				double st_ = s_ * t;
				CT += c1 + s_ *( c2 + s_ * c3 );
					
					if(p>3)
						CT += t * c4 + st_ * ( c5 + s_ * c6 );
				CT *= exprdt;
					
				tmp = K > s_ ? K-s_: 0. ;
				
				if( tmp <= 1.0e-8 || tmp <= CT )
					P[j*np+k] = exprdt * P[(j+1)*np+k];
				else
					P[j*np+k] = tmp;
			}


My Attempt:

#pragma omp parallel num_threads(nThreads)
			{
				int iD = omp_get_thread_num();
				int gd = npa / nThreads;
				int dg = (iD==nThreads-1? npa%nThreads:0);
				
				double *Aptr, *Bptr, *Cptr;

				Aptr = (double*)malloc((gd+dg)*sizeof(double));
				Bptr = (double*)malloc((gd+dg)*sizeof(double));
				Cptr = (double*)malloc((gd+dg)*sizeof(double));

				memcpy( Cptr, s+j*np + i*npa+iD*gd, (gd+dg)*sizeof(double));
				memcpy( Bptr, P + j *np+np + i * npa + iD*gd , (gd+dg) *sizeof(double));
				
				for( int l =0; l<gd+dg; ++l )               // 671
				{
					double CT = 0., tmp;
				
					double s_ =Cptr;                          // 675
					double st_ = s_ * t;
					
					CT += c1 + s_ *( c2 + s_ * c3 );
					
					if(p>3)
						CT += t * c4 + st_ * ( c5 + s_ * c6 );
					
					CT *= exprdt;
						
					tmp = K > s_ ? K-s_: 0. ;
										
									if( tmp <= 1.0e-8 || tmp <= CT )
						Aptr = exprdt * Bptr;                 // 690
					else
						Aptr = tmp;                              // 692
				}
			
			        memcpy( P+j*np + i*npa + iD*gd, Aptr, (dg+gd) *sizeof(double));

			        free(Aptr); free(Bptr); free(Cptr);
			}

And here's the compilator report about vectorization:

(671): (col. 5) remark: loop was not vectorized: existence of vector dependence
(690): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 690 and  line 675
(675): (col. 17) remark: vector dependence: assumed ANTI dependence between  line 675 and  line 690
(690): (col. 7) remark: vector dependence: assumed ANTI dependence between  line 690 and  line 692
(692): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 692 and  line 690
(690): (col. 7) remark: vector dependence: assumed ANTI dependence between  line 690 and  line 690
(690): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 690 and  line 690
(690): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 690 and  line 690
(690): (col. 7) remark: vector dependence: assumed ANTI dependence between  line 690 and  line 690
(692): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 692 and  line 675
(675): (col. 17) remark: vector dependence: assumed ANTI dependence between  line 675 and  line 692
(692): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 692 and  line 690
(690): (col. 7) remark: vector dependence: assumed ANTI dependence between  line 690 and  line 692
(675): (col. 17) remark: vector dependence: assumed ANTI dependence between  line 675 and  line 692
(692): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 692 and  line 675
(675): (col. 17) remark: vector dependence: assumed ANTI dependence between  line 675 and  line 690
(690): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 690 and  line 675

0 Kudos
3 Replies
TimP
Black Belt
39 Views

If using c++ prior to icpc 16.0, max(K-s__,0.) should help.

It looks like you would need either * __restrict Aptr or pragma ivdep or simd.  Agreed that your local definitions should not require that, but the compiler diagnostics indicate that aliasing is assumed.

Even then it's a stretch.

Guillaume_S_
Beginner
39 Views

Yes with pragma simd, it is vectorizing but I just wanted to understand why with the local definitions I had the same problem. Anyway, I'm gonna look at compiler version and try with the max.

Thank you for your response and help.

Guillaume,

Kittur_G_Intel
Employee
39 Views

Hi,
Tim's response is correct in that aliasing is assumed by the compiler and the use of restrict qualifier should help the vectorize the code.
_Kittur