How can I parallelize implicit loop ?

Zvi_D_Intel · ‎02-09-2014

I have the loop, inside its body running the function with array member (dependent on loop index) as an argument, and returning one value.
I can parallelized this loop by using cilk_for() operator instead of regular for() - and it is simple and works well.  This is explicit parallelization.  
Instead of explicit loop instruction I can use Array Notation contruction (as shown below) - it is implicit loop.
My routine is relatively long and complecs, and has Array Notation constructions inside, so it cannot be declared as a vector (elemental) one.
When I use implicit loop - it is not parallelized, the run time is increased substantially.
 
float foo(float f_in)
{
 float f_result;
 // LONG computation containing CILK+ Array Notation operations

 /////////////////////////////////////////////////////////
 return f_result;
}

int main()
{
 float af_in, af_out;

// Explicit parallelized loop
 cilk_for(int i=0; i<n; i++)
  af_out =  foo(af_in);

// Implicit non-parallelized loop
 af_out[:] =  foo(af_in[:]);
}

My question is: does somebody know, if there is the way "to say" to compiler, that my implicit loop (Array Notation assignment) has independent steps and should be parallelized (pragma, something else) ?

Barry_T_Intel · ‎02-11-2014

Have you tried #pragma simd? Essentially that tells the compiler that the loop should be vectorized, even if the auto vectorization fails.

- Barry