- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have the loop, inside its body running the function with array member (dependent on loop index) as an argument, and returning one value.
I can parallelized this loop by using cilk_for() operator instead of regular for() - and it is simple and works well. This is explicit parallelization.
Instead of explicit loop instruction I can use Array Notation contruction (as shown below) - it is implicit loop.
My routine is relatively long and complecs, and has Array Notation constructions inside, so it cannot be declared as a vector (elemental) one.
When I use implicit loop - it is not parallelized, the run time is increased substantially.
float foo(float f_in)
{
float f_result;
// LONG computation containing CILK+ Array Notation operations
/////////////////////////////////////////////////////////
return f_result;
}
int main()
{
float af_in, af_out;
// Explicit parallelized loop
cilk_for(int i=0; i<n; i++)
af_out = foo(af_in);
// Implicit non-parallelized loop
af_out[:] = foo(af_in[:]);
}
My question is: does somebody know, if there is the way "to say" to compiler, that my implicit loop (Array Notation assignment) has independent steps and should be parallelized (pragma, something else) ?
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you tried #pragma simd? Essentially that tells the compiler that the loop should be vectorized, even if the auto vectorization fails.
- Barry
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page