Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Livshin__David
Beginner
89 Views

code that may not be directly parallelized by available tools ( e.g. OpenMP )

 

Working on autoparallelizer and looking for benchmarks that may not be directly parallelized by using tools for generations of parallel code ( e.g.
Intel Threading Tools, OpenMP )

Being on the subject, is it possible to parallelize the following code using these tools ( e.g. OpenMP )?

1.
double a[ARRAY_DIM], c[ARRAY_DIM];
.
double ret;
ret = 0.;
for ( i = 0; i < ARRAY_DIM; i++ )
{
c = exp( ret );
ret += a;
}

return ret;

2.
double a[ARRAY_DIM], c[ARRAY_DIM];
.
double ret;
ret = 0.;
for ( i = 0; i < ARRAY_DIM; i++ )
{
if ( a > 0.01 )
{
ret = c;
}
}

return ret;



Thank you in advance,

David
Tags (1)
0 Kudos
6 Replies
TimP
Black Belt
89 Views

Your first loop has unresolvable data dependency problems.  Even optimizing unroll may do little for it, in view of the expense of exp().   A lot of work has been done on parallelizing loops a bit like that just to show it might be done, without considering whether there is any gain in it.

The second loop is barely hiding a lot of redundant work.  If you would simply reverse it and break when you find the match, it would be a simple search, except that you may need to expand it explicitly since the linear search auto-vectorization of icc may not work backwards.  gcc wouldn't auto-vectorize it anyway.  There is no point in thinking about parallel until you have optimized it in single thread and determined whether it remains a bottleneck in your application.  The search would need to be quite long before a binary search could show an advantage over vectorized linear search.  Back in the early days of vectorization (e.g. Cray) it was fairly common to write loops which did not break, as the compilers could not optimize with break, but that was 20-40 years ago, and people sometimes ignored the fact that making a loop 10 times as long might not be justified by vectorization.  The computers which lost 99% of performance without vectorization or parallelization were rapid market failures.

 

Livshin__David
Beginner
89 Views

Thank you for your answer. I provided more detailed comment to your post on another site.

Actually I am sorry that I listed the above two code samples, as the most replies I get relate to this code when the main problem I like to resolve was and still is:

looking for benchmarks that may not be directly parallelized by using tools for generations of parallel code.

 

 

jimdempseyatthecove
Black Belt
89 Views

I suggest you examine Elusive Algorithms - Parallel Scan to see if the technique used there can be exploited by your code requirements.

Note: exp(x + y) = exp(x) . exp(y)

There may be some hope for you in parallelizing the code listed in first part of #1

Jim Dempsey

Livshin__David
Beginner
89 Views

Thank you.

Actually calling 'exp' is not a point - the point is how to resolve the data dependency between "c = exp( ret );" and "ret += a;" ( what if instead of 'exp' I will use another function that doesn't behave "nicely" as exp ). And in the context of my post, the point is "if OpenMP can handle this code".

I wrote autoparallelizer that easily parallelizes this code ( without relying on properties of exp ) - search for "not parallel on OpenMP" on www.dalsoft.com/documentation_auto_parallelization.html.

jimdempseyatthecove
Black Belt
89 Views

The article I referenced in #4 addresses how to vectorize and parallelize a class of codes that appear to have loop order dependencies. In the article case and your #1.1 case, they both have carry dependencies.

Jim Dempsey

Livshin__David
Beginner
89 Views

Curious if Intel's C compiler can autoparallelize the code samples in my posting.

 

Reply