Your first loop has

Intel® Moderncode for Parallel Architectures

Support for developing parallel programming applications on Intel® Architecture.

code that may not be directly parallelized by available tools ( e.g. OpenMP )

1,620 Views

Working on autoparallelizer and looking for benchmarks that may not be directly parallelized by using tools for generations of parallel code ( e.g.

Intel Threading Tools, OpenMP )

Being on the subject, is it possible to parallelize the following code using these tools ( e.g. OpenMP )?

1.
double a[ARRAY_DIM], c[ARRAY_DIM];
.
double ret;
ret = 0.;
for ( i = 0; i < ARRAY_DIM; i++ )
{
c = exp( ret );
ret += a;
}

return ret;

2.
double a[ARRAY_DIM], c[ARRAY_DIM];
.
double ret;
ret = 0.;
for ( i = 0; i < ARRAY_DIM; i++ )
{
if ( a > 0.01 )
{
ret = c;
}
}

return ret;

Thank you in advance,

David

Link Copied

6 Replies

1,620 Views

Your first loop has unresolvable data dependency problems. Even optimizing unroll may do little for it, in view of the expense of exp(). A lot of work has been done on parallelizing loops a bit like that just to show it might be done, without considering whether there is any gain in it.

The second loop is barely hiding a lot of redundant work. If you would simply reverse it and break when you find the match, it would be a simple search, except that you may need to expand it explicitly since the linear search auto-vectorization of icc may not work backwards. gcc wouldn't auto-vectorize it anyway. There is no point in thinking about parallel until you have optimized it in single thread and determined whether it remains a bottleneck in your application. The search would need to be quite long before a binary search could show an advantage over vectorized linear search. Back in the early days of vectorization (e.g. Cray) it was fairly common to write loops which did not break, as the compilers could not optimize with break, but that was 20-40 years ago, and people sometimes ignored the fact that making a loop 10 times as long might not be justified by vectorization. The computers which lost 99% of performance without vectorization or parallelization were rapid market failures.

Copy link

1,620 Views

Thank you for your answer. I provided more detailed comment to your post on another site.

Actually I am sorry that I listed the above two code samples, as the most replies I get relate to this code when the main problem I like to resolve was and still is:

looking for benchmarks that may not be directly parallelized by using tools for generations of parallel code.

Copy link

1,620 Views

I suggest you examine Elusive Algorithms - Parallel Scan to see if the technique used there can be exploited by your code requirements.

Note: exp(x + y) = exp(x) . exp(y)

There may be some hope for you in parallelizing the code listed in first part of #1

Jim Dempsey

Copy link

1,620 Views

Thank you.

Actually calling 'exp' is not a point - the point is how to resolve the data dependency between "c = exp( ret );" and "ret += a;" ( what if instead of 'exp' I will use another function that doesn't behave "nicely" as exp ). And in the context of my post, the point is "if OpenMP can handle this code".

I wrote autoparallelizer that easily parallelizes this code ( without relying on properties of exp ) - search for "not parallel on OpenMP" on www.dalsoft.com/documentation_auto_parallelization.html.

Copy link

1,620 Views

The article I referenced in #4 addresses how to vectorize and parallelize a class of codes that appear to have loop order dependencies. In the article case and your #1.1 case, they both have carry dependencies.

Jim Dempsey

Copy link

1,620 Views

Curious if Intel's C compiler can autoparallelize the code samples in my posting.

Copy link

Community support is provided Monday to Friday. Other contact methods are available here.

Intel does not verify all solutions, including but not limited to any file transfers that may appear in this community. Accordingly, Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

For more complete information about compiler optimizations, see our Optimization Notice.