Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Optimizing cilk with ternary conditional

Fabio_G_
ビギナー
1,524件の閲覧回数

What is the best way to optimize the cycle

cilk_for(i=0;i<n;i++){
    x=x<0?0:x;
}

or somethings like that?

Thanks, Fabio

0 件の賞賛
1 解決策
TimP
名誉コントリビューター III
1,524件の閲覧回数

With cilk_for, it's important to make the induction variable local to each worker (thus C99 or C++):

cilk_for(int i=0;..... (there are lots of myths about appropriate data types)

icpc should tell you about this locality requirement (why not icc?).

If you want combined simd and multi-core parallelism, you must write it out with each i performing an array section using extended array notation, preferably cache aligned.  This may require AVX2 if it's an integer data type.

Intel compiler should optimize the alternative written with std::max(), while gcc doesn't offer vectorization of std::max, but, unlike the Intel compiler, offers vectorization with fmax et al. under -ffast-math (-ffinite-math-only).  If it weren't for these differences among compilers, I'd recommend max() [min] where it fits.

I'd say consider omp parallel for simd with Intel compiler; it's a bit simpler and more capable, although some similar considerations apply, along with the issues about using OpenMP and Cilk(tm) Plus in the same application.

元の投稿で解決策を見る

3 返答(返信)
TimP
名誉コントリビューター III
1,525件の閲覧回数

With cilk_for, it's important to make the induction variable local to each worker (thus C99 or C++):

cilk_for(int i=0;..... (there are lots of myths about appropriate data types)

icpc should tell you about this locality requirement (why not icc?).

If you want combined simd and multi-core parallelism, you must write it out with each i performing an array section using extended array notation, preferably cache aligned.  This may require AVX2 if it's an integer data type.

Intel compiler should optimize the alternative written with std::max(), while gcc doesn't offer vectorization of std::max, but, unlike the Intel compiler, offers vectorization with fmax et al. under -ffast-math (-ffinite-math-only).  If it weren't for these differences among compilers, I'd recommend max() [min] where it fits.

I'd say consider omp parallel for simd with Intel compiler; it's a bit simpler and more capable, although some similar considerations apply, along with the issues about using OpenMP and Cilk(tm) Plus in the same application.

Fabio_G_
ビギナー
1,524件の閲覧回数

Hi Tim,

first, Thanks!

then: I've not used local counter for the whole code and the parallelization works fine.

I'm guessing if standard C allows for local counter declaration, the same as C++. However

this is not important.

Coming back to the important issue, some suggestions you gave me are a bit obscures

(that is my fault) so I need to investigate a bit deeper the way to exploit parallel/vector

capability of processor(s) through programming and icc command line.

Thanks a lot.

 

TimP
名誉コントリビューター III
1,524件の閲覧回数

You must set -std=c99 in order to accept cilk_for(int i;...

There is significant performance loss when sharing the loop counter among a large number of workers.  I guessed wrongly originally that cilk_for would automatically privatize, until I got the message under C++ and checked performance.

返信