<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Tim, in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Optimizing-cilk-with-ternary-conditional/m-p/962407#M5331</link>
    <description>&lt;P&gt;Hi Tim,&lt;/P&gt;

&lt;P&gt;first, Thanks!&lt;/P&gt;

&lt;P&gt;then: I've not used local counter for the whole code and the parallelization works fine.&lt;/P&gt;

&lt;P&gt;I'm guessing if standard C allows for local counter declaration, the same as C++. However&lt;/P&gt;

&lt;P&gt;this is not important.&lt;/P&gt;

&lt;P&gt;Coming back to the important issue, some suggestions you gave me are a bit obscures&lt;/P&gt;

&lt;P&gt;(that is my fault) so I need to investigate a bit deeper the way to exploit parallel/vector&lt;/P&gt;

&lt;P&gt;capability of processor(s) through programming and icc command line.&lt;/P&gt;

&lt;P&gt;Thanks a lot.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 28 Mar 2014 07:59:24 GMT</pubDate>
    <dc:creator>Fabio_G_</dc:creator>
    <dc:date>2014-03-28T07:59:24Z</dc:date>
    <item>
      <title>Optimizing cilk with ternary conditional</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Optimizing-cilk-with-ternary-conditional/m-p/962405#M5329</link>
      <description>&lt;P&gt;What is the best way to optimize the cycle&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;
cilk_for(i=0;i&amp;lt;n;i++){
    x&lt;I&gt;=x&lt;I&gt;&amp;lt;0?0:x&lt;I&gt;;
}&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;or somethings like that?&lt;/P&gt;

&lt;P&gt;Thanks, Fabio&lt;/P&gt;</description>
      <pubDate>Thu, 27 Mar 2014 11:33:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Optimizing-cilk-with-ternary-conditional/m-p/962405#M5329</guid>
      <dc:creator>Fabio_G_</dc:creator>
      <dc:date>2014-03-27T11:33:25Z</dc:date>
    </item>
    <item>
      <title>With cilk_for, it's important</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Optimizing-cilk-with-ternary-conditional/m-p/962406#M5330</link>
      <description>&lt;P&gt;With cilk_for, it's important to make the induction variable local to each worker (thus C99 or C++):&lt;/P&gt;

&lt;P&gt;cilk_for(int i=0;..... (there are lots of myths about appropriate data types)&lt;/P&gt;

&lt;P&gt;icpc should tell you about this locality requirement (why not icc?).&lt;/P&gt;

&lt;P&gt;If you want combined simd and multi-core parallelism, you must write it out with each i performing an array section using extended array notation, preferably cache aligned. &amp;nbsp;This may require AVX2 if it's an integer data type.&lt;/P&gt;

&lt;P&gt;Intel compiler should optimize the alternative written with std::max(), while gcc doesn't offer vectorization of std::max, but, unlike the Intel compiler, offers vectorization with fmax et al. under -ffast-math (-ffinite-math-only). &amp;nbsp;If it weren't for these differences among compilers, I'd recommend max() [min] where it fits.&lt;/P&gt;

&lt;P&gt;I'd say consider omp parallel for simd with Intel compiler; it's a bit simpler and more capable, although some similar considerations apply, along with the issues about using OpenMP and Cilk(tm) Plus in the same application.&lt;/P&gt;</description>
      <pubDate>Thu, 27 Mar 2014 15:02:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Optimizing-cilk-with-ternary-conditional/m-p/962406#M5330</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2014-03-27T15:02:27Z</dc:date>
    </item>
    <item>
      <title>Hi Tim,</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Optimizing-cilk-with-ternary-conditional/m-p/962407#M5331</link>
      <description>&lt;P&gt;Hi Tim,&lt;/P&gt;

&lt;P&gt;first, Thanks!&lt;/P&gt;

&lt;P&gt;then: I've not used local counter for the whole code and the parallelization works fine.&lt;/P&gt;

&lt;P&gt;I'm guessing if standard C allows for local counter declaration, the same as C++. However&lt;/P&gt;

&lt;P&gt;this is not important.&lt;/P&gt;

&lt;P&gt;Coming back to the important issue, some suggestions you gave me are a bit obscures&lt;/P&gt;

&lt;P&gt;(that is my fault) so I need to investigate a bit deeper the way to exploit parallel/vector&lt;/P&gt;

&lt;P&gt;capability of processor(s) through programming and icc command line.&lt;/P&gt;

&lt;P&gt;Thanks a lot.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Mar 2014 07:59:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Optimizing-cilk-with-ternary-conditional/m-p/962407#M5331</guid>
      <dc:creator>Fabio_G_</dc:creator>
      <dc:date>2014-03-28T07:59:24Z</dc:date>
    </item>
    <item>
      <title>You must set -std=c99 in</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Optimizing-cilk-with-ternary-conditional/m-p/962408#M5332</link>
      <description>&lt;P&gt;You must set -std=c99 in order to accept cilk_for(int i;...&lt;/P&gt;

&lt;P&gt;There is significant performance loss when sharing the loop counter among a large number of workers. &amp;nbsp;I guessed wrongly originally that cilk_for would automatically privatize, until I got the message under C++ and checked performance.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Mar 2014 11:06:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Optimizing-cilk-with-ternary-conditional/m-p/962408#M5332</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2014-03-28T11:06:26Z</dc:date>
    </item>
  </channel>
</rss>

