<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Fast &amp;quot;sum&amp;quot; routine in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-quot-sum-quot-routine/m-p/962923#M16003</link>
    <description>A loop such as&lt;BR /&gt;for(int i=0, sum=0;i &amp;lt; n;++n)sum += a&lt;I&gt;;&lt;BR /&gt;(with sum as a local variable declared the same type as a[])&lt;BR /&gt;should optimize easily.  For example, on Xeon or P4, use options&lt;BR /&gt;icc -O -xW&lt;BR /&gt;or, for an SSE3 machine -xP.&lt;BR /&gt;-O1 may be superior to -O2 for loops of moderate length.&lt;/I&gt;</description>
    <pubDate>Tue, 15 Nov 2005 03:21:12 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2005-11-15T03:21:12Z</dc:date>
    <item>
      <title>Fast "sum" routine</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-quot-sum-quot-routine/m-p/962920#M16000</link>
      <description>I need to efficiently compute the element sum of a double precision vector (a[0]+a[1]+..a[n-1]) . Is there a routine in MKL for this. The BLAS ?asum compute sum of the magnitudes, unfortunately.&lt;BR /&gt;&lt;BR /&gt;Andrew</description>
      <pubDate>Tue, 15 Nov 2005 01:08:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-quot-sum-quot-routine/m-p/962920#M16000</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2005-11-15T01:08:24Z</dc:date>
    </item>
    <item>
      <title>Re: Fast "sum" routine</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-quot-sum-quot-routine/m-p/962921#M16001</link>
      <description>Intel compiler optimizations do this effectively.</description>
      <pubDate>Tue, 15 Nov 2005 01:20:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-quot-sum-quot-routine/m-p/962921#M16001</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2005-11-15T01:20:54Z</dc:date>
    </item>
    <item>
      <title>Re: Fast "sum" routine</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-quot-sum-quot-routine/m-p/962922#M16002</link>
      <description>I am using Intel 9.0, so I gather you are suggesting just a simple "for" loop. Any specific optimization directives I should use?&lt;BR /&gt;This does seem like multithreading/paralellization would help here as well... &lt;BR /&gt;&lt;BR /&gt;Andrew</description>
      <pubDate>Tue, 15 Nov 2005 01:55:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-quot-sum-quot-routine/m-p/962922#M16002</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2005-11-15T01:55:36Z</dc:date>
    </item>
    <item>
      <title>Re: Fast "sum" routine</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-quot-sum-quot-routine/m-p/962923#M16003</link>
      <description>A loop such as&lt;BR /&gt;for(int i=0, sum=0;i &amp;lt; n;++n)sum += a&lt;I&gt;;&lt;BR /&gt;(with sum as a local variable declared the same type as a[])&lt;BR /&gt;should optimize easily.  For example, on Xeon or P4, use options&lt;BR /&gt;icc -O -xW&lt;BR /&gt;or, for an SSE3 machine -xP.&lt;BR /&gt;-O1 may be superior to -O2 for loops of moderate length.&lt;/I&gt;</description>
      <pubDate>Tue, 15 Nov 2005 03:21:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-quot-sum-quot-routine/m-p/962923#M16003</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2005-11-15T03:21:12Z</dc:date>
    </item>
    <item>
      <title>Re: Fast "sum" routine</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-quot-sum-quot-routine/m-p/962924#M16004</link>
      <description>As a little test, I tried this on a Pentium D with /Qopenmp and OMP_NUM_THREADS=2 and saw 100 percent CPU usage. Very nice...&lt;BR /&gt;&lt;BR /&gt;The build log window did say OpenMP defined loop was parallelized&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;double result=0.0;&lt;BR /&gt;&lt;BR /&gt;const double *data=s.data();&lt;BR /&gt;&lt;BR /&gt;int nEntries=s.rows()*s.cols();&lt;BR /&gt;&lt;BR /&gt;#pragma omp parallel for reduction(+:result)&lt;BR /&gt;&lt;BR /&gt;for (int i=0;i&lt;BR /&gt;&lt;BR /&gt;result+=data&lt;I&gt;;&lt;BR /&gt;&lt;BR /&gt;}&lt;/I&gt;</description>
      <pubDate>Tue, 15 Nov 2005 05:20:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fast-quot-sum-quot-routine/m-p/962924#M16004</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2005-11-15T05:20:24Z</dc:date>
    </item>
  </channel>
</rss>

