<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic a large array on the sum of the optimization problem in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/a-large-array-on-the-sum-of-the-optimization-problem/m-p/860726#M7491</link>
    <description>&lt;SUP&gt;&lt;SPAN style="font-size: small; font-family: Times New Roman;"&gt;
&lt;P&gt;This is a large array on the sum of the optimization problem.&lt;/P&gt;
&lt;P&gt;There are two double type array, then the code for this problem is as follows:&lt;/P&gt;
&lt;P&gt;#pragma omp parallel for&lt;/P&gt;
&lt;P&gt;for (long i=0; i&amp;lt;5000000; i++)&lt;/P&gt;
&lt;P&gt;{&lt;/P&gt;
&lt;P&gt;array1&lt;I&gt; += array2&lt;I&gt;;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;}&lt;/P&gt;
&lt;P&gt;My computer is "Dell PowerEdge 2900III 5U" with Xeon 5420 * 2 and 48G Memory.&lt;/P&gt;
&lt;P&gt;And the OS is MS Windows Server 2003 R2 Enterprise x64 Edition sp2.&lt;/P&gt;
&lt;P&gt;The C++ compilers are VC++ 2008 and Intel C++ 11.0.061, and the solution platform is x64.&lt;/P&gt;
&lt;P&gt;and then i used VC and IC compiled the program,the two result are basiclly the same.&lt;/P&gt;
&lt;P&gt;and then i used the funtion of INTEL MKL 10.1 to compute,as follows:&lt;/P&gt;
&lt;P&gt;cblas_daxpy(5000000, 1, array2, 1, array1, 1);&lt;/P&gt;
&lt;P&gt;the performance of the program have no different.&lt;/P&gt;
&lt;P&gt;and the i used other funtino of INTEL MKL 10.1:&lt;/P&gt;
&lt;P&gt;vdAdd( n, a, b, y );&lt;/P&gt;
&lt;P&gt;Program performance decreased significantly, and only about 80% of the original.&lt;/P&gt;
&lt;P&gt;i would like to know what way to optimize this problem by enhancing program performance&lt;/P&gt;
&lt;/SPAN&gt;&lt;/SUP&gt;</description>
    <pubDate>Fri, 26 Mar 2010 07:53:12 GMT</pubDate>
    <dc:creator>fyten1985</dc:creator>
    <dc:date>2010-03-26T07:53:12Z</dc:date>
    <item>
      <title>a large array on the sum of the optimization problem</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/a-large-array-on-the-sum-of-the-optimization-problem/m-p/860726#M7491</link>
      <description>&lt;SUP&gt;&lt;SPAN style="font-size: small; font-family: Times New Roman;"&gt;
&lt;P&gt;This is a large array on the sum of the optimization problem.&lt;/P&gt;
&lt;P&gt;There are two double type array, then the code for this problem is as follows:&lt;/P&gt;
&lt;P&gt;#pragma omp parallel for&lt;/P&gt;
&lt;P&gt;for (long i=0; i&amp;lt;5000000; i++)&lt;/P&gt;
&lt;P&gt;{&lt;/P&gt;
&lt;P&gt;array1&lt;I&gt; += array2&lt;I&gt;;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;}&lt;/P&gt;
&lt;P&gt;My computer is "Dell PowerEdge 2900III 5U" with Xeon 5420 * 2 and 48G Memory.&lt;/P&gt;
&lt;P&gt;And the OS is MS Windows Server 2003 R2 Enterprise x64 Edition sp2.&lt;/P&gt;
&lt;P&gt;The C++ compilers are VC++ 2008 and Intel C++ 11.0.061, and the solution platform is x64.&lt;/P&gt;
&lt;P&gt;and then i used VC and IC compiled the program,the two result are basiclly the same.&lt;/P&gt;
&lt;P&gt;and then i used the funtion of INTEL MKL 10.1 to compute,as follows:&lt;/P&gt;
&lt;P&gt;cblas_daxpy(5000000, 1, array2, 1, array1, 1);&lt;/P&gt;
&lt;P&gt;the performance of the program have no different.&lt;/P&gt;
&lt;P&gt;and the i used other funtino of INTEL MKL 10.1:&lt;/P&gt;
&lt;P&gt;vdAdd( n, a, b, y );&lt;/P&gt;
&lt;P&gt;Program performance decreased significantly, and only about 80% of the original.&lt;/P&gt;
&lt;P&gt;i would like to know what way to optimize this problem by enhancing program performance&lt;/P&gt;
&lt;/SPAN&gt;&lt;/SUP&gt;</description>
      <pubDate>Fri, 26 Mar 2010 07:53:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/a-large-array-on-the-sum-of-the-optimization-problem/m-p/860726#M7491</guid>
      <dc:creator>fyten1985</dc:creator>
      <dc:date>2010-03-26T07:53:12Z</dc:date>
    </item>
    <item>
      <title>a large array on the sum of the optimization problem</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/a-large-array-on-the-sum-of-the-optimization-problem/m-p/860727#M7492</link>
      <description>&lt;DIV&gt;I reproduced the result - it seems to me that VML optimized much better for shorter length than for such pretty long.&lt;/DIV&gt;
&lt;DIV&gt;
&lt;DIV id="_mcePaste"&gt;For example for N = 100000, vdAdd / cblas_daxpy ~ 0.4 ( core 2 Duo) but for N = 10^6,&lt;/DIV&gt;
&lt;DIV id="_mcePaste"&gt;vdAdd / cblas_daxpy ~ 1.3.&lt;/DIV&gt;
&lt;DIV id="_mcePaste"&gt;I will ask the expert team of VML to shed light on this problem.&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;--Gennady&lt;/DIV&gt;</description>
      <pubDate>Mon, 29 Mar 2010 09:46:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/a-large-array-on-the-sum-of-the-optimization-problem/m-p/860727#M7492</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2010-03-29T09:46:16Z</dc:date>
    </item>
    <item>
      <title>a large array on the sum of the optimization problem</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/a-large-array-on-the-sum-of-the-optimization-problem/m-p/860728#M7493</link>
      <description>&lt;P&gt;Vector size 1000  10000 elements is the typical VML usage model (data should fit in caches). In this case vdAdd works faster or close to BLAS or Compiler-generated loop because of threads creation overhead in BLAS and Compiler loop. vdAdd doesnt use threading (this is a known limitation and we work on it) and so it cannot compete in case of large vector lengths in multithread environment. Moreover in your case vdAdd suffers from cache misses in the large vector lengths cases more than BLAS or Compiler because your test case uses separate memory array y for results. You shall see better performance if you write vdAdd( n, a, b, a ).&lt;BR /&gt;Thanks,&lt;BR /&gt;Nikita&lt;/P&gt;</description>
      <pubDate>Mon, 29 Mar 2010 11:15:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/a-large-array-on-the-sum-of-the-optimization-problem/m-p/860728#M7493</guid>
      <dc:creator>Nikita_A_Intel</dc:creator>
      <dc:date>2010-03-29T11:15:25Z</dc:date>
    </item>
  </channel>
</rss>

