<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Baseline Performance Data (STREAM) in Intel® oneAPI DPC++/C++ Compiler</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Baseline-Performance-Data-STREAM/m-p/1131431#M16</link>
    <description>&lt;P&gt;&lt;A href="https://raw.githubusercontent.com/jeffhammond/STREAM/master/stream.c"&gt;STREAM benchmark&lt;/A&gt;:&amp;nbsp;&lt;/P&gt;
&lt;PRE class="brush:bash; class-name:dark;"&gt;icc stream.c -o stream -O3 -xHost -qopenmp -DSTREAM_ARRAY_SIZE=33554432
&lt;/PRE&gt;

&lt;P&gt;on fpga_compile node&amp;nbsp;(Intel(R) Xeon(R) Platinum 8153 CPU @ 2.00GHz)&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;KMP_AFFINITY=compact ./stream
Function &amp;nbsp; &amp;nbsp;Best Rate MB/s &amp;nbsp;Avg time &amp;nbsp; &amp;nbsp; Min time &amp;nbsp; &amp;nbsp; Max time
Copy: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;141917.2 &amp;nbsp; &amp;nbsp; 0.003796 &amp;nbsp; &amp;nbsp; 0.003783 &amp;nbsp; &amp;nbsp; 0.003813
Scale: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 139707.1 &amp;nbsp; &amp;nbsp; 0.003849 &amp;nbsp; &amp;nbsp; 0.003843 &amp;nbsp; &amp;nbsp; 0.003855
Add: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 153685.5 &amp;nbsp; &amp;nbsp; 0.005251 &amp;nbsp; &amp;nbsp; 0.005240 &amp;nbsp; &amp;nbsp; 0.005262
Triad: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 156861.5 &amp;nbsp; &amp;nbsp; 0.005183 &amp;nbsp; &amp;nbsp; 0.005134 &amp;nbsp; &amp;nbsp; 0.005525&lt;/PRE&gt;

&lt;P&gt;on gpu node (Intel(R) Xeon(R) E-2176G CPU @ 3.70GHz)&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;KMP_AFFINITY=compact ./stream
Function &amp;nbsp; &amp;nbsp;Best Rate MB/s &amp;nbsp;Avg time &amp;nbsp; &amp;nbsp; Min time &amp;nbsp; &amp;nbsp; Max time
Copy: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 30683.5 &amp;nbsp; &amp;nbsp; 0.017546 &amp;nbsp; &amp;nbsp; 0.017497 &amp;nbsp; &amp;nbsp; 0.017601
Scale: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;32258.0 &amp;nbsp; &amp;nbsp; 0.016687 &amp;nbsp; &amp;nbsp; 0.016643 &amp;nbsp; &amp;nbsp; 0.016742
Add: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;33558.9 &amp;nbsp; &amp;nbsp; 0.024043 &amp;nbsp; &amp;nbsp; 0.023997 &amp;nbsp; &amp;nbsp; 0.024099
Triad: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;33405.5 &amp;nbsp; &amp;nbsp; 0.024130 &amp;nbsp; &amp;nbsp; 0.024107 &amp;nbsp; &amp;nbsp; 0.024161&lt;/PRE&gt;

&lt;P&gt;&lt;A href="https://github.com/UoB-HPC/BabelStream"&gt;BabelSTREAM benchmark&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;OpenMP:&lt;/P&gt;
&lt;P&gt;on fpga_compile node&amp;nbsp;(Intel(R) Xeon(R) Platinum 8153 CPU @ 2.00GHz)&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;icpc -O3 -xHost main.cpp OMPStream.cpp -qopenmp -DIMPLEMENTATION_STRING=\"OpenMP\" -g -DOMP
KMP_AFFINITY=compact ./a.out
Function    MBytes/sec  Min (sec)   Max         Average
Copy        141828.793  0.00379     0.01443     0.00401
Mul         121572.295  0.00442     0.01676     0.00464
Add         133730.659  0.00602     0.01396     0.00616
Triad       134717.507  0.00598     0.01672     0.00613
Dot         177794.491  0.00302     0.01040     0.00311&lt;/PRE&gt;

&lt;P&gt;on gpu node (Intel(R) Xeon(R) E-2176G CPU @ 3.70GHz)&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;Function    MBytes/sec  Min (sec)   Max         Average
Copy        30787.821   0.01744     0.02158     0.01753
Mul         21622.489   0.02483     0.02676     0.02494
Add         24157.226   0.03334     0.03822     0.03349
Triad       24196.118   0.03328     0.03367     0.03337
Dot         32628.879   0.01645     0.01776     0.01654&lt;/PRE&gt;

&lt;P&gt;Using OneAPI/SYCL:&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;dpcpp main.cpp SYCLStream.cpp -lsycl -lOpenCL -DIMPLEMENTATION_STRING=\"SYCL\" -O3 -DSYCL -o sycl-stream&lt;/PRE&gt;

&lt;P&gt;on fpga_compile (CPU device):&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;Function    MBytes/sec  Min (sec)   Max         Average
Copy        47386.727   0.01133     0.02763     0.01309
Mul         45257.555   0.01186     0.04275     0.01352
Add         49772.544   0.01618     0.03343     0.01742
Triad       50365.736   0.01599     0.02322     0.01720
Dot         8051.967    0.06668     6.20847     0.15319&lt;/PRE&gt;

&lt;P&gt;on gpu node (CPU device):&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;./sycl-stream --device 0
Function    MBytes/sec  Min (sec)   Max         Average
Copy        21547.783   0.02492     0.02766     0.02505
Mul         21498.313   0.02497     0.02621     0.02505
Add         24177.595   0.03331     0.03412     0.03345
Triad       24126.740   0.03338     0.03416     0.03354
Dot         31036.779   0.01730     0.02748     0.01909&lt;/PRE&gt;

&lt;P&gt;on gpu node (Intel UDH GPU):&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;./sycl-stream --device 1
Function &amp;nbsp; &amp;nbsp;MBytes/sec &amp;nbsp;Min (sec) &amp;nbsp; Max &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Average
Copy &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;36767.394 &amp;nbsp; 0.01460 &amp;nbsp; &amp;nbsp; 0.01514 &amp;nbsp; &amp;nbsp; 0.01492
Mul &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 36019.351 &amp;nbsp; 0.01491 &amp;nbsp; &amp;nbsp; 0.01537 &amp;nbsp; &amp;nbsp; 0.01504
Add &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 34204.743 &amp;nbsp; 0.02354 &amp;nbsp; &amp;nbsp; 0.02428 &amp;nbsp; &amp;nbsp; 0.02365
Triad &amp;nbsp; &amp;nbsp; &amp;nbsp; 34836.742 &amp;nbsp; 0.02312 &amp;nbsp; &amp;nbsp; 0.02378 &amp;nbsp; &amp;nbsp; 0.02322
Dot &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 28777.236 &amp;nbsp; 0.01866 &amp;nbsp; &amp;nbsp; 0.01948 &amp;nbsp; &amp;nbsp; 0.01903&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any suggestions on oneAPI compile flags are welcome!&lt;/P&gt;
&lt;P&gt;Questions:&lt;/P&gt;
&lt;UL&gt;&lt;LI&gt;Is there any way to control thread affinity to address the likely NUMA issues I am seeing with the SYCL test's bandwidth on the dual-socket fpga_compile machine?&lt;/LI&gt;&lt;LI&gt;How come on the gpu node, the bandwidth from the GPU is much better than from the CPU? I believe they share the same memory.&lt;/LI&gt;&lt;/UL&gt;</description>
    <pubDate>Tue, 29 Oct 2019 13:24:23 GMT</pubDate>
    <dc:creator>REGULY__ISTVAN</dc:creator>
    <dc:date>2019-10-29T13:24:23Z</dc:date>
    <item>
      <title>Baseline Performance Data (STREAM)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Baseline-Performance-Data-STREAM/m-p/1131431#M16</link>
      <description>&lt;P&gt;&lt;A href="https://raw.githubusercontent.com/jeffhammond/STREAM/master/stream.c"&gt;STREAM benchmark&lt;/A&gt;:&amp;nbsp;&lt;/P&gt;
&lt;PRE class="brush:bash; class-name:dark;"&gt;icc stream.c -o stream -O3 -xHost -qopenmp -DSTREAM_ARRAY_SIZE=33554432
&lt;/PRE&gt;

&lt;P&gt;on fpga_compile node&amp;nbsp;(Intel(R) Xeon(R) Platinum 8153 CPU @ 2.00GHz)&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;KMP_AFFINITY=compact ./stream
Function &amp;nbsp; &amp;nbsp;Best Rate MB/s &amp;nbsp;Avg time &amp;nbsp; &amp;nbsp; Min time &amp;nbsp; &amp;nbsp; Max time
Copy: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;141917.2 &amp;nbsp; &amp;nbsp; 0.003796 &amp;nbsp; &amp;nbsp; 0.003783 &amp;nbsp; &amp;nbsp; 0.003813
Scale: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 139707.1 &amp;nbsp; &amp;nbsp; 0.003849 &amp;nbsp; &amp;nbsp; 0.003843 &amp;nbsp; &amp;nbsp; 0.003855
Add: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 153685.5 &amp;nbsp; &amp;nbsp; 0.005251 &amp;nbsp; &amp;nbsp; 0.005240 &amp;nbsp; &amp;nbsp; 0.005262
Triad: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 156861.5 &amp;nbsp; &amp;nbsp; 0.005183 &amp;nbsp; &amp;nbsp; 0.005134 &amp;nbsp; &amp;nbsp; 0.005525&lt;/PRE&gt;

&lt;P&gt;on gpu node (Intel(R) Xeon(R) E-2176G CPU @ 3.70GHz)&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;KMP_AFFINITY=compact ./stream
Function &amp;nbsp; &amp;nbsp;Best Rate MB/s &amp;nbsp;Avg time &amp;nbsp; &amp;nbsp; Min time &amp;nbsp; &amp;nbsp; Max time
Copy: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 30683.5 &amp;nbsp; &amp;nbsp; 0.017546 &amp;nbsp; &amp;nbsp; 0.017497 &amp;nbsp; &amp;nbsp; 0.017601
Scale: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;32258.0 &amp;nbsp; &amp;nbsp; 0.016687 &amp;nbsp; &amp;nbsp; 0.016643 &amp;nbsp; &amp;nbsp; 0.016742
Add: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;33558.9 &amp;nbsp; &amp;nbsp; 0.024043 &amp;nbsp; &amp;nbsp; 0.023997 &amp;nbsp; &amp;nbsp; 0.024099
Triad: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;33405.5 &amp;nbsp; &amp;nbsp; 0.024130 &amp;nbsp; &amp;nbsp; 0.024107 &amp;nbsp; &amp;nbsp; 0.024161&lt;/PRE&gt;

&lt;P&gt;&lt;A href="https://github.com/UoB-HPC/BabelStream"&gt;BabelSTREAM benchmark&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;OpenMP:&lt;/P&gt;
&lt;P&gt;on fpga_compile node&amp;nbsp;(Intel(R) Xeon(R) Platinum 8153 CPU @ 2.00GHz)&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;icpc -O3 -xHost main.cpp OMPStream.cpp -qopenmp -DIMPLEMENTATION_STRING=\"OpenMP\" -g -DOMP
KMP_AFFINITY=compact ./a.out
Function    MBytes/sec  Min (sec)   Max         Average
Copy        141828.793  0.00379     0.01443     0.00401
Mul         121572.295  0.00442     0.01676     0.00464
Add         133730.659  0.00602     0.01396     0.00616
Triad       134717.507  0.00598     0.01672     0.00613
Dot         177794.491  0.00302     0.01040     0.00311&lt;/PRE&gt;

&lt;P&gt;on gpu node (Intel(R) Xeon(R) E-2176G CPU @ 3.70GHz)&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;Function    MBytes/sec  Min (sec)   Max         Average
Copy        30787.821   0.01744     0.02158     0.01753
Mul         21622.489   0.02483     0.02676     0.02494
Add         24157.226   0.03334     0.03822     0.03349
Triad       24196.118   0.03328     0.03367     0.03337
Dot         32628.879   0.01645     0.01776     0.01654&lt;/PRE&gt;

&lt;P&gt;Using OneAPI/SYCL:&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;dpcpp main.cpp SYCLStream.cpp -lsycl -lOpenCL -DIMPLEMENTATION_STRING=\"SYCL\" -O3 -DSYCL -o sycl-stream&lt;/PRE&gt;

&lt;P&gt;on fpga_compile (CPU device):&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;Function    MBytes/sec  Min (sec)   Max         Average
Copy        47386.727   0.01133     0.02763     0.01309
Mul         45257.555   0.01186     0.04275     0.01352
Add         49772.544   0.01618     0.03343     0.01742
Triad       50365.736   0.01599     0.02322     0.01720
Dot         8051.967    0.06668     6.20847     0.15319&lt;/PRE&gt;

&lt;P&gt;on gpu node (CPU device):&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;./sycl-stream --device 0
Function    MBytes/sec  Min (sec)   Max         Average
Copy        21547.783   0.02492     0.02766     0.02505
Mul         21498.313   0.02497     0.02621     0.02505
Add         24177.595   0.03331     0.03412     0.03345
Triad       24126.740   0.03338     0.03416     0.03354
Dot         31036.779   0.01730     0.02748     0.01909&lt;/PRE&gt;

&lt;P&gt;on gpu node (Intel UDH GPU):&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;./sycl-stream --device 1
Function &amp;nbsp; &amp;nbsp;MBytes/sec &amp;nbsp;Min (sec) &amp;nbsp; Max &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Average
Copy &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;36767.394 &amp;nbsp; 0.01460 &amp;nbsp; &amp;nbsp; 0.01514 &amp;nbsp; &amp;nbsp; 0.01492
Mul &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 36019.351 &amp;nbsp; 0.01491 &amp;nbsp; &amp;nbsp; 0.01537 &amp;nbsp; &amp;nbsp; 0.01504
Add &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 34204.743 &amp;nbsp; 0.02354 &amp;nbsp; &amp;nbsp; 0.02428 &amp;nbsp; &amp;nbsp; 0.02365
Triad &amp;nbsp; &amp;nbsp; &amp;nbsp; 34836.742 &amp;nbsp; 0.02312 &amp;nbsp; &amp;nbsp; 0.02378 &amp;nbsp; &amp;nbsp; 0.02322
Dot &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 28777.236 &amp;nbsp; 0.01866 &amp;nbsp; &amp;nbsp; 0.01948 &amp;nbsp; &amp;nbsp; 0.01903&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any suggestions on oneAPI compile flags are welcome!&lt;/P&gt;
&lt;P&gt;Questions:&lt;/P&gt;
&lt;UL&gt;&lt;LI&gt;Is there any way to control thread affinity to address the likely NUMA issues I am seeing with the SYCL test's bandwidth on the dual-socket fpga_compile machine?&lt;/LI&gt;&lt;LI&gt;How come on the gpu node, the bandwidth from the GPU is much better than from the CPU? I believe they share the same memory.&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Tue, 29 Oct 2019 13:24:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Baseline-Performance-Data-STREAM/m-p/1131431#M16</guid>
      <dc:creator>REGULY__ISTVAN</dc:creator>
      <dc:date>2019-10-29T13:24:23Z</dc:date>
    </item>
    <item>
      <title>Istvan,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Baseline-Performance-Data-STREAM/m-p/1131432#M17</link>
      <description>&lt;P&gt;Istvan,&lt;/P&gt;&lt;P&gt;Thanks for the detailed question, we'll take a look and get back to you as soon as possible.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;William&lt;/P&gt;</description>
      <pubDate>Fri, 01 Nov 2019 23:58:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Baseline-Performance-Data-STREAM/m-p/1131432#M17</guid>
      <dc:creator>WILLIAM_H_Intel4</dc:creator>
      <dc:date>2019-11-01T23:58:11Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Baseline-Performance-Data-STREAM/m-p/1131433#M18</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;What does fpga_compile node mean?&lt;BR /&gt;We are working on a feature to add affinity to DPCPP.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;BR /&gt;Varsha&lt;/P&gt;</description>
      <pubDate>Thu, 07 Nov 2019 02:35:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Baseline-Performance-Data-STREAM/m-p/1131433#M18</guid>
      <dc:creator>Varsha_M_Intel</dc:creator>
      <dc:date>2019-11-07T02:35:15Z</dc:date>
    </item>
    <item>
      <title>fpga_compile node is the node</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Baseline-Performance-Data-STREAM/m-p/1131434#M19</link>
      <description>&lt;P&gt;fpga_compile node is the node I get when submitting a job with:&amp;nbsp;&lt;/P&gt;
&lt;PRE class="brush:bash; class-name:dark;"&gt;qsub -q batch@v-qsvr-nda -l nodes=1:fpga_compile:ppn=2&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Nov 2019 08:53:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Baseline-Performance-Data-STREAM/m-p/1131434#M19</guid>
      <dc:creator>REGULY__ISTVAN</dc:creator>
      <dc:date>2019-11-07T08:53:30Z</dc:date>
    </item>
  </channel>
</rss>

