<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Surprising benchmark results while comparing kernel execution speed on different processors in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Surprising-benchmark-results-while-comparing-kernel-execution/m-p/1675290#M7375</link>
    <description>&lt;P&gt;I have been benchmarking the execution speed of a simple OpenCL kernel that I'm executing through PyOpenCL on three different CPUs and I'm really surprised by the results: the less powerful (i5-13500H) is significantly faster than the other two (i9-14900KH and Ryzen 9950X).&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The execution time on the i5-13500H is around 5.2s and it fares between 7s and 12s for the other two processors.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;From my estimates, the i5-13500H operates close to its theoretical maximum performance (500 GFlops), which is great, but the other processors are well below (by at least a factor 4 to 5).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is the kernel I'm running:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;__kernel void krn(
    __global const float *a_g, __global const float *b_g, __global float *res_g)
{
  int gid = get_global_id(0);
  float s;
  int i = 0;
  res_g[gid] = 0;
  for(i=0;i&amp;lt;"""+str(nloops)+""";i++)
  {
     s = 100.0*(a_g[gid]+b_g[gid]);
     res_g[gid] += (int)s+i;
  }
}"""&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;SPAN&gt;And I'm testing it on a pair of 256 millions random floats with nloop set to 1024.&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;On my setups t&lt;SPAN&gt;he i9-14900KH thermal throttles very quickly under this load but the Ryzen 9950X&amp;nbsp;doesn't . Still, as expected, both processors perform significantly better than the i5-13500H on benchmarks such as CPU-Z Multi Thread (about x3 to x4 faster ), which leads me to think that something is very suboptimal when running this specific OPENCL kernel on these processors.&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Any clue what could be going on?&lt;/SPAN&gt;&lt;/DIV&gt;</description>
    <pubDate>Mon, 17 Mar 2025 10:34:34 GMT</pubDate>
    <dc:creator>SebastienTs</dc:creator>
    <dc:date>2025-03-17T10:34:34Z</dc:date>
    <item>
      <title>Surprising benchmark results while comparing kernel execution speed on different processors</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Surprising-benchmark-results-while-comparing-kernel-execution/m-p/1675290#M7375</link>
      <description>&lt;P&gt;I have been benchmarking the execution speed of a simple OpenCL kernel that I'm executing through PyOpenCL on three different CPUs and I'm really surprised by the results: the less powerful (i5-13500H) is significantly faster than the other two (i9-14900KH and Ryzen 9950X).&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The execution time on the i5-13500H is around 5.2s and it fares between 7s and 12s for the other two processors.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;From my estimates, the i5-13500H operates close to its theoretical maximum performance (500 GFlops), which is great, but the other processors are well below (by at least a factor 4 to 5).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is the kernel I'm running:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;__kernel void krn(
    __global const float *a_g, __global const float *b_g, __global float *res_g)
{
  int gid = get_global_id(0);
  float s;
  int i = 0;
  res_g[gid] = 0;
  for(i=0;i&amp;lt;"""+str(nloops)+""";i++)
  {
     s = 100.0*(a_g[gid]+b_g[gid]);
     res_g[gid] += (int)s+i;
  }
}"""&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;SPAN&gt;And I'm testing it on a pair of 256 millions random floats with nloop set to 1024.&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;On my setups t&lt;SPAN&gt;he i9-14900KH thermal throttles very quickly under this load but the Ryzen 9950X&amp;nbsp;doesn't . Still, as expected, both processors perform significantly better than the i5-13500H on benchmarks such as CPU-Z Multi Thread (about x3 to x4 faster ), which leads me to think that something is very suboptimal when running this specific OPENCL kernel on these processors.&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Any clue what could be going on?&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 17 Mar 2025 10:34:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Surprising-benchmark-results-while-comparing-kernel-execution/m-p/1675290#M7375</guid>
      <dc:creator>SebastienTs</dc:creator>
      <dc:date>2025-03-17T10:34:34Z</dc:date>
    </item>
    <item>
      <title>Re: Surprising benchmark results while comparing kernel execution speed on different processors</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Surprising-benchmark-results-while-comparing-kernel-execution/m-p/1675697#M7376</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The Intel OpenCL CPU RT provides OpenCL support for Intel CPU devices. We will take a look at the performance issues on the Intel CPU.&lt;/P&gt;
&lt;P&gt;Can you provide us with a complete testable program and the build command?&amp;nbsp; Also, please tell us the version of OpenCL CPU RT you used.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Mar 2025 02:17:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Surprising-benchmark-results-while-comparing-kernel-execution/m-p/1675697#M7376</guid>
      <dc:creator>cw_intel</dc:creator>
      <dc:date>2025-03-18T02:17:52Z</dc:date>
    </item>
    <item>
      <title>Re: Surprising benchmark results while comparing kernel execution speed on different processors</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Surprising-benchmark-results-while-comparing-kernel-execution/m-p/1675845#M7377</link>
      <description>&lt;P&gt;Please find the code attached (I encrypted the zip to avoid scanning, password: 1234)&lt;/P&gt;&lt;P&gt;It runs in a Python 3.9.20 Conda environment with pyopencl 2022.1.5 and numpy 1.23.5.&lt;/P&gt;&lt;P&gt;It has been tested with&amp;nbsp; Intel OpenCL CPU Runtime 24.1.968 in the following benchmark (surprising results in red):&lt;/P&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;NVIDIA RTX 4080 super&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;0.58s&lt;/DIV&gt;&lt;DIV&gt;NVIDIA L40S&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;FONT color="#FF0000"&gt;0.77s&lt;/FONT&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;NVIDIA RTX 4060 (laptop)&amp;nbsp; &amp;nbsp;1.5s&lt;/DIV&gt;&lt;DIV&gt;NVIDIA RTX 3050&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2.3s&lt;/DIV&gt;&lt;DIV&gt;Intel Iris Xe G7&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;39s&lt;/DIV&gt;&lt;DIV&gt;-------------------------------------&lt;/DIV&gt;&lt;DIV&gt;Intel Xeon Gold 6430&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;4.77s&lt;/DIV&gt;&lt;DIV&gt;Intel i5-13500H&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;6.3s&lt;/DIV&gt;&lt;DIV&gt;AMD Ryzen 9950X&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;FONT color="#FF0000"&gt;7.23s&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;Intel i9-14900KF&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;FONT color="#FF0000"&gt;15.2s&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;-------------------------------------&lt;/DIV&gt;</description>
      <pubDate>Tue, 18 Mar 2025 15:35:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Surprising-benchmark-results-while-comparing-kernel-execution/m-p/1675845#M7377</guid>
      <dc:creator>SebastienTs</dc:creator>
      <dc:date>2025-03-18T15:35:01Z</dc:date>
    </item>
    <item>
      <title>Re: Surprising benchmark results while comparing kernel execution speed on different processors</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Surprising-benchmark-results-while-comparing-kernel-execution/m-p/1675879#M7378</link>
      <description>&lt;P&gt;Thanks for the details, we will take a look at it.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Mar 2025 13:38:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Surprising-benchmark-results-while-comparing-kernel-execution/m-p/1675879#M7378</guid>
      <dc:creator>cw_intel</dc:creator>
      <dc:date>2025-03-18T13:38:39Z</dc:date>
    </item>
    <item>
      <title>Re: Surprising benchmark results while comparing kernel execution speed on different processors</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Surprising-benchmark-results-while-comparing-kernel-execution/m-p/1676182#M7379</link>
      <description>&lt;P&gt;&lt;U&gt;&lt;STRONG&gt;Important note&lt;/STRONG&gt;&lt;/U&gt;: The test for the AMD Ryzen 9950X and intel i9-14900KF were actually performed with a slightly newer version of the runtime (2025.0.0.1166), downgrading to the latest 2024 version (2024.2.0.980) cut down computation time from 7.23s to 2.6s (9950X) and from 15.2s to 3.2s (i9-14900KF)!&lt;/P&gt;&lt;P&gt;Another time this proves that everything count in a benchmark but we can also say that the 2025 version of the runtime was an awesome update from Intel!!&lt;/P&gt;&lt;P&gt;I would be still interested to hear about kernel speed optimization and how performance can differ from processor to processor, especially between AMD (which I am more interested in) and Intel.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Mar 2025 11:40:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Surprising-benchmark-results-while-comparing-kernel-execution/m-p/1676182#M7379</guid>
      <dc:creator>SebastienTs</dc:creator>
      <dc:date>2025-03-19T11:40:01Z</dc:date>
    </item>
  </channel>
</rss>

