OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1728 Discussions

Surprising benchmark results while comparing kernel execution speed on different processors

SebastienTs
Novice
555 Views

I have been benchmarking the execution speed of a simple OpenCL kernel that I'm executing through PyOpenCL on three different CPUs and I'm really surprised by the results: the less powerful (i5-13500H) is significantly faster than the other two (i9-14900KH and Ryzen 9950X).

The execution time on the i5-13500H is around 5.2s and it fares between 7s and 12s for the other two processors. From my estimates, the i5-13500H operates close to its theoretical maximum performance (500 GFlops), which is great, but the other processors are well below (by at least a factor 4 to 5).

 

Here is the kernel I'm running:

 

__kernel void krn(
    __global const float *a_g, __global const float *b_g, __global float *res_g)
{
  int gid = get_global_id(0);
  float s;
  int i = 0;
  res_g[gid] = 0;
  for(i=0;i<"""+str(nloops)+""";i++)
  {
     s = 100.0*(a_g[gid]+b_g[gid]);
     res_g[gid] += (int)s+i;
  }
}"""

 

And I'm testing it on a pair of 256 millions random floats with nloop set to 1024. 
 
On my setups the i9-14900KH thermal throttles very quickly under this load but the Ryzen 9950X doesn't . Still, as expected, both processors perform significantly better than the i5-13500H on benchmarks such as CPU-Z Multi Thread (about x3 to x4 faster ), which leads me to think that something is very suboptimal when running this specific OPENCL kernel on these processors. 
 
Any clue what could be going on?
Labels (1)
0 Kudos
1 Solution
SebastienTs
Novice
400 Views

Important note: The test for the AMD Ryzen 9950X and intel i9-14900KF were actually performed with a slightly newer version of the runtime (2025.0.0.1166), downgrading to the latest 2024 version (2024.2.0.980) cut down computation time from 7.23s to 2.6s (9950X) and from 15.2s to 3.2s (i9-14900KF)!

Another time this proves that everything count in a benchmark but we can also say that the 2025 version of the runtime was an awesome update from Intel!!

I would be still interested to hear about kernel speed optimization and how performance can differ from processor to processor, especially between AMD (which I am more interested in) and Intel.

View solution in original post

0 Kudos
4 Replies
cw_intel
Moderator
484 Views

 

The Intel OpenCL CPU RT provides OpenCL support for Intel CPU devices. We will take a look at the performance issues on the Intel CPU.

Can you provide us with a complete testable program and the build command?  Also, please tell us the version of OpenCL CPU RT you used. 

0 Kudos
SebastienTs
Novice
466 Views

Please find the code attached (I encrypted the zip to avoid scanning, password: 1234)

It runs in a Python 3.9.20 Conda environment with pyopencl 2022.1.5 and numpy 1.23.5.

It has been tested with  Intel OpenCL CPU Runtime 24.1.968 in the following benchmark (surprising results in red):

 
NVIDIA RTX 4080 super       0.58s
NVIDIA L40S                            0.77s 
NVIDIA RTX 4060 (laptop)   1.5s
NVIDIA RTX 3050                   2.3s
Intel Iris Xe G7                         39s
-------------------------------------
Intel Xeon Gold 6430           4.77s
Intel i5-13500H                     6.3s
AMD Ryzen 9950X                7.23s
Intel i9-14900KF                  15.2s
-------------------------------------
0 Kudos
SebastienTs
Novice
401 Views

Important note: The test for the AMD Ryzen 9950X and intel i9-14900KF were actually performed with a slightly newer version of the runtime (2025.0.0.1166), downgrading to the latest 2024 version (2024.2.0.980) cut down computation time from 7.23s to 2.6s (9950X) and from 15.2s to 3.2s (i9-14900KF)!

Another time this proves that everything count in a benchmark but we can also say that the 2025 version of the runtime was an awesome update from Intel!!

I would be still interested to hear about kernel speed optimization and how performance can differ from processor to processor, especially between AMD (which I am more interested in) and Intel.

0 Kudos
cw_intel
Moderator
456 Views

Thanks for the details, we will take a look at it.

0 Kudos
Reply