- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have been benchmarking the execution speed of a simple OpenCL kernel that I'm executing through PyOpenCL on three different CPUs and I'm really surprised by the results: the less powerful (i5-13500H) is significantly faster than the other two (i9-14900KH and Ryzen 9950X).
The execution time on the i5-13500H is around 5.2s and it fares between 7s and 12s for the other two processors. From my estimates, the i5-13500H operates close to its theoretical maximum performance (500 GFlops), which is great, but the other processors are well below (by at least a factor 4 to 5).
Here is the kernel I'm running:
__kernel void krn(
__global const float *a_g, __global const float *b_g, __global float *res_g)
{
int gid = get_global_id(0);
float s;
int i = 0;
res_g[gid] = 0;
for(i=0;i<"""+str(nloops)+""";i++)
{
s = 100.0*(a_g[gid]+b_g[gid]);
res_g[gid] += (int)s+i;
}
}"""
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Important note: The test for the AMD Ryzen 9950X and intel i9-14900KF were actually performed with a slightly newer version of the runtime (2025.0.0.1166), downgrading to the latest 2024 version (2024.2.0.980) cut down computation time from 7.23s to 2.6s (9950X) and from 15.2s to 3.2s (i9-14900KF)!
Another time this proves that everything count in a benchmark but we can also say that the 2025 version of the runtime was an awesome update from Intel!!
I would be still interested to hear about kernel speed optimization and how performance can differ from processor to processor, especially between AMD (which I am more interested in) and Intel.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Intel OpenCL CPU RT provides OpenCL support for Intel CPU devices. We will take a look at the performance issues on the Intel CPU.
Can you provide us with a complete testable program and the build command? Also, please tell us the version of OpenCL CPU RT you used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please find the code attached (I encrypted the zip to avoid scanning, password: 1234)
It runs in a Python 3.9.20 Conda environment with pyopencl 2022.1.5 and numpy 1.23.5.
It has been tested with Intel OpenCL CPU Runtime 24.1.968 in the following benchmark (surprising results in red):
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Important note: The test for the AMD Ryzen 9950X and intel i9-14900KF were actually performed with a slightly newer version of the runtime (2025.0.0.1166), downgrading to the latest 2024 version (2024.2.0.980) cut down computation time from 7.23s to 2.6s (9950X) and from 15.2s to 3.2s (i9-14900KF)!
Another time this proves that everything count in a benchmark but we can also say that the 2025 version of the runtime was an awesome update from Intel!!
I would be still interested to hear about kernel speed optimization and how performance can differ from processor to processor, especially between AMD (which I am more interested in) and Intel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the details, we will take a look at it.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page