- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1 Overview
I have been testing the usage of memory bandwidth under iGPU recently, using my own implemented kernel to measure and analyze the memory read, write and copy bandwidth.
When performing warm-ups and multiple consecutive kernel launches and taking the average, the resulting bandwidth is very close to the physical bandwidth limit of memory.
But after sleeping for a period of time before launching the kernel in each loop, it will be found that the bandwidth will decrease on the lunar lake platform machine. But on the rpl and mtl platform, there is no decrease in bandwidth throughput. I am curious why this phenomenon happens and does not exist on mtl and rpl.
2 MRE
`opencl_bandwidth_test.cpp` is provided in the attachment.
g++ -std=c++11 opencl_bandwidth_test.cpp -o opencl_bandwidth_test -lOpenCL
This code will test the read, write, and copy bandwidth for 1 GB buffers at different intervals (0 ms, 1 ms, 5 ms, 10 ms, 100 ms, and 500 ms).
3 Results
I ran this program on RPL, MTL, and LNL.
1) Raptor Lake Platform (normal)
- Desktop
- CPU: Intel(R) Core(TM) i5-14500
- GPU: Xe LP
- Memory: 2 * 16GB, DDR4 3200 MT/s, theoretical bandwidth ~50GB/s
- Motherboard: ASUSTeK COMPUTER INC., TX GAMING B760M WIFI D4, Rev 1.xx
- OS: Ubuntu 24.04.2 LTS + Linux 6.15.0
- Kernel Driver: i915 + xe (both test, same results)
- Compute Runtime: 25.18.33578.6
0 ms | 1 ms | 5 ms | 10 ms | 100 ms | 500 ms | |
Read | 38.56 GB/s | 37.50 GB/s | 33.06 GB/s | 35.97 GB/s | 36.67 GB/s | 36.68 GB/s |
Write | 40.09 GB/s | 39.00 GB/s | 38.36 GB/s | 39.37 GB/s | 39.16 GB/s | 38.99 GB/s |
Copy *2 | 41.24 GB/s | 41.57 GB/s | 41.55 GB/s | 41.34 GB/s | 41.82 GB/s | 40.75 GB/s |
Under this setup, the duration of the interval does not significantly affect the read and write bandwidth.
2) Meteor Lake Platform (normal)
- Mini PC / NUC: ASUS NUC 14 Pro+
- CPU: Intel(R) Core(TM) Ultra 9 185H
- GPU: Xe LPG
- Memory: 2 * 48 GB, DDR5 5600 MT/s, theoretical bandwidth ~90GB/s
- Motherboard: ASUSTeK COMPUTER INC., NUC14RVS, 60AS0080-MB4A01
- OS: Ubuntu 22.04.5 LTS + Linux 6.8.0-60-generic
- Kernel Driver: i915
- Compute Runtime: 24.52.32224.5
0 ms | 1 ms | 5 ms | 10 ms | 100 ms | 500 ms | |
Read | 62.51 GB/s | 62.58 GB/s | 62.61 GB/s | 62.71 GB/s | 59.97 GB/s | 59.98 GB/s |
Write | 73.21 GB/s | 73.19 GB/s | 73.27 GB/s | 73.02 GB/s | 70.09 GB/s | 69.86 GB/s |
Copy *2 | 69.65 GB/s | 69.66 GB/s | 69.50 GB/s | 69.62 GB/s | 68.28 GB/s | 68.27 GB/s |
Under this setup, the duration of the interval does not significantly affect the read and write bandwidth.
3) Lunar Lake Platform (decreased)
- Laptop: ASUS Zenbook S14 (UX5406)
- CPU: Intel(R) Core(TM) Ultra 7 258V
- GPU: Xe2 LPG
- Memory: 8 * 4 GB, LPDDR5 8533 MT/s, theoretical bandwidth ~130GB/s
- Motherboard: ASUSTeK COMPUTER INC., UX5406SA, 1.0
- OS: Ubuntu 24.10 + Linux 6.15.2-061502-generic
- Kernel Driver: xe
- Compute Runtime: 25.09.32961.5
0 ms | 1 ms | 5 ms | 10 ms | 100 ms | 500 ms | |
Read | 89.60 GB/s | 89.18 GB/s | 87.78 GB/s | 79.25 GB/s | 71.78 GB/s | 73.11 GB/s |
Write | 83.74 GB/s | 82.61 GB/s | 79.80 GB/s | 78.42 GB/s | 67.18 GB/s | 66.85 GB/s |
Copy *2 | 88.02 GB/s | 87.54 GB/s | 88.88 GB/s | 83.89 GB/s | 80.83 GB/s | 80.81 GB/s |
Under this setup, the read and write bandwidth dropped from ~90 GB/s to ~70 GB/s, and copy bandwidth slightly decreased.
4 Some Conjectures
- It should be caused by different hardware architectures, otherwise this phenomenon would not only occur on the lnl. There is very little publicly available information on the latest microarchitecture, and I am not sure if it is due to lnl's 8 MB memory side cache (system level cache, SLC). Is it because of cache coherence or cache competition on SLC?
- This issue is strongly related to the iGPU. Because I intermittently make the CPU perform `memset`, it can always fill up to 100 GB/s of memory bandwidth (on the lnl platform).
- In the test code, CPU sleep was performed. But if a time-consuming and compute-bound GPU kernel is used instead of CPU sleep, and the computation kernel and bandwidth testing kernel are alternately executed, a decrease in bandwidth will also be observed.
- I tested the write kernel on lnl using vtune, but I couldn't see why it slowed down. It seems that XVE Thread Occupancy will be higher when the bandwidth is lower.
- Tags:
- igpu
Link Copied

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page