@H-Takeda could you run "clinfo" command and see what the "Max compute units" is? "clinfo" comes with OpenCL driver.
e.g. my clinfo output below.
Platform Name Intel(R) OpenCL HD Graphics Number of devices 1 Device Name Intel(R) Graphics [0x3e92] Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 3.0 NEO Driver Version 21.01.18793 Device OpenCL C Version OpenCL C 3.0 Device Type GPU Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 24 Max clock frequency 1200MHz
I have attached the results of running clinfo as a text file.
Here is an excerpt of the important part.
Platform Name Intel(R) OpenCL HD Graphics Number of devices 1 Device Name Intel(R) Gen9 HD Graphics NEO Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 2.1 NEO Driver Version 19.41.14441 Device OpenCL C Version OpenCL C 2.0 Device Type GPU Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 47 Max clock frequency 1050MHz
@H-Takeda, the EU count that's reported via clinfo under OpenCL is correct. The value reported comes directly from the i915 kernel mode driver on Linux. This kernel mode driver queries fuse registers to determine which EUs are valid and functional as expected.
In some SKUs, there can be an additional EU -- sometimes two or more. Product SKUs are defined as having a minimum number of EUs but can be manufactured from a larger die. If the die has too many defects to meet the minimum requirements, it is not shipped to customers. However, in some cases, the die has no defects and can yield a 'bonus' EU or two.
This is similar to how some CPUs can be overclocked to a higher frequency than others. However, the minimum clock frequency reported is a product requirement. To avoid possible errors and unhappy customers the "true" minimum frequency is usually higher than the minimum on the box. This happens with EU counts as well. In this case, it results in a 'bonus' EU for those lucky enough to receive them.
I hope this helps clarify the situation!
GPU Software Architect
@JenniferJ, I tried to look up the official specs for Iris Plus Graphics 655 and the ones that were visible didn't mention any specific quantities of EUs to confirm the claim a 655 is supposed to have no less than 48 EUs. An EU is fused off during the chip test and sorting during the manufacturing process. If an EU was bad, it is fused off and gets assigned to a specific SKU such as 655 or others based on what's functioning correctly prior to reaching the customer. It's not uncommon to have many 'downbins' from a single die based on expected manufacturing defects. As the manufacturing process improves over time, there are fewer and fewer downbins for a certain master die.
In this case, the real question becomes what are the Intel product specifications for 655. I suspect 655 can have either 47 or 48, if you're lucky. I suspect his isn't a defective part. @H-Takeda just wasn't one of the lucky ones -- no different than some CPUs from the same die being capable of overclocking to 4.5GHz or 5Ghz. This also happens with GPU frequency as well. In some cases, that same 48 EU part can get sorted into a higher class SKU especially if it's also stable at a higher GPU frequency. i7s are close to perfection, i5s less, and even less with i3s. A specific SKU is guaranteed to meet its minimum requirements. I hope this helps!
GPU Software Architect
Thank you @Brandon_F_Intel
So everyone has the possibility to buy the Iris Plus Graphics 655, which is only 47EUs...
If we believe your research, Intel's product specs don't guarantee that it will have 48EUs.
This is troubling to me...
Does Intel post this anywhere publicly?
Thank you, @H-Takeda, for the prompt response. The specifications for Intel(r) Core(tm) i3-8109U are publicly located at intel-core-i3-8109u-processor-4m-cache-up-to-3-60-ghz. Silicon manufacturing product binning, https://en.wikipedia.org/wiki/Product_binning, is a regular technique to provide multiple SKUs at several different price-performance points to maximize useful and beneficial products to customers by virtually all semiconductor manufacturers. The product specifications define a minimum bar of functional and quality requirements. As Intel(r) prides itself on quality products, just clearing the minimum bar is not an option as it can cause potentially unsatisfied customers. In some cases, people recognize extra features at no additional cost.
If you are seeing different product specifications for your processor, please let us know so we can correct any gaps in expectation. We're extremely pleased with Iris Plus Graphics 655's capabilities and want you to feel the same way.
GPU SW Architect
Another comment in regards to 'Max compute units' as reported by clinfo and OpenCL capabilities. It is often inferred as the # of EUs in Intel(r) GPUs. The definition of a 'compute unit' can and will change over time and shouldn't be construed so precisely as an execution unit, EU. A 'compute unit' as reported within OpenCL means something different to different vendors -- including GPUs, CPUs, FPGAs, etc. At best it can only be inferred as a coarse relative indicator of compute performance in various SKUs from a single vendor within a certain family. A new family could introduce a wider or narrower 'compute unit' with different instructions and capabilities. I hope this helps and look forward to your benchmarking results!
GPU SW Architect
Thank you for your kind reply.
For now, I'm satisfied.
As an extra question, what kind of impact do you think missing one EU would have on performance?
(especially the impact on benchmarks).
If you have any predictions, please let me know.
The performance impact of a single EU (out of 48) is about 2% in purely compute limited workload. If the workload is memory bandwidth limited, it won't have any performance impact. If the workload is limited by complex math such as transcendentals, it also won't have any impact. If you're trying to construct a benchmark, I would encourage you to focus on a specific aspect or feature in isolation or multiple aspects each in isolation. Once you can do that comfortably, then you can tie it to a real world scenario. That's how we benchmark our products internally. If you encounter puzzling results, please let us know here and we can help understand what's happening. Happy benchmarking!
GPU Software Architect