Community
cancel
Showing results for 
Search instead for 
Did you mean: 
H-Takeda
New Contributor I
678 Views

The EU count of Iris Plus Graphics 655 is wrong.

Hi, I have an issue that I asked about on another forum and didn't get a solution.
The problem is that the number of GPU execution units (EU) recognized by the app is less than the catalog.

Please read the same question in the link for details.
I was told that I should ask the question above and get advice from the Intel® Developer Zone team.

Here's the question in a nutshell.
I'm using NUC8i3BEH.
The SoC of NUC8i3BEH is i3-8109U.
The internal GPU of the i3-8109U is "Iris Plus Graphics 655".
Many documents found on Intel show the "Iris Plus Graphics 655" as being 48EU.

However, OpenCL and C for Metal apps recognize one EU less.
This is not only in the apps, but also in the debug log of DRI on Linux, which shows that one EU is recognized as less.
It may be a bug in the Intel GPU driver for Linux.

My goal is to accurately benchmark an application written in C for Metal.
Therefore, I would like to get the benchmark in an environment where not a single EU is missing.
However, currently I have no way to make sure that the application recognizes all EUs correctly.
How can I find out how many EUs are actually recognized by the OS or application?

 

Tags (2)
0 Kudos
13 Replies
H-Takeda
New Contributor I
644 Views

Is anyone else seeing this question...?

ArunJ_Intel
Moderator
635 Views

Hi


We are checking on this internally. Will get back to you soon with an update.


Thanks

Arun


JenniferJ
Moderator
540 Views

@H-Takeda could you run "clinfo" command and see what the "Max compute units" is? "clinfo" comes with OpenCL driver. 

e.g. my clinfo output below. 

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Graphics [0x3e92]
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 3.0 NEO
  Driver Version                                  21.01.18793
  Device OpenCL C Version                         OpenCL C 3.0
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               24
  Max clock frequency                             1200MHz

 

H-Takeda
New Contributor I
533 Views

@JenniferJ OK.
I have attached the results of running clinfo as a text file.
Here is an excerpt of the important part.

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Gen9 HD Graphics NEO
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.1 NEO 
  Driver Version                                  19.41.14441
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               47
  Max clock frequency                             1050MHz

 

JenniferJ
Moderator
521 Views

@H-Takeda From your clinfo output, your driver version "19.41.14441" is really old. do you have to use this version? can you try the latest driver to see if it works? 

Brandon_F_Intel
Employee
474 Views

@H-Takeda, the EU count that's reported via clinfo under OpenCL is correct.  The value reported comes directly from the i915 kernel mode driver on Linux.  This kernel mode driver queries fuse registers to determine which EUs are valid and functional as expected.

In some SKUs, there can be an additional EU -- sometimes two or more.  Product SKUs are defined as having a minimum number of EUs but can be manufactured from a larger die.  If the die has too many defects to meet the minimum requirements, it is not shipped to customers.  However, in some cases, the die has no defects and can yield a 'bonus' EU or two.  

This is similar to how some CPUs can be overclocked to a higher frequency than others.  However, the minimum clock frequency reported is a product requirement.  To avoid possible errors and unhappy customers the "true" minimum frequency is usually higher than the minimum on the box.  This happens with EU counts as well.  In this case, it results in a 'bonus' EU for those lucky enough to receive them.

I hope this helps clarify the situation!

Regards,
Brandon Fliflet
GPU Software Architect

JenniferJ
Moderator
466 Views

@Brandon_F_Intel in this case the EU is 1 count less than the expected number. it should be 48, but clinfo reports 47. does it mean one EU is bad? Thanks! 

Brandon_F_Intel
Employee
454 Views

@JenniferJ, I tried to look up the official specs for Iris Plus Graphics 655 and the ones that were visible didn't mention any specific quantities of EUs to confirm the claim a 655 is supposed to have no less than 48 EUs.  An EU is fused off during the chip test and sorting during the manufacturing process.  If an EU was bad, it is fused off and gets assigned to a specific SKU such as 655 or others based on what's functioning correctly prior to reaching the customer.  It's not uncommon to have many 'downbins' from a single die based on expected manufacturing defects.  As the manufacturing process improves over time, there are fewer and fewer downbins for a certain master die.

In this case, the real question becomes what are the Intel product specifications for 655.  I suspect 655 can have either 47 or 48, if you're lucky.  I suspect his isn't a defective part.  @H-Takeda just wasn't one of the lucky ones -- no different than some CPUs from the same die being capable of overclocking to 4.5GHz or 5Ghz.  This also happens with GPU frequency as well.  In some cases, that same 48 EU part can get sorted into a higher class SKU especially if it's also stable at a higher GPU frequency.  i7s are close to perfection, i5s less, and even less with i3s.  A specific SKU is guaranteed to meet its minimum requirements.  I hope this helps!

Regards,
Brandon Fliflet
GPU Software Architect

H-Takeda
New Contributor I
436 Views

Thank you @Brandon_F_Intel 

Oh...
So everyone has the possibility to buy the Iris Plus Graphics 655, which is only 47EUs...
If we believe your research, Intel's product specs don't guarantee that it will have 48EUs.
This is troubling to me...
Does Intel post this anywhere publicly?

Brandon_F_Intel
Employee
420 Views

Thank you, @H-Takeda, for the prompt response.  The specifications for Intel(r) Core(tm) i3-8109U are publicly located at intel-core-i3-8109u-processor-4m-cache-up-to-3-60-ghz.  Silicon manufacturing product binning, https://en.wikipedia.org/wiki/Product_binning, is a regular technique to provide multiple SKUs at several different price-performance points to maximize useful and beneficial products to customers by virtually all semiconductor manufacturers.  The product specifications define a minimum bar of functional and quality requirements.  As Intel(r) prides itself on quality products, just clearing the minimum bar is not an option as it can cause potentially unsatisfied customers.  In some cases, people recognize extra features at no additional cost.

If you are seeing different product specifications for your processor, please let us know so we can correct any gaps in expectation. We're extremely pleased with Iris Plus Graphics 655's capabilities and want you to feel the same way.

Kind regards,
Brandon Fliflet
GPU SW Architect

Brandon_F_Intel
Employee
415 Views

Another comment in regards to 'Max compute units' as reported by clinfo and OpenCL capabilities.  It is often inferred as the # of EUs in Intel(r) GPUs.  The definition of a 'compute unit' can and will change over time and shouldn't be construed so precisely as an execution unit, EU.  A 'compute unit' as reported within OpenCL means something different to different vendors -- including GPUs, CPUs, FPGAs, etc.  At best it can only be inferred as a coarse relative indicator of compute performance in various SKUs from a single vendor within a certain family.  A new family could introduce a wider or narrower 'compute unit' with different instructions and capabilities.  I hope this helps and look forward to your benchmarking results!

Regards,
Brandon Fliflet
GPU SW Architect

H-Takeda
New Contributor I
393 Views

Thank you for your kind reply.
For now, I'm satisfied.
As an extra question, what kind of impact do you think missing one EU would have on performance?
(especially the impact on benchmarks).
If you have any predictions, please let me know.

Brandon_F_Intel
Employee
389 Views

Hi @H-Takeda,

The performance impact of a single EU (out of 48) is about 2% in purely compute limited workload.  If the workload is memory bandwidth limited, it won't have any performance impact.  If the workload is limited by complex math such as transcendentals, it also won't have any impact.  If you're trying to construct a benchmark, I would encourage you to focus on a specific aspect or feature in isolation or multiple aspects each in isolation.  Once you can do that comfortably, then you can tie it to a real world scenario.  That's how we benchmark our products internally.  If you encounter puzzling results, please let us know here and we can help understand what's happening.  Happy benchmarking!

Regards,
Brandon Fliflet
GPU Software Architect

Reply