We have released a new whitepaper explaining the architecture of Intel(r) Processor Graphics Gen7.5, specifically the components associated with running compute applications on Processor Graphics. This is the architecture that supports compute APIs such as OpenCL, DirectX Compute Shader, Renderscript, C++AMP, etc.
- You can find it, 3rd link down, on this page: https://software.intel.com/en-us/articles/intel-graphics-developers-guides
- Or more directly, here: https://software.intel.com/en-us/file/compute-architecture-of-intel-processor-graphics-gen7dot5-aug4...
We'd be grateful to hear your feedback on the whitepaper contents. Tell us what you think!
I'd be a lot more interested in Intel's opencl if
1) There was GPU supported fp64. Ideally 1:2 performance ratio
2) The CPU supported a vector fp64 rsqrt
3) The native CPU and GPU fp64 rsqrt had opencl level numerical precision.
It would be good if there was a document comparing the native (double) precision (in ulp) of the different hardware.Nvidia, AMD and Intel)
My perception is that AMD and Nvidia have higher native precision. Or at least they are constantly striving to improve them. eg the 290X has "precision improvements to the native LOG and EXP operations" and so on.
My impression is that Intel expects you to buy a software library that has to do many more iterations to achieve the same result.
Maybe this is just a case of insufficient documentation?
Your Gen8 presentation and whitepaper are excellent.
The PDFs were posted here: https://intel.activeevents.com/sf14/connect/sessionDetail.ww?SESSION_ID=1312
The Gen8 IGP looks like it has become a truly general-purpose compute platform.
One question, are FP16 FMA operations available? If so, can each EU perform 16 FP16 FMAs per clock?
Architecture, yes; driver not yet.
Intel processor graphics architecture supports 16bit float FMAs as of Gen8.
In the API example of OpenCL 1.2, 2.0 the OpenCL C “half” data type for 16 bit floats is currently an “optional” extension feature enabled by Khronos cl_khr_fp16 extension.
Perhaps as feedback for feature prioritization decisions, could you possibly say what OS platform, which compute API, and what kind of device (tablet, laptop, server…) you seek to target with FP16?
Thanks for the clarification.
I'm looking to use FP16 FMA's in an OpenCL kernel on both Windows and Linux (and someday OS X).
As far as devices, anything with a display -- tablet/laptop/workstation.