Solved: Benefits of SSE/AVX processing when an integrated GPU is missing?

Toby · ‎12-16-2014

Some Intel processors have an on-chip GPU (e.g. Intel Core i/-4770K using a HD Graphics 4600 GPU) whilst others don't have this (e.g Intel Core i7 3930K). I'm wondering what implications this will have on SSE/AXV SIMD processing when such an integrated GPU is missing on the CPU. Even though there is support for SSE/AVX on many processor not having the embedded GPU, I wonder if this will reduce the benefit of using SSE/AVX significantly compared to CPUs with an embedded GPU?

I reckon one benefit is though an improved memory access pattern using SSE/AVX even for GPUs lacking an embedded CPU, but I also reckon this can achieved in other ways not using SSE/AVX.

Some thought on this?

McCalpinJohn · ‎12-16-2014

There is certainly continuing interest in using on-chip GPUs to accelerate computationally intensive workloads, but this absolutely positively has nothing to do with the processing of SSE and/or AVX instructions. (The GPU is physically too far away from the CPU cores to make off-loading of CPU instructions practical.)

GPUs have their own low-level instruction set architectures that vary significantly across generations, so they are typically programmed using a higher-level API -- the DirectX and OpenGL interfaces are commonly used. For more general-purpose programming directed at a specific GPU, the most commonly used programming model is NVIDIA's CUDA, while other GPUs are typically programmed using OpenCL. OpenCL provides portability, but it can't be accused of being elegant. Many people who are interested in GPU computing are hoping that the directive-based approaches (OpenACC and version 4 of OpenMP) will gain enough traction in the market to become mature and stable enough to justify the effort required to effectively exploit these resources.

The set of applications that can effectively use on-chip GPUs for computation is rather different than the set of applications that can effectively use discrete GPUs for acceleration. The main difference is that discrete GPUs obtain much of their performance benefit from the extremely high bandwidth of GDDR5 memory, while on-chip GPUs typically share the memory bandwidth of the host processors. It is not yet clear whether the development of high-bandwidth on-package memory (e.g., Intel's "Crystal Well" products) will change this fundamental difference. To counter their lack of extremely high bandwidth, on-chip GPUs have the advantage of higher-performance access to the processor's memory. So applications that need tighter coupling between the CPU and GPU may be better suited to systems with on-chip GPUs.

On-chip GPUs have made little headway into high-performance scientific/technical/engineering computing because they typically lack support for high-performance 64-bit floating-point arithmetic. This is a common feature on discrete GPUs intended for computing, but is much harder to justify in the more cost-sensitive realm of on-chip GPUs.

View solution in original post

Bernard · ‎12-16-2014

I am not sure if there is possibility to offload code from CPU cores to integrated GPU.

Toby · ‎12-16-2014

Well, this is actually part of the question or actually perhaps the real question. I have come across several research articles that talks about integrated GPU on the same dies as the CPU as an integrated heterogeneous environment for general purpose computing (i.e. non graphical). For example, this paper mentions Intel Sandy Bridge technology as a possibility to do this. So I kind of gather it should be possible and that using SSE/AVX would the way to use the on-chip GPU.

Though here is another research article that actually specifies: "[...] and Intel Ivy Bridge [21] both provide OpenCL-programmable GPUs integrated onto the same die as the CPU."

So I guess this kind of answers my question. If an CPU integrated on-chip GPU is to be used, it will not be used automatically by SSE/AVX but requires an alternate API for accessing it (OpenCL). Can someone please verify that this is is correctly interpreted?

Toby · ‎12-16-2014

Well, OpenCL definitely seems to be the way to go for integrated GPU I thinks: https://software.intel.com/en-us/node/531243

It would be nice with a confirmation though, if possible, that Intel ISA extensions (SSE or AVX) can't be used today to program an CPU integrated Intel GPU.

McCalpinJohn · ‎12-16-2014

There is certainly continuing interest in using on-chip GPUs to accelerate computationally intensive workloads, but this absolutely positively has nothing to do with the processing of SSE and/or AVX instructions. (The GPU is physically too far away from the CPU cores to make off-loading of CPU instructions practical.)

GPUs have their own low-level instruction set architectures that vary significantly across generations, so they are typically programmed using a higher-level API -- the DirectX and OpenGL interfaces are commonly used. For more general-purpose programming directed at a specific GPU, the most commonly used programming model is NVIDIA's CUDA, while other GPUs are typically programmed using OpenCL. OpenCL provides portability, but it can't be accused of being elegant. Many people who are interested in GPU computing are hoping that the directive-based approaches (OpenACC and version 4 of OpenMP) will gain enough traction in the market to become mature and stable enough to justify the effort required to effectively exploit these resources.

The set of applications that can effectively use on-chip GPUs for computation is rather different than the set of applications that can effectively use discrete GPUs for acceleration. The main difference is that discrete GPUs obtain much of their performance benefit from the extremely high bandwidth of GDDR5 memory, while on-chip GPUs typically share the memory bandwidth of the host processors. It is not yet clear whether the development of high-bandwidth on-package memory (e.g., Intel's "Crystal Well" products) will change this fundamental difference. To counter their lack of extremely high bandwidth, on-chip GPUs have the advantage of higher-performance access to the processor's memory. So applications that need tighter coupling between the CPU and GPU may be better suited to systems with on-chip GPUs.

On-chip GPUs have made little headway into high-performance scientific/technical/engineering computing because they typically lack support for high-performance 64-bit floating-point arithmetic. This is a common feature on discrete GPUs intended for computing, but is much harder to justify in the more cost-sensitive realm of on-chip GPUs.

Toby · ‎12-16-2014

Thanks, John for taking the time. Great reply which both certainly expands my understanding in this area and aligns well with the info I have gathered so far.