hidden text to trigger early load of fonts ПродукцияПродукцияПродукцияПродукция Các sản phẩmCác sản phẩmCác sản phẩmCác sản phẩm المنتجاتالمنتجاتالمنتجاتالمنتجات מוצריםמוצריםמוצריםמוצרים
Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

OpenCL & TBB?

robert_jay_gould
Beginner
1,277 Views
Not directly TBB only, but curious if Intel has anything to say about OpenCL (such as implementing it over TBB).


0 Kudos
6 Replies
pvonkaenel
New Contributor III
1,277 Views
Not directly TBB only, but curious if Intel has anything to say about OpenCL (such as implementing it over TBB).



If I understand correctly, OpenCL is an open version of Cuda correct? While it would be nice to have a simple way to program a Cuda type device, I have a feeling it would be very sub-optimal. Based on my experience with Cuda, you have to be very careful how you structure your grid-of-blocks/block-of-threads, how you move data between host/device memory, and what you keep in a limited shared memory - not to mention the tricks you must employ to guarentee coalesced memory access for optimal performance. There is a lot of hand tuning envolved.

Peter
0 Kudos
Bartlomiej
New Contributor I
1,277 Views
Quoting - pvonkaenel
If I understand correctly, OpenCL is an open version of Cuda correct? While it would be nice to have a simple way to program a Cuda type device, I have a feeling it would be very sub-optimal.
Not really, OpenCL is a different API than CUDA, just also designed (moslty) for GPGPU.
And AFAIK it's a ``higher-level-API'' than CUDA itself.
The specification is free, but there might be problems with open implementations - it was lacking not so long ago, and probably still is... :-(

0 Kudos
robert_jay_gould
Beginner
1,277 Views
Quoting - pvonkaenel

If I understand correctly, OpenCL is an open version of Cuda correct? While it would be nice to have a simple way to program a Cuda type device, I have a feeling it would be very sub-optimal. Based on my experience with Cuda, you have to be very careful how you structure your grid-of-blocks/block-of-threads, how you move data between host/device memory, and what you keep in a limited shared memory - not to mention the tricks you must employ to guarentee coalesced memory access for optimal performance. There is a lot of hand tuning envolved.

Peter

Yes I agree Cuda is tricky, but OpenCL is a level higher than that and the optimizations should get done under the cover, to some degree. Also an OpenCL "program/kernel"could run on a CPU if there is no GPU, and it would be even more interesting as CPU-GPU gap closes, like with Larrabee. Besides unlike Cuda it's crossplatform so it should work on an NVidia or ATI, or Larrabee(?) card with possible performance differences, but fine if they support the same extensions ala OpenGL.

0 Kudos
uj
Beginner
1,277 Views
Regarding theposition of OpenCl.

OpenCl is nota standard just for GPU computing. It'san "open standard for parallel programming of heterogenoussystems". It was proposedby Apple andsubmitted to the Khronos Group (which also controls OpenGl) for approval.

So OpenCl is more than just GPU computing and formally it has nothing to do with CUDA. But of course nVidia is basing their OpenCl implementation on CUDA (they have a driver in beta right now). And it is hard to believe OpenCl wasn't influenced by CUDA. After all CUDA was first and a proof of concept.
0 Kudos
pvonkaenel
New Contributor III
1,277 Views

Thanks for all the useful information on OpenCL. I'd read about poor performance of the NVIDIA OpenCL drivers in comparison to CUDA so decided to ignore it for now. Based on these comments I think I'll follow the discussions on it more closely.

Peter
0 Kudos
uj
Beginner
1,277 Views
Not directly TBB only, but curious if Intel has anything to say about OpenCL (such as implementing it over TBB).



It would be a big boost for Intel and TBBif one of the major OpenCl implementators would use TBB in their implementation of this standard.

In a way TBB is in the same position as CUDA isn't it? It could constitute an integral part of an OpenCl implementation.
0 Kudos
Reply