OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

Suggestion: CL algorithms library.

jogshy
New Contributor I
299 Views
Would be fantastic if you can develop some kind of CUDA's thrust/CUDAPP library optimized for your OpenCL implementation. Ideally, I would like to have some sort, reduction, parallel scan, matrix multiplication and FFT algorithms there.

Thanks.
0 Kudos
6 Replies
ARNON_P_Intel
Employee
299 Views
Hi,
I cannot comment on our future plans in this forum.
Yet, I would like to understand your vision more.
Do you expect these functions/libraries to be available:
1. From your host C/C++ code
2. As an OpenCL kernel to be enqueue
3. As a function inside the kernel code
You may want also to look into the new compile and link options in OpenCL specifiction version 1.2.
Regards
- Arnon
jogshy
New Contributor I
299 Views
Ideally I would like this from the c++ host code but also as a function inside the kernel code, yep.
rtfss1gmail_com
Beginner
299 Views
Hi,
just a long term idea.. I don't expect this year to be implemented but with ever increasing power of Intel IGPs scientific math libs would be good..
Nvidia has BLAS and FFT libraries for CUDA and AMD BLAS and FFT for OpenCL.. would be good if you can build some BLAS and FFT libs optimized for your GPUs and expose in OpenCL as host code functions but taking device buffers as I/O and disallowing host transfers (see new CUBLAS in CUDA 4.0 where even scalar parameteres like alpha, beta in blas fuctions are taken from device to avoid all host-device transfers which can take even more time than the function itself..
ARNON_P_Intel
Employee
299 Views
Ok,
Great inputs.
So far I see 2 usages:
1. (jogshy) I just want to accelrate my C/C++ code.
2. (rtfss1) I want BLAS/FFT that interact with my OpenCL code on the target device.
In both solutions, what you expect is that the BLAS/FFT (MKL/IPP libraries) from Intel will provide the best performance for your usages on Intel Core Processors with HD Graphics.
In both cases I assume that using the HD Graphics is not a must have if the most optimized specific algorithms are running better on the processor itself in the boundries of your workloads. Right?
Arnon
jogshy
New Contributor I
299 Views
>>In both cases I assume that using the HD Graphics is not a must have if the most optimized specific >>algorithms are running better on the processor itself in the boundries of your workloads. Right?

Yep.

Anyways, I would prefer to rely on Intel's optimized sort/reductions instead of implementing my own ones which is tedious and I don't really know your HW/implementation to optimize it as it should be.
rtfss1gmail_com
Beginner
299 Views
Well.. not exactly..
today intel IGP has a peak of 256 gflops in SP which is a little more than the CPU quad core it acompanies..
assuming we get 2x-3x faster IGP next year I'm saying next year perhaps intel IGPs could make a single precision matrix multiplication (BLAS3 dgemm routine) say 2-3x faster than CPU then an optimized BLAS library running on GPU could outperform even intel optimized MKL libraries by that factor and difference is only even becoming more pronounced as seems GPUs GFLOPS evolve faster than CPU Gflops..

as said is a long term idea but seems a library could take good time on implementing seeing amd and cuda blas implementation need some years to achieve full blas2 and blas3 compilance..
thanks
Reply