Direct kernel execution feature request!

janez-makovsek · ‎07-02-2011

Hi!

Please allow the kernels to be called without clEnqueueNDRange, but simply directly through the function pointer. This would allow the following features:

1.) Zero overhead for the thread start/stop because kernel runs on the current app thread. This makes it possible to accelerate "short" kernels which may need to run on one thread.
2.) Allows custom threading to be implemented by the caller
3.) Allows Intel to Inject latest high performance instructions in to any language via Open CL interface while keeping the dll style calling approach. People write performance sensitive code once in Open CL and for the next generation of CPU Intel simply releases a driver update.
4.) Makes the kernels debuggable with the full range of Intel debugging tools.
5.) Provides method to properly debug any Open CL code.

Thanks!
Atmapuri

Yariv_A_Intel2 · ‎07-02-2011

Hello,

We share most ofyour equirements above and plan an OpenCL EXT extension to execute NDRaange and task commands in a single-threaded fashion by the host thread that will apply the corresponding clEnqueue commands. This extension is planned for our next major release.

Thanks, --Yariv

Doron_S_Intel · ‎09-26-2011

Hello,

Please try out the "immediate command execution" extension included in our latest release, and let us know whether the feature is useful to you and whether it answers your needs.

Hoping to hear from you,
Doron

janez-makovsek · ‎10-03-2011

Hello,

Thank you Sir. This looks very promising. If I understand correctly, the extension needs to be specified in to the Open CL source?

To call the same kernel from multiple threads requires a code rebuild?

Thanks!
Atmapuri

Doron_S_Intel · ‎10-04-2011

I'm not sure I fully understood your questions - please don't hesitate to ask again if I've missed the point.

To use the extension, create another command queue and pass the property CL_QUEUE_IMMEDIATE_EXECUTION_ENABLE_INTEL which is defined in the cl_ext.h header file that comes bundled with the SDK. Commands enqueued to that command queue will execute in the direct manner described.

To enqueue to this queue from multiple threads, I first of all recommend you combine the mode above with CL_QUEUE_OUT_OF_ORDER_EXEC_ENABLE when you create the queue, so threads don't block each other (which is required by OpenCL in-order queue semantics). Then, that cl_command_queue handle can be shared freely between threads.

Thanks,
Doron