I'm not sure I fully understood your questions - please don't hesitate to ask again if I've missed the point.
To use the extension, create another command queue and pass the property CL_QUEUE_IMMEDIATE_EXECUTION_ENABLE_INTEL which is defined in the cl_ext.h header file that comes bundled with the SDK. Commands enqueued to that command queue will execute in the direct manner described.
To enqueue to this queue from multiple threads, I first of all recommend you combine the mode above with CL_QUEUE_OUT_OF_ORDER_EXEC_ENABLE when you create the queue, so threads don't block each other (which is required by OpenCL in-order queue semantics). Then, that cl_command_queue handle can be shared freely between threads.
Thanks,
Doron