I'm launching 4 kernels with 4 different command queues. For a single thread, it is consuming close to ~250,micro sec.
When I enable multithreading, clenqueueTask time increases to 1250 micro secs.
Before launching the kernel i'm printing the status of all the events. All are printing 0 means COMPLETED (it means nothing is pending in commandQueue). I'm not understanding why enqueueTask is consuming more time when I enable threading even if there is no pending commands in command queue.
NOTE: processing() function where ClEnqueueTask is present is always executed by single thread (multi-threading is disabled for this function).