Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
15539 Discussions

Single work item Task parallel OpenCL kernel

Altera_Forum
Honored Contributor II
1,460 Views

Hi, Does the following code, invokes the kernel to run like the attached image? or they runs one after another ?  

 

clCreateCommandQueue(context, device, NULL, &status); clEnqueueTask(command, task1, 0, NULL, &event1); clEnqueueTask(command, task2, 0, NULL, &event1); clEnqueueTask(command, task3, 0, NULL, &event1); 

https://alteraforum.com/forum/attachment.php?attachmentid=14715&stc=1
0 Kudos
8 Replies
Altera_Forum
Honored Contributor II
147 Views

If the tasks do not depend on each other such as using channels or pipes, then they should all run immediately one after the other (appearing to launch almost at the same time) once event1 has been completed. Enqueuing only places the task in the command queue and are executed sequentially as a FIFO without blocking on the host. If task2 depends on task1, some sort of management needs to be handled through events, channels, or barriers. 

 

So ideally it would look like this, having them overlap each other right from the start: 

| Task_3 | Task_2 | Task_1 |------------------------
Altera_Forum
Honored Contributor II
147 Views

 

--- Quote Start ---  

If the tasks do not depend on each other such as using channels or pipes, then they should all run immediately one after the other (appearing to launch almost at the same time) once event1 has been completed.  

 

--- Quote End ---  

 

 

Not really; the last parameter of clEnqueueTask() is "an event object that identifies this particular kernel execution instance" (https://www.khronos.org/registry/opencl/sdk/1.2/docs/man/xhtml/clenqueuetask.html). 

So, in the code above I'm not sure what will happen. Probably, it will work but you have no control over what is in event1.
Altera_Forum
Honored Contributor II
147 Views

Ah, I mistook the last parameter as an event waitlist.

Altera_Forum
Honored Contributor II
147 Views

Ah, I put the wrong code. Suppose,  

 

clEnqueueTask(command, task1, 0, NULL, &eventtask1); 

clEnqueueTask(command, task2, 0, NULL, &eventtask2); 

clEnqueueTask(command, task3, 0, NULL, &eventtask3); 

 

Say, Task 1 load 1 Million floats from global memory and channel pass it to task 2.  

Task 2 then perform summation and channel it to Task 3 

Task 3 then write back to Global memory.
Altera_Forum
Honored Contributor II
147 Views

 

--- Quote Start ---  

Ah, I put the wrong code. Suppose,  

 

clEnqueueTask(command, task1, 0, NULL, &eventtask1); 

clEnqueueTask(command, task2, 0, NULL, &eventtask2); 

clEnqueueTask(command, task3, 0, NULL, &eventtask3); 

 

Say, Task 1 load 1 Million floats from global memory and channel pass it to task 2.  

Task 2 then perform summation and channel it to Task 3 

Task 3 then write back to Global memory. 

--- Quote End ---  

 

 

If you are using channels between the kernels, you do not need events, and probably should not use them, since the channels have synchronization built-in, and will stall the reader kernel when the channel is empty.
Altera_Forum
Honored Contributor II
147 Views

@HRZ 

So the all tasks run like the attached image, just that, Task 2 will stall for data from channel(task1) and Task 3 will stall for data from channel(task2) ?
Altera_Forum
Honored Contributor II
147 Views

 

--- Quote Start ---  

Ah, I put the wrong code. Suppose,  

 

clEnqueueTask(command, task1, 0, NULL, &eventtask1); 

clEnqueueTask(command, task2, 0, NULL, &eventtask2); 

clEnqueueTask(command, task3, 0, NULL, &eventtask3); 

 

Say, Task 1 load 1 Million floats from global memory and channel pass it to task 2.  

Task 2 then perform summation and channel it to Task 3 

Task 3 then write back to Global memory. 

--- Quote End ---  

 

If you want the kernels to execute in pipeline fashion like your picture, you should use different command queues and pass data via channels, no need for event. 

Altera have excellent examples like FFT example and channelizer here (https://www.altera.com/products/design-software/embedded-software-developers/opencl/developer-zone.h...).
Altera_Forum
Honored Contributor II
147 Views

Thanks HRZ, matt. I think i got what i came for.

Reply