Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16596 Discussions

Single work item Task parallel OpenCL kernel

Altera_Forum
Honored Contributor II
1,950 Views

Hi, Does the following code, invokes the kernel to run like the attached image? or they runs one after another ?  

 

clCreateCommandQueue(context, device, NULL, &status); clEnqueueTask(command, task1, 0, NULL, &event1); clEnqueueTask(command, task2, 0, NULL, &event1); clEnqueueTask(command, task3, 0, NULL, &event1); 

https://alteraforum.com/forum/attachment.php?attachmentid=14715&stc=1
0 Kudos
8 Replies
Altera_Forum
Honored Contributor II
637 Views

If the tasks do not depend on each other such as using channels or pipes, then they should all run immediately one after the other (appearing to launch almost at the same time) once event1 has been completed. Enqueuing only places the task in the command queue and are executed sequentially as a FIFO without blocking on the host. If task2 depends on task1, some sort of management needs to be handled through events, channels, or barriers. 

 

So ideally it would look like this, having them overlap each other right from the start: 

| Task_3 | Task_2 | Task_1 |------------------------
0 Kudos
Altera_Forum
Honored Contributor II
637 Views

 

--- Quote Start ---  

If the tasks do not depend on each other such as using channels or pipes, then they should all run immediately one after the other (appearing to launch almost at the same time) once event1 has been completed.  

 

--- Quote End ---  

 

 

Not really; the last parameter of clEnqueueTask() is "an event object that identifies this particular kernel execution instance" (https://www.khronos.org/registry/opencl/sdk/1.2/docs/man/xhtml/clenqueuetask.html). 

So, in the code above I'm not sure what will happen. Probably, it will work but you have no control over what is in event1.
0 Kudos
Altera_Forum
Honored Contributor II
637 Views

Ah, I mistook the last parameter as an event waitlist.

0 Kudos
Altera_Forum
Honored Contributor II
637 Views

Ah, I put the wrong code. Suppose,  

 

clEnqueueTask(command, task1, 0, NULL, &eventtask1); 

clEnqueueTask(command, task2, 0, NULL, &eventtask2); 

clEnqueueTask(command, task3, 0, NULL, &eventtask3); 

 

Say, Task 1 load 1 Million floats from global memory and channel pass it to task 2.  

Task 2 then perform summation and channel it to Task 3 

Task 3 then write back to Global memory.
0 Kudos
Altera_Forum
Honored Contributor II
637 Views

 

--- Quote Start ---  

Ah, I put the wrong code. Suppose,  

 

clEnqueueTask(command, task1, 0, NULL, &eventtask1); 

clEnqueueTask(command, task2, 0, NULL, &eventtask2); 

clEnqueueTask(command, task3, 0, NULL, &eventtask3); 

 

Say, Task 1 load 1 Million floats from global memory and channel pass it to task 2.  

Task 2 then perform summation and channel it to Task 3 

Task 3 then write back to Global memory. 

--- Quote End ---  

 

 

If you are using channels between the kernels, you do not need events, and probably should not use them, since the channels have synchronization built-in, and will stall the reader kernel when the channel is empty.
0 Kudos
Altera_Forum
Honored Contributor II
637 Views

@HRZ 

So the all tasks run like the attached image, just that, Task 2 will stall for data from channel(task1) and Task 3 will stall for data from channel(task2) ?
0 Kudos
Altera_Forum
Honored Contributor II
637 Views

 

--- Quote Start ---  

Ah, I put the wrong code. Suppose,  

 

clEnqueueTask(command, task1, 0, NULL, &eventtask1); 

clEnqueueTask(command, task2, 0, NULL, &eventtask2); 

clEnqueueTask(command, task3, 0, NULL, &eventtask3); 

 

Say, Task 1 load 1 Million floats from global memory and channel pass it to task 2.  

Task 2 then perform summation and channel it to Task 3 

Task 3 then write back to Global memory. 

--- Quote End ---  

 

If you want the kernels to execute in pipeline fashion like your picture, you should use different command queues and pass data via channels, no need for event. 

Altera have excellent examples like FFT example and channelizer here (https://www.altera.com/products/design-software/embedded-software-developers/opencl/developer-zone.html).
0 Kudos
Altera_Forum
Honored Contributor II
637 Views

Thanks HRZ, matt. I think i got what i came for.

0 Kudos
Reply