Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17049 Discussions

Question Regarding use of Channel

Altera_Forum
Honored Contributor II
1,397 Views

Hi everyone, 

 

I am just wondering if it is possible to transfer data between an 2D NDrange kernel and 1D NDrange kernel with loops using channel extension and still maintain the ordering of the data.  

 

I wrote kernel where each thread of the 2D NDrange kernel writes 1 data element into the channel, and each thread of the 1D NDrange kernel reads 1 data element from the channel. The block size of the kernel are identical, so each thread in the second kernel use loop to read the data elements out. The kernels returns correct result in emulation but after compiled, the results are all wrong when number of work groups is more than 1. I am wondering if there is a limitation on the channel extension to prevent this kind of arrangement to work correctly, or if I did something wrong? 

 

Thanks!
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
469 Views

Have you found a solution for this problem yet? 

 

According to the datasheet, ordering should be maintained throughout the work groups (the AOCL programming guide page 1-20 and 1-21). In order to meet this ordering though a few conditions are required. Do you get a warning stating that channels may not have well defined ordering when you compile the kernels?
0 Kudos
Altera_Forum
Honored Contributor II
469 Views

It's been my experience that you get much better consistency between emulator and the actual device when you only use single-work-item kenels (no NDRange, not get_global_id, etc). Usually it's quite easy to take an existing NDRange kernel and wrap a loop around it to make it a single-work-item kernel. It's the first thing I'd try.

0 Kudos
Altera_Forum
Honored Contributor II
469 Views

Thank you for the reply guys! I found out that the if I change the order of memory access the channel will work correctly. So the problem was due to a bug in my code. Interestingly if the problem size is too large the channel may get stuck for some reason. I simply used global memory in the end, which was reasonably fast anyway.

0 Kudos
Reply