Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16606 Discussions

Channel Stall for Large Data

Altera_Forum
Honored Contributor II
1,138 Views

Hi, I'm a bit confused on how channel really works and right now facing difficulties to implement it in my project. 

The channel stops when passing large amount of data from one kernel to another, but it works just nice for the small piece of data. 

After learned through the official documentation provided by Altera, we have a few doubts:  

__kernel void producer(__global const int *src, const int iterations){ for(int i = 0; i<iterations;i++){ write_channel_altera(c0,src); } } __kernel void consumer(const int iterations){ for(int i=0; i<iterations; ++i){ dst = read_channel_altera(c0); } }  

 

Consider the example above:  

 

  1. Does the read_channel_altera will be executed right after the write_channel_altera takes place? Or the data only available after the producer kernel finish it's loop? 

  2. In my case, I am using EnqueueTask kernel, Does channels works the same as in NDrange Kernel?? 

0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
423 Views

 

--- Quote Start ---  

The channel stops when passing large amount of data from one kernel to another, but it works just nice for the small piece of data. 

--- Quote End ---  

 

 

Channel-related deadlocks only happen due to incorrect channel ordering. If you have multiple channel operations in a single kernel, the compiler might reorder them to achieve better pipeline performance, unless you explicitly force channel ordering. Because of this, if your code has been written assuming that data is read and written in the same order as channel operations follow each other in the code, you might run into a deadlock. 

 

 

 

--- Quote Start ---  

Does the read_channel_altera will be executed right after the write_channel_altera takes place? Or the data only available after the producer kernel finish it's loop? 

--- Quote End ---  

 

 

Neither. For such cases, you should run the two kernels in parallel in two different queues. In this case, both calls will run in parallel, with the write side automatically stalling when the buffer is full, and the read side automatically stalling if the buffer is empty. This code example will never deadlock regardless of which kernels is invoked sooner or how much data is passed. 

 

 

 

--- Quote Start ---  

In my case, I am using EnqueueTask kernel, Does channels works the same as in NDrange Kernel?? 

--- Quote End ---  

 

 

More or less the same.
0 Kudos
Altera_Forum
Honored Contributor II
423 Views

 

--- Quote Start ---  

 

Neither. For such cases, you should run the two kernels in parallel in two different queues. In this case, both calls will run in parallel, with the write side automatically stalling when the buffer is full, and the read side automatically stalling if the buffer is empty. This code example will never deadlock regardless of which kernels is invoked sooner or how much data is passed. 

 

--- Quote End ---  

 

 

Thank you for the clear explanation on the channel. So, if the write side automatically stalling when the buffer is full, whether any memory manager such as ping pong buffer will help this?? And how to know how much data does the buffer can hold ?
0 Kudos
Altera_Forum
Honored Contributor II
423 Views

 

--- Quote Start ---  

So, if the write side automatically stalling when the buffer is full, whether any memory manager such as ping pong buffer will help this?? 

--- Quote End ---  

 

 

Unless you have significant read and write imbalance, a single buffer will work just fine. Even in the case of imbalance, you can increase the channel depth to reduce the likelihood of stalls. I cannot think of a case where a ping pong buffer would be required here. 

 

 

--- Quote Start ---  

And how to know how much data does the buffer can hold ? 

--- Quote End ---  

 

 

You can set the channel depth using the attribute provided by Altera (check the documentation for more info). However, the compiler pretty much always increases this depth due to scheduling and FIFO overhead. You can see the final channel depth in the area report.
0 Kudos
Reply