Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17253 Discussions

Channel Deadlock Debugging

Altera_Forum
Honored Contributor II
1,962 Views

I've developed a design in which there are 4 NDRange kernels all running concurrently on the FPGA and communicating with each other via the Altera OpencL Channels extension. There are many channels in the program , however it is inherently cyclical as the data is passed in a circle around the 4 kernels. I know this design isn't recommended by Altera (as it can't optimize the channel depths), however I'm trying to look at all possible architectures to get the best throughput. But I'm having a problem while running it. 

 

I developed and tested the kernel using the emulator successfully. Everything was functional and produced promising correct results during emulation. However, after completing the full compilation and running the kernels, they seem to enter a deadlock state. I'm assuming this only happens after full compilation due to the kernels operating concurrently. 

 

Does anyone have any suggestions for debugging kernels that won't come to completion? The solution I have right now uses global memory and an independent queue that continuously transfers over the data from global memory, but it seems inefficient.
0 Kudos
9 Replies
Altera_Forum
Honored Contributor II
1,143 Views

You can use printf calls in your kernels to find out where they are currently at. Yes, printf actually does work on the FPGA, it just takes up a good chunk of LEs FFs and RAMs. I've used it for debugging before. Unfortunately the emulator is useless for these kind of problems.

0 Kudos
Altera_Forum
Honored Contributor II
1,143 Views

Thanks mscharrer! I thought printf was only supported in emulation. That should make debugging a little easier.

0 Kudos
Altera_Forum
Honored Contributor II
1,143 Views

I was surprised that it works as well, and only noticed it by accidentally leaving it in after debugging in the emulator.

0 Kudos
Altera_Forum
Honored Contributor II
1,143 Views

 

--- Quote Start ---  

I was surprised that it works as well, and only noticed it by accidentally leaving it in after debugging in the emulator. 

--- Quote End ---  

 

 

For anyone following this thread, I was able to find the solution to the deadlock. By using printf statements into a csv format, I found that the deadlock was occuring due to the channel reads and writes being executed in an unexpected order.  

 

In my case I was using an NDRange kernel iterating over the channels with a loop as follows: 

 

 

--- Quote Start ---  

 

__kernel __attribute__((reqd_work_group_size(32,1,1))) ExampleKernel() { 

 

int id = get_global_id(0); 

char4 data[NUM_ELEMENTS]; 

 

// The data is processed here. 

... 

 

 

for(int i=0; i<NUM_ELEMENTS; i++) { 

switch(id) { 

case 0: write_channel_altera(outputChannel[0],data);break; 

case 1: write_channel_altera(outputchannel[1],data);break; 

case 2: write_channel_altera(outputChannel[2],data);break; 

case 3: write_channel_altera(outputchannel[3],data);break; 

case 4: write_channel_altera(outputChannel[4],data);break; 

case 5: write_channel_altera(outputchannel[5],data);break; 

case 6: write_channel_altera(outputChannel[6],data);break; 

case 7: write_channel_altera(outputchannel[7],data);break; 

case 8: write_channel_altera(outputChannel[8],data);break; 

case 9: write_channel_altera(outputchannel[9],data);break; 

case 10: write_channel_altera(outputChannel[10],data);break; 

case 11: write_channel_altera(outputchannel[11],data);break; 

case 12: write_channel_altera(outputChannel[12],data);break; 

case 13: write_channel_altera(outputchannel[13],data);break; 

case 14: write_channel_altera(outputChannel[14],data);break; 

case 15: write_channel_altera(outputchannel[15],data);break; 

case 16: write_channel_altera(outputChannel[16],data);break; 

case 17: write_channel_altera(outputchannel[17],data);break; 

case 18: write_channel_altera(outputChannel[18],data);break; 

case 19: write_channel_altera(outputchannel[19],data);break; 

case 20: write_channel_altera(outputChannel[20],data);break; 

case 21: write_channel_altera(outputchannel[21],data);break; 

case 22: write_channel_altera(outputChannel[22],data);break; 

case 23: write_channel_altera(outputchannel[23],data);break; 

case 24: write_channel_altera(outputChannel[24],data);break; 

case 25: write_channel_altera(outputchannel[25],data);break; 

case 26: write_channel_altera(outputChannel[26],data);break; 

case 27: write_channel_altera(outputchannel[27],data);break; 

case 28: write_channel_altera(outputChannel[28],data);break; 

case 29: write_channel_altera(outputchannel[29],data);break; 

case 30: write_channel_altera(outputChannel[30],data);break; 

case 31: write_channel_altera(outputchannel[31],data);break; 

 

--- Quote End ---  

 

 

The above is an example of a (32,1,1) NDRange kernel. The data arrived such that the current loop iteration was completely finished before it continued. Here is how the data arrived: 

 

 

 

id 

loop iteration data 

 

 

data[0] 

 

 

data[0] 

 

 

data[0] 

 

 

... 

... 

 

 

31 

data[0] 

 

 

data[1] 

 

 

data[1] 

 

 

data[1] 

 

 

... 

... 

 

 

... 

... 

 

 

 

data[NUM_ELEMENTS-1] 

 

 

data[NUM_ELEMENTS-1] 

 

 

data[NUM_ELEMENTS-1] 

 

 

... 

... 

 

 

31 

data[NUM_ELEMENTS-1] 

 

 

0 Kudos
Altera_Forum
Honored Contributor II
1,143 Views

Hello, now I met the same problem with you! Have you solved your problem?How?Thanks!!

0 Kudos
Altera_Forum
Honored Contributor II
1,143 Views

 

--- Quote Start ---  

Hello, now I met the same problem with you! Have you solved your problem?How?Thanks!! 

--- Quote End ---  

 

 

If you see my previous post, I could solve the issue by re-ordering how the data is taken out of the pipes/channels. Unfortunately due to how my kernels were required to execute I abandoned this design and moved towards one that used local memory to transfer data.
0 Kudos
Altera_Forum
Honored Contributor II
1,143 Views

 

--- Quote Start ---  

If you see my previous post, I could solve the issue by re-ordering how the data is taken out of the pipes/channels. Unfortunately due to how my kernels were required to execute I abandoned this design and moved towards one that used local memory to transfer data.[/QUOT,E] 

 

In your before post, I see "For anyone following this thread, I was able to find the solution to the deadlock. By using printf statements into a csv format, I found that the deadlock was occuring due to the channel reads and writes being executed in an unexpected order. ", but when I add printf to my kernel, when kernel execute in FPPFA, most of the information cann't printf out, even though at the begining of the kernel! Have you met the same situation? If not, do you know why?Thanks!!
0 Kudos
Altera_Forum
Honored Contributor II
1,143 Views

 

--- Quote Start ---  

 

--- Quote Start ---  

If you see my previous post, I could solve the issue by re-ordering how the data is taken out of the pipes/channels. Unfortunately due to how my kernels were required to execute I abandoned this design and moved towards one that used local memory to transfer data.[/QUOT,E] 

 

In your before post, I see "For anyone following this thread, I was able to find the solution to the deadlock. By using printf statements into a csv format, I found that the deadlock was occuring due to the channel reads and writes being executed in an unexpected order. ", but when I add printf to my kernel, when kernel execute in FPPFA, most of the information cann't printf out, even though at the begining of the kernel! Have you met the same situation? If not, do you know why?Thanks!! 

--- Quote End ---  

 

 

I would guess that it is due to a similar issue that I faced. I apologize for the late reply and please forgive me if you've figured out a solution to this already. If a specific work-item executes the channel reads/writes in a different order you are likely in a deadlock scenario.
0 Kudos
Altera_Forum
Honored Contributor II
1,143 Views

Try using non-blocking reads and/or writes to the channel to avoid deadlock. In one of my designs I use a non-blocking read inside a do-while loop which effectively polls the channel until something is received: 

 

bool got_val; do{ read_val = read_channel_nb_altera(accum, &got_val); }while(!got_val); 

 

note, replace 'altera' with 'intel' if you are using sdk version > 16.0 

 

Take a look at the fpga opencl sdk programming guide (https://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/hb/opencl-sdk/aocl_programming_guide.pdf#page=33) (take a look at pages 33-35 for channel reads/writes)
0 Kudos
Reply