Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)

barrier problem

Altera_Forum
Honored Contributor II
1,136 Views

Hi all , 

 

I use one barrier(CLK_LOCAL_MEM_FENCE) in kernel function. 

 

When I use the command (aoc -v --board de5net_a7 X.cl X.aocx ) to compile the cl file , 

it will show the compile warning like Compiler Warning: Threads might reach barrier out-of-order: at most 2 concurrent workgroups are allowed. 

Although it is a warning message , but it will fail in following step : 

aoc: Compiling Quartus project. 

Error: Quartus compilation FAILED. 

Refer to quartus_sh_compile.log for the output log. 

 

So how to solve the problem or is there a method to make sure that there are 2 workgroups ?  

 

Thanks :)
0 Kudos
1 Reply
Altera_Forum
Honored Contributor II
402 Views

 

--- Quote Start ---  

Hi all , 

 

I use one barrier(CLK_LOCAL_MEM_FENCE) in kernel function. 

 

When I use the command (aoc -v --board de5net_a7 X.cl X.aocx ) to compile the cl file , 

it will show the compile warning like Compiler Warning: Threads might reach barrier out-of-order: at most 2 concurrent workgroups are allowed. 

Although it is a warning message , but it will fail in following step : 

aoc: Compiling Quartus project. 

Error: Quartus compilation FAILED. 

Refer to quartus_sh_compile.log for the output log. 

 

While I remove the barrier(CLK_LOCAL_MEM_FENCE) in kernel function. 

The warning message will not happen and can Compiling Quartus project successfully. 

But I need to add the barrier(CLK_LOCAL_MEM_FENCE) in kernel function to make sure the thread would finish in the barrier step. 

So how to solve the problem or is there a method to make sure that there are 2 workgroups ?  

 

Thanks :) 

--- Quote End ---  

 

 

You should look at the quartus_sh_compile.log and see why it is failing. This will give an idea on what to fix. 

 

When there are out-of-order work-items, the compiler chooses expensive barriers that use a lot of resources. So, maybe the kernel is not fitting on the target FPGA. 

 

You can look at your kernel and find out why the work-items are out-of-order. There may be loops or if-else statements that check the global/local ids, in which case different work-items will take different paths and reach the barrier out-of-order. You can fix this out-of-order behaviour if it is not important. If work-items are not out-of-order, the compiler will use cheaper (i.e. less resource hungry) barriers. 

 

Also, if you do not give reqd_work_group_size attribute, the compiler assumes large workgroups. This makes the barrier even more expensive. You can set this attribute if your workgroups are smaller, while will help reduce the resource usage.
0 Kudos
Reply