Reduce logic utilization

Altera_Forum · ‎03-21-2018

Hi,

I have this part in my kernel where it takes too much logic


if(relu == 1){ 
if(out < 0 )
      conv_in = 0.1*out;
else 
      conv_in = out;
 }

out is a float data. The report.html shows me it taking 4k aluts and 8k ff for this function which is too much for my de1soc to handle. Any idea how to reduce it?

Btw, the function is a leaky activation function where negative data will mutliply by 0.1.

Thanks in advance.

EDIT:

Whats the ups and downs in using these two compiler flag.

1) -fp-relaxed

2) -fpc

Altera_Forum · ‎03-21-2018

Since floating-point operations are not natively supported by the DSPs in Cyclone V, for floating-point multiplication, multiplication of mantissa will use DSPs but all other operations including shifting (with barrel shifters) and rounding will use logic and FF. This is expected behavior and cannot be avoided unless you give up on IEEE-754-compliance.

--fp-relaxed will allow parallelizing of floating-point operations in form of a tree that requires reordering of operations. This could slightly reduce the logic/FF overhead at the cost of small changes in the output. However, this might not necessarily make any difference in your kernel unless you have chained floating-point operations.

--fpc can significantly reduce logic and FF overhead of floating-point operations by reducing the area spent on rounding functions, at the cost of losing compliance with the IEEE-754 standard; i.e. if you use that switch, you could get very different (i.e. inaccurate) results compared to running the same code on a CPU/GPU.

Another option you have is to use fixed-point numbers. Altera's documents outline how you can use bit masking to convert floating-point numbers to fixed-point in an OpenCL kernel.

Altera_Forum · ‎03-21-2018

jack12, try to replace "conv_in = 0.1*out" to "conv_in = 0.125*out" or "conv_in = 0.125*out - 0.03125*out" for more precision -- these expressions is easier.

Altera_Forum · ‎03-21-2018

The kernel is mainly doing floating point convolutions repeatedly. Anyway, i will try to verify my result and compare my result with the compiler flags on. Thanks HRZ

Altera_Forum · ‎03-21-2018

Hi WitFed,

I am trying to reduce the logic utilization, as its can not fit into FPGA design. I am confused why creating conv_in = 0.125*out - 0.03125*out will reduce the logic utilization? Shouldnt it be using more logic in subtractor ?

Altera_Forum · ‎03-22-2018

because there is no 0.1 in hardware.

if you use 0.1, compiler will use a lot of hardware to implement a number as close as 0.1,

however if you use (0.125-0.03125)*out, it's like((1>>3)-(1>>5))*out

Altera_Forum · ‎03-22-2018

I see, Thanks aazz44ss.