- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

I have this part in my kernel where it takes too much logic```
if(relu == 1){
if(out < 0 )
conv_in = 0.1*out;
else
conv_in = out;
}
```

out is a float data. The report.html shows me it taking 4k aluts and 8k ff for this function which is too much for my de1soc to handle. Any idea how to reduce it? Btw, the function is a leaky activation function where negative data will mutliply by 0.1. Thanks in advance. EDIT: Whats the ups and downs in using these two compiler flag. 1) -fp-relaxed 2) -fpc
Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Since floating-point operations are not natively supported by the DSPs in Cyclone V, for floating-point multiplication, multiplication of mantissa will use DSPs but all other operations including shifting (with barrel shifters) and rounding will use logic and FF. This is expected behavior and cannot be avoided unless you give up on IEEE-754-compliance.

--fp-relaxed will allow parallelizing of floating-point operations in form of a tree that requires reordering of operations. This could slightly reduce the logic/FF overhead at the cost of small changes in the output. However, this might not necessarily make any difference in your kernel unless you have chained floating-point operations. --fpc can significantly reduce logic and FF overhead of floating-point operations by reducing the area spent on rounding functions, at the cost of losing compliance with the IEEE-754 standard; i.e. if you use that switch, you could get very different (i.e. inaccurate) results compared to running the same code on a CPU/GPU. Another option you have is to use fixed-point numbers. Altera's documents outline how you can use bit masking to convert floating-point numbers to fixed-point in an OpenCL kernel.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

jack12, try to replace "conv_in = 0.1*out" to "conv_in = 0.125*out" or "conv_in = 0.125*out - 0.03125*out" for more precision -- these expressions is easier.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

The kernel is mainly doing floating point convolutions repeatedly. Anyway, i will try to verify my result and compare my result with the compiler flags on. Thanks HRZ

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi WitFed,

I am trying to reduce the logic utilization, as its can not fit into FPGA design. I am confused why creating conv_in = 0.125*out - 0.03125*out will reduce the logic utilization? Shouldnt it be using more logic in subtractor ?- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

because there is no 0.1 in hardware.

if you use 0.1, compiler will use a lot of hardware to implement a number as close as 0.1, however if you use (0.125-0.03125)*out, it's like((1>>3)-(1>>5))*out- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I see, Thanks aazz44ss.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page