- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have written two kernels to notice the difference in fixed and floating point operations. a) __kernel __attribute__((task)) void test_multiplier(global char *restrict in, global char *restrict weights, global int *restrict out) { int output = 0; # pragma unroll 100 for(int i=0; i<VEC_SIZE; i++){ output += in * weights; } *out = output; } b) __kernel __attribute__((task)) void test_multiplier(global float *restrict in, global float *restrict weights, global float *restrict out) { int output = 0; # pragma unroll 100 for(int i=0; i<VEC_SIZE; i++){ output += in * weights; } *out = output; } Both the kernels give me the same number of DSPs, i.e 100 (unroll factor). I was expecting 25 DSPs in the 8 bit (char argument) case. Does aoc compiler optimize well for fixed point quantizations?Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quartus/AOC v16.1.2 and below do not seem to be able to infer 8-bit and 16-bit operations correctly. Your first code example only uses 50 DSPs in 17.0.2 and above. However, it is probably best to define "out" and "output" as short rather than int.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have used aoc 17.1.2. Initial report after static analysis has predicted 50DSPs. After synthesis the quartus compilation report shows the following :-
Kernel 1 - 8 bit (char) resource usage according to quartus Total registers 68810 Total pins 173 / 960 ( 18 % ) Total virtual pins 0 Total block memory bits 1,983,656 / 55,562,240 ( 4 % ) Total DSP Blocks 100 / 1,518 ( 7 % ) Total HSSI RX channels 8 / 72 ( 11 % ) Total HSSI TX channels 8 / 72 ( 11 % ) Total PLLs 78 / 144 ( 54 % ) Kernel 2 - 32 bit (float) resource usage according to quartus Logic utilization (in ALMs) 128,593 / 427,200 ( 30 % ) Total registers 157318 Total pins 173 / 960 ( 18 % ) Total virtual pins 0 Total block memory bits 10,365,736 / 55,562,240 ( 19 % ) Total DSP Blocks 100 / 1,518 ( 7 % ) Total HSSI RX channels 8 / 72 ( 11 % ) Total HSSI TX channels 8 / 72 ( 11 % ) Total PLLs 78 / 144 ( 54 % ) Why does the resource usage increase from static analysis to synthesis? Are there like any directives to restrict the number of DSPs?- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see, I remember someone else also reported a similar situation before. This is indeed strange. Try using short or char for "output" and "out" and see what happens. I would expect using int for these variables might "promote" all the multiplications to int, since the output is int. Furthermore, you can take a look at "Intel FPGA SDK for OpenCL Best Practices Guide, Section 3.3.1 Floating-Point versus Fixed-Point Representations" and follow the guidelines to mask out bits to see if you can get the desired results. If none helped, I recommend opening a ticket with Altera directly.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page