Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, FPGA AI Suite, Software Stack, and Reference Designs
479 Discussions

Strange behavior of Quartus Fitter and how to get more information

JavierHormigo
Beginner
2,857 Views

Hi, 

 

I'm designing an accelerator for DTW computation using oneAPI and Stratix 10 at the board BittWare 520N-MX Gen3x16. I have a kernel (it's actually several different kernels connected with pipes) that I replicate as many as possible to get the maximum throughput. The different kernel entities work with different input data. 

 

In one of the versions, I fitted 12 kernels in the FPGA.   Then for that kernel,  I simplify the external memory interfaces and the "function overhead" ( using oneAPI pragmas). The compile estimated resource utilization shows a reduction of more than 30% per kernel. However, Fitter failed to place more than 12 kernels on the FPGA.  What sounds even more strange to me is that if I try to compile 16 kernels I get the error:

"Error (170012): Fitter requires 72611 LABs to implement the design, but the device contains only 66099 LABs."

But, If I try to compile 14 kernels (same clock target)

"Error (170012): Fitter requires 73646 LABs to implement the design, but the device contains only 66439 LABs"

How could 14 identical kernels need more LABs than 16?

I have tried other numbers of kernels and clock frequency and the results are very unpredictable. 

Any idea of why the estimation of resource utilization is so wrong?  How can I get more information on the fitter process to try to figure out what is happening?

Thanks. 

 

Labels (1)
0 Kudos
25 Replies
BoonBengT_Intel
Moderator
504 Views

Hi @JavierHormigo,


Noted on the situation explain, allow me to take this back and aligned internally and will get back to you as soon as we have an updates.

Thank you for your patients.


Best Wishes

BB


0 Kudos
BoonBengT_Intel
Moderator
487 Views

Hi @JavierHormigo,


Thank you for the hold, after some alignment, we are on the right path.

Basically the report path that we have talked about earlier is the details report of the quartus compilation report GUI have.


It contain the plan, place, route, retime and finalize stage report which describe all devices resources the fitter allocate during logic placement such as logic lock region, global and other fast signals.

Hence would suggest to look into those report particularly the place stage report.

Hope that clarify.


Best Wishes

BB


0 Kudos
BoonBengT_Intel
Moderator
463 Views

Hi @JavierHormigo,


Greetings, any chances did you managed to look into the clarification previously.

Please do let us know if there is further doubts


Best Wishes

BB


0 Kudos
JavierHormigo
Beginner
453 Views

Hi @BoonBengT_Intel ,

 

I checked the  "place" report, and I could see significant differences but I don't understand many terms such as

[B]Estimate of ALMs recoverable by dense packing

[C][b] Due to LAB-wide signal conflicts

LAB logic registers:  Secondary logic registers

Could you clarify them? 

 

Besides that, I have no idea of how I can influence on the value of these parameters. 

 

Thank you,

JAvier 

0 Kudos
BoonBengT_Intel
Moderator
413 Views

Hi @JavierHormigo,

For the place report I would suggest having a separate query to the quartus forum as they would be the right person to advise more details on the processes.

Best Wishes
BB

0 Kudos
Reply