Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, FPGA AI Suite, Software Stack, and Reference Designs
477 Discussions

Strange behavior of Quartus Fitter and how to get more information

JavierHormigo
Beginner
2,625 Views

Hi, 

 

I'm designing an accelerator for DTW computation using oneAPI and Stratix 10 at the board BittWare 520N-MX Gen3x16. I have a kernel (it's actually several different kernels connected with pipes) that I replicate as many as possible to get the maximum throughput. The different kernel entities work with different input data. 

 

In one of the versions, I fitted 12 kernels in the FPGA.   Then for that kernel,  I simplify the external memory interfaces and the "function overhead" ( using oneAPI pragmas). The compile estimated resource utilization shows a reduction of more than 30% per kernel. However, Fitter failed to place more than 12 kernels on the FPGA.  What sounds even more strange to me is that if I try to compile 16 kernels I get the error:

"Error (170012): Fitter requires 72611 LABs to implement the design, but the device contains only 66099 LABs."

But, If I try to compile 14 kernels (same clock target)

"Error (170012): Fitter requires 73646 LABs to implement the design, but the device contains only 66439 LABs"

How could 14 identical kernels need more LABs than 16?

I have tried other numbers of kernels and clock frequency and the results are very unpredictable. 

Any idea of why the estimation of resource utilization is so wrong?  How can I get more information on the fitter process to try to figure out what is happening?

Thanks. 

 

Labels (1)
0 Kudos
25 Replies
BoonBengT_Intel
Moderator
2,103 Views

Hi @JavierHormigo,


Thank you for posting in Intel community forum and hope all is well.

Quick clarification on the situation, error mention here are during hardware compilation?

The emulation compilation are successfully right? 


Note: unfortunately the hardware and BSP should be coming from BittWare vendor, hence there are limited understand on how the build and architecture involved, hence might not be the right person to provide the exact solution here but we would try our best to help on this. 


Best Wishes

BB


0 Kudos
JavierHormigo
Beginner
2,095 Views

HI BoonBeng,

 

yes, it is during hardware compilation. Emulation is successful. And, even,  hardware compilation is successful for fewer number of kernels. 

 

Thanks, 

JAvier

0 Kudos
BoonBengT_Intel
Moderator
2,045 Views

Hi @JavierHormigo,


Apologies for the delayed in response, noted on the emulation as that rules our the code issues.

From your finding of fitting different kernel and are getting different resources, per understanding that might be due to the partition in the design.

The limit of kernel seems to be as mention which is 12 for the mention devices, as more than that it causes error on the resources. If more kernel are required to be fitted, a bigger devices would be required.

Hope that clarify.


Best Wishes

BB


0 Kudos
JavierHormigo
Beginner
2,037 Views

Hi @BoonBengT_Intel ,

 

Sorry, but I don't understand your answer.  How can I get more information on the fitter process to determine what is happening?

 

Thank you,

JAvier

0 Kudos
FvM
Valued Contributor III
2,026 Views

Hi,
how much device resources does the "12 kernel" design utilize, e.g. expressed as percent value in summary? Does the resource map indicate that 14 or 16 kernels should fit? 

0 Kudos
JavierHormigo
Beginner
2,016 Views

The initial report with 12 kernels are:

Device Static partition Quartus Fitter: Total Used (Entire System) Quartus Fitter: Kernel System Estimated: Kernel system

ALM702720168990645,368415228.0 
 - ALUT    553419
 - REG28108806759601,392,3041022779944410
 - MLAB   32334068
RAM684715903,86725992147
DSP3960786144144144

 

After optimizing the kernels using the simplest LSU (FIFO) and [[intel::max_global_work_dim(0)]] in functions to eliminate the function overhead, the report for 12 kernels is the one below

 

  Device Static partition Quartus Fitter: Total Used (Entire System) Estimated: Kernel system

ALM702720168990TBD 
 - ALUT   420411
 - REG2810880675960TBD636982
 - MLAB   3288
RAM68471590TBD1403
DSP3960786TBD144

 

The estimated utilization is much lower but the implementation finished with the error similar to that:

"Error (170012): Fitter requires 72611 LABs to implement the design, but the device contains only 66099 LABs."

 

This error is not consistent because sometimes it says more LAB are needed for smaller designs and also the number of LAB the device contains changes. 

 

I'm not sure If I know where should I get the the resource map.

 

Thanks.

0 Kudos
JavierHormigo
Beginner
2,016 Views

I'm not sure if the tables will arrive well enough so Irepited them as figures. thanks

JavierHormigo_0-1696954626874.png

 

JavierHormigo_1-1696954662604.png

 

0 Kudos
BoonBengT_Intel
Moderator
1,948 Views

Hi @JavierHormigo,


Per my understanding there are a few reports available, Quartus/HLS estimated resource utilization summary which shows total area utilization of entire design and each component individually. What are you showing I believe is the summary of the report, perhaps zooming in will give an idea on which component are taking up the resources.


Also there are a area analysis of system report (Area Analysis > Area Analysis of System), it can be used to identify parts of the design that are having large area overhead.

Hope that clarify.


Best Wishes

BB


0 Kudos
JavierHormigo
Beginner
1,937 Views

Hi @BoonBengT_Intel 

 

All these detailed reports are "estimated resource utilization" reports.  In all this report the new architecture uses much less area than the original one. However, when implemented, the fitter fails. So,  there are no useful information on this problem in those report. I need a report on the fitting process to see what's going on or any inside why the estimate said the area is reduced when the fitter said it grows.

 

Thanks

0 Kudos
BoonBengT_Intel
Moderator
1,843 Views

Hi @JavierHormigo,

 

Apologies for the hold up, you might be familiar with the report for quartus fitter available.

You would need to navigate to the report via System Resource Utilization Summary (Summary > System Resource Utilization Summary)

Report mention would have what quartus uses and oneAPI compiler estimation.

 

Just to add on to that, perhaps we can also navigate down to the kernels by the next table in the same tab.

Hopefully that will give an insight on which kernel are taking more resources, and from there some optimization would be required on the design to use less area.

 

BoonBengT_Intel_0-1698031149747.png

 

Best Wishes

BB

 

0 Kudos
JavierHormigo
Beginner
1,809 Views

Hi @BoonBengT_Intel 

 

But, the report you said only has values if the fitting is succesfull. it is empty in my case , when the implementation stops with errors. 

 

thanks, 

JAvier

0 Kudos
BoonBengT_Intel
Moderator
1,785 Views

Hi @JavierHormigo,


Form the screenshot that you have provided yes it is empty, and that make sense as suspecting the reason to that is the failure in the hardware compilation. (which will generate the quartus fitter data)


Would suggest here to use the 12 kernel as a baseline (as it is the only successfully compiled one) and look at the report from both oneAPI and quartus estimation and see if there is any optimization left we can do in the design to make less resource needed.

Hope that clarify.


Best Wishes

BB


0 Kudos
JavierHormigo
Beginner
1,753 Views

Hi @BoonBengT_Intel,

 

I did what you suggested and that lead us to my first question: Using the  12 kernel as a baseline, the estimation report said the new version uses around 40% fewer resources. However, the fitter fails to implement the hardware because of a lack of enough LABs. How is this possible? How could I get more information about the fitter process? 

 

Thanks,

 

JAvier  

0 Kudos
BoonBengT_Intel
Moderator
1,766 Views

Hi @JavierHormigo,


Greetings, just checking in to see if there is any further doubts in regards to this matter.

Hope your doubts have been clarified.


Best Wishes

BB


0 Kudos
BoonBengT_Intel
Moderator
1,678 Views

Hi @JavierHormigo,

 

Noted, clarification on the mention 40% fewer resources, that is referring to the 12 kernel right? 

Would elaborate more on how that is being observed/calculated?

 

Managed to check internally, unfortunately we do not have specific report related to fitter processes.

However closest to that in the .prj folder created during the hardware compilation, there should be some logs of the fitter in quartus compile which end with .rpt.

Perhaps would suggest having a quick look into that. 

BoonBengT_Intel_0-1698329500250.png

 

If further details of the fitter processes are required, could you provide details of such report in quartus? (Perhaps a screenshot would be great)

That would enable us to check further with such details.

Looking for to hear from you.

 

Best Wishes

BB

 

0 Kudos
JavierHormigo
Beginner
1,661 Views

Hi @BoonBengT_Intel ,

 

Sorry, I exaggerated the reduction achieved by the new version of the 12 kernels. It wasn't 40 %. The detailed reduction is

ALUT  25%

REG  33%

MLAB  20%

RAM   35%

which still are very significant. You can extract those figures from the tables I sent on October 10th. 

 

Attached you can find the rpt files regarding the fitter for the initial successful implementation and for the second that fails. I hope those files can help to understand what is happening. 

 

Thanks,

JAvier

0 Kudos
BoonBengT_Intel
Moderator
1,571 Views

Hi @JavierHormigo,


Noted with thanks for the explanation and report.

As for request on getting more information on the fitter process, would suggest to look into the failed compilation logs you have provided. In the 'top.fit.rpt' files you would be able to get the details logs on how the fitter process goings.


As you look at the bottom of the result there is a mention failure on which region.

Hence would suggest going through the logs mention in the files.

Hope that clarify.


Best Wishes

BB


0 Kudos
JavierHormigo
Beginner
1,526 Views

Hi @BoonBengT_Intel 

 

I have been going through the files you suggested but I still don't understand why a design that occupies significantly fewer resources fails to be fitted while the original implement with no errors.  The report only said that there are not enough LABs in the region, but why?

Is it because the Quartus estimation is so wrong that instead of reducing the resources by 30% as it said, they actually grow?  Is it because although the resources are less they are concentrated in the same region? If so, why? I have not changed anything to force the tool to put everything in the same region, why don't use a different region?   How can I handle this situation? 

 

Too many questions raised.

Thanks,

JAvier 

0 Kudos
BoonBengT_Intel
Moderator
1,542 Views

Hi @JavierHormigo,

Greetings, just checking in to see if there is any further doubts in regards to this matter.

Hope your doubts have been clarified.


Best Wishes

BB


0 Kudos
BoonBengT_Intel
Moderator
1,454 Views

Hi @JavierHormigo,


Noted on the situation explain, allow me to take this back and aligned internally and will get back to you as soon as we have an updates.

Thank you for your patients.


Best Wishes

BB


0 Kudos
Reply