Re: Strange behavior of Quartus Fitter and how to get more information

JavierHormigo · ‎09-27-2023

Hi,

I'm designing an accelerator for DTW computation using oneAPI and Stratix 10 at the board BittWare 520N-MX Gen3x16. I have a kernel (it's actually several different kernels connected with pipes) that I replicate as many as possible to get the maximum throughput. The different kernel entities work with different input data.

In one of the versions, I fitted 12 kernels in the FPGA. Then for that kernel, I simplify the external memory interfaces and the "function overhead" ( using oneAPI pragmas). The compile estimated resource utilization shows a reduction of more than 30% per kernel. However, Fitter failed to place more than 12 kernels on the FPGA. What sounds even more strange to me is that if I try to compile 16 kernels I get the error:

"Error (170012): Fitter requires 72611 LABs to implement the design, but the device contains only 66099 LABs."

But, If I try to compile 14 kernels (same clock target)

"Error (170012): Fitter requires 73646 LABs to implement the design, but the device contains only 66439 LABs"

How could 14 identical kernels need more LABs than 16?

I have tried other numbers of kernels and clock frequency and the results are very unpredictable.

Any idea of why the estimation of resource utilization is so wrong? How can I get more information on the fitter process to try to figure out what is happening?

Thanks.

BoonBengT_Intel · ‎10-01-2023

Hi @JavierHormigo,

Thank you for posting in Intel community forum and hope all is well.

Quick clarification on the situation, error mention here are during hardware compilation?

The emulation compilation are successfully right?

Note: unfortunately the hardware and BSP should be coming from BittWare vendor, hence there are limited understand on how the build and architecture involved, hence might not be the right person to provide the exact solution here but we would try our best to help on this.

Best Wishes

BB

JavierHormigo · ‎10-01-2023

HI BoonBeng,

yes, it is during hardware compilation. Emulation is successful. And, even, hardware compilation is successful for fewer number of kernels.

Thanks,

JAvier

BoonBengT_Intel · ‎10-09-2023

Hi @JavierHormigo,

Apologies for the delayed in response, noted on the emulation as that rules our the code issues.

From your finding of fitting different kernel and are getting different resources, per understanding that might be due to the partition in the design.

The limit of kernel seems to be as mention which is 12 for the mention devices, as more than that it causes error on the resources. If more kernel are required to be fitted, a bigger devices would be required.

Hope that clarify.

Best Wishes

BB

JavierHormigo · ‎10-09-2023

Hi @BoonBengT_Intel ,

Sorry, but I don't understand your answer. How can I get more information on the fitter process to determine what is happening?

Thank you,

JAvier

FvM · ‎10-10-2023

Hi,
how much device resources does the "12 kernel" design utilize, e.g. expressed as percent value in summary? Does the resource map indicate that 14 or 16 kernels should fit?

JavierHormigo · ‎10-10-2023

The initial report with 12 kernels are:

Device Static partition Quartus Fitter: Total Used (Entire System) Quartus Fitter: Kernel System Estimated: Kernel system

ALM	702720	168990	645,368	415228.0
- ALUT					553419
- REG	2810880	675960	1,392,304	1022779	944410
- MLAB				3233	4068
RAM	6847	1590	3,867	2599	2147
DSP	3960	786	144	144	144

After optimizing the kernels using the simplest LSU (FIFO) and [[intel::max_global_work_dim(0)]] in functions to eliminate the function overhead, the report for 12 kernels is the one below

Device Static partition Quartus Fitter: Total Used (Entire System) Estimated: Kernel system

ALM	702720	168990	TBD
- ALUT				420411
- REG	2810880	675960	TBD	636982
- MLAB				3288
RAM	6847	1590	TBD	1403
DSP	3960	786	TBD	144

The estimated utilization is much lower but the implementation finished with the error similar to that:

"Error (170012): Fitter requires 72611 LABs to implement the design, but the device contains only 66099 LABs."

This error is not consistent because sometimes it says more LAB are needed for smaller designs and also the number of LAB the device contains changes.

I'm not sure If I know where should I get the the resource map.

Thanks.

JavierHormigo · ‎10-10-2023

I'm not sure if the tables will arrive well enough so Irepited them as figures. thanks

BoonBengT_Intel · ‎10-16-2023

Hi @JavierHormigo,

Per my understanding there are a few reports available, Quartus/HLS estimated resource utilization summary which shows total area utilization of entire design and each component individually. What are you showing I believe is the summary of the report, perhaps zooming in will give an idea on which component are taking up the resources.

Also there are a area analysis of system report (Area Analysis > Area Analysis of System), it can be used to identify parts of the design that are having large area overhead.

Hope that clarify.

Best Wishes

BB

JavierHormigo · ‎10-16-2023

Hi @BoonBengT_Intel

All these detailed reports are "estimated resource utilization" reports. In all this report the new architecture uses much less area than the original one. However, when implemented, the fitter fails. So, there are no useful information on this problem in those report. I need a report on the fitting process to see what's going on or any inside why the estimate said the area is reduced when the fitter said it grows.

Thanks

BoonBengT_Intel · ‎10-22-2023

Hi @JavierHormigo,

Apologies for the hold up, you might be familiar with the report for quartus fitter available.

You would need to navigate to the report via System Resource Utilization Summary (Summary > System Resource Utilization Summary)

Report mention would have what quartus uses and oneAPI compiler estimation.

Just to add on to that, perhaps we can also navigate down to the kernels by the next table in the same tab.

Hopefully that will give an insight on which kernel are taking more resources, and from there some optimization would be required on the design to use less area.

Best Wishes

BB

JavierHormigo · ‎10-23-2023

Hi @BoonBengT_Intel

But, the report you said only has values if the fitting is succesfull. it is empty in my case , when the implementation stops with errors.

thanks,

JAvier

BoonBengT_Intel · ‎10-24-2023

Hi @JavierHormigo,

Form the screenshot that you have provided yes it is empty, and that make sense as suspecting the reason to that is the failure in the hardware compilation. (which will generate the quartus fitter data)

Would suggest here to use the 12 kernel as a baseline (as it is the only successfully compiled one) and look at the report from both oneAPI and quartus estimation and see if there is any optimization left we can do in the design to make less resource needed.

Hope that clarify.

Best Wishes

BB

JavierHormigo · ‎10-24-2023

Hi @BoonBengT_Intel,

I did what you suggested and that lead us to my first question: Using the 12 kernel as a baseline, the estimation report said the new version uses around 40% fewer resources. However, the fitter fails to implement the hardware because of a lack of enough LABs. How is this possible? How could I get more information about the fitter process?

Thanks,

JAvier

BoonBengT_Intel · ‎10-24-2023

Hi @JavierHormigo,

Greetings, just checking in to see if there is any further doubts in regards to this matter.

Hope your doubts have been clarified.

Best Wishes

BB

BoonBengT_Intel · ‎10-26-2023

Hi @JavierHormigo,

Noted, clarification on the mention 40% fewer resources, that is referring to the 12 kernel right?

Would elaborate more on how that is being observed/calculated?

Managed to check internally, unfortunately we do not have specific report related to fitter processes.

However closest to that in the .prj folder created during the hardware compilation, there should be some logs of the fitter in quartus compile which end with .rpt.

Perhaps would suggest having a quick look into that.

If further details of the fitter processes are required, could you provide details of such report in quartus? (Perhaps a screenshot would be great)

That would enable us to check further with such details.

Looking for to hear from you.

Best Wishes

BB

JavierHormigo · ‎10-26-2023

Hi @BoonBengT_Intel ,

Sorry, I exaggerated the reduction achieved by the new version of the 12 kernels. It wasn't 40 %. The detailed reduction is

ALUT 25%

REG 33%

MLAB 20%

RAM 35%

which still are very significant. You can extract those figures from the tables I sent on October 10th.

Attached you can find the rpt files regarding the fitter for the initial successful implementation and for the second that fails. I hope those files can help to understand what is happening.

Thanks,

JAvier

BoonBengT_Intel · ‎10-29-2023

Hi @JavierHormigo,

Noted with thanks for the explanation and report.

As for request on getting more information on the fitter process, would suggest to look into the failed compilation logs you have provided. In the 'top.fit.rpt' files you would be able to get the details logs on how the fitter process goings.

As you look at the bottom of the result there is a mention failure on which region.

Hence would suggest going through the logs mention in the files.

Hope that clarify.

Best Wishes

BB

JavierHormigo · ‎11-01-2023

Hi @BoonBengT_Intel

I have been going through the files you suggested but I still don't understand why a design that occupies significantly fewer resources fails to be fitted while the original implement with no errors. The report only said that there are not enough LABs in the region, but why?

Is it because the Quartus estimation is so wrong that instead of reducing the resources by 30% as it said, they actually grow? Is it because although the resources are less they are concentrated in the same region? If so, why? I have not changed anything to force the tool to put everything in the same region, why don't use a different region? How can I handle this situation?

Too many questions raised.

Thanks,

JAvier

BoonBengT_Intel · ‎10-31-2023

Hi @JavierHormigo,

Greetings, just checking in to see if there is any further doubts in regards to this matter.

Hope your doubts have been clarified.

Best Wishes

BB

BoonBengT_Intel · ‎11-06-2023

Hi @JavierHormigo,

Noted on the situation explain, allow me to take this back and aligned internally and will get back to you as soon as we have an updates.

Thank you for your patients.

Best Wishes

BB

Strange behavior of Quartus Fitter and how to get more information

High Level Design