Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, DLA, Software Stack, and Reference Designs
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
426 Discussions

How to compute or test the Peak floating-point performance of Arria10?

LJing2
Beginner
1,148 Views

In the product table of Arria10, I see the Peak floating-point performance of Arria10 GX1150 is 1366GFLOPS, the number of Hardened single-precision floating-point multiplers/adders is 1518.

So, in a clock cycle, the floating-point multiplers/adders FPGA can use is 1518, and the peak floating point performance of FPGA is 1518 * frequency. Suppose the frequency is 500MHz, the floating point performance is 1518*500=741GFLOPS. So I don't quite understand why the Peak floating-point performance is 1366GFLOPS?What does it do for FPGA programs? Or my understanding is totally wrong?

Thank you a lot for your help!

0 Kudos
2 Replies
a_x_h_75
New Contributor III
239 Views

And @ 900MHz => 900*1518=1366GFLOPs. (I'm not saying the is how they spec'd it, just my observation).

 

Arria 10 will certainly run bits of logic at 900MHz but the practicalities of running 1518 multipliers at 900MHz is another, especially when the datasheet specifies 548MHz max speed for floating point multiplication in a grade 1 part.

 

Nothing concrete to go on I'm afraid. I'd put it down to the art of writing a spec.

 

Cheers,

Alex

HRZ
Valued Contributor II
239 Views

1366 GFLOP/s is actually at 450 MHz. What you guys are missing is the fact that each DSP on Arria 10 can perform one single-precision floating-point FMA (Fused Multiply and ADD) operation per clock which counts as two FLOPs. 450 * 2 * 1518 = 1366 GFLOP/s. The peak Fmax of the DSPs on Arria 10 is around 450-500 MHz depending on the speed-grade. Realistically, even in best-optimized designs, the achievable GFLOP/s on this device is around 700-800 MHz. Even Intel's highly-optimized Matrix Multiplication library can hardly achieve over 900 GFLOP/s based on their own paper.

Reply