In the product table of Arria10, I see the Peak ﬂoating-point performance of Arria10 GX1150 is 1366GFLOPS, the number of Hardened single-precision ﬂoating-point multiplers/adders is 1518.
So, in a clock cycle, the ﬂoating-point multiplers/adders FPGA can use is 1518, and the peak floating point performance of FPGA is 1518 * frequency. Suppose the frequency is 500MHz, the floating point performance is 1518*500=741GFLOPS. So I don't quite understand why the Peak ﬂoating-point performance is 1366GFLOPS?What does it do for FPGA programs? Or my understanding is totally wrong？
Thank you a lot for your help!
And @ 900MHz => 900*1518=1366GFLOPs. (I'm not saying the is how they spec'd it, just my observation).
Arria 10 will certainly run bits of logic at 900MHz but the practicalities of running 1518 multipliers at 900MHz is another, especially when the datasheet specifies 548MHz max speed for floating point multiplication in a grade 1 part.
Nothing concrete to go on I'm afraid. I'd put it down to the art of writing a spec.
1366 GFLOP/s is actually at 450 MHz. What you guys are missing is the fact that each DSP on Arria 10 can perform one single-precision floating-point FMA (Fused Multiply and ADD) operation per clock which counts as two FLOPs. 450 * 2 * 1518 = 1366 GFLOP/s. The peak Fmax of the DSPs on Arria 10 is around 450-500 MHz depending on the speed-grade. Realistically, even in best-optimized designs, the achievable GFLOP/s on this device is around 700-800 MHz. Even Intel's highly-optimized Matrix Multiplication library can hardly achieve over 900 GFLOP/s based on their own paper.