vector_add example - measuring the performance

Altera_Forum · ‎07-18-2018

Hello,

I have executed the vector_add example on the DE10-Standard board and got the following output. It took 6.9ms kernel time to perform the floating point add operation on 1M elements. So, the performance is around 145M FLOPS. I expected the performance to be much higher in the order of 100 Giga FLOPS. Is there a way to achieve a better performance?

------------------------------------------------------------

Initializing OpenCL

Platform: Intel(R) FPGA SDK for OpenCL(TM)

Using 1 device(s)

de10_standard_sharedonly : Cyclone V SoC Development Kit

Using AOCX: vector_add.aocx

Reprogramming device [0] with handle 1

Launching for device 0 (1000000 elements)

Time: 108.505 ms

Kernel time (device 0): 6.931 ms

Verification: PASS

--------------------------------------------------

Thanks

Pavan

Altera_Forum · ‎07-19-2018

Your expectation is incorrect, you would not be able to achieve 100 GFLOP/s even using Stratix V, let alone the low-end Cyclone V FPGA on that board. Furthermore, Altera's vector add example is just a basic example to show functionality and is not designed to achieve optimal performance.