Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
15542 Discussions

vector_add example - measuring the performance

Honored Contributor II



I have executed the vector_add example on the DE10-Standard board and got the following output. It took 6.9ms kernel time to perform the floating point add operation on 1M elements. So, the performance is around 145M FLOPS. I expected the performance to be much higher in the order of 100 Giga FLOPS. Is there a way to achieve a better performance?  



Initializing OpenCL 

Platform: Intel(R) FPGA SDK for OpenCL(TM) 

Using 1 device(s) 

de10_standard_sharedonly : Cyclone V SoC Development Kit 

Using AOCX: vector_add.aocx 

Reprogramming device [0] with handle 1 

Launching for device 0 (1000000 elements) 


Time: 108.505 ms 

Kernel time (device 0): 6.931 ms 


Verification: PASS 




0 Kudos
1 Reply
Honored Contributor II

Your expectation is incorrect, you would not be able to achieve 100 GFLOP/s even using Stratix V, let alone the low-end Cyclone V FPGA on that board. Furthermore, Altera's vector add example is just a basic example to show functionality and is not designed to achieve optimal performance.