Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16642 Discussions

vector_add example - measuring the performance

Altera_Forum
Honored Contributor II
893 Views

Hello, 

 

I have executed the vector_add example on the DE10-Standard board and got the following output. It took 6.9ms kernel time to perform the floating point add operation on 1M elements. So, the performance is around 145M FLOPS. I expected the performance to be much higher in the order of 100 Giga FLOPS. Is there a way to achieve a better performance?  

 

------------------------------------------------------------ 

Initializing OpenCL 

Platform: Intel(R) FPGA SDK for OpenCL(TM) 

Using 1 device(s) 

de10_standard_sharedonly : Cyclone V SoC Development Kit 

Using AOCX: vector_add.aocx 

Reprogramming device [0] with handle 1 

Launching for device 0 (1000000 elements) 

 

Time: 108.505 ms 

Kernel time (device 0): 6.931 ms 

 

Verification: PASS 

-------------------------------------------------- 

 

Thanks 

Pavan
0 Kudos
1 Reply
Altera_Forum
Honored Contributor II
217 Views

Your expectation is incorrect, you would not be able to achieve 100 GFLOP/s even using Stratix V, let alone the low-end Cyclone V FPGA on that board. Furthermore, Altera's vector add example is just a basic example to show functionality and is not designed to achieve optimal performance.

0 Kudos
Reply