Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Altera_Forum
Honored Contributor I
709 Views

vector_add example - measuring the performance

Hello, 

 

I have executed the vector_add example on the DE10-Standard board and got the following output. It took 6.9ms kernel time to perform the floating point add operation on 1M elements. So, the performance is around 145M FLOPS. I expected the performance to be much higher in the order of 100 Giga FLOPS. Is there a way to achieve a better performance?  

 

------------------------------------------------------------ 

Initializing OpenCL 

Platform: Intel(R) FPGA SDK for OpenCL(TM) 

Using 1 device(s) 

de10_standard_sharedonly : Cyclone V SoC Development Kit 

Using AOCX: vector_add.aocx 

Reprogramming device [0] with handle 1 

Launching for device 0 (1000000 elements) 

 

Time: 108.505 ms 

Kernel time (device 0): 6.931 ms 

 

Verification: PASS 

-------------------------------------------------- 

 

Thanks 

Pavan
0 Kudos
1 Reply
Altera_Forum
Honored Contributor I
33 Views

Your expectation is incorrect, you would not be able to achieve 100 GFLOP/s even using Stratix V, let alone the low-end Cyclone V FPGA on that board. Furthermore, Altera's vector add example is just a basic example to show functionality and is not designed to achieve optimal performance.

Reply