- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have executed the vector_add example on the DE10-Standard board and got the following output. It took 6.9ms kernel time to perform the floating point add operation on 1M elements. So, the performance is around 145M FLOPS. I expected the performance to be much higher in the order of 100 Giga FLOPS. Is there a way to achieve a better performance? ------------------------------------------------------------ Initializing OpenCL Platform: Intel(R) FPGA SDK for OpenCL(TM) Using 1 device(s) de10_standard_sharedonly : Cyclone V SoC Development Kit Using AOCX: vector_add.aocx Reprogramming device [0] with handle 1 Launching for device 0 (1000000 elements) Time: 108.505 ms Kernel time (device 0): 6.931 ms Verification: PASS -------------------------------------------------- Thanks PavanLink Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your expectation is incorrect, you would not be able to achieve 100 GFLOP/s even using Stratix V, let alone the low-end Cyclone V FPGA on that board. Furthermore, Altera's vector add example is just a basic example to show functionality and is not designed to achieve optimal performance.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page