- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a simple vectorized code with just 4 reads and 4 writes in the DDR with a clock frequency of 360 MHZ and i only achieve 10.6% occupancy for the read with a bandwidth of 612MB/s and the same for the writes. When i lower the frequency (for example 200 MHZ) and increase vectorization for example 8 or 16 i get at most an occupancy of 25% and a bandwdith of 1600 MB/s for both reads and writes respectively.
The fpga that i use is the han pilot platform. Is there a way that i can increase the occupancy? I attach the .cl and host file.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi elias94,
May I know if you're using the design that has been provided from Terasic?
Regards,
Adzim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Adzim
No, it is not a Terasic design. I modified and used the vector_add design provided by intel (examples_aoc). The occupancy that i achieve with one 1 read and 1 write is 77%. But when i increase the reads and writes in parallel, (for example 2,4,8,16) my occupancy is always lower than 30%
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi elias94,
Are you able to share the link of the resource that you get?
Thank you.
Adzim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @elias94,
Thank you for posting in Intel community forum and hope all is well.
My guess is that some operation/load are not regularly accessed by the work item.
Would suggest to use the profiler to check if there a stall and how is the occupancy doing for your code.
Guide to achieve and explanation of no stalling and high occupancy can be found below:
- https://www.intel.com/content/www/us/en/docs/programmable/683521/22-1/low-occupancy-percentage.html
Hope that clear some doubts.
Best Wishes
BB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your answer.
I have a simple vectorized code where i do 4 consecutive reads and then i write them back (4 writes) in the DDR with a clock frequency of 360 MHZ and i only achieve 10.6% occupancy for the read with a bandwidth of 612MB/s and the same for the writes (i used the profiler). I have also 50% stalls .
When i lower the frequency (for example 200 MHZ, i lower the frequency by inserting shift registers) and increase vectorization for example 8 or 16 i get at most an occupancy of 25% and a bandwdith of 1600 MB/s for both reads and writes respectively. I have no stalls at all.
The trip count for my loops is the same so i cant see how the link you sent to me can help me
Best Wishes
elias94
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @elias94,
Apologies for the delayed in response, we have looked into the mention report and found some clarification as below.
Could you let us know which version of profiler are being used? is it cia the traditional profiler of via VTune? (as notice some files are missing for us to view the report)
Also what version of OpenCL is being used?
At our end we are facing come problem in compiling the provided files and are troubleshooting, will get back to you as soon as we have some result.
Hope to hear from you soon.
Best Wishes
BB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your answer
the version is the 19.1 and i use the traditional profiler
(../altera/quartus19.1/hld/bin/aoc -board=a10s_ddr device/vector_add.cl -o bin/vector_add.aocx -profile)
Best regards
Ilias
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @elias94,
Thank you for the patients, as mention you are using the version 19.1 for that would recommend to refer to the section 2.6 in the section below.
https://www.intel.com/content/www/us/en/docs/programmable/683521/19-1/introduction.html
Where it explain step by step on how to optimize the openCL design example based on the report generated.
Hope that clarify.
Best Wishes
BB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @elias94,
Good day, just checking in to see if there is any further doubts in regards to this matter.
Hope we have clarify your doubts.
Best Wishes
BB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @elias94,
Greetings, as we do not receive any further clarification on what is provided, we would assume challenge are overcome. For new queries, please feel free to open a new thread and we will be right with you. Pleasure having you here.
Best Wishes
BB
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page