Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, FPGA AI Suite, Software Stack, and Reference Designs
477 Discussions

low occupancy of the DDR in HAN Pilot platform with opencl

elias94
Beginner
2,334 Views

I have a simple vectorized code with just 4 reads and 4 writes  in the DDR with a clock frequency of 360 MHZ and i only achieve 10.6% occupancy for the read with a bandwidth of 612MB/s and the same for the writes.  When i lower the frequency (for example 200 MHZ) and increase vectorization for example 8 or 16 i get at most an occupancy of 25% and a bandwdith of 1600 MB/s for both reads and writes respectively. 

 

The fpga that i use is the han pilot platform. Is there a way that i can increase the occupancy? I attach the .cl and host file.

0 Kudos
11 Replies
AdzimZM_Intel
Employee
2,145 Views

Hi elias94,


May I know if you're using the design that has been provided from Terasic?



Regards,

Adzim


0 Kudos
elias94
Beginner
2,138 Views

Hi Adzim

 

No, it is not a Terasic design. I modified and used the vector_add design provided by intel  (examples_aoc). The occupancy that i  achieve with one 1 read and 1 write is 77%. But when i increase the reads and writes in parallel, (for example 2,4,8,16) my occupancy is always lower than 30%

 

 

0 Kudos
AdzimZM_Intel
Employee
2,037 Views

Hi elias94,


Are you able to share the link of the resource that you get?


Thank you.

Adzim


0 Kudos
elias94
Beginner
2,025 Views

Here is the repport

0 Kudos
BoonBengT_Intel
Moderator
1,985 Views

Hi @elias94,


Thank you for posting in Intel community forum and hope all is well.

My guess is that some operation/load are not regularly accessed by the work item.

Would suggest to use the profiler to check if there a stall and how is the occupancy doing for your code.


Guide to achieve and explanation of no stalling and high occupancy can be found below:

- https://www.intel.com/content/www/us/en/docs/programmable/683521/22-1/low-occupancy-percentage.html

Hope that clear some doubts.


Best Wishes

BB


0 Kudos
elias94
Beginner
1,975 Views

 Hi @BoonBengT_Intel

 

Thank you for your answer.

 

I have a simple vectorized code where i do 4 consecutive reads and  then i write them back (4 writes)  in the DDR with a clock frequency of 360 MHZ and i only achieve 10.6% occupancy for the read with a bandwidth of 612MB/s and the same for the writes (i used the profiler). I have also 50% stalls .

 

 

When i lower the frequency (for example 200 MHZ, i lower the frequency by inserting shift registers) and increase vectorization for example 8 or 16 i get at most an occupancy of 25% and a bandwdith of 1600 MB/s for both reads and writes respectively.  I have no stalls at all.

 

 

The trip count for my loops is the same so i cant see how the link you sent to me can help me

 

Best Wishes

elias94

 

 

 

0 Kudos
BoonBengT_Intel
Moderator
1,886 Views

Hi @elias94,


Apologies for the delayed in response, we have looked into the mention report and found some clarification as below.

Could you let us know which version of profiler are being used? is it cia the traditional profiler of via VTune? (as notice some files are missing for us to view the report)

Also what version of OpenCL is being used?

At our end we are facing come problem in compiling the provided files and are troubleshooting, will get back to you as soon as we have some result.

Hope to hear from you soon.


Best Wishes

BB


0 Kudos
elias94
Beginner
1,870 Views

 Hi @BoonBengT_Intel

 

 

Thank you for your answer

 

the version is the 19.1 and i use the traditional profiler

(../altera/quartus19.1/hld/bin/aoc -board=a10s_ddr device/vector_add.cl -o bin/vector_add.aocx -profile)

 

Best regards

Ilias

0 Kudos
BoonBengT_Intel
Moderator
1,727 Views

Hi @elias94,


Thank you for the patients, as mention you are using the version 19.1 for that would recommend to refer to the section 2.6 in the section below.

https://www.intel.com/content/www/us/en/docs/programmable/683521/19-1/introduction.html


Where it explain step by step on how to optimize the openCL design example based on the report generated.

Hope that clarify.


Best Wishes

BB


0 Kudos
BoonBengT_Intel
Moderator
1,695 Views

Hi @elias94,


Good day, just checking in to see if there is any further doubts in regards to this matter.

Hope we have clarify your doubts.


Best Wishes

BB


0 Kudos
BoonBengT_Intel
Moderator
1,676 Views

Hi @elias94,


Greetings, as we do not receive any further clarification on what is provided, we would assume challenge are overcome. For new queries, please feel free to open a new thread and we will be right with you. Pleasure having you here.


Best Wishes

BB


0 Kudos
Reply