Intel® High Level Design
Support for Intel® High Level Synthesis Compiler, DSP Builder, OneAPI for Intel® FPGAs, Intel® FPGA SDK for OpenCL™
722 Discussions

Stable argument doesn't work in simulation

DorianL
Novice
4,657 Views

Hi everyone,

 

I have an issue when I try to run my oneapi kernel by passing my arguments with "stable annotated_arg". I try to use a "for" loop with those "stable" arguments as variable in simulation but it is very slow and doesn't work very well  whereas when i use classic "int" declared in the kernel without using an argument variable I don't have this issue in the "for" loop and the simulation work fine and fast. Do you have an idea of what could be the issue ? Thank you !

 

DorianL

0 Kudos
1 Solution
whitepau_altera
Employee
3,823 Views

Thanks for sharing the report, @DorianL .

 

It looks like the loop at line 137 was pipelined with II=1, but it was constrained to serial execution.

whitepau_0-1721044872123.png

This means that this outer loop is effectively un-pipelined. This doesn't explain the gaps you are seeing in the simulation waveform though.

I also see that you are getting a memory system with lots of arbitration:
whitepau_2-1721046377835.png

After some experimenting, I discovered that the warning about the variable 'fenetre' is a bit of a red herring here. I would expect for a line buffer like you are describing to have a memory system with multiple banks, and each bank having a dedicated load/store unit (LSU). From the image above, we can see that the memory system is not efficiently selecting banks. I tried using the bank_bits attribute to constrain this, but it appears the compiler is ignoring this attribute now.

I was able to get the compiler to partition your 2d array by swapping the dimensions (transposing) so that the dimension to be split into banks (i.e. accessed simultaneously by different unrolled loop iterations) was in the least significant place. This appears to result in the desired memory system (don't forget to swap the accesses too!!)

 

OLD:

 

// Ligne a retard
[[intel::fpga_memory("BLOCK_RAM")]]  // memory
unsigned int line_buffer[8][NB_COLONNE_MAX];

 

 NEW:

 

// Ligne a retard
[[intel::fpga_memory("BLOCK_RAM")]]  // memory
unsigned int line_buffer[NB_COLONNE_MAX][8];

 

* Note that I changed the dimension from 5 to 8: the compiler complains if you try to create a memory system with a non-power-of-2 number of banks. Changing to 8 is ok because the compiler sees that the extra 3 banks aren't used and it optimizes them away.

The new memory system looks a lot better now:

whitepau_0-1721146677376.png

The sim looks a lot better too:

whitepau_1-1721146815040.png

I think i know how to solve these 2-cycle dips but I'm still waiting for the test to finish.

I suspect it's a side-effect of using a loop nest instead of using a while(1) loop to iterate across image pixels.

View solution in original post

0 Kudos
22 Replies
whitepau_altera
Employee
758 Views

Thanks!

0 Kudos
aikeu
Employee
634 Views

Hi whitepau,


Thanks for the help and contribuition into the issue!


Hi DorianL,


I’m glad that your question has been addressed, I now transition this thread to community support. If you have a new question, Please login to ‘https://supporttickets.intel.com/s/?language=en_US’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.


Thanks.

Regards,

Aik Eu


0 Kudos
Reply