- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am in the process of benchmarking a few common HLS tools, and I'm having some issues with Intel HLS. I've implemented a simple histogram to test the tool, however in the test-fpga report I'm getting unusually high latency from load / store operations, raising my II far above normal levels.
A load / store operation according to the report takes 31 cycles, leading me to believe that the way I wrote the histogram, the tool does not use the embedded memory on the board (which should have a 1-cycle load / store latency, knowing that I expect this circuit to run in the 200-300 MHz range). What do I need to specify to change this to use the on-board memory, or to reduce the load / store latency?
Below you can find the C++ code I'm synthesizing. Note that the goal is to pre-initialize the RAM with the inputs to the histogram function, and the function should then iterate over them.
#include <HLS/hls.h>
#include <stdio.h>
#include <iostream>
#define N 100
using namespace ihc;
component void histogram(
int feature[],
float weight[],
float hist[],
int n
)
{
int i;
for(i = 0; i < n; i++)
{
int m = feature[i];
float wt = weight[i];
float x = hist[m];
hist[m] = x + wt;
}
}
int main()
{
hls_memory hls_singlepump int feature[N];
hls_memory hls_singlepump float weight[N];
hls_memory hls_singlepump float hist[N];
int i;
for(i = 0; i < N; i++)
{
feature[i] = i + 1;
weight[i] = (float) (2 * i);
hist[i] = 0.0f;
}
histogram(feature, weight, hist, N - 1);
bool failed = false;
for(i = 0; i < N; i++)
{
float val = hist[i];
if(i == 0)
{
if(val != 0.0)
{
failed = true;
break;
}
}
else
{
if(val != (float) ((i - 1) * 2))
{
failed = true;
break;
}
}
}
if(failed)
{
printf("FAILED");
}
else
{
printf("PASSED");
}
return 0;
}
For reference, I'm synthesizing on the default Arria 10 board.
Also, if you have any tips on improving my code or some standard practices which I'm unaware of, I'll gladly take them.
Thanks in advance.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Update: I found a solution to my problem. It turns out that I need to initialize the memory on-board inside a component, since initializing it in the main function probably assumes that the loads and stores go through the board's I/O instead of the embedded memory. Moving the initialization inside the component solved the issue and brought it to an expected 1 cycle latency.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
thank you for posting here. I have seen your last message/post. I think you got the solution so if you don't have any issues I will close this case. please confirm it.
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thank you for confirmation.
if you want to reopen this case Please login to ‘https://supporttickets.intel.com’, view details of the desired request, and post a feed/response within the next 15 days to allow me to continue to support you.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page