- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I see strange behaviour in my OpenCL profiler results, related to caching. OCL configuration is ND-range with global-size == 4096, and 1-CU. I work on Harp machine (Xeon+Arria 10 FPGA)
There is a big global array (size = 16M float numbers), accessed in a fully random way (vertices of a graph).
for(unsigned j = si; j < ei; j++) // edge loop
{
neighbour_vertex_id = edge
toAdd += pg_val[neighbour_vertex_id]; // random-data access (this is what I am checking)
}
I expect to have always cache-miss when access 'pg_val' (efficiency == 1 access per cache-line == %6). However, the profiler results shows efficiency around 20%-30%.
I tried to add 'volatile' as below, but that it did not change anything (I thought volatile removes the global cache).
__global float* volatile restrict pg_val,
Then I tried to remove 'strict' as below, and then after that I got efficiency %6 as expected:
__global float* volatile pg_val, //restrict
It seems that caching causes this unexpected behaviour, but I really do not why. Because data-access pattern is random (access indices are provided in run-time). I also tried different graph benchmarks, but got same results.
Does anybody have any explanation for this? if there is a cache inside FPGA for any global argument, it should be implemented on on BRAMs and it should not be large. Is there a way to reduce cache-size per global memory argument?
I appreciate it if @HRZ can comment on this!.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found the reason for it. Actually I used a local variable as private variable! not paying attention that local-variables are shared among all work-items.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found the reason for it. Actually I used a local variable as private variable! not paying attention that local-variables are shared among all work-items.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page