OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1719 Discussions

cache efficiency in random access in ND-range: volatile vs restrict

hamze60
New Contributor I
454 Views

Hello,

I see strange behaviour in my OpenCL profiler results, related to caching. OCL configuration is ND-range with global-size == 4096, and 1-CU. I work on Harp machine (Xeon+Arria 10 FPGA)

There is a big global array (size = 16M float numbers), accessed in a fully random way (vertices of a graph).
    for(unsigned j = si; j < ei; j++) // edge loop
    {
        neighbour_vertex_id = edge;  // access-index is know in run-time (sequential data access)
        toAdd += pg_val[neighbour_vertex_id];  // random-data access (this is what I am checking)
    }

I expect to have always cache-miss when access 'pg_val' (efficiency == 1 access per cache-line == %6). However, the profiler results shows efficiency around 20%-30%.

I tried to add 'volatile' as below, but that it did not change anything (I thought volatile removes the global cache).
    __global float* volatile restrict pg_val,
    
Then I tried to remove 'strict' as below, and then after that I got efficiency %6 as expected:
    __global float* volatile pg_val,  //restrict

It seems that caching causes this unexpected behaviour, but I really do not why. Because data-access pattern is random (access indices are provided in run-time). I also tried different graph benchmarks, but got same results.

Does anybody have any explanation for this? if there is a cache inside FPGA for any global argument, it should be implemented on on BRAMs and it should not be large. Is there a way to reduce cache-size per global memory argument?

I appreciate it if @HRZ can comment on this!.
Thanks

0 Kudos
1 Solution
hamze60
New Contributor I
454 Views

I found the reason for it. Actually I used a local variable as private variable! not paying attention that local-variables are shared among all work-items.

View solution in original post

0 Kudos
1 Reply
hamze60
New Contributor I
455 Views

I found the reason for it. Actually I used a local variable as private variable! not paying attention that local-variables are shared among all work-items.

0 Kudos
Reply