Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
15563 Discussions

write to same cacheline by different kernels

Altera_Forum
Honored Contributor II
1,074 Views

Hello :), 

I have a question about possible inconsistency in writing into neighbor global memory addresses by different writers (kernels).  

suppose two kernels have write access to same array on global memory, and they write to adjacent addresses which fall into same cache line. Since each kernel has its own copy of cache line and updates its own copy, when the cache lines are updated to the global memory, doesn't this cause inconsistency? is there any cache consistency protocol implemented in Altera OpenCL? 

I think one solution is to remove "restrict" keyword or put "volatile" keyword for that variable, which removes variable cache. But with this solution, even if consistency is guaranteed, the performance drops. 

Thanks
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
69 Views

There is no cache consistency in Intel's basic cache implementation on FPGAs; all the caches are private per global memory access as also mentioned in the area report. Furthermore, OpenCL does not guarantee global memory consistency unless after kernel execution has finished. Hence, if you try to share a READ_WRITE global buffer between two or more kernels running in parallel, you will likely get incorrect output due to race conditions unless no two kernels ever write to or read from the same location (in which can you can just split the original shared buffer into multiple non-shared buffers). You can probably try atomic operations, but they will be extremely slow.

Altera_Forum
Honored Contributor II
69 Views

Our program is in a way that we can not partition the program data on global memory among different kernels. Any kernel may write in any memory address (assume a graph processing algorithm that updates data of random nodes). actually our special algorithm does not have race condition problem. because we only write data, and even if multiple kernels write into same global memory address (update same graph node), they write exactly same value.... so, in any write order, finally result will be correct. 

The real problem is with cache inconsistency, when each writer (kernel) has its own copy of cache, without being aware of changes of other writer. Considering the above situation, will putting "volatile" on global memory variable solve the cache coherency issue? anyway, cache is not expected to increase performance, due to very random access pattern. 

I really thanks for your helps,
Altera_Forum
Honored Contributor II
69 Views

Let me emphasize that the cache is not kernel-wide, but actually access-wide (at least based on what is written in the area report). Hence, the cache from one access in one kernel is not shared with the cache from another access in the same kernel, even if they are both to/from the same global buffer. If all your writers that might write data to the same global memory location always write the same value, I don't see how even the existence of a cache would cause a problem in this case since the cache data will also be the same. Either way, you can disable the cache by [falsely] marking your input buffers as volatile. You probably do not need to remove restrict, though; if you do that, you will likely get 100% sequential execution over the whole kernel.

Reply