Hello :),I have a question about possible inconsistency in writing into neighbor global memory addresses by different writers (kernels). suppose two kernels have write access to same array on global memory, and they write to adjacent addresses which fall into same cache line. Since each kernel has its own copy of cache line and updates its own copy, when the cache lines are updated to the global memory, doesn't this cause inconsistency? is there any cache consistency protocol implemented in Altera OpenCL? I think one solution is to remove "restrict" keyword or put "volatile" keyword for that variable, which removes variable cache. But with this solution, even if consistency is guaranteed, the performance drops. Thanks
There is no cache consistency in Intel's basic cache implementation on FPGAs; all the caches are private per global memory access as also mentioned in the area report. Furthermore, OpenCL does not guarantee global memory consistency unless after kernel execution has finished. Hence, if you try to share a READ_WRITE global buffer between two or more kernels running in parallel, you will likely get incorrect output due to race conditions unless no two kernels ever write to or read from the same location (in which can you can just split the original shared buffer into multiple non-shared buffers). You can probably try atomic operations, but they will be extremely slow.
Our program is in a way that we can not partition the program data on global memory among different kernels. Any kernel may write in any memory address (assume a graph processing algorithm that updates data of random nodes). actually our special algorithm does not have race condition problem. because we only write data, and even if multiple kernels write into same global memory address (update same graph node), they write exactly same value.... so, in any write order, finally result will be correct.The real problem is with cache inconsistency, when each writer (kernel) has its own copy of cache, without being aware of changes of other writer. Considering the above situation, will putting "volatile" on global memory variable solve the cache coherency issue? anyway, cache is not expected to increase performance, due to very random access pattern. I really thanks for your helps,
Let me emphasize that the cache is not kernel-wide, but actually access-wide (at least based on what is written in the area report). Hence, the cache from one access in one kernel is not shared with the cache from another access in the same kernel, even if they are both to/from the same global buffer. If all your writers that might write data to the same global memory location always write the same value, I don't see how even the existence of a cache would cause a problem in this case since the cache data will also be the same. Either way, you can disable the cache by [falsely] marking your input buffers as volatile. You probably do not need to remove restrict, though; if you do that, you will likely get 100% sequential execution over the whole kernel.