I wrote an OpenCL code to multiply a row vector into a Compressed Sparse Row matrix. But it gives me a different answer eachtime I run it.
I have built a small repro case based on my matrcies. As one can see, the program breaks at differnet values of j, despite it is expected to print Success. I think the problem is related to atomic_cmpxchg cache flushing, since the loop containing it always run only one time, which is a little stange.
Can any body help me on this please?
Thanks.
Sorry, I've just found that it is numerical error.
Thanks.
For more complete information about compiler optimizations, see our Optimization Notice.