Job4 Job5 Job5 Job6
and so on.
It's plain to see that none of the jobs ever interfere with each other. The only job that can possibly write to the same area of a given job is its parent, and it never spawns the children until the partitioning is complete. I am "sure" that this is always true and that there's no other odd flaw cropping up, after extensive testing. So the only thing I can think of is that the cache coherency is shot and I need to find some way to force the CPU to write out its cache before it spawns its children.
As far as I can tell, what I need to do is create a memory fence but it has to be at the CPU level, not at the compiler level. I've been searching for how to do this to no avail - lots of sources point out this is what I need to do, but don't explain how.
Or, if that's not how I do it, how do I? I have come across a few instructions to flush the cache but they all seem to be privileged instructions so they don't do me much good. This seems like something vital for any nonblocking parallel code so there must be some simple answer but for the life of me I can't find one.