Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

volatile and memory model

Black Belt
There are still a surprising number of occurrences of "volatile" in the latest development release (tbb20_20080311oss): what are the dynamics concerning purging them from the not-specifically-Itanium code base (last mention in the forum was several months ago)?

Is there an official statement somewhere about the memory model assumptions for using TBB? In the reference manual (version 1.8), fences are mentioned only in "6.2 atomic Template Class": where else can fences be assumed (I would think before and after a task runs, and related to mutex objects, but it would inspire more confidence to have something official about that)? In Java (since JSR 133), changes become visible to other threads only if they synchronise on the exact same monitor (a volatile in Java now also behaves like a monitor for that purpose, but that is not going to happen in C++): would that not imply that TBB-style fences are more costly (but also less prone to programmer error) than might be technically possible? How does the TBB memory model (implicit or explicit) fit with the memory models for the native thread APIs used with TBB and the code written by their programmers? Has there been a code review for observance of these issues? How concerned are people about this: have any problems been observed yet, or am I just being paranoid?

Before replying to this, make sure you look up "volatile" in the forum and are aware of the issues (it basically doesn't work, as is already acknowledged by the TBB project; my concern is that any code using it betrays that it might be sticking its head in the sand concerning a data race, so you basically wouldn't want to see "volatile" anywhere in TBB except in some platform-specific code). For the memory (consistency) model, some more reading is probably required.
0 Kudos
1 Reply

We're purging volatile as we go. Ideally, after the purge there should only be volatile in the machine-specific headers, andand inplaces where we want to force a read, butnot necessarilya fence (e.g. the first test in the test-and-(test-and-set) idiom). I've been thinking of adding a method atomic.peek() for these atomic-but-fenceless reads. Maybe a long name like unfenced_read() woud be better, to call attention to what is going on.

TBB is targeted at an audience that ismore concerned about their subject matter than the nuances of memory fences, so for the most part we've assumed that fences are in the intuitively obvious spots. Obviously, "intuitively obvious" is insufficient for experts, and for such experts we should specify where fencesare guaranteedprecisely.

With respect to tasks and fences, I think the minimal fencine guarantee is that tasks should have is that if task A is a predecessor of task B, then when B runs, it sees any memory updates by A. And a thread that executes task::wait_for_all() should see any updates by any of the tasks it is waiting on. I'm fairly sure the current implementation delivers this independent of the underlying threading API, because the fences are explicit in the reference counting mechanism upon which waiting is implemented.

0 Kudos