For a long time my understanding of the prominent warning that the function node body will be copied was that there had to be at least as many instances of the body class as there are of concurrent threads executing that particular body. But that understanding was incorrect. Nothing seems to be preventing multiple threads from accessing the same instance of the body class, so there is an implicit (at least, I haven't seen it stated anywhere) assumption that the operator() associated with the body is reentrant. First of all, can somebody confirm this? And second, what are my options for obtaining the desired behavior of one body class instance/thread, and yet having this body reside in an unlimited node? I care because I want the class to have member variables, and that's not possible with anything other than a serial node.
Why are you using mutable state (operator() is declared const)? Would your needs be served by thread-local storage? If not, why not?
(Added) Note that in C++11 with the standard library you're supposed to make all const operations thread-safe, because only non-const operations are protected by the library. Maybe for good form TBB should explicitly repeat this requirement, because it does not seem to be a requirement of the language, only of the standard library? But generally, to survive in the new C++ world, you had better make that true everywhere anyway, otherwise you'd have to make sure that your types never come into contact with the standard library, which seems somewhat delusional.
You're absolutely right, the body class signature in the docs explicitly specifies that the operator() be const, here for instance. I'd been ignoring that requirement without consequence until now because my classes had been stateless. And yes, thread-local storage will solve my problem. I suppose the question to ask is why does tbb impose that constraint of the operator() having to be const? Why not allow the body to have state and implement thread locality inside the tbb code somewhere. There doesn't seem to be any intrinsic reason for the constraint, but on the other hand it would make the body objects far more versatile.
If you can present a convincing use case, one with obvious general applicability, perhaps such a node type could still be added, but I'm fairly sure that this was a conscious decision.
Note that I don't really know whether the node is accessed by multiple threads at once, perhaps using mutable state is enough in this case, but, since there's no such promise, (mutable) TLS is at least an option that's guaranteed to work.
Note also that if you would like to retrieve the collective state after graph termination, you must not use TLS as a direct member variable of the function_node, because TLS copies are unrelated, but using a pointer or reference to an external TLS variable would work.
(Added) Strictly speaking, maybe the implementation could also sometimes use more than one instance on the same thread, so never assume more than you have to.