Flow graph queueing nodes hold input references

john5 · ‎03-17-2014

I have a flow graph where the objects are too heavy to pass by copy, so instead I'm passing a shared pointer to the object through the graph. What I noticed is that my deleters for some of those objects were being deferred until graph destruction because the graph nodes were holding copies, which is too late for my needs. I had expected them to hold references for only as long as those objects were active in the graph.

I traced this to the item_buffer inside the nodes. The buffer is an expandable queue that stores elements as pairs of (item,valid), and the valid flag is toggled when an item is "alive" or not. When an item is popped from the buffer, the flag is toggled, but the item is left wholly intact. I patched the invalidate() routine of _flow_graph_item_buffer_impl.h (line 55) to add item(i).first=T() ... ie to default-construct a T object after the pop. This solves my problem for the moment, but it's not ideal because not all objects can be default constructed. I'd rather just destruct, as is the case when you pop from most queues.

Can anyone shed any light on why this behavior is as it is?

RafSchietekat · ‎03-18-2014

Shared pointers sound great, but they can be costly because of the reference counting, which translates to full fences (at least on Intel hardware), although it depends on the context whether that will have a noticeable impact (perhaps not inside a graph). But not necessarily for performance reasons, have you tried simple pointers (which could be shared_ptr::get() values with the shared_ptr held outside of the graph)? For clearer results during debugging, weak_ptr might help to fail early (otherwise the pointer might still point to seemingly valid memory), although it might still be as costly as shared_ptr (I'd have to check this again).

I would suggest that TBB use move operations in an internal TBB namespace, that map to the real thing if C++11 is being used, so that a C++11 user can at least have that, and to simple copy assignment otherwise, because I'm not sure that move can easily be satisfactorily emulated.

john5 · ‎03-18-2014

I've been using shared pointers with TBB flow graphs without any perf problems, so I'm not too worried there. Most of my tasks are several milliseconds long, so the fences aren't a big going to be noticeable. My graph has a few exit points, so I started using the reference count to determine when a work item had exited - shared_ptr.unique() should in theory be true. This is how I noticed that the input buffer was holding a second copy, even after that item had passed through. I don't think I could get away with a weak_ptr, because I would like the object to remain alive for at least as long as it's in the flow graph.

And I agree with Raf, it would be nice if moves were used internally. If that's too ambitious for now, I'd be content with the same mechanism that the concurrent_queue uses, which is to copy-assign into the queue and to destroy the queue's copy after a pop.

Christophe_H_Intel · ‎03-18-2014

Hello, John,

You are correct, the objects are forwarded, but the items in the buffer are not destroyed. The problem with using "heavy" objects is there is a lot of copy-construction in the process of passing items from one node to the next. The general advice is to use trivially-constructible objects (basic pointers) as the items to pass if the objects themselves are big.

The other consideration is that if one is passing to multiple successors (not in the case of the queueing nodes, but in broadcasting nodes), the process of forwarding items involves copy-construction, which is also time-consuming for non-trivial objects.

I believe there is a default-constructibility requirement for objects which are passed from node-to-node, but I cannot find it in the documentation yet. If it is a requirement I will add the restriction.

I do like Raf's suggestion about move semantics. If there is a way to support it in all the C++ compilers we target, it should be done. Otherwise we can call explicit destructors for the items.

Best Regards,
Chris

john5 · ‎03-18-2014

An explicit destructor would be just fine by me and would certainly solve the problem I'm seeing.

I get what you guys are saying about the heaviness of forwarded objects, and this is why I'm going with shared_ptr rather than full objects. I figure they're only barely more expensive than raw pointers and come with a lot of side benefits, like not having to worry about the lifetime of the objects I throw into the graph (for instance).