bug in concurrent_bounded_queue? - Page 4

pp0 · ‎04-28-2010

Hi there,

The snippet below will crash after a while with an access violation somewhere below tbb_debug.dll!concurrent_queue_base_v3::internal_push() when running 1 tbb thread for consume() and 10 for produce(). This is reproducible on my Vista 64 Box (snippet is compiled to 32 bits/debug). I am using Studio 2008 SP1. Any ideas?

struct QueuedCall
{
void unblock() { myQueue.push(1); }
void block() {
int dummy;
myQueue.pop(dummy);
}

tbb::concurrent_bounded_queue myQueue;
};

tbb::concurrent_queue myPendingCalls;

void consume() {
do {
QueuedCall *call;
if (myPendingCalls.try_pop(call))
call->unblock();
} while (true);
}

void produce() {
for (;;) {
QueuedCall *c = new QueuedCall();
myPendingCalls.push(c);
c->block();
delete c;
}
}

Alexey-Kukanov · ‎06-07-2010

Quoting Dmitriy Vyukov

Now let's start from the beginning.
Concurrent programs need to delete objects during run time. Period.
So at least some objects HAVE to have safe deletion property. Period.
I.e. you need to determine when it's safe to delete an object basing you reasoning only on values returned from methods of the object itself (w/o using external objects).
Now when you can delete some objects we indeed may allow some other objects to not feature the property (they will be a kind of dependent, and always require some external assistance). But on the other hand it's at least nice if as much as possible objects will feature the property, because, well, we need to delete every object at some point in time.

Fortunately, there are kinds of objects that do have "safe deletion" property - local objects (owned by a single threadat any giventime) and static file-scope objects destroyed at program exit. Together with certain programming discipline, it allows building safe deletion protocols for shared objects.
For example, let's imagine a template class that holds a pointer to the object that one needs to share, also contains a reference counter, and implements smart pointers to access itself. All you need then is the programming discipline. The smart pointers should be local objects never passed by reference, so that they are safe to delete. And the object-to-share should never be used unless the thread holds such a pointer. When the last such pointer is destroyed, and so thereference counter becomes zero, both the object-to-share and the reference counting object can be destroyed.

Dmitry_Vyukov · ‎06-07-2010

Quoting Alexey Kukanov (Intel)

Fortunately, there are kinds of objects that do have "safe deletion" property - local objects (owned by a single threadat any giventime) and static file-scope objects destroyed at program exit. Together with certain programming discipline, it allows building safe deletion protocols for shared objects.

Nope, it's not.

Destruction of non-shared object is trivial and irrelevant.

Smart pointers aside, they are no more than 'a syntactic sugar', I am perfectly able to call 'acquire' and 'release' manually.

So what we have is a shared object with 'acquire' and 'release' methods. One of calls to 'release' returns 'true', which means it's safe to delete the object now. So it's actually a shared object that must support 'safe deletion'. You can't bolt it on with any kind of smart pointers and other external things.

For example, let's imagine a template class that holds a pointer to the object that one needs to share, also contains a reference counter, and implements smart pointers to access itself. All you need then is the programming discipline. The smart pointers should be local objects never passed by reference, so that they are safe to delete. And the object-to-share should never be used unless the thread holds such a pointer. When the last such pointer is destroyed, and so thereference counter becomes zero, both the object-to-share and the reference counting object can be destroyed.

Ok, let's see how a programming disciple will help you. Let's assume that the shared object is TBB concurrent queue with initial reference count of 11 (10 items in the queue). We have 11 threads, and each calls try_pop() once. We use smart pointer (RAII) to ensure that each thread will indeed call try_pop(). When the reference counter drops to zero (try_pop() returns 0), the thread deletes the queue.

Quiz: will smart pointers help you?

See, it has nothing to do with smart pointers and the like. It's the shared object that must be implemented in such a way that *IT* will say to you when it's safe to delete it. If your shared objects do not have such a feature, there is nothing you can do to reduce number of alive objects.

Dmitry_Vyukov · ‎06-07-2010

Note that I'm not saying that each and every object must support 'safe deletion', I'm only saying that:
(1) It's a useful property. I don't care how scientists define theoretical linearizability, but I write real-world programs and I need to know when it's safe to delete a particular object. So 'touching object's data' it's an important and observable side effect for me.
(2) Guarantees in this respect must be (at least for widely used support library) clearly documented. I.e. I need to know as to whether each particular method touches an object's data after 'partial linearization point' with each other method. For example, enqueue() may touch data after linearization point with dequeue(), but do not touch data after linearization point with size() (i.e. method by means of which I observe commit of some other method can make difference). In particular, "no guarantees in this respect" is also a clear documentation, this just means that one must always use some external means for that (in particular, sane mutexes, threads, reference counters and safe memory reclamation algorithms are always implemented with "safe deletion" in mind, thus can be used to safely delete other objects).

ARCH_R_Intel · ‎06-07-2010

I concur with both of Dmitriy's points. I suggest that the default be "no guarantees" and we clearly document the cases where it is guaranteed. It would be good to prove that the TBB mutexes have the "safe deletion" guarantee (it's not immediately obvious with some of the complicated mutexes).

Alexey-Kukanov · ‎06-07-2010

Quoting Dmitriy Vyukov

Ok, let's see how a programming disciple will help you. Let's assume that the shared object is TBB concurrent queue with initial reference count of 11 (10 items in the queue). We have 11 threads, and each calls try_pop() once. We use smart pointer (RAII) to ensure that each thread will indeed call try_pop(). When the reference counter drops to zero (try_pop() returns 0), the thread deletes the queue.

Dmitry, I am sorry to say that but you likely did not get the idea again.

I did not say that counted references are kept as items in the queue, or that there are as many references as items in the queue. I said that each thread, as long as it is going to use the queue, keeps a counted reference all along, and only releases it when done with the queue.
Let's take your earlier example with producer/consumer. The producer pushes the completion signalto the queue, and thenremoves the reference. The consumer pops the signal, and removes its reference. If you have 11 threads, each should have its own reference before using the queue in any way, and remove it when done. Whoever is last, deletes the queue.

Quoting Dmitriy Vyukov

It's the shared object that must be implemented in such a way that *IT* will say to you when it's safe to delete it. If your shared objects do not have such a feature, there is nothing you can do to reduce number of alive objects.

With the whole memory under the object being reclaimed (including any state that tells about safety of deletion), no object can say when it's safe to delete it because it can not predict the time it will be accessed again, and there always exists timing window between the check and the destruction when some other thread can tryaccessing the object. So it is the question of programming discipline at the application level, not of class/object level safety techniques.

I thought more of the components of the solution I suggested above. The template reference counting class is convenient just to separate the concept of counting and apply it to an arbitrary class. Smart pointers are syntactic sugar, with the following key idea underneath: a thread only increments the reference count ofa shared object when it already holds a counted reference to that object. Smart pointers just make it more convenient to follow this principle. The thread can then pass the reference to some other thread.

It's like an invitation-only club: if you are a member of that club you can invite other members, but people from street can't enter; so whoever is the last member leaving the club, destroys it. Indeed the destruction is safe, because the last member knows nobody can be inside :)

And the next thought is that the only difference related to concurrent programming is that reference counting should be done with atomic RMW instructions. Otherwise, it's the same old story of shared ownership.

RafSchietekat · ‎06-07-2010

"From a quick look, it appears that concurrent_queue::push does not touch *this after the pushed item becomes visible to the consumer."
It touches my_rep to notify the consumer. And even if the consumer received the notification before it saw the item, do we know whether the notification mechanism by itself can be "safely" deleted?

"The consumer is signaled after the point that the item became available to the consumer."
What would be the effect, including on performance, of inverting that order, making the consumer busy-wait for the item after receiving the notification, with or without the occasional lost quantum when the producer goes to sleep at exactly the wrong time, if this reordering is at all possible of course, and otherwise why not?

Dmitry_Vyukov · ‎06-08-2010

Quoting Alexey Kukanov (Intel)

Dmitry, I am sorry to say that but you likely did not get the idea again.

I did not say that counted references are kept as items in the queue, or that there are as many references as items in the queue. I said that each thread, as long as it is going to use the queue, keeps a counted reference all along, and only releases it when done with the queue.
Let's take your earlier example with producer/consumer. The producer pushes the completion signalto the queue, and thenremoves the reference. The consumer pops the signal, and removes its reference. If you have 11 threads, each should have its own reference before using the queue in any way, and remove it when done. Whoever is last, deletes the queue.

I was talking about an implementation of the reference counter itself.

I am able to implement reference counter by means of a quque, so that it works as you described. Or I am able to implement it by means of queue, so that it won't work as you described (tbb queue as reference counter).

I am able implement ference counter by mean of, well, reference counter, so that it works as you described. Or so that it won't work as you described. Everything depends on implemenetation of the shared object. For example consider:

bool release_reference(object_t* obj)

{

int rc = obj->rc.fetch_sub(1);

obj->last_accessed.store(get_timestamp());

return rc == 1

}

If your objects support 'safe deletion', then you are able to delete them. If they are not, then you can't. If reference counter object's methods touch own data after linearization point, then your protocol won't help you.

Dmitry_Vyukov · ‎06-08-2010

Quoting Alexey Kukanov (Intel)

With the whole memory under the object being reclaimed (including any state that tells about safety of deletion), no object can say when it's safe to delete it because it can not predict the time it will be accessed again, and there always exists timing window between the check and the destruction when some other thread can tryaccessing the object. So it is the question of programming discipline at the application level, not of class/object level safety techniques.

I see what you mean. But I would describe it as follows.

Accessing reference counted object only after acquiring a reference it's just a matter of satisfying a contract between the component and outer code. If the contract is not satisfied then, of course, all bets are off (it's like accessing an array out of bounds). But if it is satisfied, then one does not automatically get everything in this world, one gets only what the component promises. In particular, if reference counted object does not promise it's safe deletion (thought, it's as correct and as linearizable as tbb queue), you won't get it even if you follow the contract.

And my point is that some components MUST promise safe deletion. And if they are not, sorry, you can't get it with any programming discipline.

Andrey_Marochko · ‎06-08-2010

"It's a useful property ... I write real-world programs and I need to know when it's safe to delete a particular object."

I would argue that the original scanario, where queue emptiness is used as a termination signal, is a just a corner case. Normally in a concurrent setup a container being empty guarantees nothing, as there easily can be another bunch of push/pop pairs in flight in other threads. Besides, most of real world applications operate with multiple objects, and making one of them moonlighting as an execution flow controller is unsound from the design stability standpoint.

Thus even though it is indeed a useful property in general, it's applicability area is narrow enough to sacrifice it without any qualm in favor of better performance and scalability of the primary conatiner operations (insertion, deletion, lookup, etc...)

Dmitry_Vyukov · ‎06-08-2010

Quoting Alexey Kukanov (Intel)

It's like an invitation-only club: if you are a member of that club you can invite other members, but people from street can't enter; so whoever is the last member leaving the club, destroys it. Indeed the destruction is safe, because the last member knows nobody can be inside :)

It seems that you are talking about reference counters with only basic thread safety. Strongly thread safe reference counters do allow people from street to enter.

Dmitry_Vyukov · ‎06-08-2010

Quoting Andrey Marochko (Intel)

I would argue that the original scanario, where queue emptiness is used as a termination signal, is a just a corner case. Normally in a concurrent setup a container being empty guarantees nothing, as there easily can be another bunch of push/pop pairs in flight in other threads. Besides, most of real world applications operate with multiple objects, and making one of them moonlighting as an execution flow controller is unsound from the design stability standpoint.

Your opinion disagress with opinion of designers of POSIX standard. Why would they provide socket shutdown function then? It's basically the same - producer may send an END message, after receiving of which consumer may close the socket, because he knows that nobody uses it anymore.

Dmitry_Vyukov · ‎06-08-2010

Quoting Andrey Marochko (Intel)

Thus even though it is indeed a useful property in general, it's applicability area is narrow enough to sacrifice it without any qualm in favor of better performance and scalability of the primary conatiner operations (insertion, deletion, lookup, etc...)

And... what for are you going to sacrifice performance/scalability of insert/remove when implementing it?

Dmitry_Vyukov · ‎06-08-2010

Quoting Dmitriy Vyukov

Quoting Andrey Marochko (Intel)

Thus even though it is indeed a useful property in general, it's applicability area is narrow enough to sacrifice it without any qualm in favor of better performance and scalability of the primary conatiner operations (insertion, deletion, lookup, etc...)
And... what for are you going to sacrifice performance/scalability of insert/remove when implementing it?

There is always also an implementation complexity of course. However that's a different matter. If you decide to not penalize performance, each and every end user benefits. If you decide to not increase complexity, personally you benefit :)

Alexey-Kukanov · ‎06-08-2010

Quoting Dmitriy Vyukov

I see what you mean. But I would describe it as follows.
Accessing reference counted object only after acquiring a reference it's just a matter of satisfying a contract between the component and outer code. If the contract is not satisfied then, of course, all bets are off (it's like accessing an array out of bounds). But if it is satisfied, then one does not automatically get everything in this world, one gets only what the component promises. In particular, if reference counted object does not promise it's safe deletion (thought, it's as correct and as linearizable as tbb queue), you won't get it even if you follow the contract.
And my point is that some components MUST promise safe deletion. And if they are not, sorry, you can't get it with any programming discipline.

Ok I think we are inconsensus, or at leastclose enough (finally).

I agree that for the implementation of reference counters (and generally objects that control lifetime of itself or other objects), a stricter guarantee than linearizability is required. Fortunately, the very straightforward implementation of reference counters with atomic RMW instructions on a single machine word satisfies the required property. That's what I always had in mind, and so getting to the point of mutual understanding took longer. Anyway I'm glad that happened :)

As for the "invitation club" scheme, indeed this is just one possible way to design the high-level discipline (or contract if you wish); there are hazard pointers, andother schemas as well, garbage collectors, after all. I just wanted to show that there is a simple and working solution to the problem, but it is not 100% technical.And generally the solution can not be 100% doneat the library side. Well, maybe unless it's a library for a garbage collected environment, with corresponding support from both language and runtime.

ARCH_R_Intel · ‎06-08-2010

"From a quick look, it appears that concurrent_queue::push does not touch *this after the pushed item becomes visible to the consumer."

"It touches my_rep to notify the consumer. And even if the consumer received the notification before it saw the item, do we know whether the notification mechanism by itself can be "safely" deleted?"

I'm fairly certain that concurrent_queue::push does not touch my_rep after the item becomes visible to the consumer. A pushed item does not become visible until method push executes the atomic fetch-and-add "tail_counter+=..." inside the representation, after which it does not touch the representation. That fetch-and-add is the notification mechanism. So it is safe for the consumer to delete it.

As an informal check, I modified the original example to use concurrent_queue instead of concurrent_bounded_queue. It's been running for a 100,000,000 iterations without complaint. Unfortunately, I had to replace the "pop" in the original with a busy wait loop around "try_pop", so it lacks some of the desirable behavior of the original example.

What makes concurrent_bounded_queue different and thus break in the same scenario is that it has an additional wakeup mechanism after the update of the tail_counter.

Andrey_Marochko · ‎06-08-2010

"Your opinion disagress with opinion of designers of POSIX standard. Why would they provide socket shutdown function then? It's basically the same - producer may send an END message, after receiving of which consumer may close the socket, because he knows that nobody uses it anymore."

Well, design approaches to shared memory and distributed applications are a little different, aren't they? :)

mwhenson · ‎06-08-2010

Thank Arch, that was very edifying.

Mike

Andrey_Marochko · ‎06-08-2010

Quoting Dmitriy Vyukov

There is always also an implementation complexity of course. However that's a different matter. If you decide to not penalize performance, each and every end user benefits. If you decide to not increase complexity, personally you benefit :)

Unfortunately excessive implementation complexity often results in a solution completely unsuitable for real world needs. For example, there is a slew of lock- and even wait-free algorithms out there. But how many of them are used in practice?

RafSchietekat · ‎06-08-2010

#77 "I'm fairly certain that concurrent_queue::push does not touch my_rep after the item becomes visible to the consumer. A pushed item does not become visible until method push executes the atomic fetch-and-add "tail_counter+=..." inside the representation, after which it does not touch the representation. That fetch-and-add is the notification mechanism. So it is safe for the consumer to delete it. "
Here's a piece of code from concurrent_queue_base_v3::internal_push() at concurrent_queue.cpp:385-386 in tbb30_20100406oss:

[bash]r.choose( k ).push( src, k, *this );
r.items_avail.notify( predicate_leq(k) );
[/bash]

First push(), which makes the item visible (right?), then use of r, which is a reference to *my_rep. That means that seeing a "last" item is not enough to "safely" delete the queue, if that is the goal.

Dmitry_Vyukov · ‎06-08-2010

That's not the concurrent_queue. concurrent_queue does not do any notifications.

RafSchietekat · ‎06-08-2010

"That's not the concurrent_queue. concurrent_queue does not do any notifications."
Am I hallucinating, then? Or is there a less disturbing explanation? :-)