bug in concurrent_bounded_queue? - Page 2

pp0 · ‎04-28-2010

Hi there,

The snippet below will crash after a while with an access violation somewhere below tbb_debug.dll!concurrent_queue_base_v3::internal_push() when running 1 tbb thread for consume() and 10 for produce(). This is reproducible on my Vista 64 Box (snippet is compiled to 32 bits/debug). I am using Studio 2008 SP1. Any ideas?

struct QueuedCall
{
void unblock() { myQueue.push(1); }
void block() {
int dummy;
myQueue.pop(dummy);
}

tbb::concurrent_bounded_queue myQueue;
};

tbb::concurrent_queue myPendingCalls;

void consume() {
do {
QueuedCall *call;
if (myPendingCalls.try_pop(call))
call->unblock();
} while (true);
}

void produce() {
for (;;) {
QueuedCall *c = new QueuedCall();
myPendingCalls.push(c);
c->block();
delete c;
}
}

Dmitry_Vyukov · ‎04-30-2010

Quoting mwhenson

Quoting Raf Schietekat

And now how it would still fail even with a smart pointer (with both producer and consumer holding a reference to the queue)? :-)
It shouldn't, but that ignores the point that many people, especially to those with experience with a mutex based queue, will probably find this behavior initially surprising and not initially intuitive.

The queue is not linearizable:

http://en.wikipedia.org/wiki/Linearizability

Linearization points
This definition of linearizability is equivalent to the following:
All function calls have a linearization point at some instant between their invocation and their response
All functions appear to occur instantly at their linearization point, behaving as specified by the sequential definition
This alternative is usually much easier to prove. It is also much easier to reason about as a user, largely due to its intuitiveness. This property of occurring instantaneously, or indivisibly, leads to the use of the term atomic as an alternative to the longer "linearizable".

mwhenson · ‎04-30-2010

It's fine by me if it's not lineralizable - if the documentation mentioned its non-linearlizability, which it doesn't, or at least not that I can find. Because, if we don't know this, as wikipedia mentions:

On the other hand, if an operation is not atomic, the user will also need to understand and cope with sporadic extraneous behaviour caused by concurrent operations, which by its nature is likely to be hard to reproduce and debug.

Mike

mwhenson · ‎04-30-2010

Also, thanks for your help.

Mike

RafSchietekat · ‎04-30-2010

"I guess it won't. But it won't make the queue linearizable either."
In what sense?

I think it's somewhat careless to add yet another overload to what "atomic" is supposed to mean. It seems a bit like sequential consistency, both in what it does and in what it means to want to be able to take it for granted: very appealing at first sight, but would you want to pay the probably unavoidable cost? Nonlinearisability just comes with the territory, I'd say.

Alexey-Kukanov · ‎05-29-2010

I read moreinfo on linearizability, thought more of it, and discussed in the team. My vision on the issue is:
- Linearizability is a property of a set of functions (invocations and responses), in other words- a property of a synchronization protocol used to coordinate these functions.
- Alinearizable function is not obliged to havea single linearization point; it can have distinct such points for distinct other functions from the set.
- Some methods of a concurrently accessed object might be linearizable, some might not.
- TBB concurrent_queue meets the documented promise: push() and pop() methods are linearizable, while destruction is not thread safe at all.
- I do not see other practical way than reference counting (smart pointers) to make object destruction thread safe and possibly linearizable. In particular, destruction of a lock-protected queue is not linearizable either.

To prove the last item, how about the following simple history for a queue that has methods push(), pop(), and destroy(), allprotected withsame lock?
thread 1 calls queue.destroy()
thread 2 calls queue.push()
thread 1 returns from queue.destroy()
thread 2 returns from queue.push()... or maybe crashes.
Easy to see that each thread's view of the history is valid; the thread 2 did not call destroy() and it could not know some other thread did, because at the moment it called push() the queue was alive.

What I wanted to show is that a special synchronization protocol is required to make destruction thread-safe. Moreover, the protocol Dmitry assumed (the producer pushes a special signal, the consumer destroys the queue after receiving the signal) is insufficient in general, because it does not work if there are several producers or consumers. And TBB concurrent_queue is multi-producer multi-consumer queue.

The bottom line is that I do not see the described issue as lack of linearizability, or as inconsistency with documentation. The solution for it, as I already suggested, is to use smart pointers.

Dmitry_Vyukov · ‎06-03-2010

Quoting Alexey Kukanov (Intel)

- Alinearizable function is not obliged to havea single linearization point; it can have distinct such points for distinct other functions from the set.

Nope.

Every function must single linearization point. Linearizability is observable sequential consistency. If a function has several linearization points then the following scenario would be possible.

Thread 0 partially executes function A.

Thread 1 observes an effect of the invocation of function A by means of function B.

Thread 1 communicates to thread 2 that A has completed.

Thread 2 receives the message from thread 1.

Thread 2 tries to verify that function A has indeed completed by means of function C, but fails.

For example consider that function queue::enqueue() is implemented as: (1) increment queue size, (2) enqueue a message. Then:

Thread 0 partially executes queue::enqueue() (i.e. a size is incremented, but a message is not enqueued).
Thread 1 executes queue:size(), and obtains a value 1.
Thread 1 communicates to thread 2 that the queue is not empty.
Thread 2 executes queue::dequeue(), but obtains zero pointer (thread 0 is still not completed enqueue).
Thread 2 dereferences zero pointer. Bang!

Such scenario is impossible with linearizable queue.

Dmitry_Vyukov · ‎06-03-2010

However, it's still may be useful to think in terms of "partial linearization points" (when a function has several linearization points) in some contexts. But that does not make a data structure linealizable as a whole.

Dmitry_Vyukov · ‎06-03-2010

Quoting Alexey Kukanov (Intel)

- I do not see other practical way than reference counting (smart pointers) to make object destruction thread safe and possibly linearizable. In particular, destruction of a lock-protected queue is not linearizable either.

Alexey, you do not get the point.

Destruction is not linearizable, and should not be linearizable, it's strictly single-threaded operation.

BUT one may use linearizability of operations on a data structure to determine a point when it's safe to destroy it (we are not talking about contents of destructor, only about a point when it's safe to destory).

Consider:

A consumer thread receives a last message from a producer thread. Now it "thinks": "ok, it's a last message, so producer does not use the queue any more, now I may delete the queue". And linearizable queue (mutex-based) permits such scenario, while TBB queue (as well as M&S queue) does not.

Dmitry_Vyukov · ‎06-03-2010

Quoting Alexey Kukanov (Intel)

To prove the last item, how about the following simple history for a queue that has methods push(), pop(), and destroy(), allprotected withsame lock?
thread 1 calls queue.destroy()
thread 2 calls queue.push()
thread 1 returns from queue.destroy()
thread 2 returns from queue.push()... or maybe crashes.
Easy to see that each thread's view of the history is valid; the thread 2 did not call destroy() and it could not know some other thread did, because at the moment it called push() the queue was alive.

It has nothing to do with the topic, it's plain programming error. A thread should not destroy objects while they are in use. A thread must determine a point when it's safe to destroy an object first.

Dmitry_Vyukov · ‎06-03-2010

Quoting Alexey Kukanov (Intel)

What I wanted to show is that a special synchronization protocol is required to make destruction thread-safe.

Indeed.

What you missed is that the protocol can include return values from methods of the object (dequeue returned LAST_MESSAGE -> safe to destroy).

Quoting Alexey Kukanov (Intel)

Moreover, the protocol Dmitry assumed (the producer pushes a special signal, the consumer destroys the queue after receiving the signal) is insufficient in general, because it does not work if there are several producers or consumers. And TBB concurrent_queue is multi-producer multi-consumer queue.

Nope. TBB concurrent_queue IS-A single-producer/single-consumer queue while you do not provide tbb::single_producer_single_consumer_concurrent_queue.
Or you want to say that documentation/intention prohibits usage of tbb::concurrent_queue as a spsc queue? Do you?

Terry_W_Intel · ‎06-03-2010

Quoting Dmitriy Vyukov

Consider:
A consumer thread receives a last message from a producer thread. Now it "thinks": "ok, it's a last message, so producer does not use the queue any more, now I may delete the queue". And linearizable queue (mutex-based) permits such scenario, while TBB queue (as well as M&S queue) does not.

I'm not sure I get this. Assume a queue is linearizable -- that its push and pop operations each have a single linearization point. These points govern what is visible by one operation, with respect to another. They occur at some point between the invocation and response of the operation. So a linearization point on the producer inserting the "last" element has passed, but the push operation may not have terminated yet. The pop operation "sees" the pushed element, takes it out, passes its linearization point, responds. Then the destructor is called. But nothing about those linearization points tells you that the push operation has completed and that it is safe to destroy the queue...? Or am I missing something?

Thanks!

Terry

mwhenson · ‎06-03-2010

The bottom line is that I do not see the described issue as lack of linearizability, or as inconsistency with documentation. The solution for it, as I already suggested, is to use smart pointers.

I respectfully disagree. In the 1st post example, the consumer knows the producer has stopped producing, knows he's the only consumer, knows he's returned from the only pop call, then tried deleting. Had there been a lock around push / pop / destroy calls, the call to pop would not have returned until it was safe to delete in that example.

I'm not an expert on the definition of linearization, but this seems inconsistent with the definition presented here and in wikipedia. It's definitely inconsistent with a naive implementation with a single lock around everything. When we first ran into this problem, my first action was to check the documentation, but found no clues. I reiterate that this behavior is surprising to someone used to dealing with more naive implementations of a concurrent queue and suggest a note in the documentation if appealing to a less expert audience is a goal. I agree that a smart pointer should solve the problem, but that's not too useful when you have no reason to expect to need another layer of synchronization.

Mike

Alexey-Kukanov · ‎06-04-2010

Indeed your example is not linearizable in any sense, including my understanding. Such implementation would be just buggy.

In my understanding, enqueue() and dequeue() implementations should also synchronize internally, e.g. viaa readiness flag for the enqueued element. In this case, atomicincrement of size would serve as linearization point for size(), and setting the flag would serve as linearization point for dequeue().

The way to observe an object is to call some method of it. I say that making every method adhere to linearizability rules is very costly in general, and so is often impractical. Let's consider the example with the lock-based queue again. Let the producer thread push the completion signal, unlock the queue, and right after that be preempted while still inside the push() method. Let the consumer thread receive the signal, destroy the queue, _and_ unload the modulethat implements the queue- that seems to be safe because nobody is using the queue anymore, does not it? And then the producer threadresumes, and finds itself executing an epilogue of a function which no more exists. Ooops.

I will not comment on linearizability of the concurrent_queue, and I even take back my previous comment about push() and pop() being linearizable with respect to each other, because:
1) for the moment,we did not promise it in the documentation;
2) I did not prove it myself, neither I know of anyone who did.

Alexey-Kukanov · ‎06-04-2010

Quoting mwhenson

I respectfully disagree. In the 1st post example, the consumer knows the producer has stopped producing, knows he's the only consumer, knows he's returned from the only pop call, then tried deleting. Had there been a lock around push / pop / destroy calls, the call to pop would not have returned until it was safe to delete in that example.

I'm not an expert on the definition of linearization, but this seems inconsistent with the definition presented here and in wikipedia. It's definitely inconsistent with a naive implementation with a single lock around everything. When we first ran into this problem, my first action was to check the documentation, but found no clues. I reiterate that this behavior is surprising to someone used to dealing with more naive implementations of a concurrent queue and suggest a note in the documentation if appealing to a less expert audience is a goal. I agree that a smart pointer should solve the problem, but that's not too useful when you have no reason to expect to need another layer of synchronization.

Mike

What the consumer does not know however is whether the producer really returned from the enqueue call. And linearizability does not provide such a promise. All it promisesis that the operation effects important for the consumerare observed by the consumer at some instant point, no matter in what sequence and what time the calls were made. See my previous post - even with the lock based queue, no guarantee can be made that the producer has really returned from the call! Just that it has done its work.

Yes a note in the documentation might be reasonable. The note should tell that making destruction of a concurrent object thread-safe by means of the object is impossible in principle, and that additional means/protocols, such as smart pointers, should be used at the application level.

Alexey-Kukanov · ‎06-04-2010

Quoting Terry Wilmarth (Intel)

I'm not sure I get this. Assume a queue is linearizable -- that its push and pop operations each have a single linearization point. These points govern what is visible by one operation, with respect to another. They occur at some point between the invocation and response of the operation. So a linearization point on the producer inserting the "last" element has passed, but the push operation may not have terminated yet. The pop operation "sees" the pushed element, takes it out, passes its linearization point, responds. Then the destructor is called. But nothing about those linearization points tells you that the push operation has completed and that it is safe to destroy the queue...? Or am I missing something?
Thanks!
Terry

It seems tome that Dmitry for some reason assumes that after the linearization point has passed the method cannot even touch (and much less change) any state of the queue. This is obviously impractical, and this does not follow from the definition of linearizability.

We have already discussed in the team that any attempt to make destruction of the concurrent object safe from inside the object is meaningless, because the container itself can not guarantee that no other thread would touch it the very next moment. For that purpose there should be additional protocols appliedwhich a generic container obviously can not know.

Dmitry suggested such a protocol that he thinks is suitable for single producer single consumer queues, based on the observation that it works with the lock-based queue. However let'a apply formal logic. We have two postulates:
- lock-based queue is linearizable;
- lock-based queue supports the described protocol for safe destruction.
From these postulates, does it follow that any linearizable spsc queue supports the described protocol? No it does not.

Dmitry_Vyukov · ‎06-04-2010

Quoting Terry Wilmarth (Intel)

I'm not sure I get this. Assume a queue is linearizable -- that its push and pop operations each have a single linearization point. These points govern what is visible by one operation, with respect to another. They occur at some point between the invocation and response of the operation. So a linearization point on the producer inserting the "last" element has passed, but the push operation may not have terminated yet. The pop operation "sees" the pushed element, takes it out, passes its linearization point, responds. Then the destructor is called. But nothing about those linearization points tells you that the push operation has completed and that it is safe to destroy the queue...? Or am I missing something?

My reasoning is as follows.

For a linearizable data structure observable side effects of an operation are either (1) not take place at all or (2) all take place together. There is no partial states.

One of observable side effect is "return from the function" (by return I mean that a thread "stopped touching the object" rather than a physical return, i.e. execution of a RET instruction).

So if a consumer has observed that the message is in the queue then the producer stopped touching the object (provided that the object is linearizable).

And if the producer did not stop touching the object, then consumer may observe partial state (by unmapping the memory, and catching SIGSEGV). Thus the data structure in not linearizable.

Dmitry_Vyukov · ‎06-04-2010

Quoting Alexey Kukanov (Intel)

In my understanding, enqueue() and dequeue() implementations should also synchronize internally, e.g. viaa readiness flag for the enqueued element. In this case, atomicincrement of size would serve as linearization point for size(), and setting the flag would serve as linearization point for dequeue().

Sequential consistency (linearizability) does not have any sense for a single thread (the order is just what is observed). SC only have meaning when we consider a plularity of observers (their partial observed orders must form a consistent total order).

So what you described is not sequentially consistent (linearizable), because if threads that execute size() and dequeue() communicate with each other (or with a third observer), they will reveal the inconsistency. Namely, size() returns 1 (i.e. enqueue has taken place) and AFTER that dequeue() return 0 (i.e. enqueue has not taken place).

Alexey-Kukanov · ‎06-04-2010

Quoting Dmitriy Vyukov

Sequential consistency (linearizability) does not have any sense for a single thread (the order is just what is observed). SC only have meaning when we consider a plularity of observers (their partial observed orders must form a consistent total order).
So what you described is not sequentially consistent (linearizable), because if threads that execute size() and dequeue() communicate with each other (or with a third observer), they will reveal the inconsistency. Namely, size() returns 1 (i.e. enqueue has taken place) and AFTER that dequeue() return 0 (i.e. enqueue has not taken place).

This time it seems you did not get my point :)
In what I described, the dequeue would not return at all until it sees the object isfully constructed and could be obtained. It would be impossible to make a wait-free dequeue calllinearizable, but possible with a blocking dequeue().

mwhenson · ‎06-04-2010

Thanks for the clarification everyone, I found it useful.

Mike

Dmitry_Vyukov · ‎06-04-2010

Quoting Alexey Kukanov (Intel)

It seems tome that Dmitry for some reason assumes that after the linearization point has passed the method cannot even touch (and much less change) any state of the queue. This is obviously impractical, and this does not follow from the definition of linearizability.

It does follow. I can observe that the operation is partially committed: the message is available for consuming, but the thread is not stopped touching the object. "Touching the object" is observable side effect because, well, I am able to observe it.

Dmitry_Vyukov · ‎06-04-2010

Quoting Alexey Kukanov (Intel)

We have already discussed in the team that any attempt to make destruction of the concurrent object safe from inside the object is meaningless, because the container itself can not guarantee that no other thread would touch it the very next moment. For that purpose there should be additional protocols appliedwhich a generic container obviously can not know.

So is the following implementation of reference ocunting Ok with you?

bool release_object(object_t* obj)
{

int rc = obj->rc.fetch_sub(1, memory_order_acq_rel);

obj->last_access.store(get_timestamp(), memory_order_relaxed);

return rc == 1;

}