Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

bug in concurrent_bounded_queue?

pp0
Beginner
3,760 Views

Hi there,

The snippet below will crash after a while with an access violation somewhere below tbb_debug.dll!concurrent_queue_base_v3::internal_push() when running 1 tbb thread for consume() and 10 for produce(). This is reproducible on my Vista 64 Box (snippet is compiled to 32 bits/debug). I am using Studio 2008 SP1. Any ideas?

struct QueuedCall
{
void unblock() { myQueue.push(1); }
void block() {
int dummy;
myQueue.pop(dummy);
}

tbb::concurrent_bounded_queue myQueue;
};

tbb::concurrent_queue myPendingCalls;

void consume() {
do {
QueuedCall *call;
if (myPendingCalls.try_pop(call))
call->unblock();
} while (true);
}

void produce() {
for (;;) {
QueuedCall *c = new QueuedCall();
myPendingCalls.push(c);
c->block();
delete c;
}
}

0 Kudos
88 Replies
Dmitry_Vyukov
Valued Contributor I
682 Views
Quoting mwhenson
And now how it would still fail even with a smart pointer (with both producer and consumer holding a reference to the queue)? :-)

It shouldn't, but that ignores the point that many people, especially to those with experience with a mutex based queue, will probably find this behavior initially surprising and not initially intuitive.


The queue is not linearizable:

http://en.wikipedia.org/wiki/Linearizability

Linearization points

This definition of linearizability is equivalent to the following:

  • All function calls have a linearization point at some instant between their invocation and their response
  • All functions appear to occur instantly at their linearization point, behaving as specified by the sequential definition

This alternative is usually much easier to prove. It is also much easier to reason about as a user, largely due to its intuitiveness. This property of occurring instantaneously, or indivisibly, leads to the use of the term atomic as an alternative to the longer "linearizable".

0 Kudos
mwhenson
Beginner
682 Views
It's fine by me if it's not lineralizable - if the documentation mentioned its non-linearlizability, which it doesn't, or at least not that I can find. Because, if we don't know this, as wikipedia mentions:

On the other hand, if an operation is not atomic, the user will also need to understand and cope with sporadic extraneous behaviour caused by concurrent operations, which by its nature is likely to be hard to reproduce and debug.

Mike
0 Kudos
mwhenson
Beginner
682 Views
Also, thanks for your help.

Mike
0 Kudos
RafSchietekat
Valued Contributor III
682 Views
"I guess it won't. But it won't make the queue linearizable either."
In what sense?

I think it's somewhat careless to add yet another overload to what "atomic" is supposed to mean. It seems a bit like sequential consistency, both in what it does and in what it means to want to be able to take it for granted: very appealing at first sight, but would you want to pay the probably unavoidable cost? Nonlinearisability just comes with the territory, I'd say.
0 Kudos
Alexey-Kukanov
Employee
682 Views
I read moreinfo on linearizability, thought more of it, and discussed in the team. My vision on the issue is:
- Linearizability is a property of a set of functions (invocations and responses), in other words- a property of a synchronization protocol used to coordinate these functions.
- Alinearizable function is not obliged to havea single linearization point; it can have distinct such points for distinct other functions from the set.
- Some methods of a concurrently accessed object might be linearizable, some might not.
- TBB concurrent_queue meets the documented promise: push() and pop() methods are linearizable, while destruction is not thread safe at all.
- I do not see other practical way than reference counting (smart pointers) to make object destruction thread safe and possibly linearizable. In particular, destruction of a lock-protected queue is not linearizable either.

To prove the last item, how about the following simple history for a queue that has methods push(), pop(), and destroy(), allprotected withsame lock?
thread 1 calls queue.destroy()
thread 2 calls queue.push()
thread 1 returns from queue.destroy()
thread 2 returns from queue.push()... or maybe crashes.
Easy to see that each thread's view of the history is valid; the thread 2 did not call destroy() and it could not know some other thread did, because at the moment it called push() the queue was alive.

What I wanted to show is that a special synchronization protocol is required to make destruction thread-safe. Moreover, the protocol Dmitry assumed (the producer pushes a special signal, the consumer destroys the queue after receiving the signal) is insufficient in general, because it does not work if there are several producers or consumers. And TBB concurrent_queue is multi-producer multi-consumer queue.

The bottom line is that I do not see the described issue as lack of linearizability, or as inconsistency with documentation. The solution for it, as I already suggested, is to use smart pointers.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
682 Views
- Alinearizable function is not obliged to havea single linearization point; it can have distinct such points for distinct other functions from the set.

Nope.

Every function must single linearization point. Linearizability is observable sequential consistency. If a function has several linearization points then the following scenario would be possible.

Thread 0 partially executes function A.

Thread 1 observes an effect of the invocation of function A by means of function B.

Thread 1 communicates to thread 2 that A has completed.

Thread 2 receives the message from thread 1.

Thread 2 tries to verify that function A has indeed completed by means of function C, but fails.

For example consider that function queue::enqueue() is implemented as: (1) increment queue size, (2) enqueue a message. Then:

Thread 0 partially executes queue::enqueue() (i.e. a size is incremented, but a message is not enqueued).
Thread 1 executes queue:size(), and obtains a value 1.
Thread 1 communicates to thread 2 that the queue is not empty.
Thread 2 executes queue::dequeue(), but obtains zero pointer (thread 0 is still not completed enqueue).
Thread 2 dereferences zero pointer. Bang!

Such scenario is impossible with linearizable queue.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
682 Views
However, it's still may be useful to think in terms of "partial linearization points" (when a function has several linearization points) in some contexts. But that does not make a data structure linealizable as a whole.


0 Kudos
Dmitry_Vyukov
Valued Contributor I
682 Views
- I do not see other practical way than reference counting (smart pointers) to make object destruction thread safe and possibly linearizable. In particular, destruction of a lock-protected queue is not linearizable either.

Alexey, you do not get the point.

Destruction is not linearizable, and should not be linearizable, it's strictly single-threaded operation.

BUT one may use linearizability of operations on a data structure to determine a point when it's safe to destroy it (we are not talking about contents of destructor, only about a point when it's safe to destory).

Consider:

A consumer thread receives a last message from a producer thread. Now it "thinks": "ok, it's a last message, so producer does not use the queue any more, now I may delete the queue". And linearizable queue (mutex-based) permits such scenario, while TBB queue (as well as M&S queue) does not.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
682 Views
To prove the last item, how about the following simple history for a queue that has methods push(), pop(), and destroy(), allprotected withsame lock?
thread 1 calls queue.destroy()
thread 2 calls queue.push()
thread 1 returns from queue.destroy()
thread 2 returns from queue.push()... or maybe crashes.
Easy to see that each thread's view of the history is valid; the thread 2 did not call destroy() and it could not know some other thread did, because at the moment it called push() the queue was alive.

It has nothing to do with the topic, it's plain programming error. A thread should not destroy objects while they are in use. A thread must determine a point when it's safe to destroy an object first.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
682 Views
What I wanted to show is that a special synchronization protocol is required to make destruction thread-safe.

Indeed.

What you missed is that the protocol can include return values from methods of the object (dequeue returned LAST_MESSAGE -> safe to destroy).


Quoting Alexey Kukanov (Intel)
Moreover, the protocol Dmitry assumed (the producer pushes a special signal, the consumer destroys the queue after receiving the signal) is insufficient in general, because it does not work if there are several producers or consumers. And TBB concurrent_queue is multi-producer multi-consumer queue.

Nope. TBB concurrent_queue IS-A single-producer/single-consumer queue while you do not provide tbb::single_producer_single_consumer_concurrent_queue.
Or you want to say that documentation/intention prohibits usage of tbb::concurrent_queue as a spsc queue? Do you?

0 Kudos
Terry_W_Intel
Employee
682 Views
Consider:

A consumer thread receives a last message from a producer thread. Now it "thinks": "ok, it's a last message, so producer does not use the queue any more, now I may delete the queue". And linearizable queue (mutex-based) permits such scenario, while TBB queue (as well as M&S queue) does not.

I'm not sure I get this. Assume a queue is linearizable -- that its push and pop operations each have a single linearization point. These points govern what is visible by one operation, with respect to another. They occur at some point between the invocation and response of the operation. So a linearization point on the producer inserting the "last" element has passed, but the push operation may not have terminated yet. The pop operation "sees" the pushed element, takes it out, passes its linearization point, responds. Then the destructor is called. But nothing about those linearization points tells you that the push operation has completed and that it is safe to destroy the queue...? Or am I missing something?

Thanks!

Terry

0 Kudos
mwhenson
Beginner
682 Views
The bottom line is that I do not see the described issue as lack of linearizability, or as inconsistency with documentation. The solution for it, as I already suggested, is to use smart pointers.

I respectfully disagree. In the 1st post example, the consumer knows the producer has stopped producing, knows he's the only consumer, knows he's returned from the only pop call, then tried deleting. Had there been a lock around push / pop / destroy calls, the call to pop would not have returned until it was safe to delete in that example.

I'm not an expert on the definition of linearization, but this seems inconsistent with the definition presented here and in wikipedia. It's definitely inconsistent with a naive implementation with a single lock around everything. When we first ran into this problem, my first action was to check the documentation, but found no clues. I reiterate that this behavior is surprising to someone used to dealing with more naive implementations of a concurrent queue and suggest a note in the documentation if appealing to a less expert audience is a goal. I agree that a smart pointer should solve the problem, but that's not too useful when you have no reason to expect to need another layer of synchronization.

Mike

0 Kudos
Alexey-Kukanov
Employee
682 Views
Indeed your example is not linearizable in any sense, including my understanding. Such implementation would be just buggy.

In my understanding, enqueue() and dequeue() implementations should also synchronize internally, e.g. viaa readiness flag for the enqueued element. In this case, atomicincrement of size would serve as linearization point for size(), and setting the flag would serve as linearization point for dequeue().

The way to observe an object is to call some method of it. I say that making every method adhere to linearizability rules is very costly in general, and so is often impractical. Let's consider the example with the lock-based queue again. Let the producer thread push the completion signal, unlock the queue, and right after that be preempted while still inside the push() method. Let the consumer thread receive the signal, destroy the queue, _and_ unload the modulethat implements the queue- that seems to be safe because nobody is using the queue anymore, does not it? And then the producer threadresumes, and finds itself executing an epilogue of a function which no more exists. Ooops.

I will not comment on linearizability of the concurrent_queue, and I even take back my previous comment about push() and pop() being linearizable with respect to each other, because:
1) for the moment,we did not promise it in the documentation;
2) I did not prove it myself, neither I know of anyone who did.
0 Kudos
Alexey-Kukanov
Employee
682 Views
Quoting mwhenson
I respectfully disagree. In the 1st post example, the consumer knows the producer has stopped producing, knows he's the only consumer, knows he's returned from the only pop call, then tried deleting. Had there been a lock around push / pop / destroy calls, the call to pop would not have returned until it was safe to delete in that example.

I'm not an expert on the definition of linearization, but this seems inconsistent with the definition presented here and in wikipedia. It's definitely inconsistent with a naive implementation with a single lock around everything. When we first ran into this problem, my first action was to check the documentation, but found no clues. I reiterate that this behavior is surprising to someone used to dealing with more naive implementations of a concurrent queue and suggest a note in the documentation if appealing to a less expert audience is a goal. I agree that a smart pointer should solve the problem, but that's not too useful when you have no reason to expect to need another layer of synchronization.

Mike


What the consumer does not know however is whether the producer really returned from the enqueue call. And linearizability does not provide such a promise. All it promisesis that the operation effects important for the consumerare observed by the consumer at some instant point, no matter in what sequence and what time the calls were made. See my previous post - even with the lock based queue, no guarantee can be made that the producer has really returned from the call! Just that it has done its work.

Yes a note in the documentation might be reasonable. The note should tell that making destruction of a concurrent object thread-safe by means of the object is impossible in principle, and that additional means/protocols, such as smart pointers, should be used at the application level.

0 Kudos
Alexey-Kukanov
Employee
682 Views
I'm not sure I get this. Assume a queue is linearizable -- that its push and pop operations each have a single linearization point. These points govern what is visible by one operation, with respect to another. They occur at some point between the invocation and response of the operation. So a linearization point on the producer inserting the "last" element has passed, but the push operation may not have terminated yet. The pop operation "sees" the pushed element, takes it out, passes its linearization point, responds. Then the destructor is called. But nothing about those linearization points tells you that the push operation has completed and that it is safe to destroy the queue...? Or am I missing something?

Thanks!

Terry


It seems tome that Dmitry for some reason assumes that after the linearization point has passed the method cannot even touch (and much less change) any state of the queue. This is obviously impractical, and this does not follow from the definition of linearizability.

We have already discussed in the team that any attempt to make destruction of the concurrent object safe from inside the object is meaningless, because the container itself can not guarantee that no other thread would touch it the very next moment. For that purpose there should be additional protocols appliedwhich a generic container obviously can not know.

Dmitry suggested such a protocol that he thinks is suitable for single producer single consumer queues, based on the observation that it works with the lock-based queue. However let'a apply formal logic. We have two postulates:
- lock-based queue is linearizable;
- lock-based queue supports the described protocol for safe destruction.
From these postulates, does it follow that any linearizable spsc queue supports the described protocol? No it does not.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
682 Views

I'm not sure I get this. Assume a queue is linearizable -- that its push and pop operations each have a single linearization point. These points govern what is visible by one operation, with respect to another. They occur at some point between the invocation and response of the operation. So a linearization point on the producer inserting the "last" element has passed, but the push operation may not have terminated yet. The pop operation "sees" the pushed element, takes it out, passes its linearization point, responds. Then the destructor is called. But nothing about those linearization points tells you that the push operation has completed and that it is safe to destroy the queue...? Or am I missing something?


My reasoning is as follows.

For a linearizable data structure observable side effects of an operation are either (1) not take place at all or (2) all take place together. There is no partial states.

One of observable side effect is "return from the function" (by return I mean that a thread "stopped touching the object" rather than a physical return, i.e. execution of a RET instruction).

So if a consumer has observed that the message is in the queue then the producer stopped touching the object (provided that the object is linearizable).

And if the producer did not stop touching the object, then consumer may observe partial state (by unmapping the memory, and catching SIGSEGV). Thus the data structure in not linearizable.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
682 Views
In my understanding, enqueue() and dequeue() implementations should also synchronize internally, e.g. viaa readiness flag for the enqueued element. In this case, atomicincrement of size would serve as linearization point for size(), and setting the flag would serve as linearization point for dequeue().


Sequential consistency (linearizability) does not have any sense for a single thread (the order is just what is observed). SC only have meaning when we consider a plularity of observers (their partial observed orders must form a consistent total order).

So what you described is not sequentially consistent (linearizable), because if threads that execute size() and dequeue() communicate with each other (or with a third observer), they will reveal the inconsistency. Namely, size() returns 1 (i.e. enqueue has taken place) and AFTER that dequeue() return 0 (i.e. enqueue has not taken place).

0 Kudos
Alexey-Kukanov
Employee
682 Views
Sequential consistency (linearizability) does not have any sense for a single thread (the order is just what is observed). SC only have meaning when we consider a plularity of observers (their partial observed orders must form a consistent total order).

So what you described is not sequentially consistent (linearizable), because if threads that execute size() and dequeue() communicate with each other (or with a third observer), they will reveal the inconsistency. Namely, size() returns 1 (i.e. enqueue has taken place) and AFTER that dequeue() return 0 (i.e. enqueue has not taken place).

This time it seems you did not get my point :)
In what I described, the dequeue would not return at all until it sees the object isfully constructed and could be obtained. It would be impossible to make a wait-free dequeue calllinearizable, but possible with a blocking dequeue().

0 Kudos
mwhenson
Beginner
682 Views
Thanks for the clarification everyone, I found it useful.

Mike
0 Kudos
Dmitry_Vyukov
Valued Contributor I
639 Views

It seems tome that Dmitry for some reason assumes that after the linearization point has passed the method cannot even touch (and much less change) any state of the queue. This is obviously impractical, and this does not follow from the definition of linearizability.


It does follow. I can observe that the operation is partially committed: the message is available for consuming, but the thread is not stopped touching the object. "Touching the object" is observable side effect because, well, I am able to observe it.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
639 Views

We have already discussed in the team that any attempt to make destruction of the concurrent object safe from inside the object is meaningless, because the container itself can not guarantee that no other thread would touch it the very next moment. For that purpose there should be additional protocols appliedwhich a generic container obviously can not know.

So is the following implementation of reference ocunting Ok with you?

bool release_object(object_t* obj)
{

int rc = obj->rc.fetch_sub(1, memory_order_acq_rel);

obj->last_access.store(get_timestamp(), memory_order_relaxed);

return rc == 1;

}

0 Kudos
Reply