bug in concurrent_bounded_queue? - Page 3

pp0 · ‎04-28-2010

Hi there,

The snippet below will crash after a while with an access violation somewhere below tbb_debug.dll!concurrent_queue_base_v3::internal_push() when running 1 tbb thread for consume() and 10 for produce(). This is reproducible on my Vista 64 Box (snippet is compiled to 32 bits/debug). I am using Studio 2008 SP1. Any ideas?

struct QueuedCall
{
void unblock() { myQueue.push(1); }
void block() {
int dummy;
myQueue.pop(dummy);
}

tbb::concurrent_bounded_queue myQueue;
};

tbb::concurrent_queue myPendingCalls;

void consume() {
do {
QueuedCall *call;
if (myPendingCalls.try_pop(call))
call->unblock();
} while (true);
}

void produce() {
for (;;) {
QueuedCall *c = new QueuedCall();
myPendingCalls.push(c);
c->block();
delete c;
}
}

Alexey-Kukanov · ‎06-04-2010

Quoting Dmitriy Vyukov

It does follow. I can observe that the operation is partially committed: the message is available for consuming, but the thread is not stopped touching the object. "Touching the object" is observable side effect because, well, I am able to observe it.

Byunmapping thememory page which the object resides in? Ohwell :) This way you can observe almost everything, e.g. changes of automatic variables on the stack of another thread.

To me, linearizability is about a consistent behavior of the agreed upon set of functions operating on an object. These functions,when executed concurrently,should agree with each other about what they do with the object, in a predictable manner. Observations are only done by calling these functions and analyzing what it returned, and there should be a sequentially consistent history of calls and returns, including argument values and return values. It does not matter if the functions can see partial activity of concurrent operations internally; actually, in most cases they do. It does matter that when a function returns,its results areexplainable given what thethread may know about the history of previous operations with the object.

Dmitry_Vyukov · ‎06-04-2010

Quoting Alexey Kukanov (Intel)

What the consumer does not know however is whether the producer really returned from the enqueue call. And linearizability does not provide such a promise. All it promisesis that the operation effects important for the consumerare observed by the consumer at some instant point, no matter in what sequence and what time the calls were made. See my previous post - even with the lock based queue, no guarantee can be made that the producer has really returned from the call! Just that it has done its work.

Yes a note in the documentation might be reasonable. The note should tell that making destruction of a concurrent object thread-safe by means of the object is impossible in principle, and that additional means/protocols, such as smart pointers, should be used at the application level.

Good point wrt unloading the module.

I guess the problem boils down to what is observable effect. I see 3 variants:

(1) return values from the object's methods

(2) (1) + touching the object's data in any way

(3) (2) + executing code from the object's methods

All of these can be observed, so strictly speaking must be followed for 100% linearizable object.

(1) is a kind of basic guarantee. We all agree here.

(3) IMVHO is too special case, dyanmic module loading/unloading is used in a few programs and even in them is not a top priority operation.

(2) IMVHO is what must be included in the definition of plain linearizability (when you do not do any special remarks). Because object creation/destruction is present in basically all programs and moreover often it's a frequent operation.

Note that for some concurrent data structures (2) is not only "nice" property, but acknowledged requirement (namely, mutex, reference counted object, etc).

Dmitry_Vyukov · ‎06-04-2010

And note that only (2) is provided by mutexes. And I guess the intent behind linearizability is "as if protected by a mutex".

Dmitry_Vyukov · ‎06-04-2010

Quoting Alexey Kukanov (Intel)

Byunmapping thememory page which the object resides in? Ohwell :) This way you can observe almost everything, e.g. changes of automatic variables on the stack of another thread.

Not necessary unmapping the memory. Just deleting the object.

Dmitry_Vyukov · ‎06-04-2010

Quoting Alexey Kukanov (Intel)

To me, linearizability is about a consistent behavior of the agreed upon set of functions operating on an object. These functions,when executed concurrently,should agree with each other about what they do with the object, in a predictable manner. Observations are only done by calling these functions and analyzing what it returned, and there should be a sequentially consistent history of calls and returns, including argument values and return values. It does not matter if the functions can see partial activity of concurrent operations internally; actually, in most cases they do. It does matter that when a function returns,its results areexplainable given what thethread may know about the history of previous operations with the object.

It's Ok to define some special agreements wrt set of methods and observable effects, and even use the term "linearizability" with according remarks.

But if it is not done, "linearizability" is "as if protected by a mutex". That's what everybody expects.

For some concurrent algorithms "touching the object's data" is not just an observable effect is THE observable side effect:

bool release_obeject(obj)

{

pthread_mutex_lock(&obj->mtx);

int rc = --obj->rc;

obj->timestamp = get_timestamp();

pthread_mutex_unlock(&obj->mtx);

return !rc;

}

vs:

bool release_obeject2(obj)

{

int rc = obj->rc.fetch_sub(1) - 1;

obj->timestamp.store(get_timestamp());

return !rc;

}

I insist that release_obeject2() is not linearizable just because it touches the object after committing one of the side effects.

Andrey_Marochko · ‎06-04-2010

Quoting Dmitriy Vyukov

And note that only (2) is provided by mutexes. And I guess the intent behind linearizability is "as if protected by a mutex".

No, linearizability is a theoretical abstraction, and mutexes is just one way to implement it in practice.

Reasoning in terms of "too special" or "frequent operation" you are trying to attribute properties that are unrelated to the abstraction.

One can define the scope of applicability of the theoretical concept to a practical implementation, including both operations set and observable effects. In case of tbb::concurrent_queue it is push() and pop() operations and their argument and return values. And linearizability means that once push() method returns, it does not access its argument anymore, and the value it inserted into the queue is available for subsequent pops. And once a pop() returned a vaule, this value is inaccessible to any other pop() call. Nothing more.

Atomicity, which is often associated with linearizability (though original definition des not involve it), just means that effects of concurrently running operations do not interfere with each other, even when the operations may overlap in time. E.g. Herlihy writes that "each operation should appear to take effect instantaneously", not that a single store (atomic at hardware level) must be used to implement linearizability.

Dmitry_Vyukov · ‎06-04-2010

Quoting Andrey Marochko (Intel)

Reasoning in terms of "too special" or "frequent operation" you are trying to attribute properties that are unrelated to the abstraction.

Indeed. That's what I meant by "All of these can be observed, so strictly speaking must be followed for 100% linearizable object."

Dmitry_Vyukov · ‎06-04-2010

Quoting Andrey Marochko (Intel)

No, linearizability is a theoretical abstraction, and mutexes is just one way to implement it in practice.

Yes. But this does not contradict to what I said. Linearizability is a theoretical abstraction, which is build with "as if protected with mutex" in mind (intention behind)). Thus, mutex automatically is one way to implement it.

Dmitry_Vyukov · ‎06-04-2010

Quoting Andrey Marochko (Intel)

Herlihy writes that "each operation should appear to take effect instantaneously", not that a single store (atomic at hardware level) must be used to implement linearizability.

Yes, but "should *appear* to take effect instantaneously" means that it should be indistinguishable in any externally observable sense from implementation that uses single atomic store. Which is not the case for algorithms that touch object's data after lineariataion point (TBB queue, M&S queue).

Terry_W_Intel · ‎06-04-2010

Quoting Dmitriy Vyukov

Quoting Andrey Marochko (Intel)

Herlihy writes that "each operation should appear to take effect instantaneously", not that a single store (atomic at hardware level) must be used to implement linearizability.
Yes, but "should *appear* to take effect instantaneously" means that it should be indistinguishable in any externally observable sense from implementation that uses single atomic store. Which is not the case for algorithms that touch object's data after lineariataion point (TBB queue, M&S queue).

I think you are reading way more into the definition of linearizability than what is there. If what you say is true, then many of the algorithms out there that claim linearizability, are not. Because the exact linearization point itself is *not always externally observable* you can only guarantee that the point is passed after the operation responds. Only after that is it safe to call a destructor. (In the example, we know we have passed the linearization point, but that still does not imply "safe to delete container", it only implies, "the data is ready".) This discussion is mixing the very specific and strong property of linearizability (which deals with the behavior of the container -- i.e. when are the effects of operations seen by other operations) with general thread safety. Destructor is not a linearizable operation. Linearizability doesn't apply here.

-Terry

RafSchietekat · ‎06-04-2010

Maybe I missed it from this suddenly very intense discussion, but do we know for sure that a queue can be written without any kind of performance penalty whatsoever that still allows destruction right after receiving the last message? And even so, how important is it to emulate that aspect of a mutex-protected queue?

(Added) Or the other way around: do we know that such a queue can't be written?

mwhenson · ‎06-04-2010

Quoting Raf Schietekat

And even so, how important is it to emulate that aspect of a mutex-protected queue?

Personally, I don't think it's very important. I just found the implementation very opaque in this regard.

Mike

Dmitry_Vyukov · ‎06-04-2010

Quoting Raf Schietekat

Maybe I missed it from this suddenly very intense discussion, but do we know for sure that a queue can be written without any kind of performance penalty whatsoever that still allows destruction right after receiving the last message?

Perhaps it's possible to modify just destructor of tbb queue so that it will wait for outstanding (partially commited) pushes and pops (passively spin until some counters reach necessary values). That will have basically zero overhead on performance.

Dmitry_Vyukov · ‎06-04-2010

Quoting Raf Schietekat

And even so, how important is it to emulate that aspect of a mutex-protected queue?

If I would be porting a large existing application which constantly creates and destroys queues from mutex-based queues to concurrent queues, I would not mind such a support.

Terry_W_Intel · ‎06-04-2010

Quoting Dmitriy Vyukov

Quoting Raf Schietekat

Maybe I missed it from this suddenly very intense discussion, but do we know for sure that a queue can be written without any kind of performance penalty whatsoever that still allows destruction right after receiving the last message?
Perhaps it's possible to modify just destructor of tbb queue so that it will wait for outstanding (partially commited) pushes and pops (passively spin until some counters reach necessary values). That will have basically zero overhead on performance.

With any other type of global data, it's the user's responsibility to not delete something that may be in use by another thread. Why is this case different?

Alexey-Kukanov · ‎06-04-2010

I'd like to thank all fora good and useful discussion. At the very least, it showed to me that understanding of linearizability inside the TBB team at Intel appears to be consistent :) Below is what me, Andrey, and Terry said:
"linearizability is about a consistent behavior of the agreed upon set of functions ... These functions,when executed concurrently,should agree with each other about what they do with the object, in a predictable manner. Observations are only done by calling these functions and analyzing what they returned."
"linearizability is a theoretical abstraction <...> One can define the scope of applicability of the theoretical concept to a practical implementation, including both operations set and observable effects."
"linearizability ... deals with the behavior of the container -- i.e. when are the effects of operations seen by other operations"
Of course I do not pretend this to be the absolute truth; but knowing that we understand it the same way is great. And all three of us seem to think that safe object destruction is unrelated to linearizability of other operations with the object.

Trying to identify some actionable points out of the discussion, I think we should
- clarify in the documentation thatcontainer destruction is not thread safe, and should only be called after all other operations are known to complete (and not just observed to take effect);
- analyze whether concurrent_queue's operations that push or pop data are linearizable (I think they are, but I did not yet think of the proof);
- document whatever behavior guarantees we want to provide for the container.

Dmitry_Vyukov · ‎06-04-2010

Quoting Terry Wilmarth (Intel)

With any other type of global data, it's the user's responsibility to not delete something that may be in use by another thread. Why is this case different?

Just because if data structures that you have do not feature such a property (safe deletion) then you won't be able to delete ANY shared objects during program run time.

Consider that you need to delete a concurrent queue object (and it does not feature such a property). You can create separate helper reference counter to determine when it's safe to delete the queue. You determine when it's safe to delete the queue, and actually delete it. What now? Now you need to delete the helper object. How to do that? You can create third object to determine when it's safe to delete second object. Then fourth, and so on.

More precisely, number of shared objects will be non-decreasing function. Even if your helper objects occupy just one machine word, you will run out of memory quite fast.

You may say that you will join some threads and then delete all the memory that they used (the fact that some programs just do not periodically join threads during run time aside for a moment). Thread is a shared object too. So you still need to determine when it's safe to release all the resources occupied by a thread. If it does not feature safe deletion property, you need to create one more helper object, and we back from where we started.

Now let's start from the beginning.
Concurrent programs need to delete objects during run time. Period.
So at least some objects HAVE to have safe deletion property. Period.
I.e. you need to determine when it's safe to delete an object basing you reasoning only on values returned from methods of the object itself (w/o using external objects).

Now when you can delete some objects we indeed may allow some other objects to not feature the property (they will be a kind of dependent, and always require some external assistance). But on the other hand it's at least nice if as much as possible objects will feature the property, because, well, we need to delete every object at some point in time.

RafSchietekat · ‎06-05-2010

Surely Terry Wilmarth didn't mean that all objects need a helper, with smart pointers having been mentioned quite a few times already. Would you also want to be able to delete the queue after seeing evidence of N items having been pushed if you know that only N items would be coming, or would the prefix in "unsafe_size()" dissuade you in that case? Would you expect try_pop() at least to be successful? If not, why not, and if so, are you sure that there's no penalty involved, because that would seem to require flushing the pipeline before reporting a size? If it's easy to make a size-reporting function that's conservative in the sense of excluding "concurrent modifications in flight", why wouldn't concurrent_queue have one? Just wondering...

(Added) Basically, if deleting a queue based on a popped item is an important property, why not have close() and atend() instead?

ARCH_R_Intel · ‎06-07-2010

I was on vacation and am just catching up on this discussion.

As a practical matter, did anyone check whether concurrent_queue would work in this use case? I think that it might, but have not had time to check. From a quick look, it appears that concurrent_queue::push does not touch *this after the pushed item becomes visible to the consumer.

The difficulty in the implementation of concurrent_bounded_queue::push is that it must signal the a waiting consumer thread to wake up, if there is such a waiter. The consumer is signaled after the point that the item became available to the consumer. Thus the signaling mechaniusm must not be destroyed prematurely. Ironically, the given use case never has a waiting consumer since it is using try_pop, not pop, and actually does not need the wakeup action.

Dmitry_Vyukov · ‎06-07-2010

Quoting Arch Robison (Intel)

As a practical matter, did anyone check whether concurrent_queue would work in this use case? I think that it might, but have not had time to check. From a quick look, it appears that concurrent_queue::push does not touch *this after the pushed item becomes visible to the consumer.

The difficulty in the implementation of concurrent_bounded_queue::push is that it must signal the a waiting consumer thread to wake up, if there is such a waiter. The consumer is signaled after the point that the item became available to the consumer. Thus the signaling mechaniusm must not be destroyed prematurely.

I came to the same conclusion.

Btw, one of the possible ways to support the use case is to use something along the line of hazard pointers. I.e. move actual data into dynamically allocated object, and prolong it's lifetime if necessary.

mwhenson · ‎06-07-2010

Quoting Arch Robison (Intel)

Thus the signaling mechaniusm must not be destroyed prematurely. Ironically, the given use case never has a waiting consumer since it is using try_pop, not pop, and actually does not need the wakeup action.

This is an interesting thought; though I'm not sure that's true in this case. It looks like the original example uses pop() where it appears to crash, and the queue with the try_pop() call is never explicitly deleted.

Mike