Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2464 Discussions

concurrent_vector returns incorrect size()

jaredsf
Beginner
940 Views
Hi all,
I was debugging an issue we're seeing and noticed some strange behavior from tbb::concurrent_vector. I was hoping someone could tell me whether or not its expected.
The bottom line is that after a push_back() call actually completes, the size() of the vector does not reflect this. I've narrowed it down to be capacity related because if I capture capacity() and size(), size() == capacity() which leads me to believe that size() is returning the capacity() and not the actual number of elements.
I have created a simple program that duplicates this issue, which triggers most often when the vector is empty. For simplicity, this program simply ASSERTs that size() != 0 immediately after a push_back() call returns. I hope someone can please tell me if this is expected behavior or if its a bug.
-------------------------------------------
#include
#include
#include
#include
typedef tbb::concurrent_vector vec_type;
void *invokePushBack(void * vector)
{
vec_type *vec = (vec_type *)vector;
for(int i = 0 ; i < 10 ; ++i)
{
vec_type::iterator it = vec->push_back(1);
assert(vec->size() != 0);
}
pthread_exit(NULL);
}
int main()
{
int rc;
pthread_t tg[5];
// note: the race condition doesn't always trigger,
// so we loop until it does.
while(true)
{
vec_type vec;
for(int i = 0 ; i < 5 ; ++i)
{
rc = pthread_create(&tg, NULL, invokePushBack, &vec);
if (rc != 0)
{
std::cerr << "pthread_create failed: " << strerror(errno) << std::endl;
}
}
for (int i = 0 ; i < 5 ; ++i)
pthread_join(tg, NULL);
}
}
--------------------------------------
I compiled with the command:
g++ test.cpp -ltbb
Thanks!
Jared
0 Kudos
19 Replies
Anton_M_Intel
Employee
940 Views
Jared, it is expected behaviour though it is not clear enough from the Reference, sorry. size() just cannot describe state of the vector doing concurrent growth operations, it is serial concept. And using size() supposes accessing elements that are not requested by this particular thread/task. It means that these elements can be not constructed or even not allocated yet. Thus, size() returns only number of allocated elements from the very beginning for the safety reasons. In your case, the first elements were not yet allocated. So, it is safe to traverse only the range delimited by size(). Please refer to my blog for more details.
0 Kudos
jaredsf
Beginner
940 Views
Anton,
Thanks for the response. At the very least, I think the documentation for size() needs to be corrected. As you stated, it should indicate that not only can objects being constructed be included, but objects that have been successfully constructed may NOT be included.
Before your response, we devised a patch that we believe improves concurrent_vector (for our use case), although based on your blog entry it seems like it may not be ideal in the general case. Anyway, I've included it here (and will attach it to the bug you will presumably close) for completeness.
Thanks for your response, and thanks for a great library!
Jared
---------------------------------------------------------------
[plain]diff -r 45ec09f1beb8 tbb/src/tbb/concurrent_vector.cpp
--- a/tbb/src/tbb/concurrent_vector.cpp Fri Feb 25 11:57:53 2011 -0500
+++ b/tbb/src/tbb/concurrent_vector.cpp Wed Mar 23 11:30:53 2011 -0400
@@ -113,6 +113,16 @@
     static void extend_segment_table(concurrent_vector_base_v3 &v, size_type start);
 
     inline static segment_t &acquire_segment(concurrent_vector_base_v3 &v, size_type index, size_type element_size, bool owner) {
+        // Ensure that the previous segment has been fully allocated.  If this is not done,
+        // then size() may not report this segment even after it is fully constructed
+        // bugzilla 184
+        if (index > 0)
+        {
+            segment_t &previous_s = v.my_segment[index - 1];
+            if( !__TBB_load_with_acquire(previous_s.array) )
+                spin_wait_while_eq( previous_s.array, (void*)0 );
+        }
+
         segment_t &s = v.my_segment[index]; // TODO: pass v.my_segment as arument
         if( !__TBB_load_with_acquire(s.array) ) { // do not check for internal::vector_allocation_error_flag 
             if( owner ) {[/plain]
0 Kudos
Alexey-Kukanov
Employee
940 Views
Anton, I am not sure that what the test exhibits is really an expected behavior.

Note that each thread pushes an element into the vector *before* requesting the size. When push_back() returns, it should have this element allocated and constructed, right? Therefore neither capacity nor my_early_size can be zero, and so should not be size().

So I'm afraidit might be a race lurking somewhere. If I am mistaken here, please explain.
0 Kudos
Alexey-Kukanov
Employee
940 Views
I got it probably. There mustbe two vector segments allocated concurrently, and the one that contains elements with higher indexes was allocated so some push_back() calls can return, but the one for very first elements is not yet allocated, so capacity() still conservatively returns 0, and so is size().

ThenI agree that Reference needs an update to warn about possibility of such behavior.
0 Kudos
Anton_M_Intel
Employee
940 Views
Jared,
We'll think what to do with the vector to avoid such surprises in future. The patch is correct though serializes allocation of the segments and so degrades the scalability of the container. Thus it is not the way we can go.
Could you explain your use-case please? Why do you need concurrent size()? As for the test, it is enough to replace size()!=0 by !empty().. And another way to get the "unsafe size" is to calculate difference between begin() and iterator retuned by push_back(): difference_t unsafe_sz = vec->push_back(1) - vec->begin();
0 Kudos
jaredsf
Beginner
940 Views
Anton,
I understand that the patch is not generally applicable. My use case isthat we simultaneously push_back() a few thousand entries (calculating the index by subtracting push_back() - begin()), and also need to access the elements by index at a later time (milliseconds). For safety, we wanted to make sure that the index points to a valid element, so we were comparing against size() first. This was incorrectly reporting that the index was greater than size(), which executed failure code. In fact, the size() just hadn't caught up to the index yet.
We considered using at() which would do the correct checking for us, but wanted to avoid the extra checking and exception handling if possible since we know that the only concurrent growth operation we're performing is push_back(). I think you guys are right and the current behavior is correct and desired, we just didn't know what that behavior was. If the documentation is a little clearer this can be prevented in the future.
Thanks!
0 Kudos
vandelayyyy
Beginner
940 Views
hi Anton..
I believe I have the same issue using tbb40_20120408oss on Lion 10.7.4. Multiple threads are using push_back() and another thread is attempting to access the elements, but the size() has not caught up. Jared's sample code fails for me as well. Has this been addressed or is there a recommended solution?
thanks
0 Kudos
RafSchietekat
Valued Contributor III
940 Views
"Has this been addressed or is there a recommended solution?"
It is my impression that there is no scalable solution, but maybe a concurrent_queue is more appropriate in your situation?
0 Kudos
vandelayyyy
Beginner
940 Views
hmm.. I'm not sure. I have a pipeline where the item passed through the filters contains two concurrent vectors:
- one contains objects
- the other contains references to the object indices which are used in a search algorithm
The problem is that the filters farther down the pipeline see the second vector referring to entries in the first vector which are not yet visible to the thread handling the filter.
The trivial solution, which i think is fine for now in this case, is to spin a bit to ensure the size() reflects the latest push_back().This avoids modifying the TBB source in ways which will affect other parts of the code and is trivially correct but it might be helpful if there was a way to tell TBB to address this when creating the concurrent_vector.. maybe there is and I just don't know about it.
0 Kudos
vandelayyyy
Beginner
940 Views
hi Raf..
I was just looking into spinning and noticed you were deep into that area a while ago (http://software.intel.com/en-us/forums/showthread.php?t=59853). Even if it turns out not to be the best option, I would be interested in knowing what a good approach for this would be. I think that the task should loop a bit then, if size() is still not updated, progressively back off by giving its time slice back to the scheduler for longer periods of time.. ideally such that other tasks running on the same thread can resume. But I'm not familiar enough with this area of TBB to know if it is possible.
Alternatively, a simple, clean solution using concurrent queue(s) would be good. The current data structures do seem to fit well with the problem, though. It's actually a bit more complex than I indicate above because the second vector is embedded in a concurrent_hash_map and I don't know how to recast this to a queue(s).
thanks for your help!
0 Kudos
RafSchietekat
Valued Contributor III
940 Views
Why not store an index in the data item and ignore size()? Later filters should be able to read everything that an earlier filter has written for a particular data item, even from a different thread.
0 Kudos
vandelayyyy
Beginner
940 Views
ok.. thanks. The fact that a valid index can be greater than size() seems counter-intuitive, though.
0 Kudos
Benyu_Z_
Beginner
940 Views
Hi Anton, Why difference_t unsafe_sz = vec->push_back(1) - vec->begin() is considered as unsafe? Can I use vec[unsafe_sz] later to access the '1'?
0 Kudos
Anton_M_Intel
Employee
940 Views
Benyu Z. wrote:

Hi Anton,
Why difference_t unsafe_sz = vec->push_back(1) - vec->begin() is considered as unsafe?
Can I use vec[unsafe_sz] later to access the '1'?

Yes, since it is what actually returned as iterator by push_back(), (the name unsafe_sz is a bit misleading - it is rather the last known allocated index) All the elements in range [0; unsafe_sz-1] can be not yet constructed or even allocated thus unsafe. Only elements constructed in the same thread by calls to push_back() or grow_by() are safe to access.
0 Kudos
SergeyKostrov
Valued Contributor II
940 Views
>>...The fact that a valid index can be greater than size() seems counter-intuitive... This is a fundamental problem and let me provide some information on how concurrency problems are solved in modern relational databases. In relational databases a concept of Isolation Levels is used for more than 20 years and it was introduced in a SQL-92 standard. You can find more information on that subject in Microsoft SQL Server Books Online. It guarantees integrity (!) of processing and Isolation Levels are controlled by SET TRANSACTION ISOLATION LEVEL *level* SQL command. There are four levels: - read uncommitted ( all records are read ) - read committed ( only committed records are read ) - repeatable read ( transactions are completely isolated but it does not protect against phantoms ) - serializable ( transactions are completely isolated and it protects against phantoms ) Transaction Isolation Levels solve all concurrency problems and TBB developers could call that concept as Processing Isolation Levels in TBB terms. I think you could adopt that concept in future Releases of TBB. PS: I could provide a very simple SQL script ( if interested ) that demonstrates how Isolation Levels work in a database.
0 Kudos
RafSchietekat
Valued Contributor III
940 Views
I am curious about how you see isolation levels being applied here without a potentially significant or even excessive cost to performance?
0 Kudos
SergeyKostrov
Valued Contributor II
940 Views
>>...I am curious about how you see isolation levels being applied here without a potentially significant or even excessive cost to >>performance? First of all, I don't consider TBB library as a high-performance tool that can be used in a real-time environment especially in my case. Shortly speaking it is over-complicated. Then, Anton clearly explained that '...So, it is safe to traverse only the range delimited by size()...' and in terms what I've described in my previous post it corresponds to two isolation levels: 'read committed' - only committed elements are read - there are some uncommitted elements, or phantoms (!) or 'repeatable read' - processes are completely isolated but it does not protect against phantoms (!) Application of a new feature could look like ( PIL stands for 'Processing Isolation Level' ): ... #define _PIL_READ_COMMITTED 0x0001 #define _PIL_READ_UNCOMMITTED 0x0010 #define _PIL_REPEATABLE_READ 0x0100 #define _PIL_SERIALIZABLE_ 0x1000 ... void * invokePushBack( void *vector ) // Case 1 { vec_type *vec = ( vec_type * )vector; vec->setpilmode( _TBB_PIL_READ_UNCOMMITTED ); for( int i = 0; i < 10; i++ ) { vec_type::iterator it = vec->push_back(1); assert( vec->size() != 0 ); // It is NOT Safe processing //... } pthread_exit( NULL ); } ... or void * invokePushBack( void *vector ) // Case 2 { vec_type *vec = ( vec_type * )vector; vec->setpilmode( _TBB_PIL_READ_COMMITTED ); for( int i = 0; i < 10; i++ ) { vec_type::iterator it = vec->push_back(1); assert( vec->size() != 0 ); // It is Safe processing //... } pthread_exit( NULL ); } ... A feature like that had to be considered from the very beginning.
0 Kudos
RafSchietekat
Valued Contributor III
940 Views
First of all, I don't consider TBB library as a high-performance tool that can be used in a real-time environment especially in my case. Shortly speaking it is over-complicated.
That sounds harsh! I hope you only mean that it's not really suitable for real-time environments (which is obviously true because it typically sacrifices fairness for throughput, although many soft real-time applications are still quite happy with its characteristics), not that it's not high-performance, or over-complicated for all purposes...
Then, Anton clearly explained that '...So, it is safe to traverse only the range delimited by size()...'
Maybe he didn't really mean it? :-) It looks wrong to me, and his own blog contradicts it.
A feature like that had to be considered from the very beginning.
Are there multithreading toolkits that incorporate isolation levels? I'm not convinced from your example, which does not indicate transactions. BTW, isn't READ_UNCOMMITTED the weakest form ("Look mom, no books!")? You would have to have an API for atomically adding an element or subrange of elements to the vector so that they show up in their fully constructed form only, and that probably means that every subrange is individually allocated and then added by an atomic pointer instruction, which is just very very costly if all you want to do is build a collection for consumption after a barrier. I don't even think you can do this by just setting a mode on an existing instance; maybe as an argument to a factory that selects a particular implementation, but such a multi-isolation level API would probably be suboptimal for at least one of them. It's an interesting subject, of course, but for now I think you'll have to accept the limitations. Please convince me if I'm mistaken or behind the times!
0 Kudos
SergeyKostrov
Valued Contributor II
940 Views
>>...That sounds harsh! I hope you only mean that it's not really suitable for real-time environments... Yes. It is not suitable for real-time environments when every micro second is taken into account. >>...It's an interesting subject, of course, but for now I think you'll have to accept the limitations... Yes, I accepted these limitations and I know that it is not simple to implement the 'PIL' concept in existing version of TBB.
0 Kudos
Reply