- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I know from the documentation that I cannot really remove / erase elements from a concurrent_vector in a direct manner. This is rather prohibitive. I mean, I could create a new vector without that element... and swap... but that's going to be overkill.
Is there a trick that could be used, or ideas for supporting this in the future?
AJ
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You could use the same technique as std::vector::erase() uses - shifting items ina tail to newly free space. But it is not thread-safe in terms of parallel access to the same items. And it doesn't deallocate the memory.
In order to free a memory, you need copy anyway.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The iterator could just "skip over" these items as if they are gone, and in reality a serial or mutex-protected region of code could call such an apply_erase() operation. Leave it to the user to force the actual erase to be applied, I'd be happy with that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sure, you could add lots of hooks to concurrent_vector, but those additions would probably make it a whole lot slower. Marking vector entries as invalid requires a flag per element. You could embed it with each entry (has implications for the user type used to define the vector element) or collect them all together somewhere else (forcing more cache lines to be available for simple array access and possibly introduce false sharing issues). Either way, the accessor methods would be complicated by extra checking to avoid entries that have been marked invalid. Currently the accessor does a simple two-level lookup to find the segment and the index within the segment:
templateT& concurrent_vector ::internal_subscript( size_type index ) const { __TBB_ASSERT( index (my_segment .array) ; }
Moreover, the thing that makes TBB concurrent vector philosophically different than its STL equivalent is that the allocations are never relocated as would have to be done withthe suggested compact() operation. It's more important to preserve vector locations and any cache residency that may be based on that memory layout than to provide operations that might disrupt the current cache context. Doing a compact to remove invalid elements is directly counter to that philosophy.
There is nothing to prevent you from defining a vector element type that contains a validity flag. You could easily perform some experiments on this to convince yourself of the extra overhead that would be required to add such features.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's proving to be rather difficult to find a balance between the computer science theoretic stuff and what's going to happen with the machine...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your post indicates that you have a good understanding on many issues with parallel data structures, please share with us some of the possible avenues that you are considering could be explored, rather than erasing elements. Perhaps there are common patterns to parallel data structures that you can share with us?
Ultimately, yes, a priority queue is something that I could use a concurrent_vector forever. However parallelizing a priority queue would be non-trivial. I have read a few papers on the subject, and have not yet found a satisfactory algorithm.
Currently I am using a modified version of concurrent_vector with pop_back() to support a concurrent stack implementation. The push and pop operations still require a mutex for real concurrency.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm far from being an expert on the matter, but in case this is about that priority queue, I replicate here my suggestion to have a tbb::concurrent_queue drain into a std::priority_queue (the encapsulating object would provide direct access to the tbb::concurrent_queue for pushing new values, and synchronised access for popping the top of the tbb::priority_queue after first emptying the tbb::concurrent_queue into the tbb::priority_queue). Do you have requirements that might make this an unsuitable solution? Even then tbb::concurrent_vector seems unlikely to be a suitable foundation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've decided to approach my problem in a different manner, so I've worked around not having erase for now. At this point, I hope that this thread might help others who come up against the same question. It is my hope that talking about the issue could help us come up with a way to provide an erase in an efficient manner. Perhaps come up with an idiom to be able to tell others 'when you need to erase from a concurrent_vector, here's what you do...'
I thought about your example of marking elements invalid, and I've also thought on how this can be done using bit masks on the vector internally so that the user wouldn't have to supply the flag themselves. The problem is, as I think you pointed out, moving elements around internally will hurt cache performance. Perhaps it's not possible to avoid this... after all we are moving around data that is contiguous.
I suspect (not sure) that it is possible to use template programming tricks to ensure that two concurrent_vectors are effectively generated, one that supports erase() with the overhead and one that generates the code we have now... the generation of the erase() supporting vector would be triggered by an actual call to erase() somewhere along the line.
The code supporting erase could have an iterator that skips over invalidated data, and a "compact_erased()" routine that will actually move things around and hence cause the performance hit. The documentation can warn the user about the performance hits involved with erase() and compact_erased().
What are your thoughts on this approach?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps Arch or Alexeycould come up with appropriate "template programming tricks" to trick out the expanded concurrent_vector you suggest, but I doubt it. Everything from constructors down to accessors would have to be duplicated in order to have versions that check for validity and take the performance hit. And for what? a version of concurrent_vector that doesn't scale but at least is type-safe? I think there are more important things for the development team to worry about, but that's just me.
While I'm here, let me make one little adjustment to your statementon the problem of carrying validity bits in a bit mask. It's not moving elements around internally that will hurt performance. It's the fact that the bit mask is probably going to be in cached memory, which means, if you're really talking bits, that each flag shares a cache line with up to 511 of its neighbors. Any thread writing a bit needs to pull that part of the bit mask into its local cache/store buffers, which means that threads could serialize on the validity test. At least if the validity bits are distributed with the elements, collisions are less likely.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am not by training an engineer, I'm a computer science and mathematics major. The TBB team has a wealth of experience with processor and compiler issues. I don't have this background or training yet, but I am examining the TBB code on a regular basis, reading a lot of books, and attempting to contribute to the code base as much as I can with the skills I currently have.
Each time I do encounter something I don't know about, I buy a book and read it. Are there books I can read to help me understand the cache issues?
What do you think of these ones: http://shop.intel.com/shop/product.aspx?pid=sibk3603
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Engineers? Where? I was a physics and math major when I was in school. And I love having a job that makes me deal with questions like those you raise, because it makes me strive to become more knowledgable, in order to help you (that's the collective "you").
Books from Intel Press? They're all great! Honestly, I must say that I don't own any of the books in the quartet referenced in your last post. I did do some review work on the first edition of Rich Gerber's book, and I looked over the table of contents of Shameem's book and found it to have reasonable coverage, but Ihaven't gotten a copy yet. Stewart Taylor's book is about Intel Integrated Performance Primitives and James Reinders' is about VTuneAnalyzer; neither is likely to explain much about multi-core programming issues asa primary topic, but either of the first two could be useful.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page