Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

NFS_Allocate usage in concurrent_vector

Ivan_A_
Beginner
658 Views

Hi,

I am using a concurrent_vector with a custom allocator relying that all allocation done by the vector will be using that allocator. My goal is to make sure that all the vector data resides in preallocated memory at predefined address. This works great with all std containers.

Unfortunately this is not working well with tbb::concurrent_vector. I pinpointed the problem area in concurrent_vector.cpp:229. In function concurrent_vector_base_v3::helper::extend_segment_table a new segment table allocation is done with NFS_Allocate instead of using the vector's allocator. Is this really necessary?

Thank you!

0 Kudos
1 Solution
RafSchietekat
Valued Contributor III
658 Views

I don't think that this could easily be changed in the official version just to accommodate your use case. For one thing, it wouldn't make sense to allow it for concurrent_vector without also adapting the other container types, e.g., concurrent_hash_map (which uses cache_aligned_allocator, implemented on top of the same functions, for its control structures). Perhaps if this were a prevalent pattern...

You could of course patch this yourself, but perhaps that wouldn't even provide the best solution, because continuing to incur the cost of the more complicated element access of concurrent_vector after it has been frozen does not make much sense if your goal is performance, which I assume to be the case.

 

View solution in original post

0 Kudos
5 Replies
RafSchietekat
Valued Contributor III
658 Views

At first sight, this is only for support structures, not for element data, and use of this particular allocator seems intended to help prevent false sharing. It's not clear to me whether allocation on top of the user-supplied allocator would be helpful to you, even disregarding the inevitable waste from padding, because you didn't state the purpose of that "preallocated memory at predefined address".

0 Kudos
Ivan_A_
Beginner
658 Views

The use case is that we generate big database in a memory mapped file at predefined address. Then we distribute it on worker machines and map it again. This way all pointers between everything in the file are kept valid.

While these are support structures, they are part of the vector and being unable to control how they are allocated is a problem for us because obviously they are being left out of the predefined memory and subsequently the concurrent_vector is not usable.

0 Kudos
RafSchietekat
Valued Contributor III
659 Views

I don't think that this could easily be changed in the official version just to accommodate your use case. For one thing, it wouldn't make sense to allow it for concurrent_vector without also adapting the other container types, e.g., concurrent_hash_map (which uses cache_aligned_allocator, implemented on top of the same functions, for its control structures). Perhaps if this were a prevalent pattern...

You could of course patch this yourself, but perhaps that wouldn't even provide the best solution, because continuing to incur the cost of the more complicated element access of concurrent_vector after it has been frozen does not make much sense if your goal is performance, which I assume to be the case.

 

0 Kudos
Ivan_A_
Beginner
658 Views

Thanks. I see you point. I agree that in most cases it does not matter how the control structures are allocated.


I personally feel it's cleaner if the whole data (real and control) is allocated through one source, but as I read the code I know it's not that trivial and compromises should be made. You win some, you lose some. 

Keep up the good work. Great library!

0 Kudos
RafSchietekat
Valued Contributor III
658 Views

To avoid misunderstanding, I agree that it's not quite right for the user to be able to override one allocator and not be told about another one involved behind the scenes, but that was not my concern here. Basically, if you are freeze-drying containers for mapped memory, you don't need concurrent containers in the sense that multiple writers can coexist (although I don't know exactly what the penalty is for random access?), because in this new world of assumed multithreading even standard containers already allow multiple concurrent readers. Of course this may well be less convenient than being able to use the same container in a situation that also demands efficient concurrent-write access, and you also wouldn't be able to use pointers/iterators that were set before the container stopped growing as you would be able to with concurrent_vector, but you wouldn't be able to do that anyway after calling concurrent_vector::shrink_to_fit() for more efficient random access.

Does anyone have any data about access times for sequential/random access on vector or concurrent_vector with/without shrink_to_fit(), i.e., 6 different cases?

0 Kudos
Reply