- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I allocate memory with the scalable allocator will that cause problems with BLAS and LAPACK calls? I just don't want to find out that it works fine on small systems but as I run on larger systems that the memory is no longer contiguous and breaks BLAS and LAPACK in strange ways. Right now I use std::vector<double> without problems but on NUMA I run into performance issues with scaling.
Is TBB normally aware of how the allocation is done so that if I have two graph nodes ready to run the thread where the memory is local will run it if possible?
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see the point. The semantics that the vector uses contiguous memory is specified by the vector semantics not by allocator. It is always so for a STL container that it specifies what it is and its semantics does not depend on allocator. It is one of the main ideas of the allocator approach that it does not change properties of containers. It can change only an approach how and where memory is allocated and deallocated (it cannot split the requested memory blocks into parts). So you can use any allocator with any STL container and be sure that semantics of the container remains the same.
Regards
Alex
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi William,
What a problem do you face? Do you have performance issues in some particular cases? Could you describe you algorithm to give the relations between BLAS, LAPACK, std::vector<double>, graph nodes and scalable allocator functionality. How do you use these interfaces together?
Regards,
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I have a std::vector<double> it is contiguous in memory and I can do a BLAS call with
Where A, B and C are std::vector<double>
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, n, n, n, 1.0, &A[0], n, &B[0], n, 1.0, &C[0], n);
This will even run in parallel depending on the size of the matrix. However on a shared memory NUMA system with 128 cores I was thinking that replacing std::vector<double> with std::vector<double, cache_aligned_allocator<double> >; or std::vector<double, scalable_allocator<double> >; would help performance.
I just wanted to know if that would be guaranteed to generate the correct answer if I use these other allocators. I could run many tests and by chance just never hit a case where the result is incorrect. I can easily test for performance and see if it really helps, what it does for cache locality etc.
Thanks
Bill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Usually the correctness of any computational function does not depend on how and with what allocator memory is allocated.
Regards,
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I just wanted to make sure since I was not entirely sure that when an allocator is used if the vector is still contiguous in memory. Since many different BLAS could be used (not just the Intel one) if the vector was no longer contiguous in memory it could break the calculations. That was my only concern.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see the point. The semantics that the vector uses contiguous memory is specified by the vector semantics not by allocator. It is always so for a STL container that it specifies what it is and its semantics does not depend on allocator. It is one of the main ideas of the allocator approach that it does not change properties of containers. It can change only an approach how and where memory is allocated and deallocated (it cannot split the requested memory blocks into parts). So you can use any allocator with any STL container and be sure that semantics of the container remains the same.
Regards
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you.
That is exactly what I needed to know. I will do various tests and see if changing the allocator helps for my case.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page