Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

scalable allocator with MKL BLAS and LAPACK functions

William_H_7
Beginner
535 Views

If I allocate memory with the scalable allocator will that cause problems with BLAS and LAPACK calls? I just don't want to find out that it works fine on small systems but as I run on larger systems that the memory is no longer contiguous and breaks BLAS and LAPACK in strange ways. Right now I use std::vector<double> without problems but on NUMA I run into performance issues with scaling. 

Is TBB normally aware of how the allocation is done so that if I have two graph nodes ready to run the thread where the memory is local will run it if possible?

 

Thank you

0 Kudos
1 Solution
Alexei_K_Intel
Employee
535 Views

I see the point. The semantics that the vector uses contiguous memory is specified by the vector semantics not by allocator. It is always so for a STL container that it specifies what it is and its semantics does not depend on allocator. It is one of the main ideas of the allocator approach that it does not change properties of containers. It can change only an approach how and where memory is allocated and deallocated (it cannot split the requested memory blocks into parts). So you can use any allocator with any STL container and be sure that semantics of the container remains the same.

Regards
Alex

View solution in original post

0 Kudos
6 Replies
Alexei_K_Intel
Employee
535 Views

Hi William,

What a problem do you face? Do you have performance issues in some particular cases? Could you describe you algorithm to give the relations between BLAS, LAPACK, std::vector<double>, graph nodes and scalable allocator functionality. How do you use these interfaces together?

Regards,
Alex

0 Kudos
William_H_7
Beginner
535 Views

If I have a std::vector<double> it is contiguous in memory and I can do a BLAS call with

Where A, B and C are std::vector<double>

cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, n, n, n, 1.0, &A[0], n, &B[0], n, 1.0, &C[0], n);

This will even run in parallel depending on the size of the matrix. However on a shared memory NUMA system with 128 cores I was thinking that replacing std::vector<double> with std::vector<double, cache_aligned_allocator<double> >; or std::vector<double, scalable_allocator<double> >; would help performance.

I just wanted to know if that would be guaranteed to generate the correct answer if I use these other allocators. I could run many tests and by chance just never hit a case where the result is incorrect. I can easily test for performance and see if it really helps, what it does for cache locality etc.

 

Thanks

Bill

0 Kudos
Alexei_K_Intel
Employee
535 Views

Hi,

Usually the correctness of any computational function does not depend on how and with what allocator memory is allocated.

Regards,
Alex

0 Kudos
William_H_7
Beginner
535 Views

Hi,

I just wanted to make sure since I was not entirely sure that when an allocator is used if the vector is still contiguous in memory. Since many different BLAS could be used (not just the Intel one) if the vector was no longer contiguous in memory it could break the calculations. That was my only concern.

Thanks

0 Kudos
Alexei_K_Intel
Employee
536 Views

I see the point. The semantics that the vector uses contiguous memory is specified by the vector semantics not by allocator. It is always so for a STL container that it specifies what it is and its semantics does not depend on allocator. It is one of the main ideas of the allocator approach that it does not change properties of containers. It can change only an approach how and where memory is allocated and deallocated (it cannot split the requested memory blocks into parts). So you can use any allocator with any STL container and be sure that semantics of the container remains the same.

Regards
Alex

0 Kudos
William_H_7
Beginner
535 Views

Thank you.

That is exactly what I needed to know. I will do various tests and see if changing the allocator helps for my case.

0 Kudos
Reply