- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Firstly, the mkl_malloc/mkl_free has same functionality of _aligned_malloc/_aligned_free, no meter windows or linux. It is only used for allocate memory for input/output data, not for buffers during the calculation. The buffer management during the calculation is encapsulated into MKL function, it is not open for developer, you could not access pointer of these buffer memory pool.
MKL only open some interface for setting some features of the buffer memory pool usage, like mkl_disable_fast_mm, mkl_free_buffer... MKL function itself memory allocator for buffer is actually use malloc, not like TBB, not concern about thread competition, because each thread malloc a memory space for buffer on each thread. And MKL calculation function will not free these memory for buffer when it finish the caculation, for example, if I call dgemm first, and then call daxpy, the buffer space for dgemm & daxpy still remain. The only way to release space for buffer is using mkl_free_buffer/mkl_thread_free_buffer. You could refer this example to see how MKL function inner buffer management works. MKL function do not free buffer space by default for improving performance that it may reuse buffer space for next MKL function.
Next, turn to TBB. TBB can be used for threading control for any C++ project, but would not affect inside buffer management of MKL functions. And the memory allocator for TBB is used for reduce competition for threads allocation from a single global heap(memory pool). With TBB, you could template Dojob class for allocating memory in scalable way or cache_align way.
Best regards,
Fiona
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
First of all, I would like to know if you use MKL function? The mkl_malloc is actually same as aligned_malloc, it means whatever mkl function or not, they all could access the memory pool allocated by mkl_malloc. However, the mkl_disable_fast_mm only control the MKL function, do not use themselves inner buffer allocator (i_malloc), but use malloc for buffers of MKL function. The problem is, you could not access these memory by other thread control(TBB/pthread), because it not provide point of these memory to you.
I am not very clear your purpose, and please well define the meaning of "per thread memory pool". Are you pointing to call malloc under each thread? Or TLS usage? if you do not use MKL function, you could totally use TBB thread control, and TBB provide concurrent container class which is lock free. I am not
I advice you to provide pseudocode to describe which kind of memory control you would like to use. What would be helpful understanding.
Best regards,
Fiona
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
To give you a bit of context, we use cblas (mostly level 1 and 2) and VSL functions on vectors.
I would like to know if MKL_malloc/MKL_free are only proxy for _aligned_malloc/_aligned_free on windows platform, or do they include a buffer system (memory pool?) in such way to avoid unnecessary and repetitive malloc/free process.
In our context, we use sequential MKL and have 1 thread per core (our platform has 72 cores). Each thread dequeue jobs to do and each job look like this:
DoJob(args) : MKL_malloc //init of temporary vectors // several calls to CBLAS or recursive call to DoJob MKL_free // release of temporary vectors
When there are two successive calls to doJob, we would like to reuse the memory that should have been freed at the end of the first call for the MKL_malloc of the second call (is the vector is of the same size). We would like to minimize the contention due to allocation from a single heap (malloc and free are not lockless).
As far as I understand, this mechanism of buffer is in place for internal memory in function like DGEMM or FFT. And we can free the buffer with mkl_thread_free_buffers. I would like to know if it is also present in MKL_malloc/MKL_free and if it is a local buffer for each thread or a global buffer across all threads?
My point about TBB and per thread basis is that TBB offers a memory allocator that work on a per thread basis that minimize contention issued from repetitive malloc/free calls (almost lock-free malloc?). Depending on the memory management in place in MKL, we may be interested into switching to cache_aligned_allocator (the padding is 128 bytes therefor compatible with MKL functions). Any advice will be greatly appreciated.
Thank you for your help!
Arnaud
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Firstly, the mkl_malloc/mkl_free has same functionality of _aligned_malloc/_aligned_free, no meter windows or linux. It is only used for allocate memory for input/output data, not for buffers during the calculation. The buffer management during the calculation is encapsulated into MKL function, it is not open for developer, you could not access pointer of these buffer memory pool.
MKL only open some interface for setting some features of the buffer memory pool usage, like mkl_disable_fast_mm, mkl_free_buffer... MKL function itself memory allocator for buffer is actually use malloc, not like TBB, not concern about thread competition, because each thread malloc a memory space for buffer on each thread. And MKL calculation function will not free these memory for buffer when it finish the caculation, for example, if I call dgemm first, and then call daxpy, the buffer space for dgemm & daxpy still remain. The only way to release space for buffer is using mkl_free_buffer/mkl_thread_free_buffer. You could refer this example to see how MKL function inner buffer management works. MKL function do not free buffer space by default for improving performance that it may reuse buffer space for next MKL function.
Next, turn to TBB. TBB can be used for threading control for any C++ project, but would not affect inside buffer management of MKL functions. And the memory allocator for TBB is used for reduce competition for threads allocation from a single global heap(memory pool). With TBB, you could template Dojob class for allocating memory in scalable way or cache_align way.
Best regards,
Fiona
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page