- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For an OpenMP program I want to enforce the use of NUMA local memory with threads that are permanently bound tothe sameCPU. These threads do frequent ALLOCATE's and DEALLOCATE's.
It seems likekmp_malloc() is the only way of enforcing NUMA locality (same as thread locality in this context) for memory that is frequently allocated and freed. The first confusing issue is that kmp_malloc() is not mentioned in ifort documentation although it's in the libraries.
Fortran ALLOCATE does call malloc(). Even ifmalloc()along with proper array initialization gets me local memory at the first invocation that does not help for long. Aftersome time offrequent malloc() and free()it ends up with a fragmentedbag of local and remote memory pages. malloc() has no knowledge of locality. Please correct me if I'm wrong.
Now I don't see a supported way of forcing ALLOCATE to usekmp_malloc() instead of malloc(). Apparently the only waymight be intercepting malloc()using LD_PRELOAD. That's not a clean way of doing things.
While MALLOC is available as a Fortran intrinsic that is not the case for KMP_MALLOC.A moredesirable solution could be an environment variable that switches ALLOCATE from malloc() to kmp_malloc(). Any other ideas ? Highly appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Michael,
Look under C/C++ interoperability (not portable) functions.
Principally C_F_POINTER
Something like this untested code
REAL, POINTER :: ARRAY(:,:,:)
C_PTR :: CallocatedArray
...
CallocatedArray =YourCMalloc(nX * nY * nZ * SIZEOF(ARRAY(1,1,1))
if(CallocatedArray == 0) call Oops()
CALL C_F_POINTER(CallocatedArray, ARRAY, /nX,nY,nZ/)
Don't forget to return the allocated memory when you are done with it.
Jim Dempsey
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For C++ programs that is mandatory.Formost of myclassic Fortran style programswhich do all theirallocsin a central place during startup (just recently migrated from COMMON blocks to ALLOCATABLEs)it is o.k. to makejust a fewchanges as outlined above.
Anyway - your suggestions have helped develop this quite a bit. Thanks.
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
per-thread
per-node
per-system
Where the deallocate supplies not only the pointer but also the size of the memory node to be returned.
These are nested pools. Per-thread pool access is without locks. per-thread to/from per node has locks but only one lock perstore/fetch of pool (not node). Larger pool size means larger memory requirements but less overhead. A tradeoff between size and speed.
On 8 thread system, the pool based system is about 2.8x faster than malloc. And appears to have a linear scaling factor of 0.532225 (measured on Core i7 920).
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »