- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does the speed of the mkl blas/lapack library routines change significantly when one has contiguous memory versus not?
I have a strange problem that looks like a "Memory Cache Leak" (not a memory leak) leading to a slow down of a program.
Let me set the stage first. Reproducibly (using ganglia to monitor), on a cluster I have noticed that the cached memory is increasing, relatively slowly. When it becomes large, something like 2/3 of the total memory (Intel Gold with 32 cores & 192Gb) a program is running slower by about a factor of ~1.5. If I clear the cache and sync the disc (I have not tested which matter) with "sync ; echo 3 > /proc/sys/vm/drop_caches" the speed of the program increases back (~1.5 times faster).
The issue seems to be associated with I/O -- the relevant code uses mpi and only the core that is doing any I/O shows the cache leak. The program is doing a fair amount of I/O, but not massive amounts (10-40 Mb). I compile using ifort with -assume buffered_io. My suspicion is that may leave some cached files at the end, effectively a "cache leak".
The program uses a large number of blas/lapack calls. It is reasonable that the memory is less contiguous when the cached memory is large -- fragmented RAM. Can this lead to a speed change of the blas/lapack routines?
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Thanks for your question. I will investigate it and get back to you soon.
Thanks,
Alice
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you find out anything?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
exporting MKL_VERBOSE=1 will you see changing the lapack/blas execution time? With the same routines and the same input problem sizes. Are you sure that there is no third party process running at the same time?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am 100% certain this had nothing to do with other processes (there were none). Very reproducibly, "sync ; echo 3 > /proc/sys/vm/drop_caches" improved the speed by about a factor of 1.5.
N.B., the code already has a number of timers in it.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page