Community
cancel
Showing results for 
Search instead for 
Did you mean: 
L__D__Marks
New Contributor I
125 Views

Is mkl speed dependend upon how contiguous memory is?

Does the speed of the mkl blas/lapack library routines change significantly when one has contiguous memory versus not? I have a strange problem that looks like a "Memory Cache Leak" (not a memory leak) leading to a slow down of a program. Let me set the stage first. Reproducibly (using ganglia to monitor), on a cluster I have noticed that the cached memory is increasing, relatively slowly. When it becomes large, something like 2/3 of the total memory (Intel Gold with 32 cores & 192Gb) a program is running slower by about a factor of ~1.5. If I clear the cache and sync the disc (I have not tested which matter) with "sync ; echo 3 > /proc/sys/vm/drop_caches" the speed of the program increases back (~1.5 times faster). The issue seems to be associated with I/O -- the relevant code uses mpi and only the core that is doing any I/O shows the cache leak. The program is doing a fair amount of I/O, but not massive amounts (10-40 Mb). I compile using ifort with -assume buffered_io. My suspicion is that may leave some cached files at the end, effectively a "cache leak". The program uses a large number of blas/lapack calls. It is reasonable that the memory is less contiguous when the cached memory is large -- fragmented RAM. Can this lead to a speed change of the blas/lapack routines?
0 Kudos
4 Replies
Alice_H_Intel
Employee
125 Views

Hello,

Thanks for your question. I will investigate it and get back to you soon. 

Thanks,

Alice

L__D__Marks
New Contributor I
125 Views

Did you find out anything?

Gennady_F_Intel
Moderator
125 Views

exporting MKL_VERBOSE=1 will you see changing the lapack/blas execution time? With the same routines and the same input problem sizes. Are you sure that there is no third party process running at the same time? 

L__D__Marks
New Contributor I
125 Views

I am 100% certain this had nothing to do with other processes (there were none). Very reproducibly, "sync ; echo 3 > /proc/sys/vm/drop_caches" improved the speed by about a factor of 1.5.

N.B., the code already has a number of timers in it.

Reply