Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Drop in performance (BLAS, MKL)




I faced problem when implemented OpenBALS and MKL. Sizes of task were 16000 - 18000, step = 64 (i.e. 16000, 16064, 16128.......18000). The task was implemented on Cluster with 24 nodes of haswell architecture (two sockets, cache = 30MB). The question is: why does performance has deep drop when size is 16384? Both of application have the same drop in performance when size is 16384. I do not have big experience in programming and I ask about any thoughts. The miss rate also significantly increased in this size (this is why performance is decreased). Also, why does it happen in this size? 

Sorry for bothering,

Size OpenBLAS (Speed, mflops) MKL (speed, mflops)
16256 738278.342719 803630.559752
16320 734915.036548 805445.625905
16384 661585.465594 642552.265062
16448 719808.609165 797099.170117
16512 745339.961848 804849.076513
16576 742787.216771 803981.951285
0 Kudos
0 Replies