Drop in performance (BLAS, MKL)

Semen_K_ · ‎07-07-2016

Hello,

I faced problem when implemented OpenBALS and MKL. Sizes of task were 16000 - 18000, step = 64 (i.e. 16000, 16064, 16128.......18000). The task was implemented on Cluster with 24 nodes of haswell architecture (two sockets, cache = 30MB). The question is: why does performance has deep drop when size is 16384? Both of application have the same drop in performance when size is 16384. I do not have big experience in programming and I ask about any thoughts. The miss rate also significantly increased in this size (this is why performance is decreased). Also, why does it happen in this size?

Sorry for bothering,
Thanks.

Size	OpenBLAS (Speed, mflops)	MKL (speed, mflops)

16256	738278.342719	803630.559752
16320	734915.036548	805445.625905
16384	661585.465594	642552.265062
16448	719808.609165	797099.170117
16512	745339.961848	804849.076513
16576	742787.216771	803981.951285