Assume that I have 4 cores machine, each core has 2MB of LLC slice and LLC includes L2.
1) If I use single-threaded MKL, the MKL instance will use 2MB of LLC or use 8MB LLC?
2) If I use openmp threads to control the parallelism, will MKL instance determine available LLC based on thread num?
Any help is appreciated. Thanks.
Thanks for your kind help.
Assume that in i7-4770K, I have 4 threads application and each thread will call single-threaded sgemm routine.
And my question is that assuming the LLC is inclusive(before Skylake Server) and each sgemm will generate its own memory traffic and may overwrite data from other threads in LLC. And if single-threaded sgemm will use whole LLC, such situation will become much worse. So May I know whether this situation may happen?
I may recommend you to use the Intel Vtune Amplifier XE, it can explore the LLC missing , so you can compare the saturation become worse or not .