Why some parallel functions scale well with the number of cores, but the others are not.
I noticed that some parallel functions in MKL scale well with the number of cores. For example, ?gemm can be 2x if I setup two threads on two cores, and 6x if six threads on six cores. But some others, e.g. ?gemv, ?syev etc, scale less good with the number of cores. Why it is that?