mkl_dbsrmmhas a parallel implementation. I have set both
MKL_NUM_THREADSto 8 and I checked that number with
[fortran] nthr = mkl_domain_get_max_threads( MKL_BLAS )[/fortran]
I do seethe User Guide says it should be threaded, so we can check if that is accurate. The scaling will depend on the sparsity pattern so it's possibleyour case is not optimal.Is performance pegged at 100% or just always below it?