Hello,
I'm doing some performance benchmarks on Intel's mkl right now and I noticed some unexpected issues. When using dgemm_() function instead of cblas_dgemm() on the same matrix I get around 4 times fewer Gflop/s. It's the same either with the non-parallel and parallel version.
Did someone experience similiar things or could probably point me to a possible failure?
Thanks
I'm doing some performance benchmarks on Intel's mkl right now and I noticed some unexpected issues. When using dgemm_() function instead of cblas_dgemm() on the same matrix I get around 4 times fewer Gflop/s. It's the same either with the non-parallel and parallel version.
Did someone experience similiar things or could probably point me to a possible failure?
Thanks
連結已複製
0 回應