I have a Windows application I build with Intel C 18.0.2 that calls MKL extensively. I want to profile it with Vtune.
If I build it with the /Zi optimization i.e. debug info. needed for profiling it seems to get 5 times slower when run as stand alone from the command line. Vtune tells me that much time is spend in dgemm. Could it add a fixed overhead per dgemm call?
How come? Any suggestions?
This is an unexpected case. We expect to see the same mkl performance with and without debug optimization. You may try to measure mkl's ( dgemm in your case ) function performance
double t1 = dsecnd()
frot( i=0< m; ++i) dgemm();
double t2 = dsecnd()
and make some conclusion