Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Thread Timing Info with MKL L3 BLAS

rostauguardian
Beginner
794 Views
Hi all,

I was wondering if it is possible to (easily) obtain timing information per thread for the multi-threaded BLAS calls in the MKL library?

I just want to check the efficiency of using multiple threads when calling some Level 3 BLAS routines within my Fortran 90 code.

Any suggestions greatly appreciated.

Thanks,

Tim.
0 Kudos
3 Replies
TimP
Honored Contributor III
794 Views
If you simply want to compare the total CPU time accumulated by all the threads with elapsed time, you could compare the time intervals reported by cpu_time and by system_clock, provided that the time interval is great enough for meaningful comparison.
A more usual meaning of efficiency is based on the number of floating point operations executed, compared with the maximum possible number during the same time interval. You could calculate the number of operations required, according to the parameters of your problem, or attempt to collect appropriate hardware counters, e.g. using VTune.
0 Kudos
rostauguardian
Beginner
794 Views
Thanks for the reply. All I really want is to measure how much time is spent by each thread in performing a lapack operation e.g. inverse (xgetrf+xgetri) for scalability purposes.

Is there anyway to interrogate this time per thread without resorting to profiling tools?
0 Kudos
TimP
Honored Contributor III
794 Views
I'm short of ideas on how that could be done after the parallel regions in the separately compiled MKL code have terminated. You could build the functions yourself from netlib code, adding parallel regions, but it would take a lot of effort to duplicate the threaded vectorized performance of MKL.
The usual way of measuring threaded scalability is simply to repeat the job with varying number of threads (e.g. set OMP_NUM_THREADS), measuring elapsed time of that section of your job, using appropriate KMP_AFFINITY settings.
Intel thread profiler is fairly easy to use, and should give interesting information such as idle time of each thread due to work imbalance and total time of each thread.
0 Kudos
Reply