Calling MKL cblas_dgemm in recursive function

Allam_F_ · ‎10-20-2013

I am trying to implement a recursive matrix multiplication on Xeon Phi. I have two implementation . The first one I have my own implementation of Strasseen and it is working fine when I call it for more than one level of recursion the time is decreased one I increase the recursion level. To boost My algorithem I used the cblas_dgemm MKL function for submatrix multiplication I call it from Strassen Algorithem. The problem is that I the time increased when I increase the level of recursion. what is the problem

Frances_R_Intel · ‎10-21-2013

Without seeing your code, whatever I say will, at best, be an educated guess.

As I recall, Strassen will multithread but not vectorize. The MKL routines try to both vectorize and multithread if possible. Perhaps as you increase the number of levels of Strassen then call dgemm, you are creating too many threads. But, as I say, that is just a guess.

You might want to try Intel(r) VTune(r) AmplifierXE (http://software.intel.com/en-us/ARTICLES/OPTIMIZATION-AND-PERFORMANCE-TUNING-FOR-INTEL-XEON-PHI-COPROCESSORS-PART-2-UNDERSTANDING might give you some idea of what to look for) or post some sample code for us to try out.