Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

Calling MKL cblas_dgemm in recursive function

Allam_F_
Beginner
644 Views

I am trying to implement a recursive matrix multiplication on Xeon Phi. I have two implementation . The first one I have my own implementation of Strasseen and it is working fine when I call it for more than one level of recursion the time is decreased one I increase the recursion level. To boost My algorithem I used the cblas_dgemm MKL function for submatrix multiplication I call it from Strassen Algorithem. The problem is that I the time increased when I increase the level of recursion. what is the problem

0 Kudos
1 Reply
Frances_R_Intel
Employee
644 Views

Without seeing your code, whatever I say will, at best, be an educated guess. 

As I recall, Strassen will multithread but not vectorize. The MKL routines try to both vectorize and multithread if possible. Perhaps as you increase the number of levels of Strassen then call dgemm, you are creating too many threads. But, as I say, that is just a guess.

You might want to try Intel(r) VTune(r) AmplifierXE (http://software.intel.com/en-us/ARTICLES/OPTIMIZATION-AND-PERFORMANCE-TUNING-FOR-INTEL-XEON-PHI-COPROCESSORS-PART-2-UNDERSTANDING might give you some idea of what to look for) or post some sample code for us to try out.

0 Kudos
Reply